I have a requirement, for example, there will be number of .txt files in one loation c:\onelocation. I want to write the content to another location in txt format. This part is pretty easy and straight forward. But there is speed breaker here.
There will be time interval take 120 seconds. Read the files from above location and write it to another files with formate txt till 120secs and save the file with name as timestamp.
After 120sec create one more files with that timestamp but we have to read the files were cursor left in previous file.
Please can you suggest any ideas, if code is provided that would be also appreciable.
Thanks Damu.
How about this? A writer that automatically changes where it is writing two every 120 seconds.
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.Writer;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;
public class TimeBoxedWriter extends Writer {
private static DateFormat FORMAT = new SimpleDateFormat("yyyyDDDHHmm");
/** Milliseconds to each time box */
private static final int TIME_BOX = 120000;
/** For testing only */
public static void main(String[] args) throws IOException {
Writer w = new TimeBoxedWriter(new File("."), "test");
// write one line per second for 500 seconds.
for(int i = 0;i < 500;i++) {
w.write("testing " + i + "\n");
try {
Thread.sleep(1000);
} catch (InterruptedException ie) {}
}
w.close();
}
/** Output folder */
private File dir_;
/** Timestamp for current file */
private long stamp_ = 0;
/** Stem for output files */
private String stem_;
/** Current output writer */
private Writer writer_ = null;
/**
* Create new output writer
*
* #param dir
* the output folder
* #param stem
* the stem used to generate file names
*/
public TimeBoxedWriter(File dir, String stem) {
dir_ = dir;
stem_ = stem;
}
#Override
public void close() throws IOException {
synchronized (lock) {
if (writer_ != null) {
writer_.close();
writer_ = null;
}
}
}
#Override
public void flush() throws IOException {
synchronized (lock) {
if (writer_ != null) writer_.flush();
}
}
private void rollover() throws IOException {
synchronized (lock) {
long now = System.currentTimeMillis();
if ((stamp_ + TIME_BOX) < now) {
if (writer_ != null) {
writer_.flush();
writer_.close();
}
stamp_ = TIME_BOX * (System.currentTimeMillis() / TIME_BOX);
String time = FORMAT.format(new Date(stamp_));
writer_ = new FileWriter(new File(dir_, stem_ + "." + time
+ ".txt"));
}
}
}
#Override
public void write(char[] cbuf, int off, int len) throws IOException {
synchronized (lock) {
rollover();
writer_.write(cbuf, off, len);
}
}
}
Use RamdomAccessFile in java to move the cursor within the file.
Before start copying check the file modification/creation(in case of new files) time, if less than 2 mins then only start copying or else skip it.
Keep a counter of no.of bytes/lines read for each file. move the cursor to that position and read it from there.
You can duplicate the file rather than using reading and writing operations.
sample code:
FileChannel ic = new FileInputStream("<source file location>")).getChannel();
FileChannel oc = new FileOutputStream("<destination location>").getChannel();
ic.transferTo(0, ic.size(), oc);
ic.close();
oc.close();
HTH
File io is simple in java, here is an example I found on the web of copying a file to another file.
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
public class Copy {
public static void main(String[] args) throws IOException {
File inputFile = new File("farrago.txt");
File outputFile = new File("outagain.txt");
FileReader in = new FileReader(inputFile);
FileWriter out = new FileWriter(outputFile);
int c;
while ((c = in.read()) != -1)
out.write(c);
in.close();
out.close();
}
}
Related
folder structure is here
console output is here
I'd like to write a test class for the 2 methods below
package jfe;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.commons.compress.archivers.sevenz.SevenZArchiveEntry;
import org.apache.commons.compress.archivers.sevenz.SevenZFile;
import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveInputStream;
import org.apache.commons.compress.utils.IOUtils;
public class JThreadFile {
/**
* uncompresses .tar file
* #param in
* #param out
* #throws IOException
*/
public static void decompressTar(String in, File out) throws IOException {
try (TarArchiveInputStream tin = new TarArchiveInputStream(new FileInputStream(in))){
TarArchiveEntry entry;
while ((entry = tin.getNextTarEntry()) != null) {
if (entry.isDirectory()) {
continue;
}
File curfile = new File(out, entry.getName());
File parent = curfile.getParentFile();
if (!parent.exists()) {
parent.mkdirs();
}
IOUtils.copy(tin, new FileOutputStream(curfile));
}
}
}
/**
* uncompresses .7z file
* #param in
* #param destination
* #throws IOException
*/
public static void decompressSevenz(String in, File destination) throws IOException {
//#SuppressWarnings("resource")
SevenZFile sevenZFile = new SevenZFile(new File(in));
SevenZArchiveEntry entry;
while ((entry = sevenZFile.getNextEntry()) != null){
if (entry.isDirectory()){
continue;
}
File curfile = new File(destination, entry.getName());
File parent = curfile.getParentFile();
if (!parent.exists()) {
parent.mkdirs();
}
FileOutputStream out = new FileOutputStream(curfile);
byte[] content = new byte[(int) entry.getSize()];
sevenZFile.read(content, 0, content.length);
out.write(content);
out.close();
}
sevenZFile.close();
}
}
I use testNG and try to read from a folder, filter the folder for certain extensions (.tar and .7z) feed those files to the uncompress methods and compare the result to the actualOutput folder with AssertEquals. I manage to read the file names from the folder (see console output) but can't feed them to decompressTar(String in, File out). Is this because "result" is a String array and I need a String? I have no clue how DataProvider of TestNG handles data. Any help would be appreciated :) Thank you :)
import java.io.File;
import java.io.FilenameFilter;
import java.io.IOException;
import org.testng.Assert;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;
public class JThreadFileTest {
protected static final File ACT_INPUT = new File("c:\\Test\\TestInput\\"); //input directory
protected static final File EXP_OUTPUT = new File("c:\\Test\\ExpectedOutput\\"); //expected output directory
protected static final File TEST_OUTPUT = new File("c:\\Test\\TestOutput\\");
#DataProvider(name = "tarJobs")
public Object[] getTarJobs() {
//1
String[] tarFiles = ACT_INPUT.list(new FilenameFilter()
{
public boolean accept(File dir, String name)
{
return name.endsWith(".tar");
}
});
//2
Object[] result = new Object[tarFiles.length];
int i = 0;
for (String filename : tarFiles) {
result[i] = filename;
i++;
}
return result;
}
#Test(dataProvider = "tarJobs")
public void testTar(String result) throws IOException {
System.out.println("Running test" + result);
--> JThreadFile.decompressTar(result, TEST_OUTPUT);
Assert.assertEquals(TEST_OUTPUT, EXP_OUTPUT);
}
}
I’m currently using PDFBox to read the text of a set of pdfs that I’ve inherited.
I’m only interested in reading the text, not making any changes to the file.
The code that works for most of the files is:
File pdfFile = myPath.toFile();
PDDocument document = PDDocument.load(pdfFile );
Writer sw = new StringWriter();
PDFTextStripper stripper = new PDFTextStripper();
stripper.setStartPage( 1 );
stripper.writeText( document, sw );
String documentText = sw.toString()
For most files, I wind up with the text in the documentText field.
But, for 3 of 24 files, the documentText content for the first file is “\r\n”, for the second “\r\n\r\n”, and for the third “\r\n\r\n\r\n:, But the three files are not consecutive. Multiple good files are between each of these files.
The File is derived from a java.nio.Path. The WindowsFileAttribute that is part of the Path has a size of 279K, so the file is not empty on disk.
I can open the file and view the data, and it looks like the other files that my code reads.
I’m using Java 8.0.121, and PDFBox 2.0.4. (this is the latest version, I believe.)
Any suggestions? Is there a better way to read the text? (I’m not interested in the formatting, or fonts used, just the text.)
Thanks.
Reading multiple PDF docs using pdfbox in java
package readwordfile;
import java.io.BufferedReader;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.TextPosition;
import java.io.ByteArrayOutputStream;
import java.io.DataInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.util.ArrayList;
import java.util.List;
/**
* This is an example on how to extract words from PDF document
*
* #author saravanan
*/
public class GetWordsFromPDF extends PDFTextStripper {
static List<String> words = new ArrayList<String>();
public GetWordsFromPDF() throws IOException {
}
/**
* #param args
* #throws IOException If there is an error parsing the document.
*/
public static void main(String[] args) throws IOException {
String files;
// FileWriter fs = new FileWriter("C:\\Users\\saravanan\\Desktop\\New Text Document (2).txt");
// FileInputStream fstream1 = new FileInputStream("C:\\Users\\saravanan\\Desktop\\New Text Document (2).txt");
// DataInputStream in1 = new DataInputStream(fstream1);
// BufferedReader br1 = new BufferedReader(new InputStreamReader(in1));
String path = "C:\\Users\\saravanan\\Desktop\\New folder\\"; //local folder path name
File folder = new File(path);
File[] listOfFiles = folder.listFiles();
for (int i = 0; i < listOfFiles.length; i++) {
if (listOfFiles[i].isFile()) {
files = listOfFiles[i].getName();
if (files.endsWith(".pdf") || files.endsWith(".PDF")) {
String nfiles = "C:\\Users\\saravanan\\Desktop\\New folder\\";
String fileName1 = nfiles + files;
System.out.print("\n\n" + files+"\n");
PDDocument document = null;
try {
document = PDDocument.load(new File(fileName1));
PDFTextStripper stripper = new GetWordsFromPDF();
stripper.setSortByPosition(true);
stripper.setStartPage(0);
stripper.setEndPage(document.getNumberOfPages());
Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream());
stripper.writeText(document, dummy);
int x = 0;
System.out.println("");
for (String word : words) {
if (word.startsWith("xxxxxx")) { //here you can give your pdf doc starting word
x = 1;
}
if (x == 1) {
if (!(word.endsWith("YYYYYY"))) { //here you can give your pdf doc ending word
System.out.print(word + " ");
// fs.write(word);
} else {
x = 0;
break;
}
}
}
} finally {
if (document != null) {
document.close();
words.clear();
}
}
}
}
}
}
/**
* Override the default functionality of PDFTextStripper.writeString()
*
* #param str
* #param textPositions
* #throws java.io.IOException
*/
#Override
protected void writeString(String str, List<TextPosition> textPositions) throws IOException {
String[] wordsInStream = str.split(getWordSeparator());
if (wordsInStream != null) {
for (String word : wordsInStream) {
words.add(word); //store the pdf content into the List
}
}
}
}
I have a 10GB PDF file that I would like to break up into 10 files each 1GB in size. I need to do this operation in parallel, which means spinning 10 threads which each starts from a different position and read up to 1GB of data and write to a file. Basically the final result should be 10 files that each contain a portion of the original 10GB file.
I looked at FileChannel, but the position is shared, so once I modify the position in one thread, it impacts the other thread. I also looked at AsynchronousFileChannel in Java 7 but I'm not sure if that's the way to go. I appreciate any suggestion on this issue.
I wrote this simple program that reads a small text file to test the FileChannel idea, doesn't seem to work for what I'm trying to achieve.
package org.cas.filesplit;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.Path;
import java.nio.file.Paths;
public class ConcurrentRead implements Runnable {
private int myPosition = 0;
public int getPosition() {
return myPosition;
}
public void setPosition(int position) {
this.myPosition = position;
}
static final String filePath = "C:\\Users\\temp.txt";
#Override
public void run() {
try {
readFile();
} catch (IOException e) {
e.printStackTrace();
}
}
private void readFile() throws IOException {
Path path = Paths.get(filePath);
FileChannel fileChannel = FileChannel.open(path);
fileChannel.position(myPosition);
ByteBuffer buffer = ByteBuffer.allocate(8);
int noOfBytesRead = fileChannel.read(buffer);
while (noOfBytesRead != -1) {
buffer.flip();
System.out.println("Thread - " + Thread.currentThread().getId());
while (buffer.hasRemaining()) {
System.out.print((char) buffer.get());
}
System.out.println(" ");
buffer.clear();
noOfBytesRead = fileChannel.read(buffer);
}
fileChannel.close();
}
}
So im trying to make a program that you input a flash game URL and it downloads the .swf file. Shown here:
import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
/**
* Main.java
*
*
*/
public class Main {
/**
* Reads a web page into a StringBuilder object
* and prints it out to console along with the
* size of the page.
*/
public void getWebSite() {
try {
URL url = new URL("http://www.vivalagames.com");
URLConnection urlc = url.openConnection();
BufferedInputStream buffer = new BufferedInputStream(urlc.getInputStream());
StringBuilder builder = new StringBuilder();
int byteRead;
while ((byteRead = buffer.read()) != -1)
builder.append((char) byteRead);
buffer.close();
Logger.log(builder.toString());
System.out.println("The size of the web page is " + builder.length() + " bytes.");
} catch (MalformedURLException ex) {
ex.printStackTrace();
} catch (IOException ex) {
ex.printStackTrace();
}
}
/**
* Starts the program
*
* #param args the command line arguments
*/
public static void main(String[] args) {
new Main().getWebSite();
}
}
I have got to the part where it downloads the websites html and puts it into a file called output.txt. Now what im trying to do is make it search that text file till it finds the words ".swf", the searcher code is:
import java.io.*;
import java.util.Scanner;
import java.util.regex.MatchResult;
public class Sercher {
public static void main() throws FileNotFoundException {
Scanner s = new Scanner(new File("output.txt"));
while (null != s.findWithinHorizon("(?i)\\b.swf\\b", 0)) {
MatchResult mr = s.match();
System.out.printf("Word found: %s at index %d to %d.%n", mr.group(),
mr.start(), mr.end());
}
}
}
Now how do I make the main.java code run the function from the Searcher.java?
This should do it:
public static void main(String[] args) {
new Main().getWebSite();
Searcher.main();
}
Make an instance of the Searcher class in the Main class.
public static void main(String[] args) {
new Main().getWebSite();
Searcher search = new Searcher();
}
or simply, use Searcher.main();.
First of all, storing the downloaded HTML in a file to re-read this file right after is not a really good idea. You could do everything in memory.
Think in terms of objects and methods. You basically have two objects here: a Downloader and a Searcher. And you don't want two main methods to your program: only a single one. This main method should look like this:
// create the object which downloads the HTML
Downloader downloader = new Downloader();
// Ask it to download, and store the result into a String variable
String downloadedHtml = downloader.download();
// create the object which can search into a String for .swf references
Searcher searcher = new Searcher();
// pass it the String to search into
searcher.searchSwfIn(downloadedHtml);
You need to put your classes to packages and import Searcher package to your main class.
Example:
package foo.bar.package;
import for.bar.package2.Searcher;
/*
Other import declarations
*/
public class Main {
/*
Your code
*/
public static void main(String[] args) {
new Main().getWebSite();
new Searcher().search();
}
}
package for.bar.package2;
/*
Import declarations
*/
public class Searcher {
public void search() throws FileNotFoundException {
Scanner s = new Scanner(new File("output.txt"));
while (null != s.findWithinHorizon("(?i)\\bjava\\b", 0)) {
MatchResult mr = s.match();
System.out.printf("Word found: %s at index %d to %d.%n", mr.group(),
mr.start(), mr.end());
}
}
}
So I was wondering if it was possible to write all the console output to a separate file outside of Java? I know about the Printwriter and Filewriter method. However, in my experience those would work if I was using them all within one method, but I don't think I can do that with the code I have right now. Below is what I have...
Java Code
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
public class XMLTagParser extends DefaultHandler
{
private int i;
public XMLTagParser()
{
traverse(new File("C:/Documents and Settings/user/workspace/Intern Project/Proposals/Converted Proposals/Extracted Items"));
}
private static final class SaxHandler extends DefaultHandler
{
private StringBuffer buffer;
private String heading;
private boolean inHeading;
public void startElement(String uri, String localName, String qName, Attributes attrs)
{
if ("w:pStyle".equals(qName))
{
String val = attrs.getValue("w:val");
if (val.contains("Heading"))
{
if (isHeading(val))
{
System.out.println(val);
inHeading = true;
}
}
}
if("w:t".equals(qName))
{
if (inHeading == true)
{
buffer = new StringBuffer();
}
}
}
public void characters(char buff[], int offset, int length) throws SAXException
{
String s = new String(buff, offset, length);
if(buffer != null)
{
buffer.append(s);
heading = heading += s;
}
}
public void endElement(String uri, String localName, String qName)
{
buffer = null;
//if the qName is "w:p" and it is in the heading, print out the heading and then reset
if ("w:p".equals(qName) && inHeading == true)
{
System.out.println(heading);
heading = "";
inHeading = false;
}
}
// method to verify whether element is an actual heading
private static boolean isHeading(String heading)
{
String headingNumber = heading.substring(7,8);
String headingName = heading.substring(0,7);
if (headingName.equals("Heading"))
{
if (headingNumber.equals("1")
|| headingNumber.equals("2")
|| headingNumber.equals("3")
|| headingNumber.equals("4")
|| headingNumber.equals("5")
|| headingNumber.equals("6"))
{
return true;
}
}
return false;
}
}
/*private void writeFile(File file)
{
try
{
PrintWriter out = new PrintWriter(new FileWriter(file + "/" + i++));
out.close();
}
catch (IOException e)
{
e.printStackTrace(System.out);
}
}*/
private void traverse(File directory)
{
//Get all files in directory
File[] files = directory.listFiles();
for (File file : files)
{
if (file.getName().equals("document.xml"))
{
try
{
// creates and returns new instance of SAX-implementation:
SAXParserFactory factory = SAXParserFactory.newInstance();
// create SAX-parser...
SAXParser parser = factory.newSAXParser();
// prints out the current working proposal, traversing up the directory structure
System.out.println(file.getParentFile().getParentFile().getName());
// .. define our handler:
SaxHandler handler = new SaxHandler();
// and parse:
parser.parse(file.getAbsolutePath(), handler);
try
{
// instantiates new printwriter which writes out to a file
PrintWriter out = new PrintWriter(new FileWriter(file.getParentFile().getParentFile() + "/" + i++ + ".txt"));
out.close();
}
catch (IOException e)
{
e.printStackTrace(System.out);
}
}
catch (Exception ex)
{
ex.printStackTrace(System.out);
}
}
else if (file.isDirectory())
{
//It's a directory so (recursively) traverse it
traverse(file);
}
}
}
}
So I've instantiated the printwriter in there, but obviously it's no good if I have nothing to write to it. So I'm not really sure how I can get what's printing out the console to be written to that file. Any ideas? Thanks in advance.
If you really want to you can redirect System.out to any PrintStream like this:
PrintStream stream = new PrintStream("filename.txt");
System.setOut(stream);
If you get into rolling your own file logger you'll spend more time dealing with io issues, rolling files, file sizes, ect. You should use log4j instead! It will handle things like this and make your logging more flexible. It's pretty much the standard for java logging.
The System.out is basically an OutputStream; which by default points to the console. Instead, you could just create a new FileOutputStream instance pointing to the file of your choice, and identify this stream by setting it through System.setOut. That will do it for you, throughout the life-cycle of the program/application. Check this link for a complete code.
Instead of using System.out, you could use a FileWriter, write to it and flush it. It is unclear, why you increment i in your code. I guess you want to write everything to just one file.
Also, it looks like you never write to the Writer that you initialize.
You could keep using System.out, and redirect it using ">" to the file when you invoke the application. You can still retain stderr for direct console output.
Or do you mean something else when you write "outside java"?