Read large text file in java, infeasible? - java

I'm using the following method to read a file into a JTextArea:
public void readFile(File file) throws java.io.FileNotFoundException,
java.io.IOException {
if(file == null) return;
jTextArea1.setText("");
try(BufferedReader reader = new BufferedReader(new FileReader(file))){
String line = "";
while((line=reader.readLine())!=null){
jTextArea.append(line + "\n");
}
}
}
It works OK with a normal-sized file (a few hundred kilobytes), but when I tested a 30000-line file of 42 MB that Notepad can open in about 5 seconds, my file reader took forever. I couldn't wait for it to finish; I had waited for about 15-20 minutes and it was still working consuming 30% of my CPU usage.
Could you please give me a solution for this? I'm handling with text files only, not binary files, and all I know is using BufferedReader is the best.

The problem is likely not in the file reading but the processing. Repeated calls to append are likely to be very inefficient with large datasets.
Consider using a StringBuilder. This class is designed for quickly creating long strings from parts (on a single thread; see StringBuffer for a multi-threaded counterpart).
if(file == null) return;
StringBuilder sb = new StringBuilder();
jTextArea1.setText("");
try(BufferedReader reader = new BufferedReader(new FileReader(file))){
String line = "";
while((line==reader.readLine())!=null){
sb.append(line);
sb.append('\n');
}
jTextArea1.setText(sb.toString());
}
As suggested in the comments, you may wish to perform this action in a new thread so the user doesn't think your program has frozen.

Related

Process won't run unless printing output + Processbuilder

I've come across a strange issue. I've used process builder several times to call an executable from a program but have never encountered this before. For debug purposes I made a method which prints the output of the executable to System.out. Everything worked fine and my program nicely exported all of the test gifs I ran.
When it came time to run this program properly for 1000+ gifs I commented out the printout method to improve performance. Once the whole program had run I come back to find that the exportGif did not work. The program ran with no errors but the calling of the process simply did not export the gifs as expected.
After isolating lines in the printout method it seems that the deciding bit of code is the reader.readLine(). Why would this be the case? The executable should have already run, the debug method should only read the output stream after the fact, correct? I'd rather not loop through it's output stream every time as it causes the program to slow considerably.
private void printProcessOutput(Process process){
BufferedReader reader =
new BufferedReader(new InputStreamReader(process.getInputStream()));
StringBuilder builder = new StringBuilder();
String line = null;
try{
while ( (line = reader.readLine()) != null) {
builder.append(line);
builder.append(System.getProperty("line.separator"));
}
}catch(IOException e){
e.printStackTrace();
}
System.out.println(builder.toString());
}
private void exportGIF(String dirPath) throws IOException {
List<String> lines = Arrays.asList("/Users/IdeaProjects/MasterFormat/MasterFormat-Java/MasterFormat/timMaster_4.1.png \"{200.0,467.0}\"");
Path headImageFile = Paths.get(System.getProperty("user.dir") + File.separator + "headImageInfo.txt");
Files.write(headImageFile, lines, Charset.forName("UTF-8"));
String templatePath = dirPath + File.separator + "template.mp4";
String outputPath = dirPath + File.separator;
String headImagePath = headImageFile.toString();
String gifExportExecPath = "/Users/IdeaProjects/MasterFormat/MasterFormat-Java/MasterFormat/GIFExport";
Process process = new ProcessBuilder(gifExportExecPath, "-s", templatePath, "-o", outputPath, "-h", headImagePath).start();
printProcessOutput(process);
Files.delete(headImageFile);
}
EDIT
One thing I should add. I noticed that when I comment out the debug method it clocks through all 1000+ iterations in less than ten minutes, But, of course the gifs do not export (the executable doesn't run...? Not sure).
When I include the printout method it is a lot slower. I tried running it overnight but it got stuck after 183 iterations. I've tried profiling to see if it was causing some thrashing but the GC seems to run fine.
You need to consume the output of the Process or it may hang. So you can't comment out printProcessOutput(process);. Instead, comment out the lines that actually do the printing:
try{
while ( (line = reader.readLine()) != null) {
//builder.append(line);
//builder.append(System.getProperty("line.separator"));
}
} catch(IOException e){
e.printStackTrace();
}
//System.out.println(builder.toString());
I generally use this method, which also redirects the error stream:
public static void runProcess(ProcessBuilder pb) throws IOException {
pb.redirectErrorStream(true);
Process p = pb.start();
BufferedReader reader = new BufferedReader(new InputStreamReader(p.getInputStream()));
String line;
while ((line = reader.readLine()) != null) {
//System.out.println(line);
}
}

How to split the my flowfile content based on '\n'?

i have try to create sample custom processor for read lines and made some changes in input lines then process into flowfile.
This is my code to read flowfile.
String inputRow;
session.read(flowFile, new InputStreamCallback() {
#Override
public void process(InputStream in) throws IOException {
inputRow = IOUtils.toString(in);
}
});
observed that code from below reference.
http://www.nifi.rocks/developing-a-custom-apache-nifi-processor-json/
After read lines i can't able split those lines based on LineFeed character.
upstream connection for my processor yields below my sample input.
My Sample input line:
No,Name,value
1,Si,21
2,LI,321
3,Ji,11
Above lines can able to stored in "inputRow".
But i have using below code to split it based on '\n'.
String[] splits=inputRow.split("\n");
i have tried '\n' and '\r\n' to split those lines but it's not worked.
Any one please guide me to split those lines as below expected output.
splits[0]=No,Name,value
splits[1]=1,Si,21
splits[2]=2,LI,321
splits[3]=3,Ji,11
Any help appreciated.
As mentioned in another answer, you should be able to use a BufferedReader to read line-by-line. You should also avoid loading the entire contents of the flow file into memory whenever possible.
Imagine that this NiFi processor is processing 1GB CSV files and that there could be 2-3 files processed concurrently. If you read the whole flow file content into memory, you will hit out-of-memory if you have less than 3GB of heap allocated to the JVM. If you stream each file line-by-line you would only have 2-3 lines in memory at one time and would need very little overall memory.
The following snippet shows how you could read in a line, process it, and write it out, without ever having the whole content in memory:
flowFile = session.write(flowFile, new StreamCallback() {
#Override
public void process(InputStream in, OutputStream out) throws IOException {
try (InputStreamReader inReader = new InputStreamReader(in);
BufferedReader reader = new BufferedReader(inReader);
OutputStreamWriter outWriter = new OutputStreamWriter(out);
BufferedWriter writer = new BufferedWriter(outWriter)) {
String line = reader.readLine();
while (line != null) {
line = process(line);
writer.write(line);
writer.newLine();
line = reader.readLine();
}
}
}
});
You can use this regex for splitting: \\r?\\n.
String[] splits = inputRow.split("\\r?\\n");
Why pushing everything into a single string? Just read them line by line; and push those lines into a List right there:
List<String> inputRows = new ArrayList<>();
...
and within your callback you use a BufferedReader like this:
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
while ((line = reader.readLine()) != null) {
inputRows.add(line);
}

Java replace line in a text file

I found this code from another question
private void updateLine(String toUpdate, String updated) throws IOException {
BufferedReader file = new BufferedReader(new FileReader(data));
String line;
String input = "";
while ((line = file.readLine()) != null)
input += line + "\n";
input = input.replace(toUpdate, updated);
FileOutputStream os = new FileOutputStream(data);
os.write(input.getBytes());
file.close();
os.close();
}
This is my file before I replace some lines
example1
example2
example3
But when I replace a line, the file now looks like this
example1example2example3
Which makes it impossible to read the file when there are a lot of lines in it.
How would I go about editing the code above to make my file look what it looked like at the start?
Use System.lineSeparator() instead of \n.
while ((line = file.readLine()) != null)
input += line + System.lineSeparator();
The issue is that on Unix systems, the line separator is \n while on Windows systems, it's \r\n.
In Java versions older then Java 7, you would have to use System.getProperty("line.separator") instead.
As pointed out in the comments, if you have concerns about memory usage, it would be wise to not store the entire output in a variable, but write it out line-by-line in the loop that you're using to process the input.
If you read and modify line by line this has the advantage, that you dont need to fit the whole file in memory. Not sure if this is possible in your case, but it is generally a good thing to aim for streaming. In your case this would in addition remove the need for concatenate the string and you don't need to select a line terminator, because you can write each single transformed line with println(). It requires to write to a different file, which is generally a good thing as it is crash safe. You would lose data if you rewrite a file and get aborted.
private void updateLine(String toUpdate, String updated) throws IOException {
BufferedReader file = new BufferedReader(new FileReader(data));
PrintWriter writer = new PrintWriter(new File(data+".out"), "UTF-8");
String line;
while ((line = file.readLine()) != null)
{
line = line.replace(toUpdate, updated);
writer.println(line);
}
file.close();
if (writer.checkError())
throw new IOException("cannot write");
writer.close();
}
In this case, it assumes that you need to do the replace only on complete lines, not multiple lines. I also added an explicit encoding and use a writer, as you have a string to output.
This is because you use OutputStream which is better for handling binary data. Try using PrintWriter and don't add any line terminator at the end of the lines. Example is here

Reading every 10 lines using a BufferedReader

Is there a way of reading, say, every 10 lines from a .txt file using a BufferedReader? At the moment my BufferedReader is reading every line, splitting the different values and storing them in an array list; which is then used elsewhere in my program.
Use LineNumberReader which is intended for this very purpose:
LineNumberReader reader = new LineNumberReader(fileReader);
ArrayList<String> goodLines = new ArrayList<String>();
String line = null;
while ((line = reader.readLine()) != null) {
if ((reader.getLineNumber()+1) % 10 == 0) {
goodLines.add(line);
}
}
Use a loop to read all the lines you don't want, then read the line you do want.
BufferedReader br = new BufferedReader(new FileReader(file));
int index = 10;
while (lineNumber < index - 1)
{
lineNumber++;
br.readLine();
}
String lineYouWant = br.readLine();
if (lineYouWant.isEmpty()) br.close();
// Do stuff with lineYouWant
br.close();
Since all of your lines are the same size you could look at the skip() method in the BufferedReader. You would basically read a line and then skip 10 * lineSize and read the next line, etc...
The purpose of a buffered reader is to make reading logical units like lines easy. Reading multiple lines would complicate your code and not provide a great performance boost since the buffered reader is already reading large blocks of data into its buffer.
Edit: Since your records are fixed size you could use a lower level reader and just read the amount of bytes required.

Reading group of lines in HUGE files

I have no idea how to do the following: I want to process a really huge textfile (almost 5 gigabytes). Since I cannot copy the file into temporarily memory, I thought of reading the first 500 lines (or as many as fit into the memory, I am not sure about that yet), do something with them, then go on to the next 500 until I am done with the whole file.
Could you post an example of the "loop" or command that you need for that? Because all the ways I tried resulted in starting from the beginning again but I want to go on after finishing the previous 500 lines.
Help appreciated.
BufferedReader br = new BufferedReader(new FileReader(file));
String line = null;
ArrayList<String> allLines = new ArrayList<String>();
while((line = br.readLine()) != null) {
allLines.add(line);
if (allLines.size() > 500) {
processLines(allLines);
allLines.clear();
}
}
processLines(allLines);
Ok so you indicated in a comment above that you only want to keep certain lines, writing them to a new file based on certain logic. You can read in one line at a time, decide whether to keep it and if so write it to the new file. This approach will use very little memory since you are only holding one line at a time in memory. Here is one way to do that:
BufferedReader br = new BufferedReader(new FileReader(file));
String lineRead = null;
FileWriter fw = new FileWriter(new File("newfile.txt"), false);
while((lineRead = br.readLine()) != null)
{
if (true) // put your test conditions here
{
fw.write(lineRead);
fw.flush();
}
}
fw.close();
br.close();

Categories

Resources