Reading group of lines in HUGE files - java

I have no idea how to do the following: I want to process a really huge textfile (almost 5 gigabytes). Since I cannot copy the file into temporarily memory, I thought of reading the first 500 lines (or as many as fit into the memory, I am not sure about that yet), do something with them, then go on to the next 500 until I am done with the whole file.
Could you post an example of the "loop" or command that you need for that? Because all the ways I tried resulted in starting from the beginning again but I want to go on after finishing the previous 500 lines.
Help appreciated.

BufferedReader br = new BufferedReader(new FileReader(file));
String line = null;
ArrayList<String> allLines = new ArrayList<String>();
while((line = br.readLine()) != null) {
allLines.add(line);
if (allLines.size() > 500) {
processLines(allLines);
allLines.clear();
}
}
processLines(allLines);

Ok so you indicated in a comment above that you only want to keep certain lines, writing them to a new file based on certain logic. You can read in one line at a time, decide whether to keep it and if so write it to the new file. This approach will use very little memory since you are only holding one line at a time in memory. Here is one way to do that:
BufferedReader br = new BufferedReader(new FileReader(file));
String lineRead = null;
FileWriter fw = new FileWriter(new File("newfile.txt"), false);
while((lineRead = br.readLine()) != null)
{
if (true) // put your test conditions here
{
fw.write(lineRead);
fw.flush();
}
}
fw.close();
br.close();

Related

How to split the my flowfile content based on '\n'?

i have try to create sample custom processor for read lines and made some changes in input lines then process into flowfile.
This is my code to read flowfile.
String inputRow;
session.read(flowFile, new InputStreamCallback() {
#Override
public void process(InputStream in) throws IOException {
inputRow = IOUtils.toString(in);
}
});
observed that code from below reference.
http://www.nifi.rocks/developing-a-custom-apache-nifi-processor-json/
After read lines i can't able split those lines based on LineFeed character.
upstream connection for my processor yields below my sample input.
My Sample input line:
No,Name,value
1,Si,21
2,LI,321
3,Ji,11
Above lines can able to stored in "inputRow".
But i have using below code to split it based on '\n'.
String[] splits=inputRow.split("\n");
i have tried '\n' and '\r\n' to split those lines but it's not worked.
Any one please guide me to split those lines as below expected output.
splits[0]=No,Name,value
splits[1]=1,Si,21
splits[2]=2,LI,321
splits[3]=3,Ji,11
Any help appreciated.
As mentioned in another answer, you should be able to use a BufferedReader to read line-by-line. You should also avoid loading the entire contents of the flow file into memory whenever possible.
Imagine that this NiFi processor is processing 1GB CSV files and that there could be 2-3 files processed concurrently. If you read the whole flow file content into memory, you will hit out-of-memory if you have less than 3GB of heap allocated to the JVM. If you stream each file line-by-line you would only have 2-3 lines in memory at one time and would need very little overall memory.
The following snippet shows how you could read in a line, process it, and write it out, without ever having the whole content in memory:
flowFile = session.write(flowFile, new StreamCallback() {
#Override
public void process(InputStream in, OutputStream out) throws IOException {
try (InputStreamReader inReader = new InputStreamReader(in);
BufferedReader reader = new BufferedReader(inReader);
OutputStreamWriter outWriter = new OutputStreamWriter(out);
BufferedWriter writer = new BufferedWriter(outWriter)) {
String line = reader.readLine();
while (line != null) {
line = process(line);
writer.write(line);
writer.newLine();
line = reader.readLine();
}
}
}
});
You can use this regex for splitting: \\r?\\n.
String[] splits = inputRow.split("\\r?\\n");
Why pushing everything into a single string? Just read them line by line; and push those lines into a List right there:
List<String> inputRows = new ArrayList<>();
...
and within your callback you use a BufferedReader like this:
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
while ((line = reader.readLine()) != null) {
inputRows.add(line);
}

Buffered Reader is not reading my whole file java

BufferedReader reader = new BufferedReader(new FileReader("C:\\Users\\NormenYu\\Desktop\\Programming\\Java\\eclipse\\Book\\"+thebook+".txt"));
String line = reader.readLine();System.out.println(line);
My File:
(tab)You are on a hiking trip with your friend(also lives with you in a rented apartment). You suddenly find yourself walking into a jungle. As you walk, you suddenly find yourself very lonely. “Help!”, you heard.
(enter)(tab)“What was that,” you ask your friend. There is no reply. Wait... where is your friend? You start to find your way back, and suddenly you find your friend stuck in quicksand.
Do you: Walk towards your friend and try to save him or Stay away because you might also get stuck in quicksand
The program prints: You are on a hiking trip with your friend(also lives with you in a rented apartment). You suddenly find yourself walking into a jungle. As you walk, you suddenly find yourself very lonely. “Help!”, you heard.
HELP!! By the way, the things in parentheses are not written in the notepad.
Using a loop you can read each line in the file.
BufferedReader reader = new BufferedReader(new FileReader("C:\\Users\\NormenYu\\Desktop\\Programming\\Java\\eclipse\\Book\\"+thebook+".txt"));
String line;
while((line = reader.readLine()) != null) {
System.out.println(line);
}
reader.close()
You are only reading in one line with the method readLine. You need to loop over the file until you reach the end. Something like this:
BufferedReader in = new BufferedReader(new FileReader(file));
while (in.ready()) {
String s = in.readLine();
System.out.println(s);
}
in.close();
BufferedReader reader = new BufferedReader(new FileReader("C:\\Users\\NormenYu\\Desktop\\Programming\\Java\\eclipse\\Book\\"+thebook+".txt"));
String full = "";
String line;
while ((line = reader .readLine()) != null) {
full += line;
}
// full now contains the whole content of your file.

Read large text file in java, infeasible?

I'm using the following method to read a file into a JTextArea:
public void readFile(File file) throws java.io.FileNotFoundException,
java.io.IOException {
if(file == null) return;
jTextArea1.setText("");
try(BufferedReader reader = new BufferedReader(new FileReader(file))){
String line = "";
while((line=reader.readLine())!=null){
jTextArea.append(line + "\n");
}
}
}
It works OK with a normal-sized file (a few hundred kilobytes), but when I tested a 30000-line file of 42 MB that Notepad can open in about 5 seconds, my file reader took forever. I couldn't wait for it to finish; I had waited for about 15-20 minutes and it was still working consuming 30% of my CPU usage.
Could you please give me a solution for this? I'm handling with text files only, not binary files, and all I know is using BufferedReader is the best.
The problem is likely not in the file reading but the processing. Repeated calls to append are likely to be very inefficient with large datasets.
Consider using a StringBuilder. This class is designed for quickly creating long strings from parts (on a single thread; see StringBuffer for a multi-threaded counterpart).
if(file == null) return;
StringBuilder sb = new StringBuilder();
jTextArea1.setText("");
try(BufferedReader reader = new BufferedReader(new FileReader(file))){
String line = "";
while((line==reader.readLine())!=null){
sb.append(line);
sb.append('\n');
}
jTextArea1.setText(sb.toString());
}
As suggested in the comments, you may wish to perform this action in a new thread so the user doesn't think your program has frozen.

Reading every 10 lines using a BufferedReader

Is there a way of reading, say, every 10 lines from a .txt file using a BufferedReader? At the moment my BufferedReader is reading every line, splitting the different values and storing them in an array list; which is then used elsewhere in my program.
Use LineNumberReader which is intended for this very purpose:
LineNumberReader reader = new LineNumberReader(fileReader);
ArrayList<String> goodLines = new ArrayList<String>();
String line = null;
while ((line = reader.readLine()) != null) {
if ((reader.getLineNumber()+1) % 10 == 0) {
goodLines.add(line);
}
}
Use a loop to read all the lines you don't want, then read the line you do want.
BufferedReader br = new BufferedReader(new FileReader(file));
int index = 10;
while (lineNumber < index - 1)
{
lineNumber++;
br.readLine();
}
String lineYouWant = br.readLine();
if (lineYouWant.isEmpty()) br.close();
// Do stuff with lineYouWant
br.close();
Since all of your lines are the same size you could look at the skip() method in the BufferedReader. You would basically read a line and then skip 10 * lineSize and read the next line, etc...
The purpose of a buffered reader is to make reading logical units like lines easy. Reading multiple lines would complicate your code and not provide a great performance boost since the buffered reader is already reading large blocks of data into its buffer.
Edit: Since your records are fixed size you could use a lower level reader and just read the amount of bytes required.

Reading CSV file using BufferedReader resulting in reading alternative lines

I'm trying to read a csv file from my java code. using the following piece of code:
public void readFile() throws IOException {
BufferedReader br = new BufferedReader(new FileReader(fileName));
lines = new ArrayList<String>();
String newLine;
while ((newLine = br.readLine()) != null) {
newLine = br.readLine();
System.out.println(newLine);
lines.add(newLine);
}
br.close();
}
The output I get from the above piece of code is every alternative line [2nd, 4th, 6th lines] is read and returned by the readLine() method. I'm not sure why this behavior exists. Please correct me if I am missing something while reading the csv file.
The first time you're reading the line without processing it in the while loop, then you're reading it again but this time you're processing it. readLine() method reads a line and displaces the reader-pointer to the next line in the file. Hence, every time you use this method, the pointer will be incremented by one pointing to the next line.
This:
while ((newLine = br.readLine()) != null) {
newLine = br.readLine();
System.out.println(newLine);
lines.add(newLine);
}
Should be changed to this:
while ((newLine = br.readLine()) != null) {
System.out.println(newLine);
lines.add(newLine);
}
Hence reading a line and processing it, without reading another line and then processing.
You need to remove the first line in a loop body
newLine = br.readLine();
In java 8, we can easily achieve it
InputStream is = new ByteArrayInputStream(byteArr);
BufferedReader br = new BufferedReader(new InputStreamReader(is));
List<List<String>> dataList = br.lines()
.map(k -> Arrays.asList(k.split(",")))
.collect(Collectors.toCollection(LinkedList::new));
outer list will have rows and inner list will have corresponding column values

Categories

Resources