How to split the my flowfile content based on '\n'? - java

i have try to create sample custom processor for read lines and made some changes in input lines then process into flowfile.
This is my code to read flowfile.
String inputRow;
session.read(flowFile, new InputStreamCallback() {
#Override
public void process(InputStream in) throws IOException {
inputRow = IOUtils.toString(in);
}
});
observed that code from below reference.
http://www.nifi.rocks/developing-a-custom-apache-nifi-processor-json/
After read lines i can't able split those lines based on LineFeed character.
upstream connection for my processor yields below my sample input.
My Sample input line:
No,Name,value
1,Si,21
2,LI,321
3,Ji,11
Above lines can able to stored in "inputRow".
But i have using below code to split it based on '\n'.
String[] splits=inputRow.split("\n");
i have tried '\n' and '\r\n' to split those lines but it's not worked.
Any one please guide me to split those lines as below expected output.
splits[0]=No,Name,value
splits[1]=1,Si,21
splits[2]=2,LI,321
splits[3]=3,Ji,11
Any help appreciated.

As mentioned in another answer, you should be able to use a BufferedReader to read line-by-line. You should also avoid loading the entire contents of the flow file into memory whenever possible.
Imagine that this NiFi processor is processing 1GB CSV files and that there could be 2-3 files processed concurrently. If you read the whole flow file content into memory, you will hit out-of-memory if you have less than 3GB of heap allocated to the JVM. If you stream each file line-by-line you would only have 2-3 lines in memory at one time and would need very little overall memory.
The following snippet shows how you could read in a line, process it, and write it out, without ever having the whole content in memory:
flowFile = session.write(flowFile, new StreamCallback() {
#Override
public void process(InputStream in, OutputStream out) throws IOException {
try (InputStreamReader inReader = new InputStreamReader(in);
BufferedReader reader = new BufferedReader(inReader);
OutputStreamWriter outWriter = new OutputStreamWriter(out);
BufferedWriter writer = new BufferedWriter(outWriter)) {
String line = reader.readLine();
while (line != null) {
line = process(line);
writer.write(line);
writer.newLine();
line = reader.readLine();
}
}
}
});

You can use this regex for splitting: \\r?\\n.
String[] splits = inputRow.split("\\r?\\n");

Why pushing everything into a single string? Just read them line by line; and push those lines into a List right there:
List<String> inputRows = new ArrayList<>();
...
and within your callback you use a BufferedReader like this:
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
while ((line = reader.readLine()) != null) {
inputRows.add(line);
}

Related

Keep new lines when reading in a file

I'm trying to read in a file and modify the text, but I need to keep new lines when doing so. For example, if I were to read in a file that contained:
This is some text.
This is some more text.
It would just read in as
This is some text.This is some more text.
How do I keep that space? I think it has something to do with the /n escape character. I've seen using BufferReader and FileReader, but we haven't learned that in my class yet, so is there another way? What I've tried is something like this:
if (ch == 10)
{
ch = '\n';
fileOut.print(ch);
}
10 is the ASCII table code for a new line, so I thought Java could recognize it as that, but it doesn't.
In Java 8:
You can read lines using:
List<String> yourFileLines = Files.readAllLines(Paths.get("your_file"));
Then collect strings:
String collect = yourFileLines.stream().filter(StringUtils::isNotBlank).collect(Collectors.joining(" "));
The problem is that you (possibly) want to read your file a line at a time, and then you want to write it back a line at a time (keeping empty lines).
The following source does that, it reads the input file one line at a time, and writes it back one line at a time (keeping empty lines).
The only problem is ... it possibly changes the new line, maybe you are reading a unix file and write a dos file or vice-versa depending on the system you are running in and the source type of the file you a reading.
Keeping the original newline can introduce a lot complexity, read BufferedReader and PrintWriter api docs for more information.
public void process(File input , File output){
try(InputStream in = new FileInputStream(input);
OutputStream out = new FileOutputStream(output)){
BufferedReader reader = new BufferedReader(new InputStreamReader(in, "utf-8"),true);
PrintWriter writer = new PrintWriter( new OutputStreamWriter(out,"utf-8"));
String line=null;
while((line=reader.readLine())!=null){
String processed = proces(line);
writer.println(processed);
}
} catch (IOException e) {
// Some exception management
}
}
public String proces(String line){
return line;
}
/n should be \n
if (ch == 10)
{
ch = '\n';
fileOut.print(ch);
}
Is that a typo?
ch = '/n';
otherwise use
ch = '\n';

Java replace line in a text file

I found this code from another question
private void updateLine(String toUpdate, String updated) throws IOException {
BufferedReader file = new BufferedReader(new FileReader(data));
String line;
String input = "";
while ((line = file.readLine()) != null)
input += line + "\n";
input = input.replace(toUpdate, updated);
FileOutputStream os = new FileOutputStream(data);
os.write(input.getBytes());
file.close();
os.close();
}
This is my file before I replace some lines
example1
example2
example3
But when I replace a line, the file now looks like this
example1example2example3
Which makes it impossible to read the file when there are a lot of lines in it.
How would I go about editing the code above to make my file look what it looked like at the start?
Use System.lineSeparator() instead of \n.
while ((line = file.readLine()) != null)
input += line + System.lineSeparator();
The issue is that on Unix systems, the line separator is \n while on Windows systems, it's \r\n.
In Java versions older then Java 7, you would have to use System.getProperty("line.separator") instead.
As pointed out in the comments, if you have concerns about memory usage, it would be wise to not store the entire output in a variable, but write it out line-by-line in the loop that you're using to process the input.
If you read and modify line by line this has the advantage, that you dont need to fit the whole file in memory. Not sure if this is possible in your case, but it is generally a good thing to aim for streaming. In your case this would in addition remove the need for concatenate the string and you don't need to select a line terminator, because you can write each single transformed line with println(). It requires to write to a different file, which is generally a good thing as it is crash safe. You would lose data if you rewrite a file and get aborted.
private void updateLine(String toUpdate, String updated) throws IOException {
BufferedReader file = new BufferedReader(new FileReader(data));
PrintWriter writer = new PrintWriter(new File(data+".out"), "UTF-8");
String line;
while ((line = file.readLine()) != null)
{
line = line.replace(toUpdate, updated);
writer.println(line);
}
file.close();
if (writer.checkError())
throw new IOException("cannot write");
writer.close();
}
In this case, it assumes that you need to do the replace only on complete lines, not multiple lines. I also added an explicit encoding and use a writer, as you have a string to output.
This is because you use OutputStream which is better for handling binary data. Try using PrintWriter and don't add any line terminator at the end of the lines. Example is here

How to overcome out of memory exception with PrintWriter?

The following code reads a bunch of .csv files and then combines them into one .csv file. I tried to system.out.println ... all datapoints are correct, however when i try to use the PrintWriter I get:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space.
I tried to use FileWriter but got the same error. How should I correct my code?
public class CombineCsv {
public static void main(String[] args) throws IOException {
PrintWriter output = new PrintWriter("C:\\User\\result.csv");
final File file = new File("C:\\Users\\is");
int i = 0;
for (final File child: file.listFiles()) {
BufferedReader CSVFile = new BufferedReader( new FileReader( "C:\\Users\\is\\"+child.getName()));
String dataRow = CSVFile.readLine();
while (dataRow != null) {
String[] dataArray = dataRow.split(",");
for (String item:dataArray) {
System.out.println(item + "\t");
output.append(item+","+child.getName().replaceAll(".csv", "")+",");
i++;
}
dataRow = CSVFile.readLine(); // Read next line of data.
} // Close the file once all data has been read.
CSVFile.close();
}
output.close();
System.out.println(i);
}
}
I can only think of two scenarios in which that code could result in an OOME:
If the file directory has a very large number of elements, then file.listFiles() could create a very large array of File objects.
If one of the input files includes a line that is very long, then CSVFile.readLine() could use a lot of memory in the process of reading it. (Up to 6 times the number of bytes in the line.)
The simplest approach to solving both of these issues is to increase the Java heap size using the -Xmx JVM option.
I can see no reason why your use of a PrintWriter would be the cause of the problem.
Try
boolean autoFlush = true;
PrintWriter output = new PrintWriter(myFileName, autoFlush);
It creates a PrintWriter instance which flushes content everytime when there is a new line or format.

Read large text file in java, infeasible?

I'm using the following method to read a file into a JTextArea:
public void readFile(File file) throws java.io.FileNotFoundException,
java.io.IOException {
if(file == null) return;
jTextArea1.setText("");
try(BufferedReader reader = new BufferedReader(new FileReader(file))){
String line = "";
while((line=reader.readLine())!=null){
jTextArea.append(line + "\n");
}
}
}
It works OK with a normal-sized file (a few hundred kilobytes), but when I tested a 30000-line file of 42 MB that Notepad can open in about 5 seconds, my file reader took forever. I couldn't wait for it to finish; I had waited for about 15-20 minutes and it was still working consuming 30% of my CPU usage.
Could you please give me a solution for this? I'm handling with text files only, not binary files, and all I know is using BufferedReader is the best.
The problem is likely not in the file reading but the processing. Repeated calls to append are likely to be very inefficient with large datasets.
Consider using a StringBuilder. This class is designed for quickly creating long strings from parts (on a single thread; see StringBuffer for a multi-threaded counterpart).
if(file == null) return;
StringBuilder sb = new StringBuilder();
jTextArea1.setText("");
try(BufferedReader reader = new BufferedReader(new FileReader(file))){
String line = "";
while((line==reader.readLine())!=null){
sb.append(line);
sb.append('\n');
}
jTextArea1.setText(sb.toString());
}
As suggested in the comments, you may wish to perform this action in a new thread so the user doesn't think your program has frozen.

Reading group of lines in HUGE files

I have no idea how to do the following: I want to process a really huge textfile (almost 5 gigabytes). Since I cannot copy the file into temporarily memory, I thought of reading the first 500 lines (or as many as fit into the memory, I am not sure about that yet), do something with them, then go on to the next 500 until I am done with the whole file.
Could you post an example of the "loop" or command that you need for that? Because all the ways I tried resulted in starting from the beginning again but I want to go on after finishing the previous 500 lines.
Help appreciated.
BufferedReader br = new BufferedReader(new FileReader(file));
String line = null;
ArrayList<String> allLines = new ArrayList<String>();
while((line = br.readLine()) != null) {
allLines.add(line);
if (allLines.size() > 500) {
processLines(allLines);
allLines.clear();
}
}
processLines(allLines);
Ok so you indicated in a comment above that you only want to keep certain lines, writing them to a new file based on certain logic. You can read in one line at a time, decide whether to keep it and if so write it to the new file. This approach will use very little memory since you are only holding one line at a time in memory. Here is one way to do that:
BufferedReader br = new BufferedReader(new FileReader(file));
String lineRead = null;
FileWriter fw = new FileWriter(new File("newfile.txt"), false);
while((lineRead = br.readLine()) != null)
{
if (true) // put your test conditions here
{
fw.write(lineRead);
fw.flush();
}
}
fw.close();
br.close();

Categories

Resources