The following code reads a bunch of .csv files and then combines them into one .csv file. I tried to system.out.println ... all datapoints are correct, however when i try to use the PrintWriter I get:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space.
I tried to use FileWriter but got the same error. How should I correct my code?
public class CombineCsv {
public static void main(String[] args) throws IOException {
PrintWriter output = new PrintWriter("C:\\User\\result.csv");
final File file = new File("C:\\Users\\is");
int i = 0;
for (final File child: file.listFiles()) {
BufferedReader CSVFile = new BufferedReader( new FileReader( "C:\\Users\\is\\"+child.getName()));
String dataRow = CSVFile.readLine();
while (dataRow != null) {
String[] dataArray = dataRow.split(",");
for (String item:dataArray) {
System.out.println(item + "\t");
output.append(item+","+child.getName().replaceAll(".csv", "")+",");
i++;
}
dataRow = CSVFile.readLine(); // Read next line of data.
} // Close the file once all data has been read.
CSVFile.close();
}
output.close();
System.out.println(i);
}
}
I can only think of two scenarios in which that code could result in an OOME:
If the file directory has a very large number of elements, then file.listFiles() could create a very large array of File objects.
If one of the input files includes a line that is very long, then CSVFile.readLine() could use a lot of memory in the process of reading it. (Up to 6 times the number of bytes in the line.)
The simplest approach to solving both of these issues is to increase the Java heap size using the -Xmx JVM option.
I can see no reason why your use of a PrintWriter would be the cause of the problem.
Try
boolean autoFlush = true;
PrintWriter output = new PrintWriter(myFileName, autoFlush);
It creates a PrintWriter instance which flushes content everytime when there is a new line or format.
Related
This question already has answers here:
How do I sort very large files
(10 answers)
Closed 1 year ago.
I'm having a problem working with a 1.3 Gb CSV file (it contains 3 million rows). The problem is that I want to sort the file according to a field called "Timestamp" and I can't split the file into multiple reads because otherwise the sorting won't work properly. I get the following error at one point :
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
This is my code:
public class createCSV {
public static BufferedReader br = null;
public static String csvFile = "/Scrivania/dataset";
public static String newcsvFile = "/Scrivania/ordinatedataset";
public static String extFile = ".csv";
public static void main(String[] args) {
try {
List<List<String>> csvLines = new ArrayList<>();
br = new BufferedReader(new FileReader(csvFile+extFile));
CSVWriter writer = new CSVWriter(new FileWriter(newcsvFile+extFile));
String line = br.readLine();
String[] fields = line.split(",");
writer.writeNext(fields);
line = br.readLine();
while(line!=null) {
csvLines.add(Arrays.asList(line.split(",")));
line = br.readLine();
}
csvLines.sort(new Comparator<List<String>>() {
#Override
public int compare(List<String> o1, List<String> o2) {
return o1.get(8).compareTo(o2.get(8));
}
});
for(List<String>lin:csvLines){
writer.writeNext(lin.toArray(new String[0]));
}
writer.close();
}catch(IOException e) {
e.printStackTrace();
}
}
}
I have tried increasing the heap size to the maximum, 2048, in particular: -Xms512M -Xmx2048M in Run->Run Configuratins but it still gives me an error. How could I solve and sort the whole file? Thank you in advance
The approach of reading file with FileReader will keep data of file in-memory this leads to exhaustion of memory. What you need is streaming through file. You can achieve this with Scanner class of Apache commons library.
With Scanner:
List<List<String>> csvLines = new ArrayList<>();
FileInputStream inputStream = null;
Scanner sc = null;
try {
inputStream = new FileInputStream(path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
csvLines.add(Arrays.asList(line.split(",")));
}
// note that Scanner suppresses exceptions
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}
Apache Commons:
LineIterator it = FileUtils.lineIterator(theFile, "UTF-8");
try {
while (it.hasNext()) {
String line = it.nextLine();
// do something with line
}
} finally {
LineIterator.closeQuietly(it);
}
Hopefully you can find an existing library that will do this for you, or use a command line tool called from Java to do this instead. If you need to code this yourself, here's a suggestion as to a pretty simple approach you might code up...
There's a simple general approach to sorting a large file like this. I call it a "shard sort". Here's what you do:
Pick a number N that is the number of shards you'll have and a function that will produce a value between 1 and N for each input entry such that you get roughly the same number of entries in each shard. For example, you could choose N to be 10, and you could use the seconds part of your timestamp and have the shard id be id = seconds % 10. This should "randomly" spread your entries across the 10 shards.
Now open the input file and 10 output files, one for each shard. Read each entry from the input file, compute its shard id, and write it to the file for that shard id.
Now read each shard file into memory, sort it bases on on each entry's timestamp, and write it back out to file. For this example, this will take 10% of the memory needed to sort the whole file.
Now open the 10 shard files for reading and a new result file to contain the final result. Read in the next entry in all 10 input files. Write out the earliest entry timestamp-wise of those 10 entries to the output file. When you write out a value, you read a new one from the shard file it came from. Repeat this process this until all the shard files are empty and all entries in memory have been written.
If your file is so big that 10 shards isn't enough, use more. You could, for example, use 60 shard files and use the entire seconds value from your timestamp for the shard id.
I am looking at the performance of a particular stock over a period of 30 days. I downloaded the data from Yahoo finance which appears as a CSV file. If I would like to create a new column in my dataset to show the daily percentage change between open and close using java refer to column H as where the output should appear, how should I do so?
Thanks in advance!
You can just edit your file line-by-line and add a separator character using the concat() function.
Open the file with a FileReader or BufferedReader and then start parsing.
Official Javadoc: https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#concat(java.lang.String)
Please see also this.
With OpenCSV: http://opencsv.sourceforge.net/
public class OpenCSVTest {
public static void main(String[] args) {
StringWriter target = new StringWriter();
try(CSVReader reader = new CSVReader(new StringReader("my,test"));
CSVWriter writer = new CSVWriter(target);) {
for(String[] line : reader) {
String[] added = Arrays.copyOf(line, line.length + 1);
added[added.length-1] = "addition";
writer.writeNext(added);
}
} catch (IOException e) {
e.printStackTrace();
}
System.out.println(target.toString());
}
}
You can replace the StringReader and StringWriter with your input and output. The reader will then iterate through each line in your csv for you. You can make a copy of the original line and add your new column and write it out into the target.
Breakdown of the code:
try(CSVReader reader = new CSVReader(new StringReader("my,test"));
CSVWriter writer = new CSVWriter(target);) { ... }
This is a try-with-resources block. It makes sure that the reader and writer are closed after they are done.
for(String[] line : reader) {
String[] added = Arrays.copyOf(line, line.length + 1);
added[added.length-1] = "addition";
writer.writeNext(added);
}
This is a standard for loop. The CSVReader implements the Iterable interface, which enables us to use it in the for loop directly.
Arrays.copyOf(line, line.length + 1); is a function that creates a copy of the array passed to it, but with a new size. Because you want to add a column, we make a copy of the original array my,test and add 1 more space at the end of it, where we can then assign the new value to it by doing: added[added.length-1] = "addition";
Finally, we just pass that to the writer which will then correctly write the values into the target, in this case a StringWriter, in your case likely a file.
i have try to create sample custom processor for read lines and made some changes in input lines then process into flowfile.
This is my code to read flowfile.
String inputRow;
session.read(flowFile, new InputStreamCallback() {
#Override
public void process(InputStream in) throws IOException {
inputRow = IOUtils.toString(in);
}
});
observed that code from below reference.
http://www.nifi.rocks/developing-a-custom-apache-nifi-processor-json/
After read lines i can't able split those lines based on LineFeed character.
upstream connection for my processor yields below my sample input.
My Sample input line:
No,Name,value
1,Si,21
2,LI,321
3,Ji,11
Above lines can able to stored in "inputRow".
But i have using below code to split it based on '\n'.
String[] splits=inputRow.split("\n");
i have tried '\n' and '\r\n' to split those lines but it's not worked.
Any one please guide me to split those lines as below expected output.
splits[0]=No,Name,value
splits[1]=1,Si,21
splits[2]=2,LI,321
splits[3]=3,Ji,11
Any help appreciated.
As mentioned in another answer, you should be able to use a BufferedReader to read line-by-line. You should also avoid loading the entire contents of the flow file into memory whenever possible.
Imagine that this NiFi processor is processing 1GB CSV files and that there could be 2-3 files processed concurrently. If you read the whole flow file content into memory, you will hit out-of-memory if you have less than 3GB of heap allocated to the JVM. If you stream each file line-by-line you would only have 2-3 lines in memory at one time and would need very little overall memory.
The following snippet shows how you could read in a line, process it, and write it out, without ever having the whole content in memory:
flowFile = session.write(flowFile, new StreamCallback() {
#Override
public void process(InputStream in, OutputStream out) throws IOException {
try (InputStreamReader inReader = new InputStreamReader(in);
BufferedReader reader = new BufferedReader(inReader);
OutputStreamWriter outWriter = new OutputStreamWriter(out);
BufferedWriter writer = new BufferedWriter(outWriter)) {
String line = reader.readLine();
while (line != null) {
line = process(line);
writer.write(line);
writer.newLine();
line = reader.readLine();
}
}
}
});
You can use this regex for splitting: \\r?\\n.
String[] splits = inputRow.split("\\r?\\n");
Why pushing everything into a single string? Just read them line by line; and push those lines into a List right there:
List<String> inputRows = new ArrayList<>();
...
and within your callback you use a BufferedReader like this:
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
while ((line = reader.readLine()) != null) {
inputRows.add(line);
}
my problem is to read non primes from txt file and write prime factors in same file.
i actually dont know how BufferedReader works.from my understanding i am trying to read the file data to buffer(8kb) and write prime factors to file.(by creating a new one)
class PS_Task2
{
public static void main(String[] args)
{
String line=null;
int x;
try
{
FileReader file2 = new FileReader("nonprimes.txt");
BufferedReader buff2=new BufferedReader(file2);
File file1 = new File("nonprimes.txt");
file1.createNewFile();
PrintWriter d=new PrintWriter(file1);
while((line = buff2.readLine()) != null)
{
x=Integer.parseInt(line);
d.printf ("%d--> ", x);
while(x%2==0)
{
d.flush();
d.print("2"+"*");
x=x/2;
}
for (int i = 3; i <= Math.sqrt(x); i = i+2)
{
while (x%i == 0)
{
d.flush();
d.printf("%d*", i);
x = x/i;
}
}
if (x > 2)
{
d.flush();
d.printf ("%d ", x);
}
d.flush();//FLUSING THE STREAM TO FILE
d.println("\n");
}
d.close(); // CLOSING FILE
}
feel free to give detailed explanation. :D thanks ~anirudh
reading and writing to a file in java doesnt EDIT the file, but clear the old one and creates a new one, you can use many approachesfor example, to get your data, modify it, and either save it on memory in a StringBuilder or a collection or what ever and re-write it again
well i created fileOne.txt containing the following data :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
and i want to multiply all those numbers by 10, then re-write them again :
public static void main(String [] args) throws Exception{ // just for the example
// locate the file
File fileOne = new File("fileOne.txt");
FileReader inputStream = new FileReader(fileOne);
BufferedReader reader = new BufferedReader(inputStream);
// create a LinkedList to hold the data read
List<Integer> numbers = new LinkedList<Integer>();
// prepare variables to refer to the temporary objects
String line = null;
int number = 0;
// start reading
do{
// read each line
line = reader.readLine();
// check if the read data is not null, so not to use null values
if(line != null){
number = Integer.parseInt(line);
numbers.add(number*10);
}
}while(line != null);
// free resources
reader.close();
// check the new numbers before writing to file
System.out.println("NEW NUMBERS IN MEMORY : "+numbers);
// assign a printer
PrintWriter writer = new PrintWriter(fileOne);
// write down data
for(int newNumber : numbers){
writer.println(newNumber);
}
// free resources
writer.flush();
writer.close();
}
this approach is not very good when dealing with massive data
As per your problem statement, you need to take input from a file, do some processing and write back the processed data in the same file. For this, please note the below points:
You may not create a file with same name in a directory, so you must create the new file at some other location; or write the content into different file and later rename it after deleting original one.
While your file is open for reading, modifying the same file is not a good idea. you could use below approach:
Read the content of the file and store in a data structure liek Arrays, ArrayList.
Close the file.
Process the data stored in the data structure.
Open the file in write mode (over-write mode rather than append mode)
Write back the processed data into the file.
I have a text file that I want to edit using Java. It has many thousands of lines. I basically want to iterate through the lines and change/edit/delete some text. This will need to happen quite often.
From the solutions I saw on other sites, the general approach seems to be:
Open the existing file using a BufferedReader
Read each line, make modifications to each line, and add it to a StringBuilder
Once all the text has been read and modified, write the contents of the StringBuilder to a new file
Replace the old file with the new file
This solution seems slightly "hacky" to me, especially if I have thousands of lines in my text file.
Anybody know of a better solution?
I haven't done this in Java recently, but writing an entire file into memory seems like a bad idea.
The best idea that I can come up with is open a temporary file in writing mode at the same time, and for each line, read it, modify if necessary, then write into the temporary file. At the end, delete the original and rename the temporary file.
If you have modify permissions on the file system, you probably also have deleting and renaming permissions.
if the file is just a few thousand lines you should be able to read the entire file in one read and convert that to a String.
You can use apache IOUtils which has method like the following.
public static String readFile(String filename) throws IOException {
File file = new File(filename);
int len = (int) file.length();
byte[] bytes = new byte[len];
FileInputStream fis = null;
try {
fis = new FileInputStream(file);
assert len == fis.read(bytes);
} catch (IOException e) {
close(fis);
throw e;
}
return new String(bytes, "UTF-8");
}
public static void writeFile(String filename, String text) throws IOException {
FileOutputStream fos = null;
try {
fos = new FileOutputStream(filename);
fos.write(text.getBytes("UTF-8"));
} catch (IOException e) {
close(fos);
throw e;
}
}
public static void close(Closeable closeable) {
try {
closeable.close();
} catch(IOException ignored) {
}
}
You can use RandomAccessFile in Java to modify the file on one condition:
The size of each line has to be fixed otherwise, when new string is written back, it might override the string in the next line.
Therefore, in my example, I set the line length as 100 and padding with space string when creating the file and writing back to the file.
So in order to allow update, you need to set the length of line a little larger than the longest length of the line in this file.
public class RandomAccessFileUtil {
public static final long RECORD_LENGTH = 100;
public static final String EMPTY_STRING = " ";
public static final String CRLF = "\n";
public static final String PATHNAME = "/home/mjiang/JM/mahtew.txt";
/**
* one two three
Text to be appended with
five six seven
eight nine ten
*
*
* #param args
* #throws IOException
*/
public static void main(String[] args) throws IOException
{
String starPrefix = "Text to be appended with";
String replacedString = "new text has been appended";
RandomAccessFile file = new RandomAccessFile(new File(PATHNAME), "rw");
String line = "";
while((line = file.readLine()) != null)
{
if(line.startsWith(starPrefix))
{
file.seek(file.getFilePointer() - RECORD_LENGTH - 1);
file.writeBytes(replacedString);
}
}
}
public static void createFile() throws IOException
{
RandomAccessFile file = new RandomAccessFile(new File(PATHNAME), "rw");
String line1 = "one two three";
String line2 = "Text to be appended with";
String line3 = "five six seven";
String line4 = "eight nine ten";
file.writeBytes(paddingRight(line1));
file.writeBytes(CRLF);
file.writeBytes(paddingRight(line2));
file.writeBytes(CRLF);
file.writeBytes(paddingRight(line3));
file.writeBytes(CRLF);
file.writeBytes(paddingRight(line4));
file.writeBytes(CRLF);
file.close();
System.out.println(String.format("File is created in [%s]", PATHNAME));
}
public static String paddingRight(String source)
{
StringBuilder result = new StringBuilder(100);
if(source != null)
{
result.append(source);
for (int i = 0; i < RECORD_LENGTH - source.length(); i++)
{
result.append(EMPTY_STRING);
}
}
return result.toString();
}
}
If the file is large, you might want to use a FileStream for output, but that seems pretty much like it is the simplest process to do what you're asking (and without more specificity i.e. on what types of changes / edits / deletions you're trying to do, it's impossible to determine what more complicated way might work).
No reason to buffer the entire file.
Simply write each line as your read it, insert lines when necessary, delete lines when necessary, replace lines when necessary.
Fundamentally, you will not get around having to recreate the file wholesale, especially if it's just a text file.
What kind of data is it? Do you control the format of the file?
If the file contains name/value pairs (or similar), you could have some luck with Properties, or perhaps cobbling together something using a flat file JDBC driver.
Alternatively, have you considered not writing the data so often? Operating on an in-memory copy of your file should be relatively trivial. If there are no external resources which need real time updates of the file, then there is no need to go to disk every time you want to make a modification. You can run a scheduled task to write periodic updates to disk if you are worried about data backup.
In general you cannot edit the file in place; it's simply a very long sequence of characters, which happens to include newline characters. You could edit in place if your changes don't change the number of characters in each line.
Can't you use regular expressions, if you know what you want to change ? Jakarta Regexp should probably do the trick.
Although this question was a time ago posted, I think it is good to put my answer here.
I think that the best approach is to use FileChannel from java.nio.channels package in this scenario. But this, only if you need to have a good performance! You would need to get a FileChannel via a RandomAccessFile, like this:
java.nio.channels.FileChannel channel = new java.io.RandomAccessFile("/my/fyle/path", "rw").getChannel();
After this, you need a to create a ByteBuffer where you will read from the FileChannel.
this looks something like this:
java.nio.ByteBuffer inBuffer = java.nio.ByteBuffer.allocate(100);
int pos = 0;
int aux = 0;
StringBuilder sb = new StringBuilder();
while (pos != -1) {
aux = channel.read(inBuffer, pos);
pos = (aux != -1) ? pos + aux : -1;
b = inBuffer.array();
sb.delete(0, sb.length());
for (int i = 0; i < b.length; ++i) {
sb.append((char)b[i]);
}
//here you can do your stuff on sb
inBuffer = ByteBuffer.allocate(100);
}
Hope that my answer will help you!
I think, FileOutputStream.getFileChannel() will help a lot, see FileChannel api
http://java.sun.com/javase/6/docs/api/java/nio/channels/FileChannel.html
private static void modifyFile(String filePath, String oldString, String newString) {
File fileToBeModified = new File(filePath);
StringBuilder oldContent = new StringBuilder();
try (BufferedReader reader = new BufferedReader(new FileReader(fileToBeModified))) {
String line = reader.readLine();
while (line != null) {
oldContent.append(line).append(System.lineSeparator());
line = reader.readLine();
}
String content = oldContent.toString();
String newContent = content.replaceAll(oldString, newString);
try (FileWriter writer = new FileWriter(fileToBeModified)) {
writer.write(newContent);
}
} catch (IOException e) {
e.printStackTrace();
}
}
You can change the txt file to java by saving on clicking "Save As" and saving *.java extension.