I have 13 .xlsx files with about 1000 rows in each of them. Now I want to merge it to one .xlsx file with one sheet. I'm using code from here
https://blog.sodhanalibrary.com/2014/11/merge-excel-files-using-java.html#.Vi9ns36rSUk.
Here's my code (few changes, addSheet method unchanged)
try {
FileInputStream excellFile1 = new FileInputStream(new File("tmp_testOut1000.xlsx"));
XSSFWorkbook workbook1 = new XSSFWorkbook(excellFile1);
XSSFSheet sheet1 = workbook1.getSheetAt(0);
for(int i = 2; i < 14; i++){
FileInputStream excellFile2 = new FileInputStream(new File("tmp_testOut" + i + "000.xlsx"));
XSSFWorkbook workbook2 = new XSSFWorkbook(excellFile2);
XSSFSheet sheet2 = workbook2.getSheetAt(0);
System.out.println("add " + i);
addSheet(sheet1, sheet2);
}
excellFile1.close();
// save merged file
System.out.println("merging");
File mergedFile = new File("merged.xlsx");
if (!mergedFile.exists()) {
mergedFile.createNewFile();
}
FileOutputStream out = new FileOutputStream(mergedFile);
System.out.println("write");
workbook1.write(out);
out.close();
System.out.println("Files were merged succussfully");
} catch (Exception e) {
e.printStackTrace();
}
All files are loading and merging but after "write" sysout I'm getting
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.xmlbeans.impl.store.Xobj.new_cursor(Xobj.java:1829)
at org.apache.xmlbeans.impl.values.XmlObjectBase.newCursor(XmlObjectBase.java:293)
at org.apache.xmlbeans.impl.values.XmlComplexContentImpl.arraySetterHelper(XmlComplexContentImpl.java:1151)
at org.openxmlformats.schemas.spreadsheetml.x2006.main.impl.CTFontsImpl.setFontArray(Unknown Source)
at org.apache.poi.xssf.model.StylesTable.writeTo(StylesTable.java:424)
at org.apache.poi.xssf.model.StylesTable.commit(StylesTable.java:496)
at org.apache.poi.POIXMLDocumentPart.onSave(POIXMLDocumentPart.java:341)
at org.apache.poi.POIXMLDocumentPart.onSave(POIXMLDocumentPart.java:345)
at org.apache.poi.POIXMLDocument.write(POIXMLDocument.java:206)
at Start.main(Start.java:275)
What can I do? Why is this happening and how to prevent it?
POI is notoriously memory-hungry, so running out of memory is not uncommon when handling large Excel-files.
When you are able to load all original files and only get trouble writing the merged file you could try using an SXSSFWorkbook instead of an XSSFWorkbook and do regular flushes after adding a certain amount of content (see poi-documentation of the org.apache.poi.xssf.streaming-package). This way you will not have to keep the whole generated file in memory but only small portions.
Try allocating more memory eg.
java -Xmx8192m
Also what you can try is to merge in one xlsx file at a time instead of loading them all at once.
You can also move this line into your for loop:
excellFile1.close();
So you close it right away.
This issue occurs due to the below reason
The java.lang.OutOfMemoryError: GC overhead limit exceeded error is the JVM’s way of signalling that your application spends too much time doing garbage collection with too little result. By default the JVM is configured to throw this error if it spends more than 98% of the total time doing GC and when after the GC only less than 2% of the heap is recovered.
if you just want to neglect this issue you can set the following vm options:
-XX:-UseGCOverheadLimit
Refer link on GC overhead for more information.
You can also use the below switches to assign more heap memory to your application. Run a pilot on your application for some time and identify how much memory would be better for your application
-Xms128m -Xmx512m(these switches sets the initial heap memory size to 128mb and Max memory to 512mb)
If you can avoid using the convenient but memory hungry workbook APIs, work instead with the streaming logic of processing data row by row, which is much more memory efficient.
In particular, pay particular attention to the usage of the:
XSSFReader.SheetIterator for looping over the sheets.
And finally take a good look at the usage of the API: XSSFSheetXMLHandler.
For processing the rows withing a sheet.
See the code on this project:
https://github.com/jeevatkm/excelReader/blob/master/src/main/java/com/myjeeva/poi/ExcelReader.java
You define how you want to process each row by creating your own:
new SheetContentsHandler....
This is quite like SAX parsing, it will not take a bit at your ram.
private void readSheet(StylesTable styles, ReadOnlySharedStringsTable sharedStringsTable,
InputStream sheetInputStream) throws IOException, ParserConfigurationException, SAXException {
SAXParserFactory saxFactory = SAXParserFactory.newInstance();
XMLReader sheetParser = saxFactory.newSAXParser().getXMLReader();
ContentHandler handler =
new XSSFSheetXMLHandler(styles, sharedStringsTable, sheetContentsHandler, true);
sheetParser.setContentHandler(handler);
sheetParser.parse(new InputSource(sheetInputStream));
}
Related
i am trying to instanciate the FileReader giving it a String path to a file inside a while Loop
the problem is that after 30 files, i am getting the error
java.lang.OutOfMemoryError: Java heap space
the path is dynamically given to the fileReaders inside the loop :
DocumentBuilder doc= DocumentBuilderFactory.newInstance().newDocumentBuilder();
while (Index < Events.size() ) {
String FileName = getFileName(canal,Date,time);
Doc =doc.parse(new File(Deposit+File.separator+FileName));
FileReader fR = new FileReader(Deposit+File.separator+FileName);
PlaylistString =FileCopyUtils.copyToString(fR);
if(Doc!=null && PlaylistString.contains(some_text)){
// Some treatment **********
}
}
here the FileName changes for each iteration
i want to find a way to avoid having the out of memory error
i tried fR.close() ans System.gc() but i still have the error
does anyone have an idea ?
Someone is holding a reference to the FileReader objects or to another object. (BTW: You have to close the FileReader, look for try-with-resource for easier handling) Its not necessarily the case that the FileReader is the problem it could also be another object which is retained (referenced) inside your program.
Try to use a profiler like VisualVM (or YourKit, is not free) to determine memory leaks.
I am trying to create an mxgraph and a image from the created mxgraph in JAVA. Below is the code to create the image from mxgraph.
BufferedImage image = mxCellRenderer.createBufferedImage(graph,
null, 1, Color.WHITE, graphComponent.isAntiAlias(), null,
graphComponent.getCanvas());
// Creates the URL-encoded XML data
mxCodec codec = new mxCodec();
String xml = URLEncoder.encode(mxXmlUtils.getXml(codec.encode(graph.getModel())), "UTF-8");
mxPngEncodeParam param = mxPngEncodeParam.getDefaultEncodeParam(image);
param.setCompressedText(new String[] { "mxGraphModel", xml });
//Saves as a PNG file
outputStream = new FileOutputStream(new File("graph.jpg"));
ImageIO.write(image, "jpg", outputStream);
outputStream.close();
image = null;
I am using hierarchical layout in the graph.
But I am getting the out of memory error on creating the image for larger graph.
How can i get rid of this memory issue (apart from increasing the heap size)?
Is there any other alternate way to solve this problem (apart from increasing the heap size)?
See this post here:
http://forum.jgraph.com/questions/5408/save-as-png-detect-out-of-memory
especially the bottom part. There's a check in JGraphX which determines if there's enough memory. That one is wrong. There may not be enough memory because the GC hasn't run yet. If the GC runs, then memory would be freed and the createBufferedImage method could be successful. So instead of checking for the free memory, the memory should have just been allocated in a try { ... } catch( Error err} { ... } block.
I need to know how applications like Bairtail or Baregrep can handle such large text files without having a huge foot print?
I am trying to do something similar in Java as in question:
Viewing large log files in JavaFX in a ListView
But when I handle large text log files (900Mb up to 2.5Gb of text) I am running into issues. The JVM memory size increase dramatically when I read the text files.
One other way was to only retrieve the lines that I am interested in. but I am not aware of any technology to do this in java. I have to start reading line by line until I get to the required line that I want (let’s say line 1000) and then grab hold of that text. But in doing so I have 999 lines in memory which is waiting to be GC’d.
Bairgrep for instance is scanning multiple files in a folder and look for a pattern. If I open the task manager I can hardly see that the memory footprint is growing. What type of technology or way of scanning is these programs using.
Is there a technology out there that I can use in my application to handle large text files?
I might add that my log file is files that is generated by a java application and the length of each line is not the same.
One correction... with memory footprint I mean I cannot read a 6Gb file in memory. Event if I specify the VM size with -Xmx to be small. The application is running out of memory when reading the 6Gb file.
Added 2 ways I tried to get the text from the 758 Mb Log file
Method 1
#FXML
private void handleButtonAction(ActionEvent event) {
final String fileName = "D:/Development/Logs/File1.log";
try {
BufferedReader in = new BufferedReader(new FileReader(fileName));
while (in.ready()) {
String s = in.readLine();
}
in.close();
} catch (Exception e) {
e.printStackTrace();
}
}
Method 2
#FXML
private void handleButtonAction(ActionEvent event) {
final String fileName = "D:/Development/Logs/File1.log";
Scanner scan = null;
try {
File file = new File(fileName);
if (!file.exists()) {
return;
}
scan = new Scanner(file);
long start = System.nanoTime();
while (scan.hasNextLine()) {
final String line = scan.nextLine();
}
} catch (Exception e) {
e.printStackTrace();
} finally {
scan.close();
}
}
I think "MemoryMappedFile" is what you are looking for.
I found some links to help you:
http://www.linuxtopia.org/online_books/programming_books/thinking_in_java/TIJ314_029.htm
http://javarevisited.blogspot.de/2012/01/memorymapped-file-and-io-in-java.html
Both the applications you mentioned, might "handle" large files, but they don't actually need to load entire files into memory. The first one sounds like it might seek directly to the end of the file, while the second operates on a line by line basis.
It is possible they are using native code via JNI to achieve the low memory use.
Edit: Infact, they look to be purely C or C++ applications, they don't need to wait for GC like Java applications do.
I,m trying to export some files from a system and save it in my drive, the problem is that some files are pretty big and I get the java out of memory error.
FileOutputStream fileoutstream = new FileOutputStream(filenameExtension);
fileoutstream.write(dataManagement.getContent(0).getData());
fileoutstream.flush();
fileoutstream.close();
Any recomendation that I can try, I add the flush but now diference, this will call the export method, generate the file and saved. I,m using a cursos to run over the data that I,m exporting not an array, I try to add more memory but the files are too big.
You are loading the whole file in memory before writing it. On the contrary you should:
load only a chunk of data
write it
repeat the steps above until you have processed all data.
If the files are really big, you may need to read/write them in chunks. If the files are big enough to fit in memory, then you can increase the size of the virtual machine memory.
i.e:
java -Xmx512M ...
FileInputStream fi = infile;
FileOutputStream fo = outfile
byte[] buffer = new byte[5000];
int n;
while((n = fi.read(buffer)) > 0)
fo.write(b, 0, n);
Hope this helps to get the idea.
you can use the spring batch framework to do the reading and writing the file in chunk size.
http://static.springsource.org/spring-batch/
I have the code to copy a file to another location.
public static void copyFile(String sourceDest, String newDest) throws IOException {
File sourceFile = new File(sourceDest);
File destFile = new File(newDest);
if (!destFile.exists()) {
destFile.createNewFile();
}
FileChannel source = null;
FileChannel destination = null;
try {
source = new FileInputStream(sourceFile).getChannel();
destination = new FileOutputStream(destFile).getChannel();
destination.transferFrom(source, 0, source.size());
} finally {
if (source != null) {
source.close();
}
if (destination != null) {
destination.close();
}
}
}
}
While copying small chunks, say, 300-400 Mb, everything works like magic. But when I tried to copy a file a size of 1.5 Gb it failed. The stack is:
run:
12.01.2011 11:16:36 FileCopier main
SEVERE: Exception occured while copying file. Try again.
java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:748)
at sun.nio.ch.FileChannelImpl.transferFromFileChannel(FileChannelImpl.java:527)
at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:590)
at FileCopier.copyFile(FileCopier.java:64)
at FileCopier.main(FileCopier.java:27)
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:745)
... 4 more
BUILD SUCCESSFUL (total time: 0 seconds)
I haven't worked with NIO closely. Could you please help me out? Thank you so much in advance.
I think you might have been hit by an old bug which already encountered some time ago. I was not trying to copy a file but rather to seek through an memory-mapped file which failed as well. For me the workaround is to seek through the file in a loop and request the GC and finalizers to run every now and then.
The memory-mapped ByteBuffers release their mapping in the finalizer and make room for new mappings. This is very ugly, but at least it works. Let's hope they did something about this in the coming NIO iteration.
You are memory mapping a file but there is limited memory address space in a 32-bit JVM (which I presume you are using) so the map method is failing. I don't think you can map more than 1.3-1.4 GB of disk data. What heap size are you using?
You can try reducing your heap size or use a 64-bit JRE. Alternatively, don't read the file by mapping it to memory using NIO. Instead, use the traditional way of a buffered reader and writer to read and write data from one file to the other.