instantiate FileReader inside a loop without having error java heap space - java

i am trying to instanciate the FileReader giving it a String path to a file inside a while Loop
the problem is that after 30 files, i am getting the error
java.lang.OutOfMemoryError: Java heap space
the path is dynamically given to the fileReaders inside the loop :
DocumentBuilder doc= DocumentBuilderFactory.newInstance().newDocumentBuilder();
while (Index < Events.size() ) {
String FileName = getFileName(canal,Date,time);
Doc =doc.parse(new File(Deposit+File.separator+FileName));
FileReader fR = new FileReader(Deposit+File.separator+FileName);
PlaylistString =FileCopyUtils.copyToString(fR);
if(Doc!=null && PlaylistString.contains(some_text)){
// Some treatment **********
}
}
here the FileName changes for each iteration
i want to find a way to avoid having the out of memory error
i tried fR.close() ans System.gc() but i still have the error
does anyone have an idea ?

Someone is holding a reference to the FileReader objects or to another object. (BTW: You have to close the FileReader, look for try-with-resource for easier handling) Its not necessarily the case that the FileReader is the problem it could also be another object which is retained (referenced) inside your program.
Try to use a profiler like VisualVM (or YourKit, is not free) to determine memory leaks.

Related

how to build a jar with the resource file by intellij 2020 or terminal or maven plugins

Here is my simple app that I am reading a resource file and it works fine:-
public class App {
public static void main(String[] args) throws IOException {
BufferedReader bufferedReader =
new BufferedReader(
new FileReader(
Objects.requireNonNull(
Objects.requireNonNull(
App.class.getClassLoader().getResource("file.txt")
).getFile()
)
)
);
String line = "";
while ((line = bufferedReader.readLine()) != null) {
System.out.println(line);
}
}
}
I want to make a executable jar file with the resource file. I follow this. Unfortunately when I run the jar it can't find. Error:- ReadFileExample.jar!/file.txt (No such file or directory). Actually, I don't need to use IDE if it is easier to do it from the terminal or maven plugins, please let me know how can I add my resource file in the jar either by the IDE or by terminal or any maven plugin ?
new FileReader(
This bit means it will never work. FileReader reads files. And only files. Hence the name. You 'luck' into it working during dev, as the resource is an actual file at that point.
There's good news though. Your code is incredibly complicated and can be made much simpler:
.getClassLoader().getResource(...) is more code AND worse than just .getResource. getCL can return null in exotic cases. Make sure to adjust the parameter; it is now relative to the place your class is in, and you can get back to 'go from root' by putting a slash in front.
Don't use FileREader, obviously. Actually, never use that class.
Your code fails to specify encoding. This is bad; the encoding will thus default to whatever the system you run it on has as default encoding which therefore by definition is not guaranteed to match what you stuck in that jar file. Always be explicit.
These are 25 year old APIs, java has nicer ones these days. Let's use them to try to make this code easier to read.
The requireNonNull calls are useless here; you already get an NPE if you try to pass a null ref to e.g. FileReader's constructor.
Your code opens a resource and doesn't safely close it. This causes you to leak handles, which means your app will soon be 'dead' - it has a bunch of objects still in memory that are holding open OS-level file handles and the OS simply won't give any further handles to your process. Any attempt to do anything that interacts with the OS (make network connections, open files, etcetera) will just flat out not work anymore, until you shut down the VM and start it up again. That's why try-with-resources exists, to protect from this. This problem is hard to catch with trivial apps and tests.
Putting it all together:
try (var raw = App.class.getResourceAsStream("/file.txt");
var in = new BufferedReader(new InputStreamReader(raw, StandardCharsets.UTF_8))) {
String line = "";
while ((line = in.readLine()) != null) {
// process line here.
}
}

Java File is disappearing from the path /tmp/hsperfdata_*username*/

This is very confusing problem.
We have a Java-application (Java8 and running on JBoss 6.4) that is looping a certain amount of objects and writing some rows to a File on each round.
On each round we check did we receive the File object as a parameter and if we did not, we create a new object and create a physical file:
if (file == null){
File file = new File(filename);
try{
file.createNewFile();
} catch (IOException e) {e.printStackTrace();}}
So the idea is that the file get's created only once and after that the step is skipped and we proceed straight to writing. The variable filename is not a path, it's just a file name with no path so the file gets created to a path jboss_root/tmp/hsperfdata_username/
edit1. I'll add here also the methods used from writing if they happen to make relevance:
fw = new FileWriter(indeksiFile, true); // append = true
bw = new BufferedWriter(fw);
out = new PrintWriter(bw);
.
.
out.println(..)
.
.
out.flush();
out.close(); // this flushes as well -> line above is useless
So now the problem is that occasionally, quite rarely thou, the physical file disappears from the path in the middle of the process. The java-object reference is never lost, but is seems that the object itself disappears because the code automatically creates the file again to the same path and keeps on writing stuff to it. This would not happen if the condition file == null would not evaluate to true. The effect is obviously that we loose the rows which were written to the previous file. Java application does not notice any errors and keeps on working.
So, I would have three questions which are strongly related for which I was not able to find answer from google.
If we call method File.CreateNewFile(), is the resulting file a permanent file in the filesystem or some JVM-proxy-file?
If it's permanent file, do you have any idea why it's disappearing? The default behavior in our case is that at some point the file is always deleted from the path. My guess is that same mechanism is deleting the file too early. I just dunno how to control that mechanism.
My best guess is that this is related to this path jboss_root/tmp/hsperfdata_username/ which is some temp-data folder created by the JVM and probably there is some default behavior that cleans the path. Am I even close?
Help appreciated! Thanks!
File.createNewFile I never used in my code: it is not needed.
When afterwards actually writing to the file, it probaby creates it anew, or appends.
In every case there is a race on the file system. Also as these are not atomic actions,
you might end up with something unstable.
So you want to write to a file, either appending on an existing file, or creating it.
For UTF-8 text:
Path path = Paths.get(filename);
try (PrintWriter out = new PrintWriter(
Files.newBufferedWriter(path, StandardOpenOption.CREATE, StandardOpenOption.APPEND),
false)) {
out.println("Killroy was here");
}
After comment
Honestly as you are interested in the cause, it is hard to say. An application restart or I/O (?) exceptions one would find in the logs. Add logging to a specific log for appending to the files, and a (logged) periodic check for those files' existence.
Safe-guard
Here we are doing repeated physical access to the file system.
To prevent appending to a file twice at the same time (of which I would expect an exception), one can make a critical section in some form.
// For 16 semaphores:
final int semaphoreCount = 16;
final int semaphoreMask = 0xF;
Semaphore[] semaphores = new Semaphore[semaphoreCount];
for (int i = 0; i < semaphores.length; ++i) {
semaphores[i] = new Semaphore(1, true); // FIFO
}
int hash = filename.hashcode() & semaphoreMask ; // toLowerCase on Windows
Semaphore semaphore = semaphores[hash];
try {
semaphore.aquire();
... append
} finally {
semaphore.release();
}
File locks would have been a more technical solution, which I would not like to propose.
The best solution, you perhaps already have, would be to queue messages per file.

Why can't I get a file from resources?

Why can't I get a file from resources?
URL resource = getClass().getClassLoader().getResource("input data/logic test.csv");
System.out.println("Found "+resource);
CSVParser parser = new CSVParserBuilder().withSeparator(';').build();
CSVReader reader = new CSVReaderBuilder(new FileReader(resource.getFile())).withSkipLines(1).withCSVParser(parser).build();
Console output:
Found file:/home/alexandr/Repos/OTUS/first_home_work/target/classes/input%20data/logic%20test.csv
Exception in thread "main" java.io.FileNotFoundException: /home/alexandr/Repos/OTUS/first_home_work/target/classes/input%20data/logic%20test.csv (Нет такого файла или каталога)
There is an inherent logic problem with this line:
CSVReader reader = new CSVReaderBuilder(
new FileReader(resource.getFile()))..
Once the CSV is part of a Jar, it will no longer be accessible as a File object. But something like this should work directly for the URL.
CSVReader reader = new CSVReaderBuilder(
new InputStreamReader(resource.openStream()))..
change space for _ in directory name and file name, and working
This will only work while the resource is not in a Jar file.
It's:
try (InputStream raw = ClassThisIn.class.getResourceAsStream(""input data/logic test.csv")) {
InputStreamReader isr = new InputStreamReader(raw, StandardCharsets.UTF_8);
BufferedReader br = new BufferedReader(isr);
// now use br as if it was your filereader.
}
This addresses many issues:
Still works regardless of how you're running it: Your snippet only works if running as straight class files (vs., say, in a jar), and doesn't work if spaces are involved.
Still works even if your class is subclassed (getClass().getClassLoader().getResource won't, which is why you should not do that).
Still works even if the platform local charset encoding is weird (the snippet in this answer is explicit about it. That's always a good idea).
Doesn't have a resource leak. Your code never safely closes the reader you open. If you open resources, either do so in a try-with-resources construct, or, store the resource in a field and implement AutoClosable.
i change space for _ in directory name and file name, and working.... omg.
The answer is in your console output - the file was simply not found.
I would try the same code you have written, but use a file without spaces in it - and see if the file is still not found.

GC overhead limit exceeded with Apache POI

I have 13 .xlsx files with about 1000 rows in each of them. Now I want to merge it to one .xlsx file with one sheet. I'm using code from here
https://blog.sodhanalibrary.com/2014/11/merge-excel-files-using-java.html#.Vi9ns36rSUk.
Here's my code (few changes, addSheet method unchanged)
try {
FileInputStream excellFile1 = new FileInputStream(new File("tmp_testOut1000.xlsx"));
XSSFWorkbook workbook1 = new XSSFWorkbook(excellFile1);
XSSFSheet sheet1 = workbook1.getSheetAt(0);
for(int i = 2; i < 14; i++){
FileInputStream excellFile2 = new FileInputStream(new File("tmp_testOut" + i + "000.xlsx"));
XSSFWorkbook workbook2 = new XSSFWorkbook(excellFile2);
XSSFSheet sheet2 = workbook2.getSheetAt(0);
System.out.println("add " + i);
addSheet(sheet1, sheet2);
}
excellFile1.close();
// save merged file
System.out.println("merging");
File mergedFile = new File("merged.xlsx");
if (!mergedFile.exists()) {
mergedFile.createNewFile();
}
FileOutputStream out = new FileOutputStream(mergedFile);
System.out.println("write");
workbook1.write(out);
out.close();
System.out.println("Files were merged succussfully");
} catch (Exception e) {
e.printStackTrace();
}
All files are loading and merging but after "write" sysout I'm getting
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.xmlbeans.impl.store.Xobj.new_cursor(Xobj.java:1829)
at org.apache.xmlbeans.impl.values.XmlObjectBase.newCursor(XmlObjectBase.java:293)
at org.apache.xmlbeans.impl.values.XmlComplexContentImpl.arraySetterHelper(XmlComplexContentImpl.java:1151)
at org.openxmlformats.schemas.spreadsheetml.x2006.main.impl.CTFontsImpl.setFontArray(Unknown Source)
at org.apache.poi.xssf.model.StylesTable.writeTo(StylesTable.java:424)
at org.apache.poi.xssf.model.StylesTable.commit(StylesTable.java:496)
at org.apache.poi.POIXMLDocumentPart.onSave(POIXMLDocumentPart.java:341)
at org.apache.poi.POIXMLDocumentPart.onSave(POIXMLDocumentPart.java:345)
at org.apache.poi.POIXMLDocument.write(POIXMLDocument.java:206)
at Start.main(Start.java:275)
What can I do? Why is this happening and how to prevent it?
POI is notoriously memory-hungry, so running out of memory is not uncommon when handling large Excel-files.
When you are able to load all original files and only get trouble writing the merged file you could try using an SXSSFWorkbook instead of an XSSFWorkbook and do regular flushes after adding a certain amount of content (see poi-documentation of the org.apache.poi.xssf.streaming-package). This way you will not have to keep the whole generated file in memory but only small portions.
Try allocating more memory eg.
java -Xmx8192m
Also what you can try is to merge in one xlsx file at a time instead of loading them all at once.
You can also move this line into your for loop:
excellFile1.close();
So you close it right away.
This issue occurs due to the below reason
The java.lang.OutOfMemoryError: GC overhead limit exceeded error is the JVM’s way of signalling that your application spends too much time doing garbage collection with too little result. By default the JVM is configured to throw this error if it spends more than 98% of the total time doing GC and when after the GC only less than 2% of the heap is recovered.
if you just want to neglect this issue you can set the following vm options:
-XX:-UseGCOverheadLimit
Refer link on GC overhead for more information.
You can also use the below switches to assign more heap memory to your application. Run a pilot on your application for some time and identify how much memory would be better for your application
-Xms128m -Xmx512m(these switches sets the initial heap memory size to 128mb and Max memory to 512mb)
If you can avoid using the convenient but memory hungry workbook APIs, work instead with the streaming logic of processing data row by row, which is much more memory efficient.
In particular, pay particular attention to the usage of the:
XSSFReader.SheetIterator for looping over the sheets.
And finally take a good look at the usage of the API: XSSFSheetXMLHandler.
For processing the rows withing a sheet.
See the code on this project:
https://github.com/jeevatkm/excelReader/blob/master/src/main/java/com/myjeeva/poi/ExcelReader.java
You define how you want to process each row by creating your own:
new SheetContentsHandler....
This is quite like SAX parsing, it will not take a bit at your ram.
private void readSheet(StylesTable styles, ReadOnlySharedStringsTable sharedStringsTable,
InputStream sheetInputStream) throws IOException, ParserConfigurationException, SAXException {
SAXParserFactory saxFactory = SAXParserFactory.newInstance();
XMLReader sheetParser = saxFactory.newSAXParser().getXMLReader();
ContentHandler handler =
new XSSFSheetXMLHandler(styles, sharedStringsTable, sheetContentsHandler, true);
sheetParser.setContentHandler(handler);
sheetParser.parse(new InputSource(sheetInputStream));
}

BufferedReader was never closed, but file was able to delete

Recently, I reviewed our application code, and I found one issue in our code.
/**
* truncate cat tree(s) from the import file
*/
private void truncateCatTreesInFile(File file, String userImplCode) throws Exception
{
String rowStr = null, treeCode = null;
BufferedReader reader = new BufferedReader(new FileReader(file));
rowStr = reader.readLine(); // skip 1st row - header
Impl impl;
List<String> row = null;
Set<String> truncatedTrees = new HashSet<String>();
while ((rowStr = reader.readLine()) != null)
{
row = CrudServiceHelper.getRowFromFile(rowStr);
if (row == null) continue;
impl = getCatImportImpl(row.get(ECatTreeExportImportData.IMPL.getIndex()), userImplCode);
treeCode = row.get(ECatTreeExportImportData.TREE_CODE.getIndex());
if(truncatedTrees.contains(treeCode)) continue;
truncatedTrees.add(treeCode);
CatTree catTree = _treeDao.findByCodeAndImpl(treeCode, impl.getId());
if(catTree!= null) _treeDao.makeTransient(catTree);
}
_treeDao.flush();
}
Looking at the above code, the "reader" was never closed, I was thinking it could be an issue, but actually, it just works fine, the file is able to delete by tomcat.
javax.servlet.context.tempdir>
[java] 2013-03-27 17:45:54,285 INFO [org.apache.struts2.dispatcher.Dispatch
er] -
Basically, what I am trying to do is uploading one file from browser, and generate sql based on the file to insert data into our database. After all done, delete the file.
I am surprised this code works fine, does anybody have an idea here? I tried to google it, but I did not get any idea.
Thanks,
Jack
Not closing a reader may result in a resource leak. Deleting an open file may still be perfectly fine.
Under Linux (and other Unix variants) deleting a file if just unlinking a name from it. A file without any names left gets actually freed. So opening a file, deleting it (removing its name) and then reading and writing to it is a well-known way to obtain a temporary file. Once the file is closed, the space is freed, but not earlier.
Under Windows, certain programs lock files they read, this prevents other processes from removing such a file. But not all programs do so. I don't have a Windows machine around to actually test how does Java handle this.
The fact that the code does not crash does not mean that the code works completely correctly. The problem you noticed might become visible only much later, if the app just consumes more and more RAM due to the leak. This is unlikely, though: the garbage collector will eventually close readers, and probably soon enough, because reader is local and never leaks out of the method.

Categories

Resources