Hibernate - Searching - Java heap space - java

I have a java heap space problem with hibernate since I add a modification in mi code.
The program load information from a text file (2GB)
BufferedReader input = new BufferedReader(new FileReader(file));
while (line.compareTo(EOF) != 0) {
//Load of infoObject, lots of parse information from the text file
//Read lines of text using: line = input.readLine();
getSession().save(infoObject);
counter++;
if (counter % 100 == 0) {
getSession().flush();
System.gc();
}
}
This work great, now in case that the infoObject already exists in my db, I need to update the record, I use this:
BufferedReader input = new BufferedReader(new FileReader(file));
while (line.compareTo(EOF) != 0) {
//Load of infoObject
iObject infoObject_tmp = new iObject();
infoObject_tmp.setNumAccount(numAccount);
infoObject_tmp.setCloseDate(new Date("02/24/2011"));
iObject infoObject_search = (iObject) getSession().load(iObject.class, infoObject_tmp);
if (infoObject_search !=null){
getSession().update(infoObject);
}else{
getSession().save(infoObject);
}
counter++;
if (counter % 100 == 0) {
getSession().flush();
System.gc();
}
}
FreeMemory:
Registry 1: 750283136.
Registry 10000: 648229608.
Registry 50000: 411171048.
Registry 100000: Java Heap Space.
How can i fix the java heap space problem?
I know that the problem is when i check if the iObject exists or not.

you need a flush() followed by a clear(). flush() only pushes any pending updates to the database. clear() removes the objects from the session. you need to be careful with clear() though, because any other code which has a reference to an object which was in the session now has a "detached" object.

It's tough to tell what exactly is going on without seeing more of your code. It's also a bit confusing since your BufferedReader does not appear to be used.
That said, one possibility, given what it looks like you are doing, is that somewhere you are calling substring on some text input and keeping a reference to the String object that is returned as a result. For instance, you might be doing something like this:
List<String> allMySubstrings = new ArrayList<String>();
while((line = input.readLine()) != null){
String mySubstring = line.subString(40, 42);
// mySubstring still has a reference to the whole character array from line
allMySubstrings.add(mySubstring);
}
You might think then that you will only have references to allMySubstrings and not every line that was read. However, because of the way subString is implemented, the resulting String object will have a reference to the entire original character array from line, not just the relevant substring. Garbage collection won't occur as you might expect as a result
If you are doing this, you can get around it by creating a new String object:
List<String> allMySubstrings = new ArrayList<String>();
while((line = input.readLine()) != null){
String mySubstring = new String(line.subString(40, 42));
allMySubstrings.add(mySubstring);
}
Note that this might be happening even if you do not explicitly call subString as common methods like split might be responsbile.

You declare BufferedReader input, but you never seem to use it. Could it be that the line Object that you refer to in the while loop is accessing the underlying File in a non-buffered way?

Related

Trying to clear screen in java

I'm printing a single string from a server which contains multiple '\n' in it and would like to clear the screen before each new message. However, the following code causes the screen to be cleared after each line in the single string.
while (true) {
String s = server.readLine();
if (s == null) {
throw new NullPointerException();
}
ConsoleCleaner.clean();
System.out.println(s.toString());
}
Again, s is one single string with multiple '\n' inside which leads to one line being printed and the screen cleared each time.
I'm assuming that server is a BufferedReader here, since you haven't specified otherwise. And for the purpose of BufferedReader.readLine(), there's no such thing as "a single string with multiple \n". When the method encounters the first \n, that's the output of readLine().
You could avoid this issue by keeping track of the non-whitespace length of the last message printed, and only clearing the screen when it's non-zero.
Could you adapt this example to reach your requirement?
Read all lines with BufferedReader
Perhaps like this:
String s;
while ((line = server.readLine()) != null) {
s += line + (System.getProperty("line.separator"));
}
ConsoleCleaner.clean();
System.out.println(s.toString());
Not tried this...

Removing duplicate lines from a text file

I have a text file that is sorted alphabetically, with around 94,000 lines of names (one name per line, text only, no punctuation.
Example:
Alice
Bob
Simon
Simon
Tom
Each line takes the same form, first letter is capitalized, no accented letters.
My code:
try{
BufferedReader br = new BufferedReader(new FileReader("orderedNames.txt"));
PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter("sortedNoDuplicateNames.txt", true)));
ArrayList<String> textToTransfer = new ArrayList();
String previousLine = "";
String current = "";
//Load first line into previous line
previousLine = br.readLine();
//Add first line to the transfer list
textToTransfer.add(previousLine);
while((current = br.readLine()) != previousLine && current != null){
textToTransfer.add(current);
previousLine = current;
}
int index = 0;
for(int i=0; i<textToTransfer.size(); i++){
out.println(textToTransfer.get(i));
System.out.println(textToTransfer.get(i));
index ++;
}
System.out.println(index);
}catch(Exception e){
e.printStackTrace();
}
From what I understand is that, the first line of the file is being read and loaded into the previousLine variable like I intended, current is being set to the second line of the file we're reading from, current is then compared against the previous line and null, if it's not the same as the last line and it's not null, we add it to the array-list.
previousLine is then set to currents value so the next readLine for current can replace the current 'current' value to continue comparing in the while loop.
I cannot see what is wrong with this.
If a duplicate is found, surely the loop should break?
Sorry in advance when it turns out to be something stupid.
Use a TreeSet instead of an ArrayList.
Set<String> textToTransfer = new TreeSet<>();
The TreeSet is sorted and does not allow duplicates.
Don't reinvent the wheel!
If you don't want duplicates, you should consider using a Collection that doesn't allows duplicates. The easiest way to remove repeated elements is to add the contents to a Set which will not allow duplicates:
import java.util.*;
import java.util.stream.*;
public class RemoveDups {
public static void main(String[] args) {
Set<String> dist = Arrays.asList(args).stream().collect(Collectors.toSet());
}
}
Another way is to remove duplicates from text file before reading the file by the Java code, in Linux for example (far quicker than do it in Java code):
sort myFileWithDuplicates.txt | uniq -u > myFileWithoutDuplicates.txt
While, like the others, I recommend using a collection object that does not allow repeated entries into the collection, I think I can identify for you what is wrong with your function. The method in which you are trying to compare strings (which is what you are trying to do, of course) in your While loop is incorrect in Java. The == (and its counterpart) are used to determine if two objects are the same, which is not the same as determining if their values are the same. Luckily, Java's String class has a static string comparison method in equals(). You may want something like this:
while(!(current = br.readLine()).equals(previousLine) && current != null){
Keep in mind that breaking your While loop here will force your file reading to stop, which may or may not be what you intended.

why storing data directly using print() method is faster than storing it in a string and then writing to a file?

Lets consider this scenario: I am reading a file, and then tweaking each line a bit and then storing the data in a new file. Now, I tried two ways to do it:
storing the data in a String and then writing it to the target file at the end like this:
InputStream ips = new FileInputStream(file);
InputStreamReader ipsr = new InputStreamReader(ips);
BufferedReader br = new BufferedReader(ipsr);
PrintWriter desFile = new PrintWriter(targetFilePath);
String data = "";
while ((line = br.readLine()) != null) {
if (line.contains("_Stop_"))
continue;
String[] s = line.split(";");
String newLine = s[2];
for (int i = 3; i < s.length; i++) {
newLine += "," + s[i];
}
data+=newLine+"\n";
}
desFile.write(data);
desFile.close();
br.close();
directly using println() method for PrintWriter as below in the while loop:
while ((line = br.readLine()) != null) {
if (line.contains("_Stop_"))
continue;
String[] s = line.split(";");
String newLine = s[2];
for (int i = 3; i < s.length; i++) {
newLine += "," + s[i];
}
desFile.println(newLine);
}
desFile.close();
br.close();
The 2nd process is way faster than the 1st one. Now, my question is what is happening so different in these two process that it is differing so much by execution time?
Appending to your string will:
Allocate memory for a new string
Copy all data previously copied.
Copy the data from your new string.
You repeat this process for every single line, meaning that for N lines of output, you copy O(N^2) bytes around.
Meanwhile, writing to your PrintWriter will:
Copy data to the buffer.
Occasionally flush the buffer.
Meaning that for N lines of output, you copy only O(N) bytes around.
For one, you're creating an awful lot of new String objects by appending using +=. I think that'll definitely slow things down.
Try appending using a StringBuilder sb declared outside of the loop and then calling desFile.write(sb.toString()); and see how that performs.
First of all, the two processes aren't producing the same data, since the one that calls println will have line separator characters between the lines whereas the one that builds all the data up in a buffer and writes it all at once will not.
But the reason for the performance difference is probably the enormous number of String and StringBuilder objects you are generating and throwing away, the memory that needs to be allocated to hold the complete file contents in memory, and the time taken by the garbage collector.
If you're going to be doing a significant amount of string concatenation, especially in a loop, it is better to create a StringBuilder before the loop and use it to accumulate the results in the loop.
However, if you're going to be processing large files, it is probably better to write the output as you go. The memory requirements of your application will be lower, whereas if you build up the entire result in memory, the memory required will be equal to the size of the output file.

Optimum time to perform an operation: within, or after loop

I am reading a file to parse later on. The file is not likely to exceed an MB in size, so this is perhaps not a crucial question for me at this stage. But for best practise reasons, I'd like to know when is the optimum time to perform an operation.
Example:
Using a method I've pasted from http://www.dzone.com/snippets/java-read-file-string, I am reading a buffer into a string. I would now like to remove all whitespace. My method is currently this:
private String listRaw;
public boolean readList(String filePath) throws java.io.IOException {
StringBuffer fileData = new StringBuffer(1024);
BufferedReader reader = new BufferedReader(
new FileReader(filePath));
char[] buf = new char[1024];
int numRead=0;
while((numRead=reader.read(buf)) != -1){
String readData = String.valueOf(buf, 0, numRead);
fileData.append(readData);
buf = new char[1024];
}
reader.close();
listRaw = fileData.toString().replaceAll("\\s","");
return true;
}
So, I remove all whitespace from the string at the time I store it - in it's entirety - to a class variable.
To me, this means less processing but more memory usage. Would I be better off applying the replaceAll() operation on the readData variable as I append it to fileData for best practise reasons? Using more processing but avoiding passing superfluous whitespace around.
I imagine this has little impact for a small file like the one I am working on, but what if it's a 200MB log file?
Is it entirely case-dependant, or is there a consensus I'd do better to follow?
Thanks for the input everybody. I'm sure you've helped to aim my mindset in the right direction for writing Java.
I've updated my code to take into consideration the points raised. Including the suggestion by Don Roby that at some point, I may want to keep spaces. Hopefully things read better now!
private String listRaw;
public boolean readList(String filePath) throws java.io.IOException {
StringBuilder fileData = new StringBuilder(51200);
BufferedReader reader = new BufferedReader(new FileReader(filePath));
char[] buf = new char[51200];
boolean spaced = false;
while(reader.read(buf) != -1){
for(int i=0;i<buf.length;i++) {
char c = buf[i];
if (c != '\t' && c != '\r' && c != '\n') {
if (c == ' ') {
if (spaced) {
continue;
}
spaced = true;
} else {
spaced = false;
}
fileData.append(c);
}
}
}
reader.close();
listRaw = fileData.toString().trim();
return true;
}
You'd better create and apply the regexp replacement only once, at the end. But you would gain much more by
initializing the StringBuilder with a reasonable size
avoiding the creation of a String inside the loop, and append the read characters directly to the StringBuilder
avoiding the instantiation of a new char buffer, for nothing, at each iteration.
To avoid an unnecessary long temporary String creation, you could read char by char, and only append the char to the StringBuilder if it's not a whitespace. In the end, the StringBuilder would contain only the good characters, and you wouldn't need any replaceAll() call.
THere are actually several very significant inefficiencies in this code, and you'd have to fix them before worrying about the relatively less important issue you've raised.
First, don't create a new buf object on each iteration of the loop -- use the same one! There's no problem with doing so -- the new data overwrites the old, and you save on object allocation (which is one of the more expensive operations you can do.)
Second, similarly, don't create a String to call append() -- use the form of append that takes a char array and an offset (0, in this case) and length (numRead, in this case.) Again, you create one less object per loop iteration.
Finally, to come to the question you actually asked: doing it in the loop would create a String object per iteration, but with the tuning we've just done, you're creating zero objects per iterataion -- so removing the whitespace at the end of the loop is the clear winner!
Depending somewhat on the parse you're going to do, you may well be better off not removing the spaces in a separate step at all, and just ignore them during the parse.
It's also reasonably rare to want to remove all whitespace. Are you sure you don't want to just replace multiple spaces with single spaces?

Android - OutOfMemory when reading text file

I'm making a dictionary app on android. During its startup, the app will load content of .index file (~2MB, 100.000+ lines)
However, when i use BufferedReader.readLine() and do something with the returned string, the app will cause OutOfMemory.
// Read file snippet
Set<String> indexes = new HashSet<String)();
FileInputStream is = new FileInputStream(indexPath);
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
String readLine;
while ( (readLine = reader.readLine()) != null) {
indexes.add(extractHeadWord(readLine));
}
// And the extractHeadWord method
private String extractHeadWord(String string) {
String[] splitted = string.split("\\t");
return splitted[0];
}
When reading log, I found that while executing, it causes the GC explicitly clean objects many times (GC_EXPLICIT freed xxx objects, in which xxx is a big number such as 15000, 20000).
And I tried another way:
final int BUFFER = 50;
char[] readChar = new char[BUFFER];
//.. construct BufferedReader
while (reader.read(readChar) != -1) {
indexes.add(new String(readChar));
readChar = new char[BUFFER];
}
..and it run very fast. But it was not exactly what I wanted.
Is there any solution that run fast as the second snippet and easy to use as the first?
Regard.
The extractHeadWord uses String.split method. This method does not create new strings but relies on the underlying string (in your case the line object) and uses indexes to point out the "new" string.
Since you are not interessed in the rest of the string you need to discard the it so it gets garbage collected otherwise the whole string will be in memory (but you are only using a part of it).
Calling the constructor String(String) ("copy constructor") discards the rest of string:
private String extractHeadWord(String string) {
String[] splitted = string.split("\\t");
return new String(splitted[0]);
}
What happens if your extractHeadWord does this return new String(splitted[0]);.
It will not reduce temporary objects, but it might reduce the footprint of the application. I don't know if split does about the same as substring, but I guess that it does. substring creates a new view over the original data, which means that the full character array will be kept in memory. Explicitly invoking new String(string) will truncate the data.

Categories

Resources