Why does java Grep crash with OutOfMemoryError?

Why does java Grep crash with OutOfMemoryError? - java

I'm running the following code more or less out of the box
http://download.oracle.com/javase/1.4.2/docs/guide/nio/example/Grep.java
I'm using the following VM arguments
-Xms756m -Xmx1024m
It crashes with OutOfMemory on a 400mb file. What am I doing wrong?
Stack trace:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapCharBuffer.<init>(Unknown Source)
at java.nio.CharBuffer.allocate(Unknown Source)
at java.nio.charset.CharsetDecoder.decode(Unknown Source)
at com.alluvialtrading.tools.Importer.<init>(Importer.java:46)
at com.alluvialtrading.tools.ReutersImporter.<init>(ReutersImporter.java:24)
at com.alluvialtrading.tools.ReutersImporter.main(ReutersImporter.java:20)

You are not doing anything wrong.
The problem is that the application maps the entire file into memory, and then creates a 2nd in-heap copy of the file. The mapped file is not consuming heap space, though it does use part of the JVM's virtual address space.
It is the 2nd copy, and the process of creating it that is actually filling the heap. The 2nd copy contains the file content expanded into 16-bit characters. A contiguous array of ~400 million characters (800 million bytes) is too big for a 1Gb heap, considering how the heap spaces are partitioned.
In short, the application is simply using too much memory.
You could try increasing the maximum heap size, but the real problem is that the application is too simple-minded in the way it manages memory.
The other point to make is application you are running is an example designed to illustrate how to use NIO. It is not designed to be a general purpose, production quality utility. You need to adjust your expectations accordingly.

Probably because 400Mb file is loaded into CharBuffer, so it takes twice as much memory in UTF16 encoding. So it does not leave much memory for the pattern matcher.
If you're using latest versions of java, try -XX:+UseCompressedStrings so that it represents strings internally as byte arrays and consumes less memory. You might have to put CharBuffer into a String.
So the exception is
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:57)
at java.nio.CharBuffer.allocate(CharBuffer.java:329)
at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:777)
at Grep.grep(Grep.java:118)
at Grep.main(Grep.java:136)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
line under question is the constructor of HeapCharBuffer:
super(-1, 0, lim, cap, new char[cap], 0);
Which means it cannot create a char array of the size of the file.
If you want to grep large files in java, you'd need to find some algorithm that accepts a Reader of some sort. Standard java library does not have such functionality.

I would assume because the class as given loads the ENTIRE file into memory. Exactly where I'm not sure as I do not know the Java NIO classes. I would suspect though classes like MappedByteBuffer and CharBuffer might be the issue.
A stack trace might be able to tell you where its coming from.

Related

How To Append the Large Text Files to JTextArea in java Swing

I have implemented an Java Swing Application.In that I have wrote Open File Functionality.I have tried with lot of ways to read the file and write into the JTextArea(I have tried with append(),setText() and read() method also).But,It working upto 100 MB.If I want to open over 100 MB file It raises an "out of Memory Exception : Java Heap space" at textarea.append().Is there any way to append over 100MB data to JTextArea or Anyway to Increase the Memory capacity of JTextArea.Please give a Suggestions for the above issue.Thanking You.

Possibly a duplicate of Java using up far more memory than allocated with -Xmx as your problem is really that your java-instance is running out of memory.
Java can open (theoretically) files of any size, as long as you have the memory for it to be read.
I would however recommend that you only read in parts of a file in memory at a time. And when you've finish with that part you move on to the next specified amount of text.
Anyhow, for this instance and if this is not a regular problem, you could use -Xmx800m which would let java use 800mb for heap space.
If this is not a one time thing, you really should look in to just reading in parts of a file at a time. http://www.baeldung.com/java-read-lines-large-file should put you in the right direction.

Out Of Memory Error: Java heap space - Using big array size

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
Using a 2D array of [100000][100000] and other two arrays of [100000] each. I need these three arrays in the whole program so can't free their memory.
Already tried VM Options -Xmx512m in Netbeans
Please be specific and Step by Step, I am newbie in Java and Netbeans.
Thanks in advance for you help....

Let's do some math. You're allocating a 10,000,000,000 element two dimensional array, plus another two arrays of 100,000 elements.
That's 10,002,000,000 elements. If each of them is an int, that's 40,008,000,000 bytes. That's 37.26 Giga bytes.
Your -Xmx512m isn't nearly enough, you need something closer to -Xmx60G if these are really ints or -Xmx15G in the best case scenario, in which the elements are bytes (e.g. booleans). But that will probably won't work since you (probably) don't have enough physical memory. To me it sounds like you need some disk backed storage, or a database.
Either re-think what you're doing and how you're doing it, or use a machine with that much physical memory.

Send huge json object from Java Servlet

Java servlet returns JSON object.
response.setContentType("application/json");
response.getWriter().write(json.toString());
JSON object contains data fetched from table (database) size of which > 50 MB.
On running, servlet throws this error:
java.lang.OutOfMemoryError: Java heap space
It seems the issue is while writing json data. Server is unable to allocate contiguous memory of size> 50 MB to the String.
I am unable to figure out fix for this issue. How can I send huge JSON object from Servlet?

json.toString() is likely to cause the error. It is creating one big string from the already existing json object before anything has been send out.
Slurping everything into memory is convenient, but not very wise when it comes to any limitations. Process your database records one by one and stream immediately to the client instead of copying around in memory. Rule of thumb: "Any limitation given will be exceeded at some time."

Splitting the JSON data structure into smaller parts is definitely one way to solve the problem at hand. But an alternative via heap size increase might do the job in this case as well.
The “java.lang.OutOfMemoryError: Java heap space” error will be triggered when you try to add more data into the heap space area in memory, but the size of this data is larger than the JVM can accommodate in the Java heap space.
Note that JVM gets a limited amount of memory from the OS, specified during startup. There are several startup parameters controlling the separate regions of memory allocated, but in your case you are interested in heap region, which you can set (or increase if present) similar to the following example setting your heap size to 1GB:
java -Xmx1024m com.mycompany.MyClass

Fast jvm start / jvm persistancy - starting jvm with data from heap dump

I am developing an in memory data structure, and would like to add persistency.
I am looking for a way to do this fast. I thought about dumpping a heap-dump once in a while.
Is there a way to load this java heap dump as is, into my memory? or is it impossible?
Otherwise, other suggestions for fast write and fast read of the entire information?
(Serialization might take a lot of time)
-----------------edited explination:--------
Since my memory might be full of small pieces of information, referencing each other - and so serialization may require me to in efficeintly scan all my memory. reloading is also possibly problematic.
On the other hand, I can define a gigantic array, and each object I create, I shall put it in the array. Links will be a long number, reperesnting the place in the array. Now, I can just dump this array as is - and also reload it as is.
There are even some jvms like JRockit that utilize the disk space, and so maybe it is possible maybe to dump as is very quickly and to re-load very quicky.
To prove my point, java dump contains all the information of the jvm, and it is produced quickly.
Sorry, but serialization of 4GB isn't even close to being in the seconds dump is.
Also, memory is memory and there are operating systems that allow you a ram memory dump quicky.
https://superuser.com/questions/164960/how-do-i-dump-physical-memory-in-linux
When you think about it... this is quite a good strategy for persistant data structures. There is quite a hype about in-memory data bases in the last decade. But why settel for that? What if I want a fibonacci heap - to be "almost persistant". That is, every 5 minutes I will dump the inforamtion (quickly) and in case of a electrical outage, I have a backup from 5 minutes ago.
-----------------end of edited explination:--------
Thank you.

In general, there is no way to do this on HotSpot.
Objects in the heap have 2 words of header, the second of which points into permgen for the class metadata (known as a klassOop). You would have to dump all of permgen as well, which includes all the pointers to compiled code - so basically the entire process.
There would be no sane way to recover the heap state correctly.
It may be better to explain precisely what you want to build & why already-existing products don't do what you need.

Use Serialization. Implement java.io.Serializable, add serialVersionUID to all of your classes, and you can persist them to any OutputStream (file, network, whatever). Just create a starting object from where all your object are reachable (even indirectly).
I don't think that Serialization would take long time, it's optimized code in the JVM.

You can use jhat or jvisualvm to load your dump to analyze it. I don't know whether the dump file can be loaded and restarted again.

Huge LinkedList is causing GC overhead limit, is there another solution?

here is my code:
public void mapTrace(String Path) throws FileNotFoundException, IOException {
FileReader arq = new FileReader(new File(Path));
BufferedReader leitor = new BufferedReader(arq, 41943040);
Integer page;
String std;
Integer position = 0;
while ((std = leitor.readLine()) != null) {
position++;
page = Integer.parseInt(std, 16);
LinkedList<Integer> values = map.get(page);
if (values == null) {
values = new LinkedList<>();
map.put(page, values);
}
values.add(position);
}
for (LinkedList<Integer> referenceList : map.values()) {
Collections.reverse(referenceList);
}
}
This is the HashMap structure
Map<Integer, LinkedList<Integer>> map = new HashMap<>();
For 50mb - 100mb trace files i don't have any problem, but for bigger files i have:
Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: GC overhead limit exceeded
I don't know if the reverse method is increasing the memory use, if the LinkedList is using more space than other List structure or if the way i'm adding the list to the map is taking more space than it should. Does anyone can tell me what's using so much space?

Does anyone can tell me what's using so much space?
The short answer is that it is probably the space overheads of the data structure you have chosen that is using the space.
By my reckoning, a LinkedList<Integer> on a 64 bit JVM uses about 48 bytes of storage per integer in the list including the integers themselves.
By my reckoning, a Map<?, ?> on a 64 bit machine will use in the region of 48 bytes of storage per entry excluding the space need to represent the key and the value objects.
Now, your trace size estimates are rather too vague for me to plug the numbers in, but I'd expect a 1.5Gb trace file to need a LOT more than 2Gb of heap.
Given the numbers you've provided, a reasonable rule-of-thumb is that a trace file will occupy roughly 10 times its file size in heap memory ... using the data structure that you are currently using.
You don't want to configure a JVM to try to use more memory than the physical RAM available. Otherwise, you are liable to push the machine into thrashing ... and the operating system is liable to start killing processes. So for an 8Gb machine, I wouldn't advise going over -Xmx8g.
Putting that together, with an 8Gb machine you should be able to cope with a 600Mb trace file (assuming my estimates are correct), but a 1.5Gb trace file is not feasible. If you really need to handle trace files that big, my advice would be to either:
design and implement custom collection types for your specific use-case that use memory more efficiently,
rethink your algorithms so that you don't need to hold the entire trace files in memory, or
get a bigger machine.
I did some tests before reading your comment, i put -Xmx14g and processed the 600mb file, it took some minutes(about 10) but it did fine.
The -Xmx14g option sets the maximum heap size. Based on the observed behaviour, I expect that the JVM didn't need anywhere like that much memory ... and didn't request it from the OS. And if you'd looked at memory usage in the task manager, I expect you'd have seen numbers consistent with that.
Then i put -Xmx18g and tried to process the 1,5gb file, and its been running for about 20 minutes. My memory in the task manager is going from 7,80 to 7,90. I wonder if this will finish, how could i use MORE memory than i have? Does it use the HD as virtual memory?
Yes that it is what it does.
Yes, each page of your processes virtual address space corresponds to a page on the hard disc.
If you've got more virtual pages than physical memory pages, at any given time some of those virtual memory pages will live on disk only. When your application tries to use a one of those non-resident pages, the VM hardware generates an interrupt, and the operating system finds an unused page and populates it from the disc copy and then hands control back to your program. But if your application is busy, then it will have had to make that physical memory page by evicting another page. And that may have involved writing the contents of the evicted page to disc.
The net result is that when you try to use significantly more virtual address pages than you have physical memory, the application generates lots of interrupts that result in lots of disc reads and writes. This is known as thrashing. If your system thrashes too badly, the system will spend most of its waiting for disc reads and writes to finish, and performance will drop dramatically. And on some operating systems, the OS will attempt to "fix" the problem by killing processes.

Further to Stephen's quite reasonable answer, everything has its limit and your code simply isn't scalable.
In case where the input is "large" (as in your case), the only reasonable approach is a stream based approach, which while (usually) more complicated to write, uses very little memory/resources. Essentially you hold in memory only what you need to process the current task then release it asap.
You may find that unix command line tools are your best weapon, perhaps using a combination of awk, sed, grep etc to massage your raw data into hopefully a usable "end format".
I once stopped a colleague from writing a java program to read in and parse XML and issue insert statements to a database: I showed him how to use a series of piped commands to produce executable SQL which was then piped directly into the database command line tool. Took about 30 minutes to get it right, but job done. And the file was massive , so in java it would have required a SAC parser and JDBC, which aren't fun.

to build this structure, I would put those data in a key/value datastore like berkeleydb for java.
peusdo-code
putData(db,page,value)
{
Entry key=new Entry();
Entry data=new Entry();
List<Integer> L=new LinkedList<Integer>();;
IntegerBinding.intToEntry(page,key);
if(db.get(key,data)==OperationStatus.SUCCESS)
{
TupleInput t=new TupleInput(data);
int n=t.readInt();
for(i=0;i< n;++n) L.add(n);
}
L.add(value);
TupleOutput out=new TupleOutput();
out.writeInt(L.size());
for(int v: L) out.writeInt(v);
data=new Entry(out.toByteArray());
db.put(key,data);
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.