Nashorn OutOfMemoryError when building large js objects (strings) - java

I'm using Nashorn on 1.8u60 to create model objects to pass back to view tier (thymeleaf). Part of the model object is a somewhat large string (not big enough to cause any issues in plain java) containing HTML. When trying to convert the object back into Java using ScriptObjectMirror methods i'm hitting the following exception. Changing max heap size doesn't seem to have any effect ( changed from 900mb to 1800mb, same error). I couldn't find much online about this, but are there any restrictions that Nashorn places on object sizes? I'm going to try latest 1.8 JDK now.
java.lang.OutOfMemoryError: Java heap space
at jdk.nashorn.internal.runtime.ConsString.flatten(ConsString.java:105)
at jdk.nashorn.internal.runtime.ConsString.flattened(ConsString.java:98)
at jdk.nashorn.internal.runtime.ConsString.toString(ConsString.java:69)
at jdk.nashorn.api.scripting.ScriptObjectMirror.wrap(ScriptObjectMirror.java:704)
at jdk.nashorn.api.scripting.ScriptObjectMirror.wrapLikeMe(ScriptObjectMirror.java:721)
at jdk.nashorn.api.scripting.ScriptObjectMirror.wrapLikeMe(ScriptObjectMirror.java:730)
at jdk.nashorn.api.scripting.ScriptObjectMirror.access$300(ScriptObjectMirror.java:64)
at jdk.nashorn.api.scripting.ScriptObjectMirror$13.call(ScriptObjectMirror.java:371)
at jdk.nashorn.api.scripting.ScriptObjectMirror$13.call(ScriptObjectMirror.java:364)
at jdk.nashorn.api.scripting.ScriptObjectMirror.inGlobal(ScriptObjectMirror.java:859)
at jdk.nashorn.api.scripting.ScriptObjectMirror.entrySet(ScriptObjectMirror.java:364)
...
Thanks,Adrian

That line reads
final char[] chars = new char[length];
so it appears there's indeed not enough memory for the final string. Nashorn uses ConsString as a way to amortize concatenation costs by delaying concatenation until the result is used (most JS engines use this optimization otherwise e.g. a concatenating lots of strings in a loop will require O(n^2) time).
This means that you might have a result of many + operators on strings be a tree of ConsString objects that get "flattened" at once. The tradeoff for linearizing the time of the concatenation is the need to keep those ConsStrings around, which'll require more than twice the memory required for the string (more than twice 'cause of the ConsString objects own overhead).
One way to get around this is to periodically invoke str.toString(). It is seemingly a no-op but internally it forces flattening of the concatenation tree. Try introducing it into your code at some point and see whether it helps.

Related

What happens when a Java String overflows?

As far as I understand, Java Strings are just an array of characters, with the maximum length being an integer value.
If I understand this answer correctly, it is possible to cause an overflow with a String - albeit in "unusual circumstances".
Since Java Strings are based on char arrays and Java automatically checks array bounds, buffer overflows are only possible in unusual scenarios:
If you call native code via JNI
In the JVM itself (usually written in C++)
The interpreter or JIT compiler does not work correctly (Java bytecode mandated bounds checks)
Correct me if I'm wrong, but I believe this means that you can write outside the bounds of the array, without triggering the ArrayIndexOutOfBounds (or similar) exception.
I've encountered issues in C++ with buffer overflows, and I can find plenty of advice about other languages, but none specifically answering what would happen if you caused a buffer overflow with a String (or any other array type) in Java.
I know that Java Strings are bounds-checked, and can't be overflowed by native Java code alone (unless issues are present in the compiler or JVM, as per points 2 and 3 above), but the first point implies that it is technically possible to get a char[] into an... undesirable position.
Given this, I have two specific questions about the behaviour of such issues in Java, assuming the above is correct:
If a String can overflow, what happens when it does?
What would the implications of this behaviour be?
Thanks in advance.
To answer you first question, I had the luck of actually causing a error of such, and the execution just stopped throwing one of these errors:
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
So that was my case, I don't know if that represents a security problem as buffer overflow in C and C++.
A String in Java is immutable, so once created there is no writing to the underlying array of char or array of byte (it depends on the Java version and contents of the String whether one or the other is used). Ok, using JNI could circumvent that, but with pure Java, it is impossible to leave the bounds of the array without causing an ArrayOutOfBoundsException or alike.
The only way to cause a kind of an overflow in the context of String handling would be to create a new String that is too long. Make sure that your JVM will have enough heap (around 36 GB), create a char array of Integer.MAX_VALUE - 1, populate that appropriately, call new String( byte [] ) with that array, and then execute
var string = string.concat( new String( array ) );
But the result is just an exception telling you that it was attempted to create a too large array.

how to create our own O(1) substring function in java as it was in jdk 6.

How to create our own O(1) substring function in java as it was in jdk 6. If there is any method to use substring() of jdk 6 on advanced versions of jdk ?
The O(1) substring was because the underlying character array of the string could be shared between objects. Hence substring simply required creating an object with a pointer to the original string along with an offset and length. There was no copying of the actual data itself, which had the annoying effect that taking a small substring of a huge string, then deleting the huge one, didn't actually free up memory. This lead to code such as:
String newstr = new String(oldStr.substring(5,9));
rather than the more sensible-looking:
String newstr = oldStr.substring(5,9);
Since strings no longer share data (Update 6 of Java 7 is where I think this happened), that's not possible so, if you want to get back that O(1) performance, you'll basically have to construct your own string class to do it.
Just be aware that you may be worrying about something that's not so important. Unless your strings are very large, the extra cost (in space and time) of copying the data for them may be inconsequential.
And the extra effort in converting your O1String into String for every function that needs the latter, as well as the less than perfect integration with literal strings, may well make it even worse.
Here you can view how it was implimented in Java 6
Open JDK

Java StringBuilder high volumes of data blocks thread

I'm running a small program that processes around 215K of records in the database. These records contain xml that is used by JaxB to marshal and unmarshal to objects.
The program I was running was trying to find xml's that due to legacy couldn't be unmarshalled anymore. Each time I had the unmarshal exception I save this exception message containing the xml in an arraylist. All in the end I wanted to send out a mail with all failed records with the cause exception message. So I used the messages in the arraylist together with a StringBuilder to compose the email body.
However there where around 75K failures and when I was building the body the StringBuilder just stopped appending at a certain point in the for loop and the thread was blocked. I since changed my approach not to append the xml from the exception message anymore, but I'm still not clear why it didn't work.
Could it be that the VM went out of memory, or can Strings only be of a certain size (doubtful I believe certainly in the 64 bit era). Is there a better way I could have solved this ? I contemplated sending the StringBuilder to my service instead of saving the strings in an arraylist first, but that would be such a dirty interface then :(
Any architectural insights would be appreciated.
EDIT
As requested here the code, it's no rocket science. Take that the failures list contains around 75K entries, each entry contains an xml of on avg 500 to 1000 lines
private String createBodyMessage(List<String> failures) {
StringBuilder builder = new StringBuilder();
builder.append("Failed operations\n");
builder.append("=================\n\n");
for (String failure : failures) {
builder.append(failure);
builder.append("\n");
}
return builder.toString();
}
You might be just successful with
int sizeEstimate = failures.size() * 20;
StringBuilder builder = new StringBuilder(sizeEstimate);
builder.append("Failed operations\n");
builder.append("=================\n\n");
while (!failures.isEmpty()) {
builder.append(failures.remove(0));
builder.append('\n');
}
This does less resizing the internal buffer of StringBuilder and consumes failures to reduce that memory.
It might not solve the problem if the text is too huge.
Compressed attachment however is standard procedure.
StringBuffer is based on Array structure, and the maximum number of cells in array is 2^31-1
Reaching this size will normally throws an error on Java 7, but i'm not very sure
The solution is to swap your data to a file, before reaching a fixed size of your StringBuffer
Could it be that the VM went out of memory,
If you filled up the heap, you would get an OutOfMemoryError exception.
or can Strings only be of a certain size (doubtful I believe certainly in the 64 bit era).
Actually, yes. A Java String or StringBuilder can contain at most 2^32-1 characters1.
Is there a better way I could have solved this ? I contemplated sending the StringBuilder to my service instead of saving the strings in an arraylist first ...
That won't help if the real problem is that the concatenation of the strings is too large to hold in a StringBuilder.
Actually, a better approach would be to stream the strings into a PipedOutputStream, and use the corresponding PipedInputStream to construct a MimeBodyPart that you then attach to the email. You could include a compressor in the stream stack too.
But an even better approach would be not to attempt to send gigabytes of erroneous data as email attachments. Save them as files that can be be fetched (or whatever) if the email recipient wants them.
1 - Surprisingly, the javadocs don't seem to state this explicitly. However, String.length() returns an int, and various string manipulation methods take int arguments to specify offsets and lengths. And certainly, the standard implementations of String and StringBuilder use a single char[] as backing store, and arrays are limited to 2^31-1 elements by the JLS and the JVM spec.

best way of loading a large text file in java

I have a text file, with a sequence of integer per line:
47202 1457 51821 59788
49330 98706 36031 16399 1465
...
The file has 3 million lines of this format. I have to load this file into the memory and extract 5-grams out of it and do some statistics on it. I do have memory limitation (8GB RAM). I tried to minimize the number of objects I create (only have 1 class with 6 float variables, and some methods). And each line of that file, basically generates number of objects of this class (proportional to the size of the line in temrs of #ofwords). I started to feel that Java is not a good way to do these things when C++ is around.
Edit:
Assume that each line produces (n-1) objects of that class. Where n is the number of tokens in that line separated by space (i.e. 1457). So considering the average size of 10 words per line, each line gets mapped to 9 objects on average. So, there will be 9*3*10^6 objects.So, the memory needed is: 9*3*10^6*(8 bytes obj header + 6 x 4 byte floats) + (a map(String,Objects) and another map (Integer,ArrayList(Objects))). I need to keep everything in the memory, because there will be some mathematical optimization happening afterwards.
Reading/Parsing the file:
The best way to handle large files, in any language, is to try and NOT load them into memory.
In java, have a look at MappedByteBuffer. it allows you to map a file into process memory and access its contents without loading the whole thing into your heap.
You might also try reading the file line-by-line and discarding each line after you read it - again to avoid holding the entire file in memory at once.
Handling the resulting objects
For dealing with the objects you produce while parsing, there are several options:
Same as with the file itself - if you can perform whatever it is you want to perform without keeping all of them in memory (while "streaming" the file) - that is the best solution. you didnt describe the problem youre trying to solve so i dont know if thats possible.
Compression of some sort - switch from Wrapper objects (Float) to primitives (float), use something like the flyweight pattern to store your data in giant float[] arrays and only construct short-lived objects to access it, find some pattern in your data that allows you to store it more compactly
Caching/offload - if your data still doesnt fit in memory "page it out" to disk. this can be as simple as extending guava to page out to disk or bringing in a library like ehcache or the likes.
a note on java collections and maps in particular
For small objects java collections and maps in particular incur a large memory penalty (due mostly to everything being wrapped as Objects and the existence of the Map.Entry inner class instances). at the cost of a slightly less elegant API, you should probably look at gnu trove collections if memory consumption is an issue.
Optimal would be to hold only integers and line ends.
To that end, one way would be: convert the file to two files:
one binary file of integers (4 bytes)
one binary file with indexes where the next line would start.
For this one can use a Scanner to read, and a DataOutputStream+BufferedOutputStream to write.
Then you can load those two files in arrays of primitive type:
int[] integers = new int[(int)integersFile.length() / 4];
int[] lineEnds = new int[(int)lineEndsFile.length() / 4];
Reading can be done with a MappedByteBuffer.toIntBuffer(). (You then would not even need the arrays, but it would become a bit COBOL like verbose.)

Which uses less memory String.format or +/StringBuilder?

I've been searching through SO for awhile and all I can find is references to speed for strings and one or two rather misguided attempts at memory benchmarking.
My situation is that we have a ton of logging messages in our application and we're wondering if there is any measurable MEMORY advantage to using String.format vs. + vs. StringBuilder.
I've got a solid grip on measuring the time each of these is taking and there are plenty of SO posts for that.
Can anyone tell me which one is better for lowering memory consumption?
Example:
if(LOG.isDebugEnabled()) LOG.debug(String.format("Invoice id = %s is waiting for processing", invoice.getId()));
Since String.format() is much more complicated because it supports format sequences and data types (%s, %d etc) it is expected to be more performance and memory expensive. However I believe this may be significant for very long strings only.
I believe String.format would use less memory.
When using StringBuilder, you need to create a new builder object and then append string to it. The creation of the object and the constant references to it (via append or similar methods) would seem to me to be more memory intensive than to return a straight forward string using String.format.
StringBuilder uses a temporary object before creating the final string.
String.format seems like a more direct way and therefore less memory intensive.
Moreover, StringBuilder asks for a specific size when initialized (or else it will default to some value).
You could compare the default allocated memory values of a StringBuilder object versus a plain old String object.
You could test these two options with a large dataset of strings to be built and assessing the time it takes for each approach.
Hope this helps.

Categories

Resources