Why does FileChannel.map take up to Integer.MAX_VALUE of data? - java

I am getting following exception when using FileChannel.map
Exception in thread "main" java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
at sun.nio.ch.FileChannelImpl.map(Unknown Source)
at niotest.NioTest.readUsingNio(NioTest.java:38)
at niotest.NioTest.main(NioTest.java:64)
Quickly looking into OpenJdk implementation shows that the method map(..) in FileChannelImpl takes size of type long as input. But inside the body, it compares it with Integer.MAX_VALUE and throws error if its greater than that. Why take long size as input but limit it to max integer length?
Anyone knows specific reason behind this implementation?
or is it some kind of bug?
Source URL - http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/nio/ch/FileChannelImpl.java
I am running this program using 64bit JRE on 64bit Windows-2k8

It's not an implementation specific bug. The size is defined in the FileChannel.map as long, but...
size - The size of the region to be mapped; must be non-negative and no greater than Integer.MAX_VALUE
All compliant JVM implementations will be this way. I suspect the reason is a combination of history (who would need to access a file larger than 2GB? ;) and trying to push things forward in later versions of Java (it will be easier to allow values larger than Integer.MAX than it will be to change the data type from int to long.)
A lot of people find this int-based thinking in the Java API regarding anything File very confounding and short sighted. But remember, Java start development in 1995! I'm sure 2GB seemed like a relatively safe value at the time.

ByteBuffer's capacity is limited to Integer.MAX_VALUE, so there is no way to map anything larger than that.
Look at: MappedByteBuffer map(MapMode mode, long position, long size)
position has to be long for obvious reasons.
size is not necessary to be long but in any calculation it has to be promoted - for example position+size has to be a positive long. OS mapping indeed may use long to carry the mapping, map function (mmap) may need to map more than Integer.MAX_VALUE in order to preserve page size but ByteBuffer just can't use that.
Overall int lays very deep in java's design and there is no size_t C alike type, mass utilizing long instead of int will damper the performance. So in the end: if you need greater maps than 2GB, just use more than a single ByteBuffer.

Related

What happens when a Java String overflows?

As far as I understand, Java Strings are just an array of characters, with the maximum length being an integer value.
If I understand this answer correctly, it is possible to cause an overflow with a String - albeit in "unusual circumstances".
Since Java Strings are based on char arrays and Java automatically checks array bounds, buffer overflows are only possible in unusual scenarios:
If you call native code via JNI
In the JVM itself (usually written in C++)
The interpreter or JIT compiler does not work correctly (Java bytecode mandated bounds checks)
Correct me if I'm wrong, but I believe this means that you can write outside the bounds of the array, without triggering the ArrayIndexOutOfBounds (or similar) exception.
I've encountered issues in C++ with buffer overflows, and I can find plenty of advice about other languages, but none specifically answering what would happen if you caused a buffer overflow with a String (or any other array type) in Java.
I know that Java Strings are bounds-checked, and can't be overflowed by native Java code alone (unless issues are present in the compiler or JVM, as per points 2 and 3 above), but the first point implies that it is technically possible to get a char[] into an... undesirable position.
Given this, I have two specific questions about the behaviour of such issues in Java, assuming the above is correct:
If a String can overflow, what happens when it does?
What would the implications of this behaviour be?
Thanks in advance.
To answer you first question, I had the luck of actually causing a error of such, and the execution just stopped throwing one of these errors:
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
So that was my case, I don't know if that represents a security problem as buffer overflow in C and C++.
A String in Java is immutable, so once created there is no writing to the underlying array of char or array of byte (it depends on the Java version and contents of the String whether one or the other is used). Ok, using JNI could circumvent that, but with pure Java, it is impossible to leave the bounds of the array without causing an ArrayOutOfBoundsException or alike.
The only way to cause a kind of an overflow in the context of String handling would be to create a new String that is too long. Make sure that your JVM will have enough heap (around 36 GB), create a char array of Integer.MAX_VALUE - 1, populate that appropriately, call new String( byte [] ) with that array, and then execute
var string = string.concat( new String( array ) );
But the result is just an exception telling you that it was attempted to create a too large array.

Efficiently Storing a Short History of Boolean Events for many Components

To preface this - I have no influence on the design of this problem and I can't really give a lot of details about the technical background.
Say I have a lot of components of the same type that regularly get a boolean event - and I need to hold a short history of these boolean events.
A coworker of mine wrote a rather naive implementation using the type Map<Component, CircularFifoQueue<Boolean>>, CircularFifoQueue being data structure from Apache Commons. The code works, but given how generics work in Java and the dimensions used, this is really inefficient as it stores a reference to one of the two singleton boolean objects instead of just one bit.
Generally there are around 100K components and the history is supposed to hold the 5-10 most recent boolean values (might be subject to change but probably won't be larger than 10). This currently means that around 1.5GB of RAM are allocated just for these history maps. Also these changes happen quite frequently so it wouldn't hurt to increase the CPU efficiency if possible.
One obvious change would be to move the history into the Component class to remove the HashMap-induced overhead.
The more complicated question is how to efficiently store the last few boolean values.
One possible way would be to use BitSets, but as those use long[] as their underlying data structure, I doubt it would be the most efficient way to store what is essentially 5 bits.
Another option would be to directly use an integer and shift the value as a way to remove old entries. So basically
int history = 0;
public void set(int length, boolean active){
if(active) {
history |= 1 << length;
} else {
history &= ~(1 << length);
}
// shift one to the right to remove oldest entry
history = history >> 1;
}
Just off the top of my head. This code is untested. I don't know how efficient or if it works, but that is about what I had in mind.
But that would still lead to quite some overhead compared to the optimal case of storing 5 bits of data using 5 bits of memory.
One could achieve some additional saving if the histories of the different components were stored in a contiguous array, but I'm not sure how to handle either one giant contiguous BitSet. Or alternatively a large byte[] where each byte represents one bool-history as explained above.
This is a weirdly specific problem and I'd be really glad about any suggestions.
Setting aside the bit manipulations which I'm sure you'll conquer, please think how efficient is efficient enough.
Every instance of
class Foo {}
allocates 16 bytes. So if you were to introduce
class ComponentHistory {
private final int bits;
}
that's 20 bytes.
If you replace the int with byte, you're still at 20 bytes: byte type is padded to 4 bytes by JVM (at least).
If you define a global array of bits somewhere and refer to it from ComponentHistory, the reference itself is at least 4 bytes.
Basically, you can't win :)
But consider this: if you go with the simplest approach that you have already outlined, that produces simple readable code, your 100K component histories will take up 2MB of RAM - substantial savings from your current level of 1.5GB. Specifically, you've saved 1498MB.
Suppose you indeed invent a cumbersome yet working way of only storing 5 bits per history. You'd then need 500Kb = 60KB to store all histories. With the baseline of 1.5GB, your savings are now 1499.94MB. Savings improve by 0.1%. Does that at all matter? More often than not, I'd prefer to not over-optimize here while sacrificing simplicity.

File size vs. in memory size in Java

If I take an XML file that is around 2kB on disk and load the contents as a String into memory in Java and then measure the object size it's around 33kB.
Why the huge increase in size?
If I do the same thing in C++ the resulting string object in memory is much closer to the 2kB.
To measure the memory in Java I'm using Instrumentation.
For C++, I take the length of the serialized object (e.g string).
I think there are multiple factors involved.
First of all, as Bruce Martin said, objects in java have an overhead of 16 bytes per object, c++ does not.
Second, Strings in Java might be 2 Bytes per character instead of 1.
Third, it could be that Java reserves more Memory for its Strings than the C++ std::string does.
Please note that these are just ideas where the big difference might come from.
Assuming that your XML file contains mainly ASCII characters and uses an encoding that represents them as single bytes, then you can espect the in memory size to be at least double, since Java uses UTF-16 internally (I've heard of some JVMs that try to optimize this, thouhg). Added to that will be overhead for 2 objects (the String instance and an internal char array) with some fields, IIRC about 40 bytes overall.
So your "object size" of 33kb is definitely not correct, unless you're using a weird JVM. There must be some problem with the method you use to measure it.
In Java String object have some extra data, that increases it's size.
It is object data, array data and some other variables. This can be array reference, offset, length etc.
Visit http://www.javamex.com/tutorials/memory/string_memory_usage.shtml for details.
String: a String's memory growth tracks its internal char array's growth. However, the String class adds another 24 bytes of overhead.
For a nonempty String of size 10 characters or less, the added overhead cost relative to useful payload (2 bytes for each char plus 4 bytes for the length), ranges from 100 to 400 percent.
More:
What is the memory consumption of an object in Java?
Yes, you should GC and give it time to finish. Just System.gc(); and print totalMem() in the loop. You also better to create a million of string copies in array (measure empty array size and, then, filled with strings), to be sure that you measure the size of strings and not other service objects, which may present in your program. String alone cannot take 32 kb. But hierarcy of XML objects can.
Said that, I cannot resist the irony that nobody cares about memory (and cache hits) in the world of Java. We are know that JIT is improving and it can outperform the native C++ code in some cases. So, there is not need to bother about memory optimization. Preliminary optimization is a root of all evils.
As stated in other answers, Java's String is adding an overhead. If you need to store a large number of strings in memory, I suggest you to store them as byte[] instead. Doing so the size in memory should be the same than the size on disk.
String -> byte[] :
String a = "hello";
byte[] aBytes = a.getBytes();
byte[] -> String :
String b = new String(aBytes);

Huge String Table in Java

I've got a question about storing huge amount of Strings in application memory. I need to load from file and store about 5 millions lines, each of them max 255 chars (urls), but mostly ~50. From time to time i'll need to search one of them. Is it possible to do this app runnable on ~1GB of RAM?
Will
ArrayList <String> list = new ArrayList<String>();
work?
As far as I know String in java is coded in UTF-8, what gives me huge memory use. Is it possible to make such array with String coded in ANSI?
This is console application run with parameters:
java -Xmx1024M -Xms1024M -jar "PServer.jar" nogui
The latest JVMs support -XX:+UseCompressedStrings by default which stores strings which only use ASCII as a byte[] internally.
Having several GB of text in a List isn't a problem, but it can take a while to load from disk (many seconds)
If the average URL is 50 chars which are ASCII, with 32 bytes of overhead per String, 5 M entries could use about 400 MB which isn't much for a modern PC or server.
A Java String is a full blown object. This means that appart from the characters of the string theirselves, there is other information to store in it (a pointer to the class of the object, a counter with the number of pointers pointing to it, and some other infrastructure data). So an empty String already takes 45 bytes in memory (as you can see here).
Now you just have to add the maximum lenght of your string and make some easy calculations to get the maximum memory of that list.
Anyway, I would suggest you to load the string as byte[] if you have memory issues. That way you can control the encoding and you can still do searchs.
Is there some reason you need to restrict it to 1G? If you want to search through them, you definitely don't want to swap to disk, but if the machine has more memory it makes sense to go higher then 1G.
If you have to search, use a SortedSet, not an ArrayList

What is the max. capacity of byte-Array?

I made a JavaClass which is making addition, sub, mult. etc.
And the numbers are like (155^199 [+,-,,/] 555^669 [+,-,,/] ..... [+,-,*,/] x^n);
each number is stored in Byte-Array and byte-Array can contain max. 66.442
example:
(byte) array = [1][0] + [9][0] = [1][0][0]
(byte) array = [9][0] * [9][0] = [1][8][0][0]
My Class file is not working if the number is bigger then (example: 999^999)
How i can solve this problem to make addition between much bigger numbers?
When the byte-Array reachs the 66.443 values, VM gives this error:
Caused by: java.lang.ClassNotFoundException. which is actually not the correct error-description.
well it means, if i have a byte-array with 66.443 values, the class cannot read correctly.
Solved:
Used multidimensional-Byte Array to solve this problem.
array{array, ... nth-array} [+, -, /] nth-array{array, ... nth-array}
only few seconds to make an addition between big numbers.
Thank you!
A single method in Java is limited to 64KB of byte code. When you initialise an array in code it uses byte code to do this. This would limit the maximum size you can define an array to about this size.
If you have a large byte array of value I suggest you store it in an external file and load it at runtime. This way you can have a byte array of up to 2 GB. If you need more than this you need to have an array of arrays.
What does your actual code look like? What error are you getting?
A Java byte array can hold up to 2^31-1 values, if there is that much contiguous memory available.
Each array can hold a maximum of Integer.MAX_VALUE values. If it crashes, I guess you see an OutOfMemoryError. Fix that by starting you java vm with more heap space:
java -Xmx1024M <...>
(example give 1024 MByte heap space)
java.lang.ClassNotFoundException is thrown if the virtual machine needs a class and can't load it - usually because it is not on the class path (sometimes the case when we simply forget to compile a java source file..). This exception is totally unrelated to java array operations.
To continue the discussion in the comments section:
The name of the missing class is very important. At the line of code, where the exception is thrown, the VM tries to load the class ClassBigMath for the very first time and fails. The classloader can't find a file ClassBigMath.class on the classpath.
Double check first if the compiled java file is really present and double check that you don't have a typo in your source code. Typical reasons for this error:
We simply forget to compile a source file
A class file is on the classpath at compilation time but not at execution time
We do a Class.forName("MyClass") and have a typo in the class name
java.math.BigInteger is much better solution to handle large number. Is there any reason , you have choosed byte array ?
The maximum size of an array in Java is given by Integer.MAX_VALUE. This is 2^31-1 elements. You might get OOM exceptions for less if there is not enough memory free. Besides that, for what you are doing you might want to look at the BigInteger class. It seems you are doing your math in some form of decimal representation, which is not very memory efficient.

Categories

Resources