I have a Java Card smart card and I want to assess the available EEPROM.
To do it, I use the function JCSystem.getAvailableMemory(JCSystem.MEMORY_TYPE_PERSISTENT).
As the return statment of this function is a short, without allocating any data, I get the value 0x7FFF. To solve this problem, I create byte arrays this way: new byte[(short) 0x7FFF] to deduce the available persistant memory.
If I create two arrays:
arr1 = new byte[(short) 0x7FFF];
arr2 = new byte[(short) 0x7FFF];
Then it rests 0x1144 bytes of available memory according to JCSystem.getAvailableMemory(JCSystem.MEMORY_TYPE_PERSISTENT). So if I sum, it means that there is 32767*2 + 4420 = 69954 bytes available.
But when I change the size of my arrays:
arr1 = new byte[(short) 0x7FFF];
arr2 = new byte[(short) 0x6FFF];
then it rests 0x2244 bytes of available memory. So if I sum, it means that there is 70210 bytes available.
Another example:
With
arr1 = new byte[(short) 0x7FFF];
arr2 = new byte[(short) 0x5FFF];
it rests 0x3344 bytes of available memory. So if I sum it means that there is 70466 bytes available.
Even if it's negligible, why these differences? (70210 differs from 70466).
In the same way, I want to test how many AESKey I can allocate in one applet. So I try to find the available memory as I described before but with AESKey arrays.
With the same card, when I create an AESKey array this way:
arr = new AESKey[(short) 0x03E8];
for (short i = 0x0000; i < 0x03E8; i++) {
arr[i] = (AESKey) KeyBuilder.buildKey(KeyBuilder.TYPE_AES, KeyBuilder.LENGTH_AES_256, false);
}
So I create an array of a thousand 256 bits AESKey. I thought that it would take 32Ko, but the method JCSystem.getAvailableMemory(JCSystem.MEMORY_TYPE_PERSISTENT) indicates that there is 0x0022 bytes available. Why this result?
If I test with half keys (e.g 500):
arr = new AESKey[(short) 0x01F4];
for (short i = 0x0000; i < 0x01F4; i++) {
arr[i] = (AESKey) KeyBuilder.buildKey(KeyBuilder.TYPE_AES, KeyBuilder.LENGTH_AES_256, false);
}
the method JCSystem.getAvailableMemory(JCSystem.MEMORY_TYPE_PERSISTENT) indicates that there is 0x55EE (21998) bytes available: I definitely don't see the relation with the case when I create 1000 keys if the available EEPROM is about 70Ko like I explained at the beginning...
Could someone describe in detail how the memory is allocated in Java Card to explain the results cited above?
There are a few reasons for this:
there is object allocation overhead;
there may be overhead with regards to aligning data;
there may be overhead with regards to memory fragmentation;
for keys, there may be overhead to keep them secure.
All these issues will reduce the amount of memory available to you. In that regard you should see getAvailableMemory as a rough indication of the maximum amount of memory available.
How much overhead is required depends on the Java Card runtime.
Well, for a short answer:
Java Card isn't to clever storing arrays / it needs additional data. So if you fill up a byte array with x bytes to get under the 0x7FFF treshhold the array will internally need more than x bytes to store the data and therefore you have a difference that you noticed.
If you are working with JCOP cards you can circumvent the problem by using UtilX.getAvailableMemory().
For a little more knowledge read this:
http://ruimtools.com/doc.php?doc=jc_best on the point reducing EEPROM consumption (however some parts are outdated)
Related
I have read some answers for this question(Why I can't create an array with large size? and https://bugs.openjdk.java.net/browse/JDK-8029587) and I don't understand the following.
"In the GC code we pass around the size of objects in words as an int." As I know the size of a word in JVM is 4 bytes. According to this, if we pass around the size of long array of large size (for example, MAX_INT - 5) in words as an int, we must get OutOfMemoryException with Requested array size exceeds VM limit because the size is too large for int even without size of header. So why arrays of different types have the same limit on max count of elements?
Only addressing the why arrays of different types have the same limit on max count of elements? part:
Because it doesn't matter to much in practical reality; but allows the code implementing the JVM to be simpler.
When there is only one limit; that is the same for all kinds of arrays; then you can deal all arrays with that code. Instead of having a lot of type-specific code.
And given the fact that the people that need "large" arrays can still create them; and only those that need really really large arrays are impacted; why spent that effort?
The answer is in the jdk sources as far as I can tell (I'm looking at jdk-9); also after writing it I am not sure if it should be a comment instead (and if it answers your question), but it's too long for a comment...
First the error is thrown from hotspot/src/share/vm/oops/arrayKlass.cpp here:
if (length > arrayOopDesc::max_array_length(T_ARRAY)) {
report_java_out_of_memory("Requested array size exceeds VM limit");
....
}
Now, T_ARRAY is actually an enum of type BasicType that looks like this:
public static final BasicType T_ARRAY = new BasicType(tArray);
// tArray is an int with value = 13
That is the first indication that when computing the maximum size, jdk does not care what that array will hold (the T_ARRAY does not specify what types will that array hold).
Now the method that actually validates the maximum array size looks like this:
static int32_t max_array_length(BasicType type) {
assert(type >= 0 && type < T_CONFLICT, "wrong type");
assert(type2aelembytes(type) != 0, "wrong type");
const size_t max_element_words_per_size_t =
align_size_down((SIZE_MAX/HeapWordSize - header_size(type)), MinObjAlignment);
const size_t max_elements_per_size_t =
HeapWordSize * max_element_words_per_size_t / type2aelembytes(type);
if ((size_t)max_jint < max_elements_per_size_t) {
// It should be ok to return max_jint here, but parts of the code
// (CollectedHeap, Klass::oop_oop_iterate(), and more) uses an int for
// passing around the size (in words) of an object. So, we need to avoid
// overflowing an int when we add the header. See CRs 4718400 and 7110613.
return align_size_down(max_jint - header_size(type), MinObjAlignment);
}
return (int32_t)max_elements_per_size_t;
}
I did not dive too much into the code, but it is based on HeapWordSize; which is 8 bytes at least. here is a good reference (I tried to look it up into the code itself, but there are too many references to it).
I am trying to create 2D array in Java as follows:
int[][] adjecancy = new int[96295][96295];
but it is failing with the following error:
JVMDUMP039I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError" at 2017/04/07 11:58:55 - please wait.
JVMDUMP032I JVM requested System dump using 'C:\eclipse\workspaces\TryJavaProj\core.20170407.115855.7840.0001.dmp' in response to an event
JVMDUMP010I System dump written to C:\eclipse\workspaces\TryJavaProj\core.20170407.115855.7840.0001.dmp
JVMDUMP032I JVM requested Heap dump using 'C:\eclipse\workspaces\TryJavaProj\heapdump.20170407.115855.7840.0002.phd' in response to an event
JVMDUMP010I Heap dump written to C:\eclipse\workspaces\TryJavaProj\heapdump.20170407.115855.7840.0002.phd
A way to solve this is by increasing the JVM memory but I am trying to submit the code for an online coding challenge. There it is also failing and I will not be able to change the settings there.
Is there any standard limit or guidance for creating large arrays which one should not exceed?
int[][] adjecancy = new int[96295][96295];
When you do that you are trying to allocate 96525*96525*32 bits which is nearly 37091 MB which is nearly 37 gigs. That is highly impossible to get the memory from a PC for Java alone.
I don't think you need that much data in your hand on initialization of your program. Probably you have to look at ArrayList which gives you dynamic allocation of size and then keep on freeing up at runtime is a key to consider.
There is no limit or restriction to create an array. As long as you have memory, you can use it. But keep in mind that you should not hold a block of memory which makes JVM life hectic.
Array must obviously fit into memory. If it does not, the typical solutions are:
Do you really need int (max value 2,147,483,647)? Maybe byte (max
value 127) or short is good enough? byte is 8 times smaller than int.
Do you have really many identical values in array (like zeros)? Try to use sparse arrays.
for instance:
Map<Integer, Map<Integer, Integer>> map = new HashMap<>();
map.put(27, new HashMap<Integer, Integer>()); // row 27 exists
map.get(27).put(54, 1); // row 27, column 54 has value 1.
They need more memory per value stored, but have basically no limits on the array space (you can use Long rather than Integer as index to make them really huge).
Maybe you just do not know how long the array should be? Try ArrayList, it self-resizes. Use ArrayList of ArrayLists for 2D array.
If nothing else is helpful, use RandomAccessFile to store your overgrown data into the filesystem. 100 Gb or about are not a problem in these times on a good workstation, you just need to compute the required offset in the file. The filesystem is obviously much slower than RAM but with good SSD drive may be bearable.
It is recommended to allocate Maximum Heap Size that can be allocated is 1/4th of the Machine RAM Size.
1 int in Java takes 4 bytes and your array allocation needs approximately 37.09GB of Memory.
In that case even if I assume you are allocating Full Heap to just an Array your machine should be around 148GB RAM. That is huge.
Have a look at below.
Ref: http://docs.oracle.com/javase/8/docs/technotes/guides/vm/gc-ergonomics.html
Hope this helps.
It depends on maximum memory available to your JVM and the content type of the array. For int we have 4 bytes of memory. Now if 1 MB of memory is available on your machine , it can hold maximum of 1024 * 256 integers(1 MB = 1024 * 1024 bytes). Keeping that in mind you can create your 2D array accordingly.
Array that you can create depends upon JVM heap size.
96295*96295*4(bytes per number) = 37,090,908,100 bytes = ~34.54 GBytes. Most JVMs in competitive code judges don't have that much memory. Hence the error.
To get a good idea of what array size you can use for given heap size -
Run this code snippet with different -Xmx settings:
Scanner scanner = new Scanner(System.in);
while(true){
System.out.println("Enter 2-D array of size: ");
size = scanner.nextInt();
int [][]numbers = new int[size][size];
numbers = null;
}
e.g. with -Xmx 512M -> 2-D array of ~10k+ elements.
Generally most of online judges have ~1.5-2GB heap while evaluating submissions.
I have to read a text file of 226mb made like this:
0 25
1 1382
2 99
3 3456
4 921
5 1528
6 578
7 122
8 528
9 81
the first number is a index, the second a value. I want to load a vector of short reading this file (8349328 positions), so I wrote this code:
Short[] docsofword = new Short[8349328];
br2 = new BufferedReader(new FileReader("TermOccurrenceinCollection.txt"));
ss = br2.readLine();
while(ss!=null)
{
docsofword[Integer.valueOf(ss.split("\\s+")[0])] = Short.valueOf(ss.split("\\s+")[1]); //[indexTerm] - numOccInCollection
ss = br2.readLine();
}
br2.close();
It turns out that the entire load takes an incredible amount of memory of 4.2GB. Really i don't understand why, i expected a 15MB vector.
Thanks for any answer.
There are multiple effects at work here.
First, you declared your array as type Short[] insted of short[]. The former is a reference type, meaning each value is wrapped into an instance of Short, consuming the overhead of a full blown object (most likely 16 bytes instead of two). This also inflates each array slot from two bytes to the reference size (generally 4 or 8 bytes, depending on heap size and 32/64 bit VM). The minimum size you can expect for the fully populated array is thus approximately: 8349328 x 20 = 160MB.
Your reading code is happily producing tons of garbage objects - you are using again a wrapper type (Integer) to address the array where a simple int would do. Thats at least 16 bytes of garbage where it would be zero with int. String.split is another culprit, you force the compilation of two regular expressions per line, plus create two strings. Thats numerous short lived objects that become garbage for each line. All of that could be avoided with a few more lines of code.
So you have a relatively memory hungry array, and lots of garbage. The garbage memory can be cleaned up, but the JVM decides when. The decision is based on available maximum heap memory and garbage collector parameters. If you supplied no arguments for either, the JVM will happily fill your machines memory before it attempts to reclaim garbage.
TLDR: Inefficient reading code paired with no JVM parameters.
If file is generated by you, use objectOutputStream, It very easy way to read the file.
As #Durandal, change the code accordingly. I am giving sample code below.
short[] docsofword = new short[8349328];
br2 = new BufferedReader(new FileReader("TermOccurrenceinCollection.txt"));
ss = br2.readLine();
int strIndex, index;
while(ss!=null)
{
strIndex = ss.indexOf( ' ' );
index = Integer.parseInt(ss.subStr(0, strIndex));
docsofword[index] = Short.parseShort(ss.subStr(strIndex+1));
ss = br2.readLine();
}
br2.close();
Even you can optimise further. Instead of indexOf() we can write our own method, when char is matching to space, parse string as integer. After that we will get indexOf Space and index for get remain string.
If I take an XML file that is around 2kB on disk and load the contents as a String into memory in Java and then measure the object size it's around 33kB.
Why the huge increase in size?
If I do the same thing in C++ the resulting string object in memory is much closer to the 2kB.
To measure the memory in Java I'm using Instrumentation.
For C++, I take the length of the serialized object (e.g string).
I think there are multiple factors involved.
First of all, as Bruce Martin said, objects in java have an overhead of 16 bytes per object, c++ does not.
Second, Strings in Java might be 2 Bytes per character instead of 1.
Third, it could be that Java reserves more Memory for its Strings than the C++ std::string does.
Please note that these are just ideas where the big difference might come from.
Assuming that your XML file contains mainly ASCII characters and uses an encoding that represents them as single bytes, then you can espect the in memory size to be at least double, since Java uses UTF-16 internally (I've heard of some JVMs that try to optimize this, thouhg). Added to that will be overhead for 2 objects (the String instance and an internal char array) with some fields, IIRC about 40 bytes overall.
So your "object size" of 33kb is definitely not correct, unless you're using a weird JVM. There must be some problem with the method you use to measure it.
In Java String object have some extra data, that increases it's size.
It is object data, array data and some other variables. This can be array reference, offset, length etc.
Visit http://www.javamex.com/tutorials/memory/string_memory_usage.shtml for details.
String: a String's memory growth tracks its internal char array's growth. However, the String class adds another 24 bytes of overhead.
For a nonempty String of size 10 characters or less, the added overhead cost relative to useful payload (2 bytes for each char plus 4 bytes for the length), ranges from 100 to 400 percent.
More:
What is the memory consumption of an object in Java?
Yes, you should GC and give it time to finish. Just System.gc(); and print totalMem() in the loop. You also better to create a million of string copies in array (measure empty array size and, then, filled with strings), to be sure that you measure the size of strings and not other service objects, which may present in your program. String alone cannot take 32 kb. But hierarcy of XML objects can.
Said that, I cannot resist the irony that nobody cares about memory (and cache hits) in the world of Java. We are know that JIT is improving and it can outperform the native C++ code in some cases. So, there is not need to bother about memory optimization. Preliminary optimization is a root of all evils.
As stated in other answers, Java's String is adding an overhead. If you need to store a large number of strings in memory, I suggest you to store them as byte[] instead. Doing so the size in memory should be the same than the size on disk.
String -> byte[] :
String a = "hello";
byte[] aBytes = a.getBytes();
byte[] -> String :
String b = new String(aBytes);
I try to build a map with the content of a file and my code is as below:
System.out.println("begin to build the sns map....");
String basePath = PropertyReader.getProp("oldbasepath");
String pathname = basePath + "\\user_sns.txt";
FileReader fr;
Map<Integer, List<Integer>> snsMap =
new HashMap<Integer, List<Integer>>(2000000);
try {
fr = new FileReader(pathname);
BufferedReader br = new BufferedReader(fr);
String line;
int i = 1;
while ((line = br.readLine()) != null) {
System.out.println("line number: " + i);
i++;
String[] strs = line.split("\t");
int key = Integer.parseInt(strs[0]);
int value = Integer.parseInt(strs[1]);
List<Integer> list = snsMap.get(key);
//if the follower is not in the map
if(snsMap.get(key) == null)
list = new LinkedList<Integer>();
list.add(value);
snsMap.put(key, list);
System.out.println("map size: " + snsMap.size());
}
} catch (IOException e) {
e.printStackTrace();
}
System.out.println("finish building the sns map....");
return snsMap;
The program is very fast at first but gets much slowly when the information printed is :
map size: 1138338
line number: 30923602
map size: 1138338
line number: 30923603
....
I try to find to reason with two System.out.println() clauses to judge the preformance of BufferedReader and HashMap instead of a Java profiler.
Sometimes it takes a while to get the information of the map size after getting the line number information, and sometimes, it takes a while to get the information of the line number information after get the map size. My question is: which makes my program slow? the BufferedReader for a big file or HashMap for a big map?
If you are testing this from inside Eclipse, you should be aware of the huge performance penalty of writing to stdout/stderr, due to Eclipse's capturing that ouptut in the Console view. Printing inside a tight loop is always a performance issue, even outside of Eclipse.
But, if what you are complaining about is the slowdown experienced after processing 30 million lines, then I bet it's a memory issue. First it slows down due to intense GC'ing and then it breaks with OutOfMemoryError.
You will have to check you program with some profiling tools to understand why it is slow.
In general file access is much more slower than in memory operations (unless you are constrained in memory and doing excess GC) so the guess would be that reading file could be the slower here.
Before you profiled, you will not know what is slow and what isn't.
Most likely, the System.out will show up as being the bottleneck, and you'll then have to profile without them again. System.out is the worst thing you can do for finding performance bottlenecks, because in doing so you usually add an even worse bottleneck.
An obivous optimization for your code is to move the line
snsMap.put(key, list);
into the if statement. You only need to put this when you created a new list. Otherwise, the put will just replace the current value with itself.
Java cost associated with Integer objects (and in particular the use of Integers in the Java Collections API) is largely a memory (and thus Garbage Collection!) issue. You can sometimes get significant gains by using primitive collections such as GNU trove, depending how well you can adjust your code to use them efficiently. Most of the gains of Trove are in memory usage. Definitely try rewriting your code to use TIntArrayList and TIntObjectMap from GNU trove. I'd avoid linked lists, too, in particular for primitive types.
Roughly estimated, a HashMap<Integer, List<Integer>> needs at least 3*16 bytes per entry. The doubly linked list again needs at least 2*16 bytes per entry stored. 1m keys + 30m values ~ 1 GB. No overhead included yet. With GNU trove TIntObjectHash<TIntArrayList> that should be 4+4+16 bytes per key and 4 bytes per value, so 144 MB. The overhead is probably similar for both.
The reason that Trove uses less memory is because the types are specialized for primitive values such as int. They will store the int values directly, thus using 4 bytes to store each.
A Java collections HashMap consists of many objects. It roughly looks like this: there are Entry objects that point to a key and a value object each. These must be objects, because of the way generics are handled in Java. In your case, the key will be an Integer object, which uses 16 bytes (4 bytes mark, 4 bytes type, 4 bytes actual int value, 4 bytes padding) AFAIK. These are all 32 bit system estimates. So a single entry in the HashMap will probably need some 16 (entry) + 16 (Integer key) + 32 (yet empty LinkedList) bytes of memory that all need to be considered for garbage collection.
If you have lots of Integer objects, it just will take 4 times as much memory as if you could store everything using int primitives. This is the cost you pay for the clean OOP principles realized in Java.
The best way is to run your program with profiler (for example, JProfile) and see what parts are slow. Also debug output can slow your program, for example.
Hash Map is not slow, but in reality its the fastest among the maps. HashTable is the only thread safe among maps, and can be slow sometimes.
Important note: Close the BufferedReader and File after u read the data... this might help.
eg: br.close()
file.close()
Please check you system processes from task manager, there may be too may processes running in the background.
Sometimes eclipse is real resource heavy, so try to run it from console to check it.