How big is an object reference?

How big is an object reference? - java

What is the size that a reference in Android's Java VM consumes?
More info:
By that I mean, if we have
String str = "Watever";
I need what str takes, not "Watever". -- "Watever" is what's saved in the location to which the pointer (or the reference) that str is holding, is pointing to.
Also, if we have
String str = null;
how much memory does it consume? Is it the same as the other str?
Now, if we have:
Object obj[] = new object[2];
how much does obj consume and how much does obj[1] and obj[2] consume?
The reason for the question is the following: (in case someone can recommend something).
I'm working on an app that manages many pictures downloaded from internet.
I started storing those pictures on a "bank" (that consists of a list of pictures).
When displaying those pictures on a gallery, I used to search for the picture in the list (SLOW) and then, if then picture wasn't there, I used to show a temporal downloading image until the picture was downloaded.
Since that happened on the UI Thread, the app became very slow, so I thought about implementing a hash table on the bank instead of the list I had.
As I explained before, this search occurs in the UI Thread (and I can't change that). Because of that, collisions can become a problem if they start slowing the thread.
I have read that "To balance time and space efficiency, the hash table should be around half full", but that makes collisions occur half of the time (Not practical for the UI Thread). That makes me think about having a very long hash table (compared to the amount of pictures saved) and use more RAM (having less free VMHeap).
Before determining the size of the hash table, I wanted to know how much memory would it consume in order not to exagerate.
I know that the size of the hash table might be very small compared to the memory that the pictures might consume, but I wanted to make sure I wasn't consuming more memory than necessary.
Before asking this question i searched, between other places, in
How big is an object reference in Java and precisely what information does it contain?
reference type size in java
Hashing Tutorial
(Yes, I know two of the places contradict each other, that's part of the reason for the question).

A object or array reference occupies one 32 bit word (4 bytes) on a 32 bit JVM or Davlik VM. A null takes the same space as a reference. (It has to, because a null has to fit in a reference-typed slot; i.e. instance field, local variable, etc.)
On the other hand, an object occupies a minimum of 2 32 bit words (8 bytes), and an array occupies a minimum of 3 32 bit words (12 bytes). The actual size depends on the number and kinds of fields for an object, and on the number and kind of elements for an array.
For a 64 bit JVM, the size of a reference is 64 bits, unless you have configured the JVM to use compressed pointers:
-XX:+UseCompressedOops Enables the use of compressed pointers (object references represented as 32 bit offsets instead of 64-bit pointers) for optimized 64-bit performance with Java heap sizes less than 32gb.
This is the nub of your question, I think.
Before determining the size of the hash table, I wanted to know how much memory would it consume in order not to exagerate.
If you allocate a HashMap or Hashtable with a large initial size, the majority of the space will be occupied by the hash array. This is an array of references, so the size will be 3 + initialSize 32 bit words. It is unlikely that this will be significant ... unless you get your size estimate drastically wrong.
However, I think you are probably worrying unnecessarily about performance. If you are storing objects in a default allocated HashMap or Hashtable, the class will automatically resize the hash table as it gets larger. So, provided that your objects have a decent hash function (not too slow, not hashing everything to a small number of values) the hash table should not be a direct CPU performance concern.

References are nearly free. Even more so when compared to images.
Having a few collisions in a Map isn't a real problem. Collisions can be resolved far quicker than a linear search through a list of items. That said, a Binary Search through a sorted list of items would be a good way to keep memory usage down (compared to a Map).
I can vouch for the effectiveness of a having smaller initial sizes for Maps - I recently wrote a program that makes a Trie structure of 170000 English words. When I set the initial size to 26, I would run out of memory by the time I got to words starting with R. Cutting it down to 5, I was able to create the maps without memory issues and can search the tree (with many collisions) in effectively no time.
[Edit] If a reference is 32 bit (4 bytes) and your average image is around 2 megabytes, you could fit 500000 references into the same space that a single image would take. You don't have to worry about the references.

Related

Massive map performance (java)

I'm working on a project that requires that I store (potentially) millions of key-value mapping, and make (potentially) the 100s of queries a second. There are some checks I can do around the data I'm working with, but it will only reduce the load by a bit. In addition, I will be making (potentially) 100s of put/removes a second, so my question is: Is there a map sufficient for this task? Is there any way I might optimize the map? Is there something faster that would work for storing key-value mappings?
Some additional information;
- The key will be a point in 3d spaces, I feel like this means I could use arrays, but the arrays would have to be massive
- The value must be an object
Any help would be greatly appreciated!

Back of envelope estimates help in getting to terms with this sort of thing. If you have millions of entries in a map, lets say 32M, and a key is a 3d point (so 3 ints->3*4B->12 bytes) ->12B * 32M = 324MB. You didn't mention the size of the value but assuming you have a similarly sized value lets double that figure. This is Java, so assuming a 64bit platform with Compressed OOPs which is default and what most people are on, you pay an extra 12B of object header per Object. So: 32M * 2 * 24B = 1536MB.
Now if you use a HashMap each entry requires an extra HashMap.Node, in Java8 on the platform above you are looking at 32B per Node (use OpenJDK JOL to find out object sizes). Which brings us to 2560MB. Also throw in the cost of the HashMap array, with 32M entries you are looking at a table with 64M entries (because the array size is a power of 2 and you need some slack beyond your entries), so that's an extra 256MB. All together lets round it up to 3GB?
Most servers these days have quite large amounts of memory (10s to 100s of GB) and adding an extra 3GB to the JVM live set should not scare you. You might consider it disappointing that the overhead exceeds the data in your case, but this is not your emotional well being, it's a question of will it work ;-)
Now that you've loaded up the data, you are mutating it at a rate of 100s of inserts/deletes per second, lets say 1024, reusing above quantities we can sum it up with: 1024 * (24*2 + 32) = 70KB. Churning 70KB of garbage per second is small change for many applications, and not something you necessarily need to sweat about. To put it in context, a JVM will contend with collecting many 100s of MB of Young Generation in a matter of 10s of milliseconds these days.
So, in summary, if all you need is to load the data and query/mutate it along the lines you describe you might just find that a modern server can easily contend with a vanilla solution. I'd recommend you give that a go, maybe prototype with some representative data set, and see how it works out. If you have an issue you can always find more exotic/efficient solutions.

Serialization - differences between C++ and Java

I've recently been running some benchmarks trying to find the "best" serialization frameworks for C++ and also in Java. The factors that make up "best" for me are
the speed of de/serializing and also the resulting size of the serialized object.
If I look at the results of various frameworks in Java, I see that the resulting byte[] is generally smaller than the object size in memory. This is even the case with the built in Java serialization. If you then look at some of the other offerings (protobuf etc.) the size decreases even more.
I was quite surprised that when I looked at things on the C++ size (boost, protobuf) that the resulting object is generally no smaller (and in some cases bigger) than the original object.
Am I missing something here? Why do I get a fair amount of "compression" for free in Java but not in C++?
n.b for measuring the size of the objects in Java, I'm using Instrumentation http://docs.oracle.com/javase/6/docs/api/java/lang/instrument/Instrumentation.html

Did you compare the absolute size of the data? I would say that Java has more overhead, so if you "compress" the data into a serialized buffer, the amount of overhead decreases a lot more. In C/C++ you have almost the bare minimum required for the physical data size, so there is not much room for compression. And in fact, you have to add additional information to deserialize it, which could even result in a growth.

Object size can be observed to be bigger than the actual data size due to the offset bits between data members.
When an object is serialized, these offset bits are discarded and as a result, serialized object memory is smaller.
Because java is a managed environment, it will need more of such offset data to control memory and ownership, therefore, their compression rate is bigger.

What do you do when you need more Java Heap Space?

Sorry if this has been asked before (though I can't really find a solution).
I'm not really too good at programming, but anyways, I am crawling a bunch of websites and storing information about them on a server. I need a java program to process vector coordinates associated with each of the documents (about a billion or so documents with a grant total of 500,000 numbers, plus or minus, associated with each of the documents). I need to calculate the singular value decomposition of that whole matrix.
Now Java, obviously, can't handle as big of a matrix as that to my knowledge. If i try making a relatively small array (about 44 million big) then I will get a heap error. I use eclipse, and so I tried changing the -xmx value to 1024m (it won't go any higher for some reason even though I have a computer with 8gb of ram).
What solution is there to this? Another way of retrieving the data I need? Calculating the SVD in a different way? Using a different programming language to do this?
EDIT: Just for right now, pretend there are a billion entries with 3 words associated with each. I am setting the Xmx and Xms correctly (from run configurations in eclipse -> this is the equivalent to running java -XmsXXXX -XmxXXXX ...... in command prompt)

The Java heap space can be set with the -Xmx (note the initial capital X) option and it can certainly reach far more than 1 GB, provided you are using an 64-bit JVM and the corresponding physical memory is available. You should try something along the lines of:
java -Xmx6144m ...
That said, you need to reconsider your design. There is a significant space cost associated with each object, with a typical minimum somewhere around 12 to 16 bytes per object, depending on your JVM. For example, a String has an overhead of about 36-40 bytes...
Even with a single object per document with no book-keeping overhead (impossible!), you just do not have the memory for 1 billion (1,000,000,000) documents. Even for a single int per document you need about 4 GB.
You should re-design your application to make use of any sparseness in the matrix, and possibly to make use of disk-based storage when possible. Having everything in memory is nice, but not always possible...

Are you using a 32 bit JVM? These cannot have more than 2 GB of Heap, I never managed to allocate more than 1.5 GB. Instead, use a 64 bit JVM, as these can allocate much more Heap.

Or you could apply some math to it and use divide and conquer strategy. This means, split the problem into little problems to get to the same result.
Don't know much about SVD but maybe this page can be helpful:
http://www.netlib.org/lapack/lug/node32.html

-Xms and -Xmx are different. The one containg s is the starting heap space and the one with x is the maximum heap space.
so
java -Xms512 -Xmx1024
would give you 512 to start with
As other people have said though you may need to break your problem down to get this to work. Are you using 32 or 64 bit java?

For data of that size, you should not plan to store it all in memory. The most common scheme to externalize this kind of data is to store it all in a database and structure your program around database queries.

Just for right now, pretend there are a billion entries with 3 words associated with each.
If you have one billion entries you need 1 billion times the size of each entry. If you mean 3 x int as words that's 12 GB at least just for the data. If you meant words as String, you would enumerate the words as there is only about 100K words in English and it would take the same amount of space.
Given 16 GB cost a few hundred dollars, I would suggest buying more memory.

Techniques for keeping data in the cache, locality?

For ultra-fast code it essential that we keep locality of reference- keep as much of the data which is closely used together, in CPU cache:
http://en.wikipedia.org/wiki/Locality_of_reference
What techniques are to achieve this? Could people give examples?
I interested in Java and C/C++ examples. Interesting to know of ways people use to stop lots of cache swapping.
Greetings

This is probably too generic to have clear answer. The approaches in C or C++ compared to Java will differ quite a bit (the way the language lays out objects differ).
The basic would be, keep data that will be access in close loops together. If your loop operates on type T, and it has members m1...mN, but only m1...m4 are used in the critical path, consider breaking T into T1 that contains m1...m4 and T2 that contains m4...mN. You might want to add to T1 a pointer that refers to T2. Try to avoid objects that are unaligned with respect to cache boundaries (very platform dependent).
Use contiguous containers (plain old array in C, vector in C++) and try to manage the iterations to go up or down, but not randomly jumping all over the container. Linked Lists are killers for locality, two consecutive nodes in a list might be at completely different random locations.
Object containers (and generics) in Java are also a killer, while in a Vector the references are contiguous, the actual objects are not (there is an extra level of indirection). In Java there are a lot of extra variables (if you new two objects one right after the other, the objects will probably end up being in almost contiguous memory locations, even though there will be some extra information (usually two or three pointers) of Object management data in between. GC will move objects around, but hopefully won't make things much worse than it was before it run.
If you are focusing in Java, create compact data structures, if you have an object that has a position, and that is to be accessed in a tight loop, consider holding an x and y primitive types inside your object rather than creating a Point and holding a reference to it. Reference types need to be newed, and that means a different allocation, an extra indirection and less locality.

Two common techniques include:
Minimalism (of data size and/or code size/paths)
Use cache oblivious techniques
Example for minimalism: In ray tracing (a 3d graphics rendering paradigm), it is a common approach to use 8 byte Kd-trees to store static scene data. The traversal algorithm fits in just a few lines of code. Then, the Kd-tree is often compiled in a manner that minimalizes the number of traversal steps by having large, empty nodes at the top of tree ("Surface Area Heuristics" by Havran).
Mispredictions typically have a probability of 50%, but are of minor costs, because really many nodes fit in a cache-line (consider that you get 128 nodes per KiB!), and one of the two child nodes is always a direct neighbour in memory.
Example for cache oblivious techniques: Morton array indexing, also known as Z-order-curve-indexing. This kind of indexing might be preferred if you usually access nearby array elements in unpredictable direction. This might be valuable for large image or voxel data where you might have 32 or even 64 bytes big pixels, and then millions of them (typical compact camera measure is Megapixels, right?) or even thousands of billions for scientific simulations.
However, both techniques have one thing in common: Keep most frequently accessed stuff nearby, the less frequently things can be further away, spanning the whole range of L1 cache over main memory to harddisk, then other computers in the same room, next room, same country, worldwide, other planets.

Some random tricks that come to my mind, and which some of them I used recently:
Rethink your algorithm. For example, you have an image with a shape and the processing algorithm that looks for corners of the shape. Instead of operating on the image data directly, you can preprocess it, save all the shape's pixel coordinates in a list and then operate on the list. You avoid random the jumping around the image
Shrink data types. Regular int will take 4 bytes, and if you manage to use e.g. uint16_t you will cache 2x more stuff
Sometimes you can use bitmaps, I used it for processing a binary image. I stored pixel per bit, so I could fit 8*32 pixels in a single cache line. It really boosted the performance
Form Java, you can use JNI (it's not difficult) and implement your critical code in C to control the memory

In the Java world the JIT is going to be working hard to achieve this, and trying to second guess this is likely to be counterproductive. This SO question addresses Java-specific issues more fully.

B-tree implementation for variable size keys

I'm looking to implement a B-tree (in Java) for a "one use" index where a few million keys are inserted, and queries are then made a handful of times for each key. The keys are <= 40 byte ascii strings, and the associated data always takes up 6 bytes. The B-tree structure has been chosen because my memory budget does not allow me to keep the entire temporary index in memory.
My issue is about the practical details in choosing a branching factor and storing nodes on disk. It seems to me that there are two approaches:
One node always fit within one block. Achieved by choosing a branching factor k so that even for the worst case key-length the storage requirement for keys, data and control structures are <= the system block size. k is likely to be low, and nodes will in most cases have a lot of empty room.
One node can be stored on multiple blocks. Branching factor is chosen independent of key size. Loading a single node may require that multiple blocks are loaded.
The questions are then:
Is the second approach what is usually used for variable-length keys? or is there some completely different approach I have missed?
Given my use case, would you recommend a different overall solution?
I should in closing mention that I'm aware of the jdbm3 project, and is considering using it. Will attempt to implement my own in any case, both as a learning exercise and to see if case specific optimization can yield better performance.
Edit: Reading about SB-Trees at the moment:
S(b)-Trees
Algorithms and Data Structures for External Memory

I'm missing option C here:
At least two tuples always fit into one block, the block size is chosen accordingly. Blocks are filled up with as many key/value pairs as possible, which means the branching factor is variable. If the blocksize is much greater than average size of a (key, value) tuple, the wasted space would be very low. Since the optimal IO size for discs is usually 4k or greater and you have a maximum tuple size of 46, this is automatically true in your case.
And for all options you have some variants: B* or B+ Trees (see Wikipedia).

JDBM BTree is already self balancing. It also have defragmentation which is very fast and solves all problems described above.
One node can be stored on multiple blocks. Branching factor is chosen independent of key size. Loading a single node may require that multiple blocks are loaded.
Not necessary. JDBM3 uses mapped memory, so it never reads full block from disk to memory. It creates 'a view' on top of block and only read partial data as actually needed. So instead of reading full 4KB block, it may read just 2x128 bytes. This depends on underlying OS block size.
Is the second approach what is usually used for variable-length keys? or is there some completely different approach I have missed?
I think you missed point that increasing disk size decreases performance, as more data have to be read. And single tree can have share both approaches (newly inserted nodes first, second after defragmentation).
Anyway, flat-file with mapped memory buffer is probably best for your problem. Since you have fixed record size and just a few million records.
Also have look at leveldb. It has new java port which almost beats JDBM:
https://github.com/dain/leveldb
http://code.google.com/p/leveldb/

You could avoid this hassle if you use some embedded database. Those have solved these problems and some more for you already.
You also write: "a few million keys" ... "[max] 40 byte ascii strings" and "6 bytes [associated data]". This does not count up right. One gig of RAM would allow you more then "a few million" entries.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.