Maximum size of 2D array in Java - java

I have to create 1000 heaps, each of which can contain 10^6 nodes. For easy access of the nodes, for deletion of nodes and for updating keys of nodes, I am planning to create a 2D array of size 10^6 * 1000 in which I'll be storing the references of nodes. But, is an array of such a big size possible to create in Java?
Is there a better way to access a particular node from the heaps without creating an array?
I could go through each node of the heap in order to search for my node, but this process will be of the order n for 1 heap, and if I have to perform a deletion of a particular node from all heaps, the process would take be of the order 1000*n.

If you need more memory than max value of int you can use sun.misc.Unsafe. See here: https://dzone.com/articles/understanding-sunmiscunsafe

Java arrays are indexed by ints, so the maximum index is 2^31 - 1, which is 2147483647 (approx. 2E9), so you should be ok with your 1E9 sized array. However, you'll need to have enough RAM, don't forget a billion longs will take 8GB of ram.
There's a much more detailed discussion in Do Java arrays have a maximum size?

Related

How much memory is needed to do bucket-sorting on 1TB of integers in Java?

I think the answer to this question should be 16GB, the following is how I calculate:
One integer is 32bit
In java, the range of integers is from -2^31 to 2^31-1, so the total number of integers is 2^32
We need to have a int array with the size of 2^32 to do bucket sorting
So I got the result of 32bit * 2^32 = 16GB
Can anyone tell me if this is correct? Because I found people saying this should be 4GB. I don't know how 4GB is calculated.
One example can be found:
https://www.quora.com/How-would-you-sort-a-100-TB-file-with-only-4-GB
The example question linked to is not asking the same thing. A bucket sort on 1TB of integers would need 4GB of buckets, where each bucket would be 1TB * sizeof(integer) in size. This would not be a reasonable approach.
As for the linked question, to sort a large file, some variation of an external bottom up k-way merge sort would be used, probably with k == 16. The trade off on k is the number of passes it will take to do the sort (fewer passes if k is larger), versus the compare time to find the smallest element of k elements for each element merged (longer compare time if k is larger).

Hashmap hashtable size limit less than max allowed limit for array index

I just wish to validate my below understanding so please suggest.
In Java, regular array can have indices up to the maximum value of int type which is 2 raised to power 31 minus -1 and since HashMap MAXIMUM_CAPACITY is an int too, it can go up to that value too.
But since HashMap internally needs table length(bucket size) to be a power of two so limit gets curtailed to - static final int MAXIMUM_CAPACITY = 1 << 30; since that value is nearest power of two to 1<<31 -1 .
Am I correct in my understanding?
All answers here mention only about sign bit limit but not power of two requirement,
/**
* The table, resized as necessary. Length MUST Always be a power of two.
*/
transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;
Also, I understand that size limit for array or Hashmap (bucket size) has nothing to do with system / object / heap memory limitations but max_range for int data type only (index data type)and other logical requirements (like power of two etc).
You are (more or less) correct in your reasoning about array size.
But the size limit on the internal array HashMap.table does not limit the size of HashMap (i.e. number entries which can be stored in the HashMap).
Each element in that array is effectively a linked list of Entry objects of unlimited size, therefore there is no hard limit on number of entries which can be stored in a HashMap.
The limit for the array is 2^^30 as this is the largest power of two you can have a for an array. However, there is no reason to suggest that the hash map is limited to this size, but rather it is around this point that the hash map degrades in to a hash of linked lists (or tree in Java 8) i.e there is no limit to the number of entries in each bucket.

Maximum size of an array - Type mismatch: cannot convert from long to int

I see that the maximum size of an array can be only maximum size of an Int. Why does Java not allow an array of size long-Max ?
long no = 10000000000L;
int [] nums = new int[no];//error here
You'll have to address the "why" question to the Java designers. Anyone else can only speculate. My speculation is that they felt that a two-billion-element array ought to be enough for anybody (which, in fairness, it probably is).
An int-sized length allows arrays of 231-1 ("~2 billion") elements. In the gigantically overwhelming majority of arrays' uses, that's plenty.
An array of that many elements will take between 2 gigabytes and 16 gigabytes of memory, depending on the element type. When Java appeared in 1995, new PCs had only around 8 megabytes of RAM. And those 32-bit operating systems, even if they used virtual memory on disk, had a practical limit on the size of a contiguous chunk of memory they could allocate which was quite a bit less than 2 gigabytes, because other allocated things are scattered around in a process's address space. Thus the limits of an int-sized array length were untestable, unforeseeable, and just very far away.
On 32-bit CPUs, arithmetic with ints is much faster than with longs.
Arrays are a basic internal type and they are used numerously. A long-sized length would take an extra 4 bytes per array to store, which in turn could affect packing together of arrays in memory, potentially wasting more bytes between them. (Even though the longer length would almost never be useful.)
If you ever do need in-RAM storage for more than ~2 billion items, you can use an array of arrays.
Unfortunately Java does not support arrays with more than 2^31 elements.
i.e. 16 GiB of space for a long[] array.
try creating this...
Object[] array = new Object[Integer.MAX_VALUE - 4];
you should get OUTOFMEMMORY error...SO the maximum size will be Integer.MAX_VALUE - 5

to SORT 1 BILLION of integer numbers with small physical memory

Want to SORT 1 BILLION of integer numbers and my system has just 1 GB of RAM.What could be the fastest and efficient way to sort?
Say we have an input in a text file an integer per line.
We are using java program to sort.
I have specified RAM as we cannot hold all the input integers in the RAM.
Update: Integers are 7 digit numbers.
Integers are 7 digit numbers.
So there are only 10 million possible values.
You have 1GB of RAM. Make an array of counters, one for each possible value.
Read through the file once, count up the counters.
When done, output the numbers according to the final counter values.
Every number can occur at most 1 billion times. So a 32bit counter would be enough. This means a 10M x 4 bytes = 40M byte array.
The simplest thing to do is break the input into smaller files that can fit in memory and sort each, and then merge the results.
Guido van Rossum has a good description of doing this in python while obviously not the same language the principle is the same.
You specified that are sorting a billion 7 (decimal) digit numbers.
If there were no duplicates, you could sort in memory with 107 BITS using radix sort. Since you must have duplicates (107 less than 109), you could implement radix sort using (say) an array of 107 8-bit counters, with a HashMap<Integer, Integer> to deal with the relatively few cases where the counters overflow. Or just an array of 107 32-bit counters.
Another more general approach (that works for any kind of value) is to split the file into N smaller subfiles, sort each subfile in memory, and then perform an N-way merge of the sorted subfiles.
Using a BitSet with 4 billion possible values occupies 512 MB. Just set all the int values you see and write them out in order (they are naturally sorted)
This only works if you don't care about duplicates.
If counting duplicates matters I would still consider either a memory mapped file for counting, or using a merge sort of sorted subsections of data. (I believe the later is an expected answer)
I recently bough a 24 GB PC for under £1K, so a few GB isn't that much unless you limited by a hosted solution. (Or using a mobile device)
Assuming every integer occurs exactly one time you can read the file and for every number you find you set a bit - the bit array has to hold 10000000 bits - this uses only 1,28 MB RAM which should be available... after you have read all integers you just go through the array and output the numbers where a bit ist set...

Why does creating a big Java array consume so much memory?

Why does the following line
Object[] objects = new Object[10000000];
result in a lot of memory (~40M) being used by the JVM? Is there any way to know the internal workings of the VM when allocating arrays?
Well, that allocates enough space for 10000000 references, as well as a small amount of overhead for the array object itself.
The actual size will depend on the VM - but it's surely not surprising that it's taking up a fair amount of memory... I'd expect at least 40MB, and probably 80MB on a 64-bit VM, unless it's using compressed oops for arrays.
Of course, if you populate the array with that many distinct objects, that will take much, much more memory... but the array itself still needs space just for the references.
What do you mean by "a lot of memory"? You allocating 10000000 pointers, each taking 4 bytes(on 32 bit machine) - this is about 40mb of memory.
You are creating ten million references to an object. A reference is at least 4 bytes; IIRC in Java it might be 8, but I'm unsure of that.
So with that one line you're creating 40 or 80 megabytes of data.
You are reserving space for ten million references. That is quite a bit.
It results in a lot of memory being used because it needs to allocate heap space for 10 million objects and their associated overhead.
To look into the internal workings of the JVM, you can check out its source code, as it is open source.
Your array has to hold 10 million object references, which on modern platforms are 64 bit (8 byte) pointers. Since it is allocated as a contiguous chunk of storage, it should take 80 million bytes. That's big in one sense, small compared to the likely amount of memory you have. Why does it bother you?
It creates an array with 10.000.000 reference pointers, all initialized with null.
What did you expect, saying this is "a lot"?
Further reading
Size of object references in Java
One of the principal reasons arrays are used so widely is that their elements can be accessed in constant time. This means that the time taken to access a[i] is the same for each index i. This is because the address of a[i] can be determined arithmetically by adding a suitable offset to the address of the head of the array. The reason is that space for the contents of an array is allocated as a contiguous block of memory.
According to this site, the memory usage for arrays is a 12 bytes header + 4 bytes per element. If you declare an empty array of Object holding 10M elements, then you have just about 40MB of memory used from the start. If you start filling that array with actually 10M object, then the size increases quite rapidly.
From this site, and I just tested it on my 64-bit machine, the size of a plain Object is about 31 bytes, so an array of 10M of Object is just about 12 bytes + (4 + 31 bytes) * 10M = 350 000 012 bytes (or 345.78 MB)
If your array is holding other type of objects, then the size will be even larger.
I would suggest you use some kind of random access file(s) to hold you data if you have to keep so much data inside your program. Or even use a database such as Apache Derby, which will also enable you to sort and filter your data, etc.
I may be behind the times but I understood from the book Practical Java that Vectors are more efficient and faster than Arrays. Is it possible to use a Vector instead of an array?

Categories

Resources