I have this 4 Dimensional array to store String values which are used to create a map which is then displayed on screen with the paintComponent. I have read many articles saying that using huge arrays is very inefficient. (Especially since the array dimensions are 16x16x3x3) I was wondering if there was any way to store the string values (I use them as ID values) differently to save memory or reduce fetching time. If you have any ideas or methods I would appreciate it. Thanks!
Well, if your matrix is full, that is every element contains data, then I believe an array is optimally efficient. But if your matrix is sparse, you could look into using more Linking-based data types.
the first thing I would do is to not use strings as IDs, use ints. It'll reduce the size of your structure a lot.
Also, that array really isn't that big, I wouldn't worry about efficiency if that's the only data structure you have. It's only 2304 elements large.
First off, 16*16*3*3 = 2304 - quite modest really. At this size I'd be more worried about the confusion likely to be caused by a 4D array than the size it is taking!
As others have said, if it fully populated, arrays are ok. If it has gaps, an ArrayList or similar would be better.
If the Strings are just IDs, why not store an enum (or even Integers) instead of a string?
Keep in mind that the String values are separate from the array. The array itself takes the same memory space regardless of what string values it links to. Accessing a specific address in your array will take the same amount of time regardless of what type of object you have saved there, or what the value of that object is.
However, if you find that many of your string values represent exactly the same string, you can avoid having multiple copies of the same string by leveraging String.intern(). If you store the interned string, and you don't have any other references to the non-interned string, that frees the non-interned string up to be garbage-collected. Your array will then have multiple entries that point to the same memory space, rather than different memory addresses with equivalent string objects.
See also:
Is it good practice to use java.lang.String.intern()?
http://www.codeinstructions.com/2009/01/busting-javalangstringintern-myths.html
Depending on the requirements of your IDs, you may also want to look into using a different data structure than strings. For example, while the array itself would be the same size, storing int values would avoid the need to allocate extra space for each individual entry.
Also, a 4-dimensional array may not be the best data structure for your needs in the first place. Can you describe why you've chosen this data structure for what you're trying to represent?
The Strings only take up the space of a reference in each array element. There could be a savings if the strings come from a very small set of values. A more important question is are your 4-dimensional arrays sparse or mostly filled? If you have very few values actually specified then you might have a big savings replacing the 4-d array with a Map from the indicies to the String. Let me know if you want a code sample.
Do you actually have a 4D array of 16x16x3x3 (i.e. 2k) string objects? That doesn't sound that big to me. An array is the most efficient way to store a collection of objects, in terms of memory. An ArrayList can be slightly less efficient (up to 50% wasted space).
The only other way I can think of is to store the Strings end-to-end in one giant String and then use substring() to get the bit you need from that, but you would still need to store the indexes somewhere.
Are you running out of memory? If so check that the Strings in your array are the size you think they are - the backing array of a String instance in Java can be much larger than the string itself. If you do subString() on a 1 GB string, the returned string instance shares the 1 GB array of the first string so will keep it from being GC'd longer than you might expect.
Related
I'm trying to create a byte array whose size is of type long. For example, think of it as:
long x = _________;
byte[] b = new byte[x];
Apparently you can only specify an int for the size of a byte array.
Before anyone asks why I would need a byte array so large, I'll say I need to encapsulate data of message formats that I am not writing, and one of these message types has a length of an unsigned int (long in Java).
Is there a way to create this byte array?
I am thinking if there's no way around it, I can create a byte array output stream and keep feeding it bytes, but I don't know if there's any restriction on a size of a byte array...
(It is probably a bit late for the OP, but it might still be useful for others)
Unfortunately Java does not support arrays with more than 231−1 elements. The maximum consumption is 2 GiB of space for a byte[] array, or 16 GiB of space for a long[] array.
While it is probably not applicable in this case, if the array is going to be sparse, you might be able to get away with using an associative data structure like a Map to match each used offset to the appropriate value. In addition, Trove provides an more memory-efficient implementation for storing primitive values than standard Java collections.
If the array is not sparse and you really, really do need the whole blob in memory, you will probably have to use a two-dimensional structure, e.g. with a Map matching offsets modulo 1024 to the proper 1024-byte array. This approach might be be more memory efficient even for sparse arrays, since adjacent filled cells can share the same Map entry.
A byte[] with size of the maximum 32-bit signed integer would require 2GB of contiguous address space. You shouldn't try to create such an array. Otherwise, if the size is not really that large (and it's just a larger type), you could safely cast it to an int and use it to create the array.
You should probably be using a stream to read your data in and another to write it out. If you are gong to need access to data later on in the file, save it. If you need access to something you haven't ran into yet, you need a two-pass system where you run through once and store the "stuff you'll need for the second pass, then run through again".
Compilers work this way.
The only case for loading in the entire array at once is if you have to repeatedly randomly access many locations throughout the array. If this is the case, I suggest you load it into multiple byte arrays all stored in a single container class.
The container class would have an array of byte arrays, but from outside all the accesses would seem contiguous. You would just ask for byte 49874329128714391837 and your class would divide your Long by the size of each byte array to calculate which array to access, then use the remainder to determine the byte.
It could also have methods to store and retrieve "Chunks" that could span byte-array boundaries that would require creating a temporary copy--but the cost of creating a few temporary arrays would be more than made up for by the fact that you don't have a locked 2gb space allocated which I think could just destroy your performance.
Edit: ps. If you really need the random access and can't use streams then implementing a containing class is a Very Good Idea. It will let you change the implementation on the fly from a single byte array to a group of byte arrays to a file-based system without any change to the rest of your code.
It's not of immediate help but creating arrays with larger sizes (via longs) is a proposed language change for Java 7. Check out the Project Coin proposals for more info
One way to "store" the array is to write it to a file and then access it (if you need to access it like an array) using a RandomAccessFile. The api for that file uses long as an index into file instead of int. It will be slower, but much less hard on the memory.
This is when you can't extract what you need during the initial input scan.
I have a String that I need to search for in a collection of Strings. I'll need to do searches for multiple representations of the required String(original representation, trimmed, UTF-8 encoded, non ASCII characters encoded). The collection size will be in the order of thousands.
I'm trying to figure out what's the best representation to use for the collection in order to have the best performance:
ArrayList - iterate over the array and check if any of the elements match any of the Strings representations
HashMap - check if map contains any of my Strings representation
Any other?
Generally speaking, HashMap (or any other hashtable-based data structure) is much more preferred for "lookup" exercise. The reason is simple, those data structures support lookup in constant time (independent of collection size).
But... in your scenario (single query for collection), you probably will not gain any performance improvements from using HashMap instead of ArrayList. Reasons:
Putting data inside HashMap will take some time. Not significant time, but comparable to one full pass of the initial list.
Your collection is pretty small - iterating over 5000 of elements is a matter of couple milliseconds (or faster?). Since you need to "search" only once, you will not save much time on that.
Hi i need a multidimentional array to store big number but i am getting heap space error. I have 4gb ram.
double array[][] = new double[100000][100000]
I know it would need a lot of memory, can any one help me tackling this issue? Thanks for helping
If the array is sparse (more empty array cells than filled ones), you could look at using a hash map instead.
For the hash map, use the index of the array as the key.
example:
{ 23: 'foo', 23945: 'bar' }
this will be much more memory efficient!
If you absolutely need that much memory, you will need at minimum 100000*100000*8 B, or 80 GB of RAM (plus whatever overhead there is for the array organization itself). I will go as far as to say that you probably cannot work with that much data in RAM.
The only real way you'll be able to have an array that big is to write the parts of the array that aren't in use at a given moment in time out to disk. This will take some work to do though because you have to track where you are in the array to push and pull the array's data on the disk.
I have a need to store multiple datatypes(like int or string mostly) inside a two dimensional array. Using Object[][] does solve the problem. But is it a good way to do so ??
How does the Object[][] array then reserve the heap space ? I mean, in accordance with which datatype? Does it leads to any wastage of resources ?
I was trying to do something like this:-
Object[][] dataToBeWritten={ {"Pami",34,45},
{"Ron","x",""},
{"spider","x",""}
};
Edit: You may suggest any better alternatives also if there exists any..
See How to calculate the memory usage of a Java array and Memory usage of Java objects: general guide.
For example, let's consider a 10x10 int array. Firstly, the "outer" array has its 12-byte object header followed by space for the 10 elements. Those elements are object references to the 10 arrays making up the rows. That comes to 12+4*10=52 bytes, which must then be rounded up to the next multiple of 8, giving 56. Then, each of the 10 rows has its own 12-byte object header, 4*10=40 bytes for the actual row of ints, and again, 4 bytes of padding to bring the total for that row to a multiple of 8. So in total, that gives 11*56=616 bytes. That's a bit bigger than if you'd just counted on 10*10*4=400 bytes for the hundred "raw" ints themselves.
I think this is for Hotspot only though. References to any object are, just link ints, 4 byte each, regardless of the actual object, or the object being null. Spare requirement for the objects themselves is a whole different story though, as the space isn't reserved or anything the like at array creation.
All objects are stored by reference. So a reference to the heap memory is stored. Therefore the amount of memory allocated for an array is one sizeof ( reference ) per entry.
An array of Objects is basically an array of pointers. However, that's what you get with any array of non-primitive types in Java - an array of Objects, and array of Strings, and an array of Robots of equal length take up the exact same amount of space. Heap space for the actual objects isn't allocated until you initialize the objects.
Alternative:
Use proper classes. You are trying to take some dynamic approach in a statically typed language. The thing is that Object[] doesn't help the reader of your code one bit what he is reading about. In fact I can't even suggest a design for a class because I can't make sense of your example. What is {"Pami",34,45} and how is this supposed to be related to {"spider","x",""}?
So supposed this information is something foo-like you should create a class Foo and collect all that stuff in a Foo[] or a List<Foo>.
Remember: Not only comments store information about your code. The type system contains valuable information about what you're trying to accomplish. Object contains no such information.
I'm trying to create a byte array whose size is of type long. For example, think of it as:
long x = _________;
byte[] b = new byte[x];
Apparently you can only specify an int for the size of a byte array.
Before anyone asks why I would need a byte array so large, I'll say I need to encapsulate data of message formats that I am not writing, and one of these message types has a length of an unsigned int (long in Java).
Is there a way to create this byte array?
I am thinking if there's no way around it, I can create a byte array output stream and keep feeding it bytes, but I don't know if there's any restriction on a size of a byte array...
(It is probably a bit late for the OP, but it might still be useful for others)
Unfortunately Java does not support arrays with more than 231−1 elements. The maximum consumption is 2 GiB of space for a byte[] array, or 16 GiB of space for a long[] array.
While it is probably not applicable in this case, if the array is going to be sparse, you might be able to get away with using an associative data structure like a Map to match each used offset to the appropriate value. In addition, Trove provides an more memory-efficient implementation for storing primitive values than standard Java collections.
If the array is not sparse and you really, really do need the whole blob in memory, you will probably have to use a two-dimensional structure, e.g. with a Map matching offsets modulo 1024 to the proper 1024-byte array. This approach might be be more memory efficient even for sparse arrays, since adjacent filled cells can share the same Map entry.
A byte[] with size of the maximum 32-bit signed integer would require 2GB of contiguous address space. You shouldn't try to create such an array. Otherwise, if the size is not really that large (and it's just a larger type), you could safely cast it to an int and use it to create the array.
You should probably be using a stream to read your data in and another to write it out. If you are gong to need access to data later on in the file, save it. If you need access to something you haven't ran into yet, you need a two-pass system where you run through once and store the "stuff you'll need for the second pass, then run through again".
Compilers work this way.
The only case for loading in the entire array at once is if you have to repeatedly randomly access many locations throughout the array. If this is the case, I suggest you load it into multiple byte arrays all stored in a single container class.
The container class would have an array of byte arrays, but from outside all the accesses would seem contiguous. You would just ask for byte 49874329128714391837 and your class would divide your Long by the size of each byte array to calculate which array to access, then use the remainder to determine the byte.
It could also have methods to store and retrieve "Chunks" that could span byte-array boundaries that would require creating a temporary copy--but the cost of creating a few temporary arrays would be more than made up for by the fact that you don't have a locked 2gb space allocated which I think could just destroy your performance.
Edit: ps. If you really need the random access and can't use streams then implementing a containing class is a Very Good Idea. It will let you change the implementation on the fly from a single byte array to a group of byte arrays to a file-based system without any change to the rest of your code.
It's not of immediate help but creating arrays with larger sizes (via longs) is a proposed language change for Java 7. Check out the Project Coin proposals for more info
One way to "store" the array is to write it to a file and then access it (if you need to access it like an array) using a RandomAccessFile. The api for that file uses long as an index into file instead of int. It will be slower, but much less hard on the memory.
This is when you can't extract what you need during the initial input scan.