2D-Array : prefered way access items - java

So here I am tonight with this question that came up into my mind :
What is your favourite way to access the items of a m x n matrix
there is the normal way where you use an index for the columns
and another index for the rows matrix[i][j]
and there's another way where your matrix is a vector of length m*n
and you access the items using [i*n+j] as index number
tell me what method you prefeer most , are there any other methods
that would work for specific cases ?

Let's say we have this piece of C(++) code:
int x = 3;
int y = 4;
arr2d[x][y] = 0xFF;
arr1d[x*10+y] = 0xFF;
Where:
unsigned char arr2d[10][10];
unsigned char arr1d[10*10];
And now let's look at the compiled version of it (assembly; using debugger):
As you can see there's absolutely no penalty or slowdown when accessing array elements no matter if you're using 2D arrays or not, since both of the methods are actually the same.

There are only two reasons to go for the one-dimensional array to represent n-dimensions I can think of:
Performance: The usual way to allocate n-dimensional arrays means that we get n dimensions that may not necessarily be allocated in one piece - which isn't that great for spatial locality (and may also result in at least some additional memory accesses - in the worst case we need 1 additional read for each access). Now in C/C++ you can get around this (allocate memory in one piece, then afterwards specify the correct pointers; just be really careful not to forget this when you delete it) and other languages (C#) already can do this out of the box. Also note that in a language with a stop&copy GC the reasoning is unnecessary since all the objects will be allocated near each other anyhow. You avoid additional overhead for each single dimension though, so you use your memory and cache a bit better.
For some algorithms it's nicer to just use a one dimensional array which may make the code shorter and slightly faster - that's probably the one thing that can be argued as subjective here.

I think that if you need a 2D array, is because you would like to access it as a 2d array, not as a 1D array
Otherwise you can do a simple multiply to make it a 1D array

If I was to use a 2-D array, I would vote for matrix[i][j]. I think this is more readable. However, I might consider using Guava's Table class.
http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Table.html

I don't think that your "favourite" way, or the most aesthetically pleasing way is a good approach to take with this issue - underlying performance would be my main concern.
Storing a matrix as a contiguous array is often the most efficient way of doing matrix calculations. If you take a look at optimised BLAS (Basic Linear Algebra Subroutine) libraries, such as the Intel MKL, the AMD ACML, ATLAS etc etc contiguous matrix storage will be used. When contiguous storage is used, and contiguous data access patterns are exploited higher performance can result due to the improved locality-of-reference (i.e. cache performance) of the operations.
In some languages (i.e. c++) you could use operator overloading to achieve the data[i][j] style of indexing while doing the 1D array index mappings behind the scenes.
Hope this helps.

Related

Implementing efficient data structure using Arrays only

As part of my programming course I was given an exercise to implement my own String collection. I was planning on using ArrayList collection or similar but one of the constraints is that we are not allowed to use any Java API to implement it, so only arrays are allowed. I could have implemented this using arrays however efficiency is very important as well as the amount of data that this code will be tested with. I was suggested to use hash tables or ordered tress as they are more efficient than arrays. After doing some research I decided to go with hash tables because they seemed easy to understand and implement but once I started writing code I realised it is not as straight forward as I thought.
So here are the problems I have come up with and would like some advice on what is the best approach to solve them again with efficiency in mind:
ACTUAL SIZE: If I understood it correctly hash tables are not ordered (indexed) so that means that there are going to be gaps in between items because hash function gives different indices. So how do I know when array is full and I need to resize it?
RESIZE: One of the difficulties that I need to create a dynamic data structure using arrays. So if I have an array String[100] once it gets full I will need to resize it by some factor I decided to increase it by 100 each time so once I would do that I would need to change positions of all existing values since their hash keys will be different as the key is calculated:
int position = "orange".hashCode() % currentArraySize;
So if I try to find a certain value its hash key will be different from what it was when array was smaller.
HASH FUNCTION: I was also wondering if built-in hashCode() method in String class is efficient and suitable for what I am trying to implement or is it better to create my own one.
DEALING WITH MULTIPLE OCCURRENCES: one of the requirements is to be able to add multiple words that are the same, because I need to be able to count how many times the word is stored in my collection. Since they are going to have the same hash code I was planning to add the next occurrence at the next index hoping that there will be a gap. I don't know if it is the best solution but here how I implemented it:
public int count(String word) {
int count = 0;
while (collection[(word.hashCode() % size) + count] != null && collection[(word.hashCode() % size) + count].equals(word))
count++;
return count;
}
Thank you in advance for you advice. Please ask anything needs to be clarified.
P.S. The length of words is not fixed and varies greatly.
UPDATE Thank you for your advice, I know I did do few stupid mistakes there I will try better. So I took all your suggestions and quickly came up with the following structure, it is not elegant but I hope it is what you roughly what you meant. I did have to make few judgements such as bucket size, for now I halve the size of elements, but is there a way to calculate or some general value? Another uncertainty was as to by what factor to increase my array, should I multiply by some n number or adding fixed number is also applicable? Also I was wondering about general efficiency because I am actually creating instances of classes, but String is a class to so I am guessing the difference in performance should not be too big?
ACTUAL SIZE: The built-in Java HashMap just resizes when the total number of elements exceeds the number of buckets multiplied by a number called the load factor, which is by default 0.75. It does not take into account how many buckets are actually full. You don't have to, either.
RESIZE: Yes, you'll have to rehash everything when the table is resized, which does include recomputing its hash.
So if I try to find a certain value it's hash key will be different from what it was when array was smaller.
Yup.
HASH FUNCTION: Yes, you should use the built in hashCode() function. It's good enough for basic purposes.
DEALING WITH MULTIPLE OCCURRENCES: This is complicated. One simple solution would just be to have the hash entry for a given string also keep count of how many occurrences of that string are present. That is, instead of keeping multiple copies of the same string in your hash table, keep an int along with each String counting its occurrences.
So how do I know when array is full and I need to resize it?
You keep track of the size and HashMap does. When the size used > capacity * load factor you grow the underlying array, either as a whole or in part.
int position = "orange".hashCode() % currentArraySize;
Some things to consider.
The % of a negative value is a negative value.
Math.abs can return a negative value.
Using & with a bit mask is faster however you need a size which is a power of 2.
I was also wondering if built-in hashCode() method in String class is efficient and suitable for what I am trying to implement or is it better to create my own one.
The built in hashCode is cached, so it is fast. However it is not a great hashCode and has poor randomness for lower bit, and higher bit for short strings. You might want to implement your own hashing strategy, possibly a 64-bit one.
DEALING WITH MULTIPLE OCCURRENCES:
This is usually done with a counter for each key. This way you can have say 32767 duplicates (if you use short) or 2 billion (if you use int) duplicates of the same key/element.

Appropriate data structure for storing large number of objects retrievable by sparse identifier

I guess I'm looking for a sparse array implementation, but I really need this to be efficient in terms of memory usage, and one peculiarity of my data that an implementation could take advantage of is that the indices are populated such that if the value for an index i is present, the indices i-1 and i+1 are also likely to have values present, and similarly if the value for i has no value present, i-1 and i+1 are likely to not have values present.
I'm working in Java, and I need the index type to be long rather than the more usual int, if this makes a difference. I have approximately 50 million objects that will need to be stored. I've looked into Trove4J's TLongObjectHashMap, unfortunately this will require around 1.6GB for the hash table alone, and I really need to improve on this.
Can anyone point me towards something that can optimize for long runs of sequentially allocated identifiers? Logarithmic performance of insert/get is acceptable to me, so perhaps something tree-based?
Btrees have quite small memory overhead, so I will try those.
Maybe you could use a database instead of an array ? An in-memory embedded databse like h2sql!

Speeding up code - 3D array

I'm trying to improve the speed of some code I've written. I was wondering how efficient accessing data from a 3d array of integers is?
I have an array
int cube[][][] = new int[10][10][10];
which I populate with values. I then access these values several thousand times.
I was wondering, seeing as all 3d arrays are theoretically stored in 1D arrays in memory, is there a way to turn my 3d array into a 1d one? For instance I could have cube[0] referring to the old cube[0][0][0] and cube [1] refering to the old cube[0][0][1].
I'm not sure how to go about doing it. I'm sure it's possible but my brain is worn out.
Thanks
You can create the single-dimension array as follows:
int cube[] = new int[w * h * d];
And to access an element:
int value = cube[x * h * d + y * d + z];
But I doubt it will be much faster and you're losing some convenience and safety. Before deciding to go through with this change it might be a good idea to perform some benchmark tests on your data to see if you actually have a problem and whether the change gives a sufficiently large improvement to be worth the extra complexity.
That's exactly what Java is doing behind the scenes. A three dimensional array is simply an array of arrays of arrays. In theory you could separate the arrays into 10 two dimensional arrays or 100 one-dimensional arrays (and even into 1000 individual variables), but it would be unlikely to speed up your performance. Focus on optimizing your algorithm instead.
int cube[] = new int[ X*Y*Z ];
cube[ i*X*Y + j*X + k ] = ...
But, as others already said: It's not expected to be faster (as the calculations have to be done anyway). Let Java do its stuff for reasons of error-avoidance.
Do not do it - Java handles all this for you. You can of course make it a 1D array and then do the calculations but you will hardly beat the optimized JVM code which does the same on the background. Also - is this really causing a performance bottleneck according to a profiler? If not, you might optimize your code prematurely.
You could use a LinkedList and store a 2D array in each Node. That would be more efficient I believe.

in java, which is better - three arrays of booleans or 1 array of bytes?

I know the question sounds silly, but consider this: I have an array of ints (1..N) and a labelling algorithm. at any point the item the int represents is in one of three states. The current version holds these states in a byte array, where 0, 1 and 2 represent the three states. alternatively, I could have three arrays of boolean - one for each state. which is better (consumes less memory) depends on how jvm (sun's version) stores the arrays - is a boolean represented by 1 bit? is there any other magic happening behind the scenes? (p.s. don't start with all that "this is not the way OO/Java works" - I know, but here performance comes in front. plus the algorithm is simple and perfectly readable even in such form).
Thanks a lot
Instead of two booleans or 1 int, just use a BitSet - http://java.sun.com/j2se/1.4.2/docs/api/java/util/BitSet.html
You can then have two bits per label/state. And BitSet being a standard java class, you are likely to get good performance.
Theoretically, with 3 boolean arrays you'll need to do:
firstState[n] = false;
secondState[n] = true;
thirdState[n] = false;
every time when you want to change n-th element state. Here you can see 3 taking element by index operations and 3 assignment operations.
With 1 byte array you'll need:
elements[n] = 1;
It's more readable and 3 times faster. And one more advantage of this solution it that you can easily add as many new states as you want (when with boolean arrays you'll need to introduce new arrays).
But I don't think you'll ever see the performance difference.
UPD: actually I'd make it more java way (not looking that you don't find easy ways) and use array of enums. This will make it much more clear and will give you some flexibility (maybe in future you'll decide that oop is not so bad thing):
enum ElementState {
FIRST, SECOND, THIRD;
}
ElementState[] elementStates = new ElementState[N];
...
elementStates[i] = ElementState.FIRST;
The JVM second edition spec (http://java.sun.com/docs/books/jvms/second_edition/html/Overview.doc.html) specifies that boolean arrays are encoded as (0,1), but doesn't specify the type used. So the particular JVM may or may not use bit - it could use int.
However, if performance is paramount, using a single byte would in any case seem to be your best option anyway.
EDIT: I incorrectly said that boolean arrays are stored as bit arrays - this is possible but implementation specific.
If you want a guaranteed minimum you could use three java.util.BitSets. These will only use one bit per flag (though you will have the extra object overhead, that may outweigh the benefits if the number of flags is small.) I would say if you have a large number of objects BitSet may be a better alternative, otherwise an array of byte constants or enums will lead to more readable code (and the extra storage shouldn't be a real concern.)
The array of bytes is much better!
A boolean uses in every programming language 1 byte! So you will use for every state 3 bytes and you can do this with only 1 byte (in theory you can reduce it to only 1 bit (see other posts).
with a byte array, you can simply change it to the byte you want. With three arrays you have to change the value at every array!
When you are your application developing, it is possible you need an extra state. So, this means you have to create again an array. Plus you have to change 4 values (second point)
So, I hope we persuaded you!

Do 2D arrays use more resources than 1D arrays in Java?

For example, would a full int[50][8] use more resources (RAM and CPU) than 8 full int[50] arrays?
In the first case you have one array object pointing to fifty array objects holding 8 int's.
So 1 + 50 array objects + fifty pointers in the first array object.
In the second case you have one array object pointing to 8 array objects holding 50 int's.
So 1 + 8 array objects + eight pointers in the first array object. Holding the int's is a wash.
There is not a good way to evaluate CPU usage for this.
There appears to be three things to compare here.
new int[50][8]
new int[8][50]
new int[400]
Now, I get this confused, but the way to remember is to think of new int[50][] which is valid.
So new int[50][8] is an array of 50 arrays of size 8 (51 objects). new int[8][50] is an array of 8 arrays of size 50 (9 objects). 9 objects will have a lower overhead than 51. new int[400] is just one object.
However, it at this size it probably doesn't make any measurable difference to the performance of your program. You might want to encapsulate the array(s) within an object that will allow you to change the implementation and provide a more natural interface to client code.
One additional useage point (came from a reference I unfortunately can't find now, but fairly commonsensical)-
The authors of this paper were testing various ways of compressing sparse arrays into mutidimensional arrays. One thing they noticed is that it makes a difference in terms of speed which way you iterate -
The idea was that if you have int[i][j] it was faster to do
for (i) {
for (j)
than to do
for (j) {
for (i)
because in the first instance you're iterating through elements stored contiguously.
you could tweak a tiny amout of memory by using an int[] myInt = int[400] array, and manually accessing an int at position (x,y) with myInt[x+y*50]
that would save you 50 32-bit pieces of memory. accessing it that way will maybe (who knows exactly what the hotspot compiler does to this..) take one more instruction for the multiplication.
that kind of micro-optimisation will most likely not make your app perform better, and it will decrease readability.
I suggest writing a small performance test for this with very large arrays to see the actual difference. In reality I don't think this would make the slightest difference.
int[50][8] is 50 arrays of length 8
int[8][50] is 8 arrays of length 50
int[400] is one array 400.
Each array has an overhead of about 16 bytes.
However, for the sizes you have here, it really doesn't matter. You are not going to be saving much either way.

Categories

Resources