Converting to a column oriented array in Java

Converting to a column oriented array in Java - java

Although I have Java in the title, this could be for any OO language.
I'd like to know a few new ideas to improve the performance of something I'm trying to do.
I have a method that is constantly receiving an Object[] array. I need to split the Objects in this array through multiple arrays (List or something), so that I have an independent list for each column of all arrays the method receives.
Example:
List<List<Object>> column-oriented = new ArrayList<ArrayList<Object>>();
public void newObject(Object[] obj) {
for(int i = 0; i < obj.length; i++) {
column-oriented.get(i).add(obj[i]);
}
}
Note: For simplicity I've omitted the initialization of objects and stuff.
The code I've shown above is slow of course. I've already tried a few other things, but would like to hear some new ideas.
How would you do this knowing it's very performance sensitive?
EDIT:
I've tested a few things and found that:
Instead of using ArrayList (or any other Collection), I wrapped an Object[] array in another object to store individual columns. If this array reaches its capacity, I create another array with twice de size and copy the contents from one to another using System.copyArray. Surprisingly (at least for me) this is faster that using ArrayList to store the inner columns...

The answer depends on the data and usage profile. How much data do you have in such collections? What is proportions of reads/writes (adding objects array)? This affects what structure for inner list is better and many other possible optimizations.
The fastest way to copy data is avoid copying at all. If you know that obj array is not modified further by the caller code (this is important condition), one of possible tricks is to implement you custom List class to use as inner list. Internally you will store shared List<Object[]>. Each call we just add new array to that list. Custom inner list class will know which column it represents (let it be n) and when it is asked to give item at position m, it will transpose m and n and query internal structure to get internalArray.get(m)[n]. This implementation is unsafe because of limitation on the caller that is easy to forget about but might be faster under some conditions (however, this might be slower under other).

I would try using LinkedList for the inner list, because it should have better performance for insertions. Maybe wrappping Object arra into collection and using addAll might help as well.

ArrayList may be slow, due to copying of arrays (It uses a similar approach as your self-written collection).
As an alternate solution you could try to simply store the Rows at first and create columns when neccessary. This way, copying of the internal arrays at the list is reduced to a minimum.
Example:
//Notice: You can use a LinkedList for rows, as no index based access is used.
List<Object[]> rows =...
List<List<Object>> columns;
public void processColumns() {
columns = new ArrayList<List<Object>>();
for(Object[] aRow : rows){
while (aRow.size() > columns.size()){
//This ensures that the ArrayList is big enough, so no copying is necessary
List<Object> newColumn = new ArrayList<Object>(rows.size())
columns.add(newColumn);
}
for (int i = 0; i < aRow.length; i++){
columns.get(i).add(aRow[i]);
}
}
}
Depending on the number of columns, it's still possible that the outer list is copying arrays internally, but normal tables contains far more rows than columns, so it should be a small array only.

Use a LinkedList for implementing the column lists. It's grows linearly with the data and is O(1). (If you use ArrayList it has to resize the internal array from time to time).
After collecting the values you can convert that linked lists to arrays. If N is the number of rows you will pass from holding 3*N refs for each list (each LInkedList has prevRef/nextRef/itemRef) to only N refs.
It would be nice to have an array for holding the different column lists, but of course, it's not a big improvement and you can do it only if you know the column count in advance.
Hope it helps!
Edit tests and theory indicate that ArrayList is better in amortized cost, it is, the total cost divided by the number of items processed... so don't follow my 'advice' :)

Related

Are there better ways to make an ArrayList bigger?

I am trying to use an ArrayList to store and retrieve items by an index value. My code is similar to this:
ArrayList<Object> items = new ArrayList<>();
public void store (int index, Object item)
{
while(items.size() < index) items.add(null);
items.set(index, item);
}
The loop seems ugly and I would like to use items.setMinimumSize(index + 1) but it does not exist. I could use items.addAll(Arrays.asList(new Object[index - items.size()])) but that allocates memory and seems overly complex as a way to just make an array bigger. Most of the time the loop will iterate zero times, so I prefer simplicity over speed.
The index values probably won't exceed 200, so I could use:
Object[] items = new Object[200];
public void store (int index, Object item)
{
items[index] = item;
}
but this would break if it ever needs over 200 values.
Are these really my only options? It seems that I am forced into something more complex than it should be.

I would consider using a Map instead of a List construct. Why not this :
//depending on your requirements, you might want to use another Map
//implementation, just read through the docs...
Map<Integer, Object> items = new HashMap<>();
public void store (int index, Object item)
{
items.put(index, item);
}
This way you can avoid that senseless for loop.

The problem is, what you want isn't really an arraylist. It feels like what you want is a Map<Integer, T> instead of an ArrayList<T>, as you clearly want to map an index to a value; you want this operation:
Add a value to this list such that list.get(250) returns it.
And that is possible with arraylist, but not really what it is for, and when you use things in a way that wasn't intended, what usually ends up happening is that you write a bunch of weird code and go: "really? Is this right?" - and so it is here.
There's nothing particularly wrong with doing what you're doing, given that you said the indices aren't going to go much beyond 200, but, generally, I advise not creating semantic noise by using things for what they aren't. If you must, create an entirely new type that encapsulates exactly this notion of a sparse list. If you must, implement it by using an arraylist under the hood, that'd be fine.
Alternatively, use something that already exists.
a map
From the core library, why not.. new TreeMap<Integer, T>()? treemaps keep themselves sorted (so if you loop through its keys, you get them in order; if you add 5, then 200, then 20, you get the keys in order '5, 20, 200' as you'd expect. The performance is O(1) or O(log n), but with a higher runup (this is extremely unlikely to matter one iota if you have a collection of ~200 items in it. Wake me up when you add a million, then performance might be even measurable, let alone noticable) - and you can toss a 'key' of a few million at it if you want, no problem. The 'worst case scenario' is far better here, you basically cannot cause this to be a problem, whereas with a sparse, array-backed list, if I tossed an index of 3 billion at it, you would then have a few GB worth of blank memory allocated; you'd definitely notice that!
A sparse list
java itself doesn't have sparse lists, but other libraries do. Search the web for something you like, add it as a dependency, and keep on going.

The loop seems ugly and I would like to use items.setMinimumSize(index + 1) but it does not exist.
A List contains a sequence of references, without gaps (but possibly with null elements). Indexes into a list correlate with that sequence, and the list size is defined as the number of elements it contains. You cannot manipulate elements with indexes greater than or equal to the list's current size.
The size is not to be confused with the capacity of an ArrayList, which is the number of elements it presently is able to accommodate without acquiring additional storage. To a first approximation, you can and should ignore ArrayList capacity. Working with that makes your code specific to ArrayList, rather than general to Lists, and it's mostly an issue of optimization.
Thus, if you want to increase the size of an ArrayList (or most other kinds of List) so as to ensure that a certain index is valid for it, then the only alternative is to add elements to make it at least that large. But you don't need to add them one at a time in a loop. I actually like your
items.addAll(Arrays.asList(new Object[index - items.size()]))
, but you need one more element. The size needs to be at least index + 1 in order to set() the element at index index.
Alternatively, you could use
items.addAll(Collections.nCopies(1 + index - items.size(), null));
That might even be cheaper, too.

Fixed number of generic objects which can be iterated over

I have a class which always holds four objects:
class Foo<E> {
Cow<E> a, b, c, d;
}
I want to be able to iterate over them, so ideally I'd like to use an array:
class Foo<E> {
Cow<E>[] cows = new Cow<E>[4]; // won't work, can't create generic array
}
I don't want to use a list or a set since I want there to always be 4 Cow objects. What's the best solution for me?

If you want to preserve the genericity, you will have to reimplement something similar to a list and I don't think it is worth it.
You said:
The first is that you can add and remove elements to and from a list.
Well you can create an unmodifiable list:
List<E> list = Collections.unmodifiableList(Arrays.asList(a, b, c, d));
The second is that I'm creating a quadtree data structure and using a list wouldn't be too good for performance. Quadtrees have a lot of quadrants and using lists would decrease performance significantly.
First you can initialise the list to the right size:
List<E> list = new ArrayList<>(4);
Once you have done that, the list will only use a little bit more memory than an array (probably 8 bytes: 4 byte for the backing array reference and another 4 byte for the size).
And in terms of performance an ArrayList performs almost as good as an array.
Bottom line: I would start by using a list and measure the performance. If it is not good enough AND it is due to using a list instead of an array, then you will have to adapt your design - but I doubt that this will be your main issue.

Use a generic ArrayList and simply have methods to insert values into your object, and do checks inside those methods, to make sure you don't end up having more than 4 Cow objects.

I will suggest creating a bounded list. Java does not have an inbuilt one however you can create a custom one using Google collections or use the one in Apache collections. See Is there a bounded non-blocking Collection in Java?

Use Collection instead of array:
List<Cow<E>> cows = new ArrayList<>(); // in Java 7
Or
List<Cow<E>> cows = new ArrayList<Cow<E>>(); //Java 6 and below
More information will show why it is IMPOSSIBLE to have arrays whit generics. You can see here

Cow<E>[] cows = (Cow<E>[])new Cow[4];
or
Cow<E>[] cows = (Cow<E>[])new Cow<?>[4];

Performance primitive Array vs ArrayList

I want to know if there is a difference in performance if I use a primitive array and then rebuild it to add new elements like this:
AnyClass[] elements = new AnyClass[0];
public void addElement(AnyClass e) {
AnyClass[] temp = new AnyClass[elements.length + 1];
for (int i = 0; i < elements.length; i++) {
temp[i] = elements[i];
}
temp[elements.length] = e;
elements = temp;
}
or if I just use an ArrayList and add the elements.
I am not certain that is why I ask, is it the same speed because an ArrayList is build in the same way as I did it with the primitive array or is there really a difference and a primitive array is always faster even if I rebuild it everytime I add an element?

ArrayLists work in a similar way but instead of rebuilding every time they double there capacity every time the limit is reached. so if you are constantly adding to it ArrayLists will be faster because recreating the array is fairly slow.
So your implementation could use less memory if you are not adding to it often but as far as speed goes it will be slower most of the time.

In a nutshell, stick with ArrayList. It is:
widely understood;
well tested;
will probably be more performant that your own implementation (for example, ArrayList.add() is guaranteed to be amortised constant-time, which your method is not).

When an ArrayList resizes it doubles itself, so that you are not wasting time resizing each time. Amortized, that means that it doesn't take any time to resize. That's why you shouldn't waste time recreating the wheel. The people who created the first one already learned how to make one more efficient and know more about the platform than you do.

There is no performance issue in both Arrays and ArrayList.
Arrays and ArrayList are index based so both will work in same way.
If you required the dynamic Array you can use arrayList.
If Array size is static then go with Array.

Your implementation is likely to lose clearly to Java's ArrayList in terms of speed. One particularly expensive thing you're doing is reallocating the array every time you want to add elements, while Java's ArrayList tries to optimize by having some "buffer" before having to reallocate.

ArrayList will also use internally Array Only , so this is true Array will be faster than ArrayList. While writing high performance code always use Array. For the same reason Array is back bone for most of the collections.
You must go through JDK implementation of Collections.
We use ArrayList when we are developing some application and we are not concerned about such minor performance issues and we do trade off because we get already written API to put , get , resize etc.

Context is very important: I mean if you are constantly inserting new items/elements ArrayList will certainly be faster than Array. On the other hand if you just want to access an element at a known position-say arrayItems[8], Array is faster than ArrayList.get(8); Sine there is overhead of get() function calls and other steps and checks.

What's more efficient and compact: A huge set of linkedlist variables or a two-dimensional arraylist containing each of these?

I want to create a large matrix (n by n) where each element corresponds to a LinkedList (of certain objects).
I can either
Create the n*n individual linked lists and name them with the help of eval() within a loop that iterates through both dimensions (or something similar), so that in the end I'll have LinkedList_1_1, LinkedList_1_2 etc. Each one has a unique variable name. Basically, skipping the matrix altogether.
Create an ArrayList of ArrayLists and then push into each element a linked list.
Please recommend me a method if I want to conserve time & space, and ease-of-access in my later code, when I want to reference individual LinkedLists. Ease-of-acess will be poor with Method 1, as I'll have to use eval whenever I want to access a particular linked list.
My gut-feeling tells me Method 2 is the best approach, but how exactly do I form my initializations?

As you know the sizes to start with, why don't you just use an array? Unfortunately Java generics prevents the array element itself from being a concrete generic type, but you can use a wildcard:
LinkedList<?>[][] lists = new LinkedList<?>[n][n];
Or slightly more efficient in memory, just a single array:
LinkedList<?>[] lists = new LinkedList<?>[n * n];
// Then for access...
lists[y * n + x] = ...;
Then you'd need to cast on each access - using #SuppressWarnings given that you know it will always work (assuming you encapsulate it appropriately). I'd put that in a single place:
#SuppressWarnings("unchecked")
private LinkedList<Foo> getList(int x, int y) {
if (lists[y][x] == null) {
lists[y][x] = new LinkedList<Foo>();
}
// Cast won't actually have any effect at execution time. It's
// just to tell the compiler we know what we're doing.
return (LinkedList<Foo>) lists[y][x];
}
Of course in both cases you'd then need to populate the arrays with empty linked lists if you needed to. (If several of the linked lists never end up having any nodes, you may wish to consider only populating them lazily.)
I would certainly not generate a class with hundreds of variables. It would make programmatic access to the lists very painful, and basically be a bad idea in any number of ways.

Is having a List and an array with the exact same element bad programming style?

I have a short (12 elements) LinkedList of short strings (7 characters each).
I need to search through this list both by index and by content (i.e. search a particular string and get its index in the list).
I thought about making a copy of the LinkedList as an array at runtime (just once, since the LinkedList is a static member of my class), so I can access the strings by index more quickly.
Given that the LinkedList is never changed at runtime, is this bad programming practice or is this an idea worth considering?
IMPORTANT EDIT: the array can't be sorted, I need it to map specific strings to specific numbers.

Instead of a LinkedList just use an ArrayList - you can look up fast based on an index, and you can easily search through it.

What problem are you trying to solve here? Are you worried that accessing elements by index is too slow in LinkedList? If so, you might want to use ArrayList instead.
But for a 12-element list, the improvement probably won't make any measurable difference. Unless this is something you're accessing several hundred times a second, I wouldn't waste any time on trying to optimize it.

Another idea you might want to consider is using a Map:
Map someMap<int, String>
It's easy to search for values in a map by both key and value.
Might also not be the best idea, but at least better then creating 2 lists with the same values =)

The question is, why are you using a LinkedList in the first place?
The main reason to choose a LinkedList over an array list is if you need to make a number of insertions/deletions in the middle of the List or if you don't know the exact size of the list and don't want to make a number of reallocations of the Array.
The main reason to choose an ArrayList over a LinkedList is if you need to have random access to each of the elements.
(There are other advantages/disadvantages to each, but those are probably the main ones that come to mind)
It looks like you do need random access to the list, so why did you pick a LinkedList over an ArrayList

I would say it depends on your intention and the effect it really has.
With only 12 elements it seems unlikely to me that converting the LinkedList to an array has an impact on performance. So it could make the code unnecessarily (slightly) harder to understand for other people. From this point of view it could be considered a non optimal programming style.
If the number of elements increases, i.g. you're need to pre-process some data which would require a dynamic data structure. And for later use an indexed lookup performs much better, this wouldn't be a bad programming style, rather a required improvement.

Given that you know the exact amount of elements you are going to be using why not use an array from the start?
string[] myArray = new string[7];
// Add your data
Sort(myArray); // Sort your strings
int value = binarySearch(myArray, "key"); // Search your array
Or since you cant sort the array you could just make a linear search method
public int Search(string[] array, string key)
{
for(int i = 0; i < array.legnth(); i++)
{
if(array[i] == key)
return i;
}
return -1;
}
Edit: After re-loading the page and reading peoples responses I agree that ArrayList should be exactly what you need.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.