Are there better ways to make an ArrayList bigger? - java

I am trying to use an ArrayList to store and retrieve items by an index value. My code is similar to this:
ArrayList<Object> items = new ArrayList<>();
public void store (int index, Object item)
{
while(items.size() < index) items.add(null);
items.set(index, item);
}
The loop seems ugly and I would like to use items.setMinimumSize(index + 1) but it does not exist. I could use items.addAll(Arrays.asList(new Object[index - items.size()])) but that allocates memory and seems overly complex as a way to just make an array bigger. Most of the time the loop will iterate zero times, so I prefer simplicity over speed.
The index values probably won't exceed 200, so I could use:
Object[] items = new Object[200];
public void store (int index, Object item)
{
items[index] = item;
}
but this would break if it ever needs over 200 values.
Are these really my only options? It seems that I am forced into something more complex than it should be.

I would consider using a Map instead of a List construct. Why not this :
//depending on your requirements, you might want to use another Map
//implementation, just read through the docs...
Map<Integer, Object> items = new HashMap<>();
public void store (int index, Object item)
{
items.put(index, item);
}
This way you can avoid that senseless for loop.

The problem is, what you want isn't really an arraylist. It feels like what you want is a Map<Integer, T> instead of an ArrayList<T>, as you clearly want to map an index to a value; you want this operation:
Add a value to this list such that list.get(250) returns it.
And that is possible with arraylist, but not really what it is for, and when you use things in a way that wasn't intended, what usually ends up happening is that you write a bunch of weird code and go: "really? Is this right?" - and so it is here.
There's nothing particularly wrong with doing what you're doing, given that you said the indices aren't going to go much beyond 200, but, generally, I advise not creating semantic noise by using things for what they aren't. If you must, create an entirely new type that encapsulates exactly this notion of a sparse list. If you must, implement it by using an arraylist under the hood, that'd be fine.
Alternatively, use something that already exists.
a map
From the core library, why not.. new TreeMap<Integer, T>()? treemaps keep themselves sorted (so if you loop through its keys, you get them in order; if you add 5, then 200, then 20, you get the keys in order '5, 20, 200' as you'd expect. The performance is O(1) or O(log n), but with a higher runup (this is extremely unlikely to matter one iota if you have a collection of ~200 items in it. Wake me up when you add a million, then performance might be even measurable, let alone noticable) - and you can toss a 'key' of a few million at it if you want, no problem. The 'worst case scenario' is far better here, you basically cannot cause this to be a problem, whereas with a sparse, array-backed list, if I tossed an index of 3 billion at it, you would then have a few GB worth of blank memory allocated; you'd definitely notice that!
A sparse list
java itself doesn't have sparse lists, but other libraries do. Search the web for something you like, add it as a dependency, and keep on going.

The loop seems ugly and I would like to use items.setMinimumSize(index + 1) but it does not exist.
A List contains a sequence of references, without gaps (but possibly with null elements). Indexes into a list correlate with that sequence, and the list size is defined as the number of elements it contains. You cannot manipulate elements with indexes greater than or equal to the list's current size.
The size is not to be confused with the capacity of an ArrayList, which is the number of elements it presently is able to accommodate without acquiring additional storage. To a first approximation, you can and should ignore ArrayList capacity. Working with that makes your code specific to ArrayList, rather than general to Lists, and it's mostly an issue of optimization.
Thus, if you want to increase the size of an ArrayList (or most other kinds of List) so as to ensure that a certain index is valid for it, then the only alternative is to add elements to make it at least that large. But you don't need to add them one at a time in a loop. I actually like your
items.addAll(Arrays.asList(new Object[index - items.size()]))
, but you need one more element. The size needs to be at least index + 1 in order to set() the element at index index.
Alternatively, you could use
items.addAll(Collections.nCopies(1 + index - items.size(), null));
That might even be cheaper, too.

Related

Duplicate item's index in linkedHashSet

I am adding some values to a LinkedHashSet and based on add() method's output i.e. true/false, I am performing other operations.
If the Set contains duplicate element it returns false and in this case I want to know the index of the duplicate element in the Set as I need to use that index somewhere else. Being a 'linked' collection there must be some way to get the index, but I couldn't find any such thing in Set/LinkedHashSet API.
LinkedHashSet is not explicitly indexed per se. If you require an index, using a Set for such application is usually a sign of wrong abstraction and/or lousy programming. LinkedHashSet only guarantees you predictable iteration order, not proper indexing of elements. You should use a List in such cases, since that's the interface giving you indexing guarantee. You can, however, infer the index using a couple of methods, for example (not recommended, mind me):
a) use indexed iteration through the collection (e.g. with for loop), seeking the duplicate and breaking when it's found; it's O(n) complexity for getting the index,
Object o; // this is the object you want to add to collection
if ( !linkedHashSet.add(o) ) {
int index = 0;
for( Object obj : linkedHashSet ) {
if ( obj == o ) // or obj.equals(o), depending on your code's semantics
return index;
index++;
}
}
b) use .toArray() and find the element in the array, e.g. by
Object o; // this is the object you want to add to collection
int index;
if ( !linkedHashSet.add(o) )
index = Arrays.asList(linkedHashSet.toArray()).indexOf(o);
again, O(n) complexity of acquiring index.
Both would incur heavy runtime penalty (the second solution is obviously worse with respect to efficiency, as it creates an array every time you seek the index; creating a parallel array mirroring the set would be better there). All in all, I see a broken abstraction in your example. You say
I need to use that index somewhere else
... if that's really the case, using Set is 99% of the time wrong by itself.
You can, on the other hand, use a Map (HashMap for example), containing an [index,Object] (or [Object,index], depending on the exact use case) pairs in it. It'd require a bit of refactoring, but it's IMO a preferred way to do this. It'd give you the same order of complexity for most operations as LinkedHashSet, but you'd get O(1) for getting index essentially for free (Java's HashSet uses HashMap internally anyway, so you're not losing any memory by replacing HashSet with HashMap).
Even better way would be to use a class explicitly handling integer maps - see HashMap and int as key for more information; tl;dr - http://trove.starlight-systems.com/ has TIntObjectHashMap & TObjectIntHashMap , giving you possibly the best speed for such operations possible.

Is having a List and an array with the exact same element bad programming style?

I have a short (12 elements) LinkedList of short strings (7 characters each).
I need to search through this list both by index and by content (i.e. search a particular string and get its index in the list).
I thought about making a copy of the LinkedList as an array at runtime (just once, since the LinkedList is a static member of my class), so I can access the strings by index more quickly.
Given that the LinkedList is never changed at runtime, is this bad programming practice or is this an idea worth considering?
IMPORTANT EDIT: the array can't be sorted, I need it to map specific strings to specific numbers.
Instead of a LinkedList just use an ArrayList - you can look up fast based on an index, and you can easily search through it.
What problem are you trying to solve here? Are you worried that accessing elements by index is too slow in LinkedList? If so, you might want to use ArrayList instead.
But for a 12-element list, the improvement probably won't make any measurable difference. Unless this is something you're accessing several hundred times a second, I wouldn't waste any time on trying to optimize it.
Another idea you might want to consider is using a Map:
Map someMap<int, String>
It's easy to search for values in a map by both key and value.
Might also not be the best idea, but at least better then creating 2 lists with the same values =)
The question is, why are you using a LinkedList in the first place?
The main reason to choose a LinkedList over an array list is if you need to make a number of insertions/deletions in the middle of the List or if you don't know the exact size of the list and don't want to make a number of reallocations of the Array.
The main reason to choose an ArrayList over a LinkedList is if you need to have random access to each of the elements.
(There are other advantages/disadvantages to each, but those are probably the main ones that come to mind)
It looks like you do need random access to the list, so why did you pick a LinkedList over an ArrayList
I would say it depends on your intention and the effect it really has.
With only 12 elements it seems unlikely to me that converting the LinkedList to an array has an impact on performance. So it could make the code unnecessarily (slightly) harder to understand for other people. From this point of view it could be considered a non optimal programming style.
If the number of elements increases, i.g. you're need to pre-process some data which would require a dynamic data structure. And for later use an indexed lookup performs much better, this wouldn't be a bad programming style, rather a required improvement.
Given that you know the exact amount of elements you are going to be using why not use an array from the start?
string[] myArray = new string[7];
// Add your data
Sort(myArray); // Sort your strings
int value = binarySearch(myArray, "key"); // Search your array
Or since you cant sort the array you could just make a linear search method
public int Search(string[] array, string key)
{
for(int i = 0; i < array.legnth(); i++)
{
if(array[i] == key)
return i;
}
return -1;
}
Edit: After re-loading the page and reading peoples responses I agree that ArrayList should be exactly what you need.

Converting to a column oriented array in Java

Although I have Java in the title, this could be for any OO language.
I'd like to know a few new ideas to improve the performance of something I'm trying to do.
I have a method that is constantly receiving an Object[] array. I need to split the Objects in this array through multiple arrays (List or something), so that I have an independent list for each column of all arrays the method receives.
Example:
List<List<Object>> column-oriented = new ArrayList<ArrayList<Object>>();
public void newObject(Object[] obj) {
for(int i = 0; i < obj.length; i++) {
column-oriented.get(i).add(obj[i]);
}
}
Note: For simplicity I've omitted the initialization of objects and stuff.
The code I've shown above is slow of course. I've already tried a few other things, but would like to hear some new ideas.
How would you do this knowing it's very performance sensitive?
EDIT:
I've tested a few things and found that:
Instead of using ArrayList (or any other Collection), I wrapped an Object[] array in another object to store individual columns. If this array reaches its capacity, I create another array with twice de size and copy the contents from one to another using System.copyArray. Surprisingly (at least for me) this is faster that using ArrayList to store the inner columns...
The answer depends on the data and usage profile. How much data do you have in such collections? What is proportions of reads/writes (adding objects array)? This affects what structure for inner list is better and many other possible optimizations.
The fastest way to copy data is avoid copying at all. If you know that obj array is not modified further by the caller code (this is important condition), one of possible tricks is to implement you custom List class to use as inner list. Internally you will store shared List<Object[]>. Each call we just add new array to that list. Custom inner list class will know which column it represents (let it be n) and when it is asked to give item at position m, it will transpose m and n and query internal structure to get internalArray.get(m)[n]. This implementation is unsafe because of limitation on the caller that is easy to forget about but might be faster under some conditions (however, this might be slower under other).
I would try using LinkedList for the inner list, because it should have better performance for insertions. Maybe wrappping Object arra into collection and using addAll might help as well.
ArrayList may be slow, due to copying of arrays (It uses a similar approach as your self-written collection).
As an alternate solution you could try to simply store the Rows at first and create columns when neccessary. This way, copying of the internal arrays at the list is reduced to a minimum.
Example:
//Notice: You can use a LinkedList for rows, as no index based access is used.
List<Object[]> rows =...
List<List<Object>> columns;
public void processColumns() {
columns = new ArrayList<List<Object>>();
for(Object[] aRow : rows){
while (aRow.size() > columns.size()){
//This ensures that the ArrayList is big enough, so no copying is necessary
List<Object> newColumn = new ArrayList<Object>(rows.size())
columns.add(newColumn);
}
for (int i = 0; i < aRow.length; i++){
columns.get(i).add(aRow[i]);
}
}
}
Depending on the number of columns, it's still possible that the outer list is copying arrays internally, but normal tables contains far more rows than columns, so it should be a small array only.
Use a LinkedList for implementing the column lists. It's grows linearly with the data and is O(1). (If you use ArrayList it has to resize the internal array from time to time).
After collecting the values you can convert that linked lists to arrays. If N is the number of rows you will pass from holding 3*N refs for each list (each LInkedList has prevRef/nextRef/itemRef) to only N refs.
It would be nice to have an array for holding the different column lists, but of course, it's not a big improvement and you can do it only if you know the column count in advance.
Hope it helps!
Edit tests and theory indicate that ArrayList is better in amortized cost, it is, the total cost divided by the number of items processed... so don't follow my 'advice' :)

How can I create a java list using the member variables of an existing list, without using a for loop?

I have a java list
List<myclass> myList = myClass.selectFromDB("where clause");
//myClass.selectFromDB returns a list of objects from DB
But I want a different list, specifically.
List<Integer> goodList = new ArrayList<Integer>();
for(int i = 0;i++; i<= myList.size()) {
goodList[i] = myList[i].getGoodInteger();
}
Yes, I could do a different query from the DB in the initial myList creation, but assume for now I must use that as the starting point and no other DB queries. Can I replace the for loop with something much more efficient?
Thank you very much for any input, apologies for my ignorance.
In order to extract a field from the "myclass", you're going to have to loop through the entire contents of the list. Whether you do that with a for loop, or use some sort of construct that hides the for loop from you, it's still going to take approximately the same time and use the same amount of resources.
An important question is: why do you want to do this? Are you trying to make your code cleaner? If so, you could write a method along these lines:
public static List<Integer> extractGoodInts (List<myClass> myList) {
List<Integer> goodInts = new ArrayList<Integer>();
for(int i = 0; i < myList.size(); i++){
goodInts.add(myList.get(i).getGoodInteger());
}
return goodInts;
}
Then, in your code, you can just go:
List<myClass> myList = myClass.selectFromDB("where clause");
List<Integer> goodInts = myClass.extractGoodInts(myList);
However, if you're trying to make your code more efficient and you're not allowed to change the query, you're out of luck; somehow or another, you're going to need to individually grab each int from the list, which means you're going to be running in O(n) time no matter what clever tricks you can come up with.
There are really only two ways I can think of that you can make this more "efficient":
Somehow split this up between multiple cores so you can do the work in parallel. Of course, this assumes that you've got other cores, they aren't doing anything useful already, and that there's enough processing going on that the overheard of doing this is even worth it. My guess is that (at least) the last point isn't true in your case given that you're just calling a getter. If you wanted to do this you'd try to have a number of threads (I'd probably actually use an Executor and Futures for this) equal to the number of cores, and then give roughly equal amounts of work to each of them (probably just by slicing your list into roughly equal sized pieces).
If you believe that you'll only be accessing a small subset of the resulting List, but are unsure of exactly which elements, you could try doing things lazily. The easiest way to do that would be to use a pre-built lazy mapping List implementation. There's one in Google Collections Library. You use it by calling Lists.transform(). It'll immediately return a List, but it'll only perform your transformation on elements as they are requested. Again, this is only more efficient if it turns out that you only ever look at a small fraction of the output List. If you end up looking at the entire thing this will not be more efficient, and will probably work out to be less efficient.
Not sure what you mean by efficient. As the others said, you have to call the getGoodInteger method on every element of that list one way or another. About the best you can do is avoid checking the size every time:
List<Integer> goodInts = new ArrayList<Integer>();
for (MyClass myObj : myList) {
goodInts.add(myObj.getGoodInteger());
}
I also second jboxer's suggestion of making a function for this purpose.

Best way to remove repeats in a collection in Java?

This is a two-part question:
First, I am interested to know what the best way to remove repeating elements from a collection is. The way I have been doing it up until now is to simply convert the collection into a set. I know sets cannot have repeating elements so it just handles it for me.
Is this an efficient solution? Would it be better/more idiomatic/faster to loop and remove repeats? Does it matter?
My second (related) question is: What is the best way to convert an array to a Set? Assuming an array arr The way I have been doing it is the following:
Set x = new HashSet(Arrays.asList(arr));
This converts the array into a list, and then into a set. Seems to be kinda roundabout. Is there a better/more idiomatic/more efficient way to do this than the double conversion way?
Thanks!
Do you have any information about the collection, like say it is already sorted, or it contains mostly duplicates or mostly unique items? With just an arbitrary collection I think converting it to a Set is fine.
Arrays.asList() doesn't create a brand new list. It actually just returns a List which uses the array as its backing store, so it's a cheap operation. So your way of making a Set from an array is how I'd do it, too.
Use HashSet's standard Collection conversion constructor. According to The Java Tutorials:
Here's a simple but useful Set idiom.
Suppose you have a Collection, c, and
you want to create another Collection
containing the same elements but with
all duplicates eliminated. The
following one-liner does the trick.
Collection<Type> noDups = new HashSet<Type>(c);
It works by creating a Set (which, by
definition, cannot contain a
duplicate), initially containing all
the elements in c. It uses the
standard conversion constructor
described in the The Collection
Interface section.
Here is a minor variant of this idiom
that preserves the order of the
original collection while removing
duplicate element.
Collection<Type> noDups = new LinkedHashSet<Type>(c);
The following is a generic method that
encapsulates the preceding idiom,
returning a Set of the same generic
type as the one passed.
public static <E> Set<E> removeDups(Collection<E> c) {
return new LinkedHashSet<E>(c);
}
Assuming you really want set semantics, creating a new Set from the duplicate-containing collection is a great approach. It's very clear what the intent is, it's more compact than doing the loop yourself, and it leaves the source collection intact.
For creating a Set from an array, creating an intermediate List is a common approach. The wrapper returned by Arrays.asList() is lightweight and efficient. There's not a more direct API in core Java to do this, unfortunately.
I think your approach of putting items into a set to produce the collection of unique items is the best one. It's clear, efficient, and correct.
If you're uncomfortable using Arrays.asList() on the way into the set, you could simply run a foreach loop over the array to add items to the set, but I don't see any harm (for non-primitive arrays) in your approach. Arrays.asList() returns a list that is "backed by" the source array, so it doesn't have significant cost in time or space.
1.
Duplicates
Concurring other answers: Using Set should be the most efficient way to remove duplicates. HashSet should run in O(n) time on average. Looping and removing repeats would run in the order of O(n^2). So using Set is recommended in most cases. There are some cases (e.g. limited memory) where iterating might make sense.
2.
Arrays.asList() is a cheap operation that doesn't copy the array, with minimal memory overhead. You can manually add elements by iterating through the array.
public static Set arrayToSet(T[] array) {
Set set = new HashSet(array.length / 2);
for (T item : array)
set.add(item);
return set;
}
Barring any specific performance bottlenecks that you know of (say a collection of tens of thousands of items) converting to a set is a perfectly reasonable solution and should be (IMO) the first way you solve this problem, and only look for something fancier if there is a specific problem to solve.

Categories

Resources