Duplicate item's index in linkedHashSet - java

I am adding some values to a LinkedHashSet and based on add() method's output i.e. true/false, I am performing other operations.
If the Set contains duplicate element it returns false and in this case I want to know the index of the duplicate element in the Set as I need to use that index somewhere else. Being a 'linked' collection there must be some way to get the index, but I couldn't find any such thing in Set/LinkedHashSet API.

LinkedHashSet is not explicitly indexed per se. If you require an index, using a Set for such application is usually a sign of wrong abstraction and/or lousy programming. LinkedHashSet only guarantees you predictable iteration order, not proper indexing of elements. You should use a List in such cases, since that's the interface giving you indexing guarantee. You can, however, infer the index using a couple of methods, for example (not recommended, mind me):
a) use indexed iteration through the collection (e.g. with for loop), seeking the duplicate and breaking when it's found; it's O(n) complexity for getting the index,
Object o; // this is the object you want to add to collection
if ( !linkedHashSet.add(o) ) {
int index = 0;
for( Object obj : linkedHashSet ) {
if ( obj == o ) // or obj.equals(o), depending on your code's semantics
return index;
index++;
}
}
b) use .toArray() and find the element in the array, e.g. by
Object o; // this is the object you want to add to collection
int index;
if ( !linkedHashSet.add(o) )
index = Arrays.asList(linkedHashSet.toArray()).indexOf(o);
again, O(n) complexity of acquiring index.
Both would incur heavy runtime penalty (the second solution is obviously worse with respect to efficiency, as it creates an array every time you seek the index; creating a parallel array mirroring the set would be better there). All in all, I see a broken abstraction in your example. You say
I need to use that index somewhere else
... if that's really the case, using Set is 99% of the time wrong by itself.
You can, on the other hand, use a Map (HashMap for example), containing an [index,Object] (or [Object,index], depending on the exact use case) pairs in it. It'd require a bit of refactoring, but it's IMO a preferred way to do this. It'd give you the same order of complexity for most operations as LinkedHashSet, but you'd get O(1) for getting index essentially for free (Java's HashSet uses HashMap internally anyway, so you're not losing any memory by replacing HashSet with HashMap).
Even better way would be to use a class explicitly handling integer maps - see HashMap and int as key for more information; tl;dr - http://trove.starlight-systems.com/ has TIntObjectHashMap & TObjectIntHashMap , giving you possibly the best speed for such operations possible.

Related

Are there better ways to make an ArrayList bigger?

I am trying to use an ArrayList to store and retrieve items by an index value. My code is similar to this:
ArrayList<Object> items = new ArrayList<>();
public void store (int index, Object item)
{
while(items.size() < index) items.add(null);
items.set(index, item);
}
The loop seems ugly and I would like to use items.setMinimumSize(index + 1) but it does not exist. I could use items.addAll(Arrays.asList(new Object[index - items.size()])) but that allocates memory and seems overly complex as a way to just make an array bigger. Most of the time the loop will iterate zero times, so I prefer simplicity over speed.
The index values probably won't exceed 200, so I could use:
Object[] items = new Object[200];
public void store (int index, Object item)
{
items[index] = item;
}
but this would break if it ever needs over 200 values.
Are these really my only options? It seems that I am forced into something more complex than it should be.
I would consider using a Map instead of a List construct. Why not this :
//depending on your requirements, you might want to use another Map
//implementation, just read through the docs...
Map<Integer, Object> items = new HashMap<>();
public void store (int index, Object item)
{
items.put(index, item);
}
This way you can avoid that senseless for loop.
The problem is, what you want isn't really an arraylist. It feels like what you want is a Map<Integer, T> instead of an ArrayList<T>, as you clearly want to map an index to a value; you want this operation:
Add a value to this list such that list.get(250) returns it.
And that is possible with arraylist, but not really what it is for, and when you use things in a way that wasn't intended, what usually ends up happening is that you write a bunch of weird code and go: "really? Is this right?" - and so it is here.
There's nothing particularly wrong with doing what you're doing, given that you said the indices aren't going to go much beyond 200, but, generally, I advise not creating semantic noise by using things for what they aren't. If you must, create an entirely new type that encapsulates exactly this notion of a sparse list. If you must, implement it by using an arraylist under the hood, that'd be fine.
Alternatively, use something that already exists.
a map
From the core library, why not.. new TreeMap<Integer, T>()? treemaps keep themselves sorted (so if you loop through its keys, you get them in order; if you add 5, then 200, then 20, you get the keys in order '5, 20, 200' as you'd expect. The performance is O(1) or O(log n), but with a higher runup (this is extremely unlikely to matter one iota if you have a collection of ~200 items in it. Wake me up when you add a million, then performance might be even measurable, let alone noticable) - and you can toss a 'key' of a few million at it if you want, no problem. The 'worst case scenario' is far better here, you basically cannot cause this to be a problem, whereas with a sparse, array-backed list, if I tossed an index of 3 billion at it, you would then have a few GB worth of blank memory allocated; you'd definitely notice that!
A sparse list
java itself doesn't have sparse lists, but other libraries do. Search the web for something you like, add it as a dependency, and keep on going.
The loop seems ugly and I would like to use items.setMinimumSize(index + 1) but it does not exist.
A List contains a sequence of references, without gaps (but possibly with null elements). Indexes into a list correlate with that sequence, and the list size is defined as the number of elements it contains. You cannot manipulate elements with indexes greater than or equal to the list's current size.
The size is not to be confused with the capacity of an ArrayList, which is the number of elements it presently is able to accommodate without acquiring additional storage. To a first approximation, you can and should ignore ArrayList capacity. Working with that makes your code specific to ArrayList, rather than general to Lists, and it's mostly an issue of optimization.
Thus, if you want to increase the size of an ArrayList (or most other kinds of List) so as to ensure that a certain index is valid for it, then the only alternative is to add elements to make it at least that large. But you don't need to add them one at a time in a loop. I actually like your
items.addAll(Arrays.asList(new Object[index - items.size()]))
, but you need one more element. The size needs to be at least index + 1 in order to set() the element at index index.
Alternatively, you could use
items.addAll(Collections.nCopies(1 + index - items.size(), null));
That might even be cheaper, too.

Why don't we count linear search cost as a prerequisite bottleneck for the insertion operation of a linked list, compared to ArrayList?

I have had this question for a while but I have been unsatisfied with the answers because the distinctions appear to be arbitrary and more like conventional wisdom that is sort of blindly accepted rather than assessed critically.
In an ArrayList it is said that insertion cost (for a single element) is linear. If we are inserting at index p for 0 <= p < n where n is the size of the list, then the remaining n-p elements are shifted over first before the new element is copied into position p.
In a LinkedList, it is said that insertion cost (for a single element) is constant. For instance if we already have a node and we want to insert after it, we rearrange some pointers and it's done quickly. But getting this node in the first place, I don't see how it can be done other than a linear search first (assuming it isn't a trivial case like prepending at the start of the list or appending at the end).
And yet in the case of the LinkedList, we don't count that initial search time. To me this is confusing because it's sort of like saying "The ice cream is free... after you pay for it." It's like, well, of course it is... but that sort of skips the hard part of paying for it. Of course inserting in a LinkedList is going to be constant time if you already have the node you want, but getting that node in the first place may take some extra time! I could easily say that inserting in an ArrayList is constant time... after I move the remaining n-p elements.
So I don't understand why this distinction is made for one but not the other. You could argue that insertion is considered constant for LinkedLists because of the cases where you insert at the front or back where linear time operations are not required, whereas in an ArrayList, insertion requires copying of the suffix array after position p, but I could easily counter that by saying if we insert at the back of an ArrayList, it is amortized constant time and doesn't require extra copying in most cases unless we reach capacity.
In other words we separate the linear stuff from the constant stuff for LinkedList, but we don't separate them for the ArrayList, even though in both cases, the linear operations may not be invoked or not invoked.
So why do we consider them separate for LinkedList and not for ArrayList? Or are they only being defined here in the context where LinkedList is overwhelmingly used for head/tail appends and prepends as opposed to elements in the middle?
This is basically a limitation of the Java interface for List and LinkedList, rather than a fundamental limitation of linked lists. That is, in Java there is no convenient concept of "a pointer to a list node".
Every type of list has a few different concepts loosely associated with the idea of pointing to a particular item:
The idea of a "reference" to a specific item in a list
The integer position of an item in the list
The value of a item that may be in the list (possibly multiple times)
The most general concept is the first one, and is usually encapsulated in the idea of an iterator. As it happens, the simple way to implement an iterator for an array backed list is simply to wrap an integer which refers to the position of the item in a list. So for array lists only, the first and second ways of referring to items are pretty tightly bound.
For other list types, however, and even for most other container types (trees, hashes, etc) that is not the case. The generic reference to an item is usually something like a pointer to the wrapper structure around one item (e.g., HashMap.Entry or LinkedList.Entry). For these structures the idea of accessing the nth element isn't necessary natural or even possible (e.g., unordered collections like sets and many hash maps).
Perhaps unfortunately, Java made the idea of getting an item by its index a first-class operation. Many of the operations directly on List objects are implemented in terms of list indexes: remove(int index), add(int index, ...), get(int index), etc. So it's kind of natural to think of those operations as being the fundamental ones.
For LinkedList though it's more fundamental to use a pointer to a node to refer to an object. Rather than passing around a list index, you'd pass around the pointer. After inserting an element, you'd get a pointer to the element.
In C++ this concept is embodied in the concept of the iterator, which is the first class way to refer to items in collections, including lists. So does such a "pointer" exist in Java? It sure does - it's the Iterator object! Usually you think of an Iterator as being for iteration, but you can also think of it as pointing to a particular object.
So the key observation is: given an pointer (iterator) to an object, you can remove and add from linked lists in constant time, but from an array-like list this takes linear time in general. There is no inherent need to search for an object before deleting it: there are plenty of scenarios where you can maintain or take as input such a reference, or where you are processing the entire list, and here the constant time deletion of linked lists does change the algorithmic complexity.
Of course, if you need to do something like delete the first entry containing the value "foo" that implies both a search and a delete operation. Both array-based and linked lists taken O(n) for search, so they don't vary here - but you can meaningfully separate the search and delete operations.
So you could, in principle, pass around Iterator objects rather than list indexes or object values - at least if your use case supports it. However, at the top I said that "Java has no convenient notion of a pointer to a list node". Why?
Well because actually using Iterator is actually very inconvenient. First of all, it's tough to get an Iterator to an object in the first place: for example, and unlike C++, the add() methods don't return an Iterator - so to get a pointer to the item you just added, you need to go ahead and iterate over the list or use the listIterator(int index) call, which is inherently inefficient for linked lists. Many methods (e.g., subList()) support only a version that takes indexes, but not Iterators - even when such a method could be efficiently supported.
Add to that the restrictions around iterator invalidation when the list is modified, and they actually become pretty useless for referring to elements except in immutable lists.
So Java's support of pointers to list elements is pretty half-hearted an so it's tough to leverage the constant time operations that linked list offers, except in cases such as adding to the front of a list, or deleting items during iteration.
It's not limited to lists, either - the ConcurrentQueue is also a linked structure which supports constant time deletes, but you can't reliably use that ability from Java.
If you're using a LinkedList, chances are you're not going to use it for a random access insert. LinkedList offers constant time for push (insert at the beginning) or add (because it has a ref to the final element IIRC). You are correct in your suspicion that an insert into a random index (e.g. insert sorted) will take linear time - not constant.
ArrayList, by contrast, is worst case linear. Most of the time it simply does an arraycopy to shift the indices (which is a low-level shift that is constant time). Only when you need to resize the backing array will it take linear time.

ConcurrentSkipList? That is, not a ConcurrentSkipListSet

I need a very fast (insert, remove, contains) highly concurrent list that can be sorted using a comparator/comparable.
The existing ConcurrentSkipListSet would be ideal, if it was a list and not a set. I need to insert multiple items which are equal into the data structure.
I'm currently thinking of using a LinkedDeque if I can't find anything better, but that structure is considerably slower than a skiplist at high contention.
Any suggestions?
EDIT: What I actually need, bare minimum, is something that is sorted using compareTo, can insert concurrently and can remove/get items using object identity. All other concurrent requirements mentioned in comments still apply.
The existing ConcurrentSkipListSet would be ideal, if it was a list and not a set.
So the SkipList data-structure at it's core is a linked list. If you are worried about order and the ability to traverse it easily and in order, the SkipList will work very well for that as well. It is also a probabilistic alternative to a balanced tree which is why it can also be a Set or a Map. The data structure in memory looks something like the following:
To quote from the Javadocs:
This class implements a concurrent variant of SkipLists providing expected average log(n) time cost for the containsKey, get, put and remove operations and their variants. Insertion, removal, update, and access operations safely execute concurrently by multiple threads. Iterators are weakly consistent, returning elements reflecting the state of the map at some point at or since the creation of the iterator. They do not throw ConcurrentModificationException, and may proceed concurrently with other operations. Ascending key ordered views and their iterators are faster than descending ones.
If you explain more about what features you want from List, I can answer better whether ConcurrentSkipListSet will be able to work.
Edit:
Ah, I see. After some back and forth in the comment, it seems like you need to be able to stick two objects that are equivalent into the Set which isn't possible. What we worked out is to never have compareTo(...) return 0. It's a bit of a hack but using AtomicLong to generate a unique number for each object, you can then compare those numbers whenever the real comparison field (in this case a numerical timeout value) is equal. This will allow objects with the same field to be inserted into the Set and kept in the proper order based on the field.
You can create the Set with a comparator that never returns 0.
private Set<Obj> entities = new ConcurrentSkipListSet<>((o1, o2) -> {
if (o1.equals(o2)) {
// Return -1 or 1 - decide where you want to place an object when it's equals to another one
return -1;
}
// Implement the sorting order below
if (o1.getTimestamp() < o2.getTimestamp()) {
return -1;
}
if (o1.getTimestamp() > o2.getTimestamp()) {
return 1;
}
return -1;
})
;

for-each vs for vs while

I wonder what is the best way to implement a "for-each" loop over an ArrayList or every kind of List.
Which of the followings implementations is the best and why? Or is there a best way?
Thank you for your help.
List values = new ArrayList();
values.add("one");
values.add("two");
values.add("three");
...
//#0
for(String value : values) {
...
}
//#1
for(int i = 0; i < values.size(); i++) {
String value = values.get(i);
...
}
//#2
for(Iterator it = values.iterator(); it.hasNext(); ) {
String value = it.next();
...
}
//#3
Iterator it = values.iterator();
while (it.hasNext()) {
String value = (String) it.next();
...
}
#3 has a disadvantage because the scope of the iterator it extends beyond the end of the loop. The other solutions don't have this problem.
#2 is exactly the same as #0, except #0 is more readable and less prone to error.
#1 is (probably) less efficient because it calls .size() every time through the loop.
#0 is usually best because:
it is the shortest
it is least prone to error
it is idiomatic and easy for other people to read at a glance
it is efficiently implemented by the compiler
it does not pollute your method scope (outside the loop) with unnecessary names
The short answer is to use version 0. Take a peek at the section title Use Enhanced For Loop Syntax at Android's documentation for Designing for Performance. That page has a bunch of goodies and is very clear and concise.
#0 is the easiest to read, in my opinion, but #2 and #3 will work just as well. There should be no performance difference between those three.
In almost no circumstances should you use #1. You state in your question that you might want to iterate over "every kind of List". If you happen to be iterating over a LinkedList then #1 will be n^2 complexity: not good. Even if you are absolutely sure that you are using a list that supports efficient random access (e.g. ArrayList) there's usually no reason to use #1 over any of the others.
In response to this comment from the OP.
However, #1 is required when updating (if not just mutating the current item or building the results as a new list) and comes with the index. Since the List<> is an ArrayList<> in this case, the get() (and size()) is O(1), but that isn't the same for all List-contract types.
Lets look at these issues:
It is certainly true that get(int) is not O(1) for all implementations of the List contract. However, AFAIK, size() is O(1) for all List implementations in java.util. But you are correct that #1 is suboptimal for many List implementations. Indeed, for lists like LinkedList where get(int) is O(N), the #1 approach results in a O(N^2) list iteration.
In the ArrayList case, it is a simple matter to manually hoist the call to size(), assigning it to a (final) local variable. With this optimization, the #1 code is significantly faster than the other cases ... for ArrayLists.
Your point about changing the list while iterating the elements raises a number of issues:
If you do this with a solution that explicitly or implicitly uses iterators, then depending on the list class you may get ConcurrentModificationExceptions. If you use one of the concurrent collection classes, you won't get the exception, but the javadocs state that the iterator won't necessarily return all of the list elements.
If you do this using the #1 code (as is) then, you have a problem. If the modification is performed by the same thread, you need to adjust the index variable to avoid missing entries, or returning them twice. Even if you get everything right, a list entry concurrently inserted before the current position won't show up.
If the modification in the #1 case is performed by a different thread, it hard to synchronize properly. The core problem is that get(int) and size() are separate operations. Even if they are individually synchronized, there is nothing to stop the other thread from modifying the list between a size and get call.
In short, iterating a list that is being concurrently modified is tricky, and should be avoided ... unless you really know what you are doing.

Storing & lookup double array

I have a fairly expensive array calculation (SpectralResponse) which I like to keep to a minimum. I figured the best way is to store them and bring it back up when same array is needed again in the future. The decision is made using BasicParameters.
So right now, I use a LinkedList of object for the arrays of SpectralResponse, and another LinkedList for the BasicParameter. And the BasicParameters has a isParamsEqualTo(BasicParameters) method to compare the parameter set.
LinkedList<SpectralResponse> responses
LinkedList<BasicParameters> fitParams
LinkedList<Integer> responseNumbers
So to look up, I just go through the list of BasicParameters, check for match, if matched, return the SpectralResponse. If no match, then calculate the SpectralResponse.
Here's is the for loop I used to lookup.
size: LinkedList size, limited to a reasonable value
responseNumber: just another variable to distinguish the SpectralResponse.
for ( i = size-1; i > 0 ; i--) {
if (responseNumbers.get(i) == responseNum)
{
tempFit = fitParams.get(i);
if (tempFit.isParamsEqualTo(fit))
{
return responses.get(i);
}
}
}
But somehow, doing it this way no only take out lots of memory, it's actually slower than just calculating SpectralResponse straight. Much slower.
So it is my implementation that's wrong, or I was mistaken that precalculating and lookup is faster?
You are accessing a LinkedList by index, this is the worst possible way to access it ;)
You should use ArrayList instead, or use iterators for all your lists.
Possibly you should merge the three objects into one, and keep them in a map with responseNum as key.
Hope this helps!
You probably should use an array type (an actual array, like Vector, ArrayList), not Linked lists. Linked lists is best for stack or queue operation, not indexing (since you have to traverse it from one end). Vector is a auto resizing array, wich has less overhead in accessing inexes.
The get(i) methods of LinkedList require that to fetch each item it has to go further and further along the list. Consider using an ArrayList, the iterator() method, or just an array.
The second line, 'if (responseNumbers.get(i) == responseNum)' will also be inefficient as the responseNumbers.get(i) is an Integer, and has to be unboxed to an int (Java 5 onwards does this automatically; your code would not compile on Java 1.4 or earlier if responseNum is declared as an an int). See this for more information on boxing.
To remove this unboxing overhead, use an IntList from the apache primitives library. This library contains collections that store the underlying objects (ints in your case) as a primitive array (e.g. int[]) instead of an Object array. This means no boxing is required as the IntList's methods return primitive types, not Integers.

Categories

Resources