Sorting a list from the beginning - java

I have a list of objects that will take lots of adds/removes. I want the list to be sorted according to a certain function.
Right now, everytime I add a new object, I do:
list.add(obj1);
Collections.sort(list1, comparator);
Removing an object doesn't "unsort" the list, so I only need to do this for the add operation.
However, Collections.sort is O(>N) which is not very fast.
Is there any structure in java that allows me to keep a sorted list from the very beginning?
Forgot to mention
I tried to use TreeSet. It allows me to a pass a comparator which will be used for sorting but which will also be used to remove elements which is not what I want. I want it to be sorted but the remove functionality be identical to a list.

As Alencar proposes, use Collections.binarySearch(yourList, key, comparator) to find the correct insert position. This is far faster than looking up the whole list, since binarySearch only needs log2(size of list) queries to find the correct insertion position.
So, if your insert code was
void sortedInsert(List<T> list, T value, Comparator<T> comparator) {
int pos=0;
ListIterator<T> it=list.listIterator();
while (it.hasNext() && comparator.compare(value, it.next()) < 0) pos ++;
if (pos < it.previousIndex()) it.previous();
it.add(value);
}
... and a faster version would be ...
void sortedInsert2(List<T> list, T value, Comparator<T> comparator) {
int pos = Collections.binarySearch(list, value, comparator);
if (pos < 0) {
pos = -pos -1; // returns negatives if not found; see javadoc
}
list.insert(value, pos);
}
Note that the difference may not be that great, because inserting into a non-linked list requires shifting elements up. So if the underlying list is an ArrayList, copying, on average, half the elements one place to the right to make place for the new element results in O(n) extra time per copy. And if the list is linked, you still need to pay that O(n) penalty, but this time to perform the binary search -- in that case, it would probably be better to use the first version.
Note that the 1st version uses a ListIterator just in case you decide to use linked lists. For ArrayLists, it would be easier to use list.get(pos) and list.add(pos, value), but those are a very bad idea in the context of iterating linked lists.

Can't you add the object in the "right" place? already sorted?
You said that every time you add a new object you sort the list/collection.
Maybe you can use a Binary Search to find the exact index you need to insert the new value or use your comparator to find.

Related

Does the ArrayList's contains() method work faster if the ArrayList is ordered?

I suspect it doesn't. If I want to use the fact that the list is ordered, should I implement my own contains() method, using binary search, for example? Are there any methods that assume that the list is ordered?
This question is different to the possible duplicate because the other question doesn't ask about the contains() method.
No, because ArrayList is backed by array and internally calls indexOf(Object o) method where it searches sequentially. Thus sorting is not relevant to it. Here's the source code:
/**
* Returns the index of the first occurrence of the specified element
* in this list, or -1 if this list does not contain the element.
* More formally, returns the lowest index <tt>i</tt> such that
* <tt>(o==null ? get(i)==null : o.equals(get(i)))</tt>,
* or -1 if there is no such index.
*/
public int indexOf(Object o) {
if (o == null) {
for (int i = 0; i < size; i++)
if (elementData[i]==null)
return i;
} else {
for (int i = 0; i < size; i++)
if (o.equals(elementData[i]))
return i;
}
return -1;
}
Use binary search of collections to search in an ordered array list
Collections.<T>binarySearch(List<T> list, T key)
Arraylist.contains will consider this as a normal list and it would take the same amount of time as any unordered list that is O(n) whereas complexity of binary search would be O(logn) in worst case
No. contains uses indexOf:
public boolean contains(Object var1) {
return this.indexOf(var1) >= 0;
}
and indexOf just simply iterates over the internal array:
for(var2 = 0; var2 < this.size; ++var2) {
if (var1.equals(this.elementData[var2])) {
return var2;
}
}
Collections.binarySearch is what you're looking for:
Searches the specified list for the specified object using the binary
search algorithm. The list must be sorted into ascending order
according to the natural ordering of its elements (as by the
sort(List) method) prior to making this call. If it is not sorted, the
results are undefined.
Emphasis mine
Also consider using a SortedSet such as a TreeSet which will provide stronger guarantees that the elements are kept in the correct order, unlike a List which must rely on caller contracts (as highlighted above)
Does the ArrayList's contains() method work faster if the ArrayList is ordered?
It doesn't. The implementation of ArrayList does not know if the list is ordered or not. Since it doesn't know, it cannot optimize in the case when it is ordered. (And an examination of the source code bears this out.)
Could a (hypothetical) array-based-list implementation know? I think "No" for the following reasons:
Without either a Comparator or a requirement that elements implement Comparable, the concept of ordering is ill-defined.
The cost of checking that a list is ordered is O(N). The cost of incrementally checking that a list is still ordered is O(1) ... but still one or two calls to compare on each update operation. That is a significant overhead ... for a general purpose data structure to incur in the hope of optimizing (just) one operation in the API.
But that's OK. If you (the programmer) are able to ensure (ideally by efficient algorithmic means) that a list is always ordered, then you can use Collections.binarySearch ... with zero additional checking overhead in update operations.
Just to keep it simple.
If you have an array [5,4,3,2,1] and you order it to [1,2,3,4,5] will forks faster if you look for 1 but it will take longer to find 5. Consequently, from the mathematical point of view if you order an array, searching for an item inside will anyway require to loop from 1 to, in the worst case, n.
May be that for your problem sorting may help, say you receive unordered timestamps but
if your array is not too small
want to avoid the additional cost of sorting per each new entry in the array
you just want to find quickly an object
you know the Object properties you are searching for
you can create a KeyObject containing the properties you are looking for implements equals & hashCode for it then store your items into a Map. Using a Map.containsKey(new KeyObject(prop1, prop2)) would be in any case faster than looping the array. If you do not have the real object you can always create a fake KeyObject, filled with the properties you expect, to check the Map.

Efficient way to Find most similar List<String>

I have a list1<String> and other 1000 list<String>. I need to choose the list with the most exact match values.
Today I go over each list<String> and compare to list1 , save the coverge in some sortedlist and in the end choose the most similar list.
public static <T> List<T> intersection(List<T> list1, List<T> list2) {
List<T> list = new ArrayList<T>();
for (T t : list1) {
if(list2.contains(t)) {
list.add(t);
}
}
return list;
}
This operation to go over all the 1000 unique lists is taken lost of time assuming I have lots of lists to compare it too.
Could you please suggest me an efficient way / algorithm to do it?
Your lists are not sorted, so any contains() operation needs to search the whole list (or until found so N/2 on average).
So first sort (Collections.sort()) all lists, then use Collections.binarySearch() to find whether the String is contained or not. This needs then only (log N) instead of N/2 as before.
The accepted anwser is good, but can still be improved. You can simply use a LinkedHashSet, which will take O(n) to dump data into the set, and O(1) for each contains operation. This will help if your list is big, but for small ones, use the sort instead.
If you have duplicate entries in your list, you may got some unexpected result, as your original code will create more than one in the result. In that case, use something like Google Guava's LinkedHashMultiset. If you don't have Guava on your classpath, likely you will have to write one on your own if you want O(1) search time.
Just as a side note, the Collections.sort() will alter the original list. If you need the original order later or the list is somehow unmodifiable , you should create a copy of it, in which case i think you should try the set instead, because they take same amount of time to build, and HashSet use less time to perform a contains

iterating through a linkedhashmap but over a certain range

So i know how to iterate through a whole linkedhashmap from the beginning, but what if I want to only link through a certain portion from it? IE: i want to start from the end and go only 4 elements back. How would I do that and is it possible?
What you are searching for is a ListIterator which would allow you to iterate backwards in a list. Unfortunately, LinkedHashMap does not hold a reference towards the previous element, and thus does not provide this iterator.
So, you end up with two solutions. One, you implement the method to find the X last elements: you hold, let's say an array (a circular buffer) of size X and keep there the last X elements you have seen. This solution is rather inefficient if you call this method frequently and for X much smaller than the size of your map.
A second solution is to keep a HashMap instead of a LinkedHashMap and an extra List to maintain the insertion order. E.g. an ArrayList or a LinkedList which provide a ListIterator and thus, backwards iteration.
You could use the ListIterator for this, by doing something like this.
List list = new ArrayList<>(map.keySet());
ListIterator li = list.listIterator(list.size());
while (li.hasPrevious()) {
System.out.println(map.get(li.previous()));
}
Since the LinkedHashMap maintains the order, you could simply create a list from the keys which are going to be in order as well. Get a ListIterator from the last index, so that you can traverse backwards having a counter(which I've not shown) to iterator till the no. of elements required.
You have to extend standard implementation and override methods that return appropriate iterator to your own.
Iterator<K> newKeyIterator() { return new KeyIterator(); }
Iterator<V> newValueIterator() { return new ValueIterator(); }
Iterator<Map.Entry<K,V>> newEntryIterator() { return new EntryIterator(); }
LinkedHashMap.Entry is a doubly linked list, so you can go forward and backward as well.
LinkedHashMap.LinkedHashIterator is a base iterator for LinkedHashMap. Make what you need based on it.

How to make a sorted set with an O(1) random access by index

Need a collection of strings where elements inserted needed to be sorted and also non-duplicate, can be retrieved through index.
I can use TreeSet which removes duplicates and sorts everything in
order but cannot retrieve through index. for retrieving through
index, i can make ArrayList and addAll elements to it, but this
addAll takes lot of time.
or
I can use an ArrayList, insert required and then remove duplicates by some other method, then using Collections.sort method to sort elements.
But the thing is, all these take time, is there any straight-way to achieve this, a collection -sorted, non-duplicate, with O(1) random access by index.
There's a Data Type in the commons collection called SetUniqueList that I believe meetsyour needs perfectly. Check it out:
https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/list/SetUniqueList.html
You can use the second idea:
I can use ArrayList,insert required and then remove duplicates by some
other method, then using Collections.sort method to sort elements.
but instead of removing the duplicates before the sort, you could sort the ArrayList first, then all duplicates are on consecutive positions and can be removed in a single pass afterwards.
At this point, both your methods have the same overall complexity: O(N*logN) and it's worth noting that you cannot obtain a sorted sequence faster than this anyway (without additional exploitation of some knowledge about the values).
The real problem here is that the OP has not told us the real problem. So lots of people guess at data structures and post answers without really thinking.
The real symptom, as the OP stated in a comment, is that it takes 700ms to put the strings in a TreeSet, and another 700 ms to copy that TreeSet into an ArrayList. Obviously, the program is not doing what the OP thinks it is, as the copy should take at most a few microseconds. In fact, the program below, running on my ancient Thinkpad, takes only 360ms to create 100,000 random strings, put them in a TreeSet, and copy that TreeSet into an ArrayList.
That said, the OP has selected an answer (twice). Perhaps if/when the OP decides to think about the real problem, this example of an SSCCE will be helpful. It's CW, so feel free to edit it.
import java.lang.management.ManagementFactory;
import java.lang.management.ThreadMXBean;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.TreeSet;
public class Microbench
{
public static void main(String[] argv)
throws Exception
{
ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
long start = threadBean.getCurrentThreadCpuTime();
executeTest();
long finish = threadBean.getCurrentThreadCpuTime();
double elapsed = (finish - start) / 1000000.0;
System.out.println(String.format("elapsed time = %7.3f ms", elapsed));
}
private static List<String> executeTest()
{
String[] data = generateRandomStrings(100000);
TreeSet<String> set = new TreeSet<String>();
for (String s : data)
set.add(s);
return new ArrayList<String>(set);
}
private static String[] generateRandomStrings(int size)
{
Random rnd = new Random();
String[] result = new String[size];
for (int ii = 0 ; ii < size ; ii++)
result[ii] = String.valueOf(rnd.nextLong());
return result;
}
}
The performance depends on how frequently the elements are added and how frequently they will be accessed by index.
I can use TreeSet which removes duplicates and sorts everything in order but cannot retrieve through index. for retrieving through index, i can make arraylist and addall elements to it, but this addAll takes lot of time.
List.addAll(yourSortedSet) will take atleast O(n) time and space each time you want to access the SortedSet as List (i.e. by the index of element).
I can use ArrayList,insert required and then remove duplicates by some other method, then using Collections.sort method to sort elements.
sorting will certainly take More than O(n) each time you want a sorted view of your list.
One more solution
If you are not fetching by the index very often then it is more efficient to do it as follows:
Just store Strings in a SortedSet may be extend TreeSet and provide/implement your own get(int i) method where you iterate till the ith element and return that element. In the worst case, this will be O(n) otherwise much lesser. This way you are not performing any comparison or conversion or copying of Strings. No extra space is needed.
I am not sure, do you test map? I mean use your string as key in a TreeMap.
In a Map, it is a O(1) for a key to find its position(a hash value). And TreeMap's keySet will return a sorted set of keys in TreeMap.
Does this fit your requirement?
If you are bound to the List at the beginning and the end of the operation, convert it into a Set with the "copy" constructor (or addAll) after the elements are populated, this removes the duplicates. If you convert it into a TreeSet with an appropriate Comparator it'll even sort it. Than, you can convert it back into a List.
Use a Hashmap you will have solved problem with unique values and sort it by some of sorting methods. If it is possible use quicksort.
Maybe using LinkedList (which takes less memory than arraylist) with boolean method which determines if that element is already in the list and a QuickSort algorithm. All structures in java have to be somehow sorted and protected from duplicates I think, so everything takes time...
there is two ways to do that use LinkedMap where each element in map is unique or make your own extention of list and override method add
import java.util.ArrayList;
public class MyList<V> extends ArrayList<V>{
private static final long serialVersionUID = 5847609794342633994L;
public boolean add(V object) {
//make each object unique
if(contains(object)){
return false;
}
//you can make here ordering and after save it at position
//your ordering here
//using extended method add
super.add(yourposition,object);
}
}
I also faced the problem of finding element at a certain position in a TreeMap. I enhanced the tree with weights that allow accessing elements by index and finding elements at indexes.
The project is called indexed-tree-map https://github.com/geniot/indexed-tree-map . The implementation for finding index of an element or element at an index in a sorted map is not based on linear iteration but on a tree binary search. Updating weights of the tree is also based on vertical tree ascent. So no linear iterations.

Random beginning index iterator for HashSet

I use HashSet for add(); remove(); clear(); iterator(); methods. So far everything worked like a charm. However, now I need to fulfill a different requirement.
I'd like to be able to start iterating from a certain index. For example, I'd like the following two programs to have same output.
Program 1
Iterator it=map.iterator();
for(int i=0;i<100;i++)
{
it.next();
}
while (it.hasNext())
{
doSomethingWith(it.next());
}
Program 2
Iterator it=map.iterator(100);
while (it.hasNext())
{
doSomethingWith(it.next());
}
The reason I don't want to use the Program 1 is that it creates unnecessary overhead. From my research, I couldn't not find a practical way of creating an iterator with beginning index.
So, my question is, what would be a good way to achieve my goal while minimizing the overhead?
Thank you.
There is a reason why add(), remove(), are fast in a HashSet. You are trading the ability to treat the elements in the set as a random access list for speed and memory costs.
I'm afraid you can't really do that unless you convert your Set into a List first. This is simple to do but it usually involved a complete processing of all the elements in a Set. If you want the ability to start the iterator from a certain place more than once form the same state it might make sense. If not then you will probably be better with your current approach.
And now for the code (assuming that Set<Integer> set = new HashSet<Integer>(); is your declared data structure:
List<Integer> list = new ArrayList<Integer>(set);
list.subList(100, list.size()).iterator(); // this will get your iterator.
HashSet does not have order. So you could put the set into a List which could use index.
Example:
HashSet set = new HashSet();
//......
ArrayList list = new ArrayList(set);
Since a HashSet's iterator produces items in no particular order, it doesn't really make any difference whether you drop 100 items from the beginning or from the end.
Dropping items from the end would be faster.
Iterator it = map.iterator();
int n = map.size() - 100;
for (int i = 0; i < n; i++)
doSomethingWith(it.next());
You can make use of a NavigableMap. If you can rely on keys (and start from a certain key), that's out of the box.
Map<K,V> submap = navigableMap.tailMap(fromKey);
Then you'll use the resulting submap to simply get the iterator() and do your stuff.
Otherwise, if you must start at some index, you may need make use of a temporary list.
K fromKey = new ArrayList<K>( navigableMap.keySet() ).get(index);
and then get the submap as above.
Following #toader's suggestion.
for(Integer i : new ArrayList<Integer>(set).subList(100, set.size())) {
// from the 100'th value.
}
Note: the nth value has no meaning with a HashSet. Perhaps you need a SortedSet in which case the 100th would be 100th largest value.

Categories

Resources