Efficient way to Find most similar List<String> - java

I have a list1<String> and other 1000 list<String>. I need to choose the list with the most exact match values.
Today I go over each list<String> and compare to list1 , save the coverge in some sortedlist and in the end choose the most similar list.
public static <T> List<T> intersection(List<T> list1, List<T> list2) {
List<T> list = new ArrayList<T>();
for (T t : list1) {
if(list2.contains(t)) {
list.add(t);
}
}
return list;
}
This operation to go over all the 1000 unique lists is taken lost of time assuming I have lots of lists to compare it too.
Could you please suggest me an efficient way / algorithm to do it?

Your lists are not sorted, so any contains() operation needs to search the whole list (or until found so N/2 on average).
So first sort (Collections.sort()) all lists, then use Collections.binarySearch() to find whether the String is contained or not. This needs then only (log N) instead of N/2 as before.

The accepted anwser is good, but can still be improved. You can simply use a LinkedHashSet, which will take O(n) to dump data into the set, and O(1) for each contains operation. This will help if your list is big, but for small ones, use the sort instead.
If you have duplicate entries in your list, you may got some unexpected result, as your original code will create more than one in the result. In that case, use something like Google Guava's LinkedHashMultiset. If you don't have Guava on your classpath, likely you will have to write one on your own if you want O(1) search time.
Just as a side note, the Collections.sort() will alter the original list. If you need the original order later or the list is somehow unmodifiable , you should create a copy of it, in which case i think you should try the set instead, because they take same amount of time to build, and HashSet use less time to perform a contains

Related

Sorted Lists in Java [duplicate]

In Java there are the SortedSet and SortedMap interfaces. Both belong to the Java Collections framework and provide a sorted way to access the elements.
However, in my understanding there is no SortedList in Java. You can use java.util.Collections.sort() to sort a list.
Any idea why it is designed like that?
List iterators guarantee first and foremost that you get the list's elements in the internal order of the list (aka. insertion order). More specifically it is in the order you've inserted the elements or on how you've manipulated the list. Sorting can be seen as a manipulation of the data structure, and there are several ways to sort the list.
I'll order the ways in the order of usefulness as I personally see it:
1. Consider using Set or Bag collections instead
NOTE: I put this option at the top because this is what you normally want to do anyway.
A sorted set automatically sorts the collection at insertion, meaning that it does the sorting while you add elements into the collection. It also means you don't need to manually sort it.
Furthermore if you are sure that you don't need to worry about (or have) duplicate elements then you can use the TreeSet<T> instead. It implements SortedSet and NavigableSet interfaces and works as you'd probably expect from a list:
TreeSet<String> set = new TreeSet<String>();
set.add("lol");
set.add("cat");
// automatically sorts natural order when adding
for (String s : set) {
System.out.println(s);
}
// Prints out "cat" and "lol"
If you don't want the natural ordering you can use the constructor parameter that takes a Comparator<T>.
Alternatively, you can use Multisets (also known as Bags), that is a Set that allows duplicate elements, instead and there are third-party implementations of them. Most notably from the Guava libraries there is a TreeMultiset, that works a lot like the TreeSet.
2. Sort your list with Collections.sort()
As mentioned above, sorting of Lists is a manipulation of the data structure. So for situations where you need "one source of truth" that will be sorted in a variety of ways then sorting it manually is the way to go.
You can sort your list with the java.util.Collections.sort() method. Here is a code sample on how:
List<String> strings = new ArrayList<String>()
strings.add("lol");
strings.add("cat");
Collections.sort(strings);
for (String s : strings) {
System.out.println(s);
}
// Prints out "cat" and "lol"
Using comparators
One clear benefit is that you may use Comparator in the sort method. Java also provides some implementations for the Comparator such as the Collator which is useful for locale sensitive sorting strings. Here is one example:
Collator usCollator = Collator.getInstance(Locale.US);
usCollator.setStrength(Collator.PRIMARY); // ignores casing
Collections.sort(strings, usCollator);
Sorting in concurrent environments
Do note though that using the sort method is not friendly in concurrent environments, since the collection instance will be manipulated, and you should consider using immutable collections instead. This is something Guava provides in the Ordering class and is a simple one-liner:
List<string> sorted = Ordering.natural().sortedCopy(strings);
3. Wrap your list with java.util.PriorityQueue
Though there is no sorted list in Java there is however a sorted queue which would probably work just as well for you. It is the java.util.PriorityQueue class.
Nico Haase linked in the comments to a related question that also answers this.
In a sorted collection you most likely don't want to manipulate the internal data structure which is why PriorityQueue doesn't implement the List interface (because that would give you direct access to its elements).
Caveat on the PriorityQueue iterator
The PriorityQueue class implements the Iterable<E> and Collection<E> interfaces so it can be iterated as usual. However, the iterator is not guaranteed to return elements in the sorted order. Instead (as Alderath points out in the comments) you need to poll() the queue until empty.
Note that you can convert a list to a priority queue via the constructor that takes any collection:
List<String> strings = new ArrayList<String>()
strings.add("lol");
strings.add("cat");
PriorityQueue<String> sortedStrings = new PriorityQueue(strings);
while(!sortedStrings.isEmpty()) {
System.out.println(sortedStrings.poll());
}
// Prints out "cat" and "lol"
4. Write your own SortedList class
NOTE: You shouldn't have to do this.
You can write your own List class that sorts each time you add a new element. This can get rather computation heavy depending on your implementation and is pointless, unless you want to do it as an exercise, because of two main reasons:
It breaks the contract that List<E> interface has because the add methods should ensure that the element will reside in the index that the user specifies.
Why reinvent the wheel? You should be using the TreeSet or Multisets instead as pointed out in the first point above.
However, if you want to do it as an exercise here is a code sample to get you started, it uses the AbstractList abstract class:
public class SortedList<E> extends AbstractList<E> {
private ArrayList<E> internalList = new ArrayList<E>();
// Note that add(E e) in AbstractList is calling this one
#Override
public void add(int position, E e) {
internalList.add(e);
Collections.sort(internalList, null);
}
#Override
public E get(int i) {
return internalList.get(i);
}
#Override
public int size() {
return internalList.size();
}
}
Note that if you haven't overridden the methods you need, then the default implementations from AbstractList will throw UnsupportedOperationExceptions.
Because the concept of a List is incompatible with the concept of an automatically sorted collection. The point of a List is that after calling list.add(7, elem), a call to list.get(7) will return elem. With an auto-sorted list, the element could end up in an arbitrary position.
Since all lists are already "sorted" by the order the items were added (FIFO ordering), you can "resort" them with another order, including the natural ordering of elements, using java.util.Collections.sort().
EDIT:
Lists as data structures are based in what is interesting is the ordering in which the items where inserted.
Sets do not have that information.
If you want to order by adding time, use List. If you want to order by other criteria, use SortedSet.
Set and Map are non-linear data structure. List is linear data structure.
The tree data structure SortedSet and SortedMap interfaces implements TreeSet and TreeMap respectively using used Red-Black tree implementation algorithm. So it ensure that there are no duplicated items (or keys in case of Map).
List already maintains an ordered collection and index-based data structure, trees are no index-based data structures.
Tree by definition cannot contain duplicates.
In List we can have duplicates, so there is no TreeList(i.e. no SortedList).
List maintains elements in insertion order. So if we want to sort the list we have to use java.util.Collections.sort(). It sorts the specified list into ascending order, according to the natural ordering of its elements.
JavaFX SortedList
Though it took a while, Java 8 does have a sorted List.
http://docs.oracle.com/javase/8/javafx/api/javafx/collections/transformation/SortedList.html
As you can see in the javadocs, it is part of the JavaFX collections, intended to provide a sorted view on an ObservableList.
Update: Note that with Java 11, the JavaFX toolkit has moved outside the JDK and is now a separate library. JavaFX 11 is available as a downloadable SDK or from MavenCentral. See https://openjfx.io
For any newcomers, as of April 2015, Android now has a SortedList class in the support library, designed specifically to work with RecyclerView. Here's the blog post about it.
Another point is the time complexity of insert operations.
For a list insert, one expects a complexity of O(1).
But this could not be guaranteed with a sorted list.
And the most important point is that lists assume nothing about their elements.
For example, you can make lists of things that do not implement equals or compare.
Think of it like this: the List interface has methods like add(int index, E element), set(int index, E element). The contract is that once you added an element at position X you will find it there unless you add or remove elements before it.
If any list implementation would store elements in some order other than based on the index, the above list methods would make no sense.
In case you are looking for a way to sort elements, but also be able to access them by index in an efficient way, you can do the following:
Use a random access list for storage (e.g. ArrayList)
Make sure it is always sorted
Then to add or remove an element you can use Collections.binarySearch to get the insertion / removal index. Since your list implements random access, you can efficiently modify the list with the determined index.
Example:
/**
* #deprecated
* Only for demonstration purposes. Implementation is incomplete and does not
* handle invalid arguments.
*/
#Deprecated
public class SortingList<E extends Comparable<E>> {
private ArrayList<E> delegate;
public SortingList() {
delegate = new ArrayList<>();
}
public void add(E e) {
int insertionIndex = Collections.binarySearch(delegate, e);
// < 0 if element is not in the list, see Collections.binarySearch
if (insertionIndex < 0) {
insertionIndex = -(insertionIndex + 1);
}
else {
// Insertion index is index of existing element, to add new element
// behind it increase index
insertionIndex++;
}
delegate.add(insertionIndex, e);
}
public void remove(E e) {
int index = Collections.binarySearch(delegate, e);
delegate.remove(index);
}
public E get(int index) {
return delegate.get(index);
}
}
(See a more complete implementation in this answer)
First line in the List API says it is an ordered collection (also known as a sequence). If you sort the list you can't maintain the order, so there is no TreeList in Java.
As API says Java List got inspired from Sequence and see the sequence properties http://en.wikipedia.org/wiki/Sequence_(mathematics)
It doesn't mean that you can't sort the list, but Java strict to his definition and doesn't provide sorted versions of lists by default.
I think all the above do not answer this question due to following reasons,
Since same functionality can be achieved by using other collections such as TreeSet, Collections, PriorityQueue..etc (but this is an alternative which will also impose their constraints i.e. Set will remove duplicate elements. Simply saying even if it does not impose any constraint, it does not answer the question why SortedList was not created by java community)
Since List elements do not implements compare/equals methods (This holds true for Set & Map also where in general items do not implement Comparable interface but when we need these items to be in sorted order & want to use TreeSet/TreeMap,items should implement Comparable interface)
Since List uses indexing & due to sorting it won't work (This can be easily handled introducing intermediate interface/abstract class)
but none has told the exact reason behind it & as I believe these kind of questions can be best answered by java community itself as it will have only one & specific answer but let me try my best to answer this as following,
As we know sorting is an expensive operation and there is a basic difference between List & Set/Map that List can have duplicates but Set/Map can not.
This is the core reason why we have got a default implementation for Set/Map in form of TreeSet/TreeMap. Internally this is a Red Black Tree with every operation (insert/delete/search) having the complexity of O(log N) where due to duplicates List could not fit in this data storage structure.
Now the question arises we could also choose a default sorting method for List also like MergeSort which is used by Collections.sort(list) method with the complexity of O(N log N). Community did not do this deliberately since we do have multiple choices for sorting algorithms for non distinct elements like QuickSort, ShellSort, RadixSort...etc. In future there can be more. Also sometimes same sorting algorithm performs differently depending on the data to be sorted. Therefore they wanted to keep this option open and left this on us to choose. This was not the case with Set/Map since O(log N) is the best sorting complexity.
https://github.com/geniot/indexed-tree-map
Consider using indexed-tree-map . It's an enhanced JDK's TreeSet that provides access to element by index and finding the index of an element without iteration or hidden underlying lists that back up the tree. The algorithm is based on updating weights of changed nodes every time there is a change.
We have Collections.sort(arr) method which can help to sort ArrayList arr. to get sorted in desc manner we can use Collections.sort(arr, Collections.reverseOrder())

SortedList that maintains order like SortedSet but also permits duplicate elements [duplicate]

In Java there are the SortedSet and SortedMap interfaces. Both belong to the Java Collections framework and provide a sorted way to access the elements.
However, in my understanding there is no SortedList in Java. You can use java.util.Collections.sort() to sort a list.
Any idea why it is designed like that?
List iterators guarantee first and foremost that you get the list's elements in the internal order of the list (aka. insertion order). More specifically it is in the order you've inserted the elements or on how you've manipulated the list. Sorting can be seen as a manipulation of the data structure, and there are several ways to sort the list.
I'll order the ways in the order of usefulness as I personally see it:
1. Consider using Set or Bag collections instead
NOTE: I put this option at the top because this is what you normally want to do anyway.
A sorted set automatically sorts the collection at insertion, meaning that it does the sorting while you add elements into the collection. It also means you don't need to manually sort it.
Furthermore if you are sure that you don't need to worry about (or have) duplicate elements then you can use the TreeSet<T> instead. It implements SortedSet and NavigableSet interfaces and works as you'd probably expect from a list:
TreeSet<String> set = new TreeSet<String>();
set.add("lol");
set.add("cat");
// automatically sorts natural order when adding
for (String s : set) {
System.out.println(s);
}
// Prints out "cat" and "lol"
If you don't want the natural ordering you can use the constructor parameter that takes a Comparator<T>.
Alternatively, you can use Multisets (also known as Bags), that is a Set that allows duplicate elements, instead and there are third-party implementations of them. Most notably from the Guava libraries there is a TreeMultiset, that works a lot like the TreeSet.
2. Sort your list with Collections.sort()
As mentioned above, sorting of Lists is a manipulation of the data structure. So for situations where you need "one source of truth" that will be sorted in a variety of ways then sorting it manually is the way to go.
You can sort your list with the java.util.Collections.sort() method. Here is a code sample on how:
List<String> strings = new ArrayList<String>()
strings.add("lol");
strings.add("cat");
Collections.sort(strings);
for (String s : strings) {
System.out.println(s);
}
// Prints out "cat" and "lol"
Using comparators
One clear benefit is that you may use Comparator in the sort method. Java also provides some implementations for the Comparator such as the Collator which is useful for locale sensitive sorting strings. Here is one example:
Collator usCollator = Collator.getInstance(Locale.US);
usCollator.setStrength(Collator.PRIMARY); // ignores casing
Collections.sort(strings, usCollator);
Sorting in concurrent environments
Do note though that using the sort method is not friendly in concurrent environments, since the collection instance will be manipulated, and you should consider using immutable collections instead. This is something Guava provides in the Ordering class and is a simple one-liner:
List<string> sorted = Ordering.natural().sortedCopy(strings);
3. Wrap your list with java.util.PriorityQueue
Though there is no sorted list in Java there is however a sorted queue which would probably work just as well for you. It is the java.util.PriorityQueue class.
Nico Haase linked in the comments to a related question that also answers this.
In a sorted collection you most likely don't want to manipulate the internal data structure which is why PriorityQueue doesn't implement the List interface (because that would give you direct access to its elements).
Caveat on the PriorityQueue iterator
The PriorityQueue class implements the Iterable<E> and Collection<E> interfaces so it can be iterated as usual. However, the iterator is not guaranteed to return elements in the sorted order. Instead (as Alderath points out in the comments) you need to poll() the queue until empty.
Note that you can convert a list to a priority queue via the constructor that takes any collection:
List<String> strings = new ArrayList<String>()
strings.add("lol");
strings.add("cat");
PriorityQueue<String> sortedStrings = new PriorityQueue(strings);
while(!sortedStrings.isEmpty()) {
System.out.println(sortedStrings.poll());
}
// Prints out "cat" and "lol"
4. Write your own SortedList class
NOTE: You shouldn't have to do this.
You can write your own List class that sorts each time you add a new element. This can get rather computation heavy depending on your implementation and is pointless, unless you want to do it as an exercise, because of two main reasons:
It breaks the contract that List<E> interface has because the add methods should ensure that the element will reside in the index that the user specifies.
Why reinvent the wheel? You should be using the TreeSet or Multisets instead as pointed out in the first point above.
However, if you want to do it as an exercise here is a code sample to get you started, it uses the AbstractList abstract class:
public class SortedList<E> extends AbstractList<E> {
private ArrayList<E> internalList = new ArrayList<E>();
// Note that add(E e) in AbstractList is calling this one
#Override
public void add(int position, E e) {
internalList.add(e);
Collections.sort(internalList, null);
}
#Override
public E get(int i) {
return internalList.get(i);
}
#Override
public int size() {
return internalList.size();
}
}
Note that if you haven't overridden the methods you need, then the default implementations from AbstractList will throw UnsupportedOperationExceptions.
Because the concept of a List is incompatible with the concept of an automatically sorted collection. The point of a List is that after calling list.add(7, elem), a call to list.get(7) will return elem. With an auto-sorted list, the element could end up in an arbitrary position.
Since all lists are already "sorted" by the order the items were added (FIFO ordering), you can "resort" them with another order, including the natural ordering of elements, using java.util.Collections.sort().
EDIT:
Lists as data structures are based in what is interesting is the ordering in which the items where inserted.
Sets do not have that information.
If you want to order by adding time, use List. If you want to order by other criteria, use SortedSet.
Set and Map are non-linear data structure. List is linear data structure.
The tree data structure SortedSet and SortedMap interfaces implements TreeSet and TreeMap respectively using used Red-Black tree implementation algorithm. So it ensure that there are no duplicated items (or keys in case of Map).
List already maintains an ordered collection and index-based data structure, trees are no index-based data structures.
Tree by definition cannot contain duplicates.
In List we can have duplicates, so there is no TreeList(i.e. no SortedList).
List maintains elements in insertion order. So if we want to sort the list we have to use java.util.Collections.sort(). It sorts the specified list into ascending order, according to the natural ordering of its elements.
JavaFX SortedList
Though it took a while, Java 8 does have a sorted List.
http://docs.oracle.com/javase/8/javafx/api/javafx/collections/transformation/SortedList.html
As you can see in the javadocs, it is part of the JavaFX collections, intended to provide a sorted view on an ObservableList.
Update: Note that with Java 11, the JavaFX toolkit has moved outside the JDK and is now a separate library. JavaFX 11 is available as a downloadable SDK or from MavenCentral. See https://openjfx.io
For any newcomers, as of April 2015, Android now has a SortedList class in the support library, designed specifically to work with RecyclerView. Here's the blog post about it.
Another point is the time complexity of insert operations.
For a list insert, one expects a complexity of O(1).
But this could not be guaranteed with a sorted list.
And the most important point is that lists assume nothing about their elements.
For example, you can make lists of things that do not implement equals or compare.
Think of it like this: the List interface has methods like add(int index, E element), set(int index, E element). The contract is that once you added an element at position X you will find it there unless you add or remove elements before it.
If any list implementation would store elements in some order other than based on the index, the above list methods would make no sense.
In case you are looking for a way to sort elements, but also be able to access them by index in an efficient way, you can do the following:
Use a random access list for storage (e.g. ArrayList)
Make sure it is always sorted
Then to add or remove an element you can use Collections.binarySearch to get the insertion / removal index. Since your list implements random access, you can efficiently modify the list with the determined index.
Example:
/**
* #deprecated
* Only for demonstration purposes. Implementation is incomplete and does not
* handle invalid arguments.
*/
#Deprecated
public class SortingList<E extends Comparable<E>> {
private ArrayList<E> delegate;
public SortingList() {
delegate = new ArrayList<>();
}
public void add(E e) {
int insertionIndex = Collections.binarySearch(delegate, e);
// < 0 if element is not in the list, see Collections.binarySearch
if (insertionIndex < 0) {
insertionIndex = -(insertionIndex + 1);
}
else {
// Insertion index is index of existing element, to add new element
// behind it increase index
insertionIndex++;
}
delegate.add(insertionIndex, e);
}
public void remove(E e) {
int index = Collections.binarySearch(delegate, e);
delegate.remove(index);
}
public E get(int index) {
return delegate.get(index);
}
}
(See a more complete implementation in this answer)
First line in the List API says it is an ordered collection (also known as a sequence). If you sort the list you can't maintain the order, so there is no TreeList in Java.
As API says Java List got inspired from Sequence and see the sequence properties http://en.wikipedia.org/wiki/Sequence_(mathematics)
It doesn't mean that you can't sort the list, but Java strict to his definition and doesn't provide sorted versions of lists by default.
I think all the above do not answer this question due to following reasons,
Since same functionality can be achieved by using other collections such as TreeSet, Collections, PriorityQueue..etc (but this is an alternative which will also impose their constraints i.e. Set will remove duplicate elements. Simply saying even if it does not impose any constraint, it does not answer the question why SortedList was not created by java community)
Since List elements do not implements compare/equals methods (This holds true for Set & Map also where in general items do not implement Comparable interface but when we need these items to be in sorted order & want to use TreeSet/TreeMap,items should implement Comparable interface)
Since List uses indexing & due to sorting it won't work (This can be easily handled introducing intermediate interface/abstract class)
but none has told the exact reason behind it & as I believe these kind of questions can be best answered by java community itself as it will have only one & specific answer but let me try my best to answer this as following,
As we know sorting is an expensive operation and there is a basic difference between List & Set/Map that List can have duplicates but Set/Map can not.
This is the core reason why we have got a default implementation for Set/Map in form of TreeSet/TreeMap. Internally this is a Red Black Tree with every operation (insert/delete/search) having the complexity of O(log N) where due to duplicates List could not fit in this data storage structure.
Now the question arises we could also choose a default sorting method for List also like MergeSort which is used by Collections.sort(list) method with the complexity of O(N log N). Community did not do this deliberately since we do have multiple choices for sorting algorithms for non distinct elements like QuickSort, ShellSort, RadixSort...etc. In future there can be more. Also sometimes same sorting algorithm performs differently depending on the data to be sorted. Therefore they wanted to keep this option open and left this on us to choose. This was not the case with Set/Map since O(log N) is the best sorting complexity.
https://github.com/geniot/indexed-tree-map
Consider using indexed-tree-map . It's an enhanced JDK's TreeSet that provides access to element by index and finding the index of an element without iteration or hidden underlying lists that back up the tree. The algorithm is based on updating weights of changed nodes every time there is a change.
We have Collections.sort(arr) method which can help to sort ArrayList arr. to get sorted in desc manner we can use Collections.sort(arr, Collections.reverseOrder())

Sorting a list from the beginning

I have a list of objects that will take lots of adds/removes. I want the list to be sorted according to a certain function.
Right now, everytime I add a new object, I do:
list.add(obj1);
Collections.sort(list1, comparator);
Removing an object doesn't "unsort" the list, so I only need to do this for the add operation.
However, Collections.sort is O(>N) which is not very fast.
Is there any structure in java that allows me to keep a sorted list from the very beginning?
Forgot to mention
I tried to use TreeSet. It allows me to a pass a comparator which will be used for sorting but which will also be used to remove elements which is not what I want. I want it to be sorted but the remove functionality be identical to a list.
As Alencar proposes, use Collections.binarySearch(yourList, key, comparator) to find the correct insert position. This is far faster than looking up the whole list, since binarySearch only needs log2(size of list) queries to find the correct insertion position.
So, if your insert code was
void sortedInsert(List<T> list, T value, Comparator<T> comparator) {
int pos=0;
ListIterator<T> it=list.listIterator();
while (it.hasNext() && comparator.compare(value, it.next()) < 0) pos ++;
if (pos < it.previousIndex()) it.previous();
it.add(value);
}
... and a faster version would be ...
void sortedInsert2(List<T> list, T value, Comparator<T> comparator) {
int pos = Collections.binarySearch(list, value, comparator);
if (pos < 0) {
pos = -pos -1; // returns negatives if not found; see javadoc
}
list.insert(value, pos);
}
Note that the difference may not be that great, because inserting into a non-linked list requires shifting elements up. So if the underlying list is an ArrayList, copying, on average, half the elements one place to the right to make place for the new element results in O(n) extra time per copy. And if the list is linked, you still need to pay that O(n) penalty, but this time to perform the binary search -- in that case, it would probably be better to use the first version.
Note that the 1st version uses a ListIterator just in case you decide to use linked lists. For ArrayLists, it would be easier to use list.get(pos) and list.add(pos, value), but those are a very bad idea in the context of iterating linked lists.
Can't you add the object in the "right" place? already sorted?
You said that every time you add a new object you sort the list/collection.
Maybe you can use a Binary Search to find the exact index you need to insert the new value or use your comparator to find.

How to make a sorted set with an O(1) random access by index

Need a collection of strings where elements inserted needed to be sorted and also non-duplicate, can be retrieved through index.
I can use TreeSet which removes duplicates and sorts everything in
order but cannot retrieve through index. for retrieving through
index, i can make ArrayList and addAll elements to it, but this
addAll takes lot of time.
or
I can use an ArrayList, insert required and then remove duplicates by some other method, then using Collections.sort method to sort elements.
But the thing is, all these take time, is there any straight-way to achieve this, a collection -sorted, non-duplicate, with O(1) random access by index.
There's a Data Type in the commons collection called SetUniqueList that I believe meetsyour needs perfectly. Check it out:
https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/list/SetUniqueList.html
You can use the second idea:
I can use ArrayList,insert required and then remove duplicates by some
other method, then using Collections.sort method to sort elements.
but instead of removing the duplicates before the sort, you could sort the ArrayList first, then all duplicates are on consecutive positions and can be removed in a single pass afterwards.
At this point, both your methods have the same overall complexity: O(N*logN) and it's worth noting that you cannot obtain a sorted sequence faster than this anyway (without additional exploitation of some knowledge about the values).
The real problem here is that the OP has not told us the real problem. So lots of people guess at data structures and post answers without really thinking.
The real symptom, as the OP stated in a comment, is that it takes 700ms to put the strings in a TreeSet, and another 700 ms to copy that TreeSet into an ArrayList. Obviously, the program is not doing what the OP thinks it is, as the copy should take at most a few microseconds. In fact, the program below, running on my ancient Thinkpad, takes only 360ms to create 100,000 random strings, put them in a TreeSet, and copy that TreeSet into an ArrayList.
That said, the OP has selected an answer (twice). Perhaps if/when the OP decides to think about the real problem, this example of an SSCCE will be helpful. It's CW, so feel free to edit it.
import java.lang.management.ManagementFactory;
import java.lang.management.ThreadMXBean;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.TreeSet;
public class Microbench
{
public static void main(String[] argv)
throws Exception
{
ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
long start = threadBean.getCurrentThreadCpuTime();
executeTest();
long finish = threadBean.getCurrentThreadCpuTime();
double elapsed = (finish - start) / 1000000.0;
System.out.println(String.format("elapsed time = %7.3f ms", elapsed));
}
private static List<String> executeTest()
{
String[] data = generateRandomStrings(100000);
TreeSet<String> set = new TreeSet<String>();
for (String s : data)
set.add(s);
return new ArrayList<String>(set);
}
private static String[] generateRandomStrings(int size)
{
Random rnd = new Random();
String[] result = new String[size];
for (int ii = 0 ; ii < size ; ii++)
result[ii] = String.valueOf(rnd.nextLong());
return result;
}
}
The performance depends on how frequently the elements are added and how frequently they will be accessed by index.
I can use TreeSet which removes duplicates and sorts everything in order but cannot retrieve through index. for retrieving through index, i can make arraylist and addall elements to it, but this addAll takes lot of time.
List.addAll(yourSortedSet) will take atleast O(n) time and space each time you want to access the SortedSet as List (i.e. by the index of element).
I can use ArrayList,insert required and then remove duplicates by some other method, then using Collections.sort method to sort elements.
sorting will certainly take More than O(n) each time you want a sorted view of your list.
One more solution
If you are not fetching by the index very often then it is more efficient to do it as follows:
Just store Strings in a SortedSet may be extend TreeSet and provide/implement your own get(int i) method where you iterate till the ith element and return that element. In the worst case, this will be O(n) otherwise much lesser. This way you are not performing any comparison or conversion or copying of Strings. No extra space is needed.
I am not sure, do you test map? I mean use your string as key in a TreeMap.
In a Map, it is a O(1) for a key to find its position(a hash value). And TreeMap's keySet will return a sorted set of keys in TreeMap.
Does this fit your requirement?
If you are bound to the List at the beginning and the end of the operation, convert it into a Set with the "copy" constructor (or addAll) after the elements are populated, this removes the duplicates. If you convert it into a TreeSet with an appropriate Comparator it'll even sort it. Than, you can convert it back into a List.
Use a Hashmap you will have solved problem with unique values and sort it by some of sorting methods. If it is possible use quicksort.
Maybe using LinkedList (which takes less memory than arraylist) with boolean method which determines if that element is already in the list and a QuickSort algorithm. All structures in java have to be somehow sorted and protected from duplicates I think, so everything takes time...
there is two ways to do that use LinkedMap where each element in map is unique or make your own extention of list and override method add
import java.util.ArrayList;
public class MyList<V> extends ArrayList<V>{
private static final long serialVersionUID = 5847609794342633994L;
public boolean add(V object) {
//make each object unique
if(contains(object)){
return false;
}
//you can make here ordering and after save it at position
//your ordering here
//using extended method add
super.add(yourposition,object);
}
}
I also faced the problem of finding element at a certain position in a TreeMap. I enhanced the tree with weights that allow accessing elements by index and finding elements at indexes.
The project is called indexed-tree-map https://github.com/geniot/indexed-tree-map . The implementation for finding index of an element or element at an index in a sorted map is not based on linear iteration but on a tree binary search. Updating weights of the tree is also based on vertical tree ascent. So no linear iterations.

Filtering List without using iterator

I need to filter a List of size 1000 or more and get a sublist out of it.
I dont want to use an iterator.
1) At present I am iterating the List and comparing it using Java. This is time consuming task. I need to increase the performance of my code.
2) I also tried to use Google Collections(Guava), but I think it will also iterate in background.
Predicate<String> validList = new Predicate<String>(){
public boolean apply(String aid){
return aid.contains("1_15_12");
}
};
Collection<String> finalList =com.google.common.collect.Collections2.filter(Collection,validList);
Can anyone suggest me how can I get sublist faster without iterating or if iterator is used I will get result comparatively faster.
Consider what happens if you call size() on your sublist. That has to check every element, as every element may change the result.
If you have a very specialized way of using your list which means you don't touch every element in it, don't use random access, etc, perhaps you don't want the List interface at all. If you could tell us more about what you're doing, that would really help.
List is an ordered collection of objects. So You must to iterate it in order to filter.
I enrich my comment:
I think iterator is inevitable during filtering, as each element has to be checked.
Regarding to Collections2.filter, it's different from simple filter: the returned Collection is still "Predicated". That means IllegalArgumentException will be thrown if unsatisfied element is added to the Collection.
If the performance is really your concern, most probably the predicate is pretty slow. What you can do is to Lists.partition your list, filter in parallel (you have to write this) and then concatenate the results.
There might be better ways to solve your problem, but we would need more information about the predicate and the data in the List.

Categories

Resources