Iterating and comparing big data set - java

Basically I receive a 2 big data lists from 2 different database, the list looks like this:
List 1:
=============
A000001
A000002
A000003
.
.
A999999
List 2:
=============
121111
000111
000003
000001
.
.
I need to compare two list and find out each data which is in List 1 is available in List 2 (after appending some standard key to it), so that and if it is available put it in 3rd list for further manipulation. As an example A000001 is available in List 1 as well as in List 2 (after appending some standard key to it) so I need to put it in 3rd list.
Basically I have this code, it does like this for each row in List 1, I'm iterating through all data in List 2 and doing comparison. (Both are array list)
List<String> list1 = //Data of list 1 from db
List<String> list2 = //Data of list 2 from db
for(String list1Item:list1) {
for(String list2Item:list2) {
String list2ItemAfterAppend = "A" + list2Item;
if(list1Item.equalsIgnoreCase(list2ItemAfterAppend)) {
//Add it to 3rd list
}
}
}
Yes, this logic works fine, but I feel this is not efficient way to iterate list. After putting timers, it's taking 13444 milliseconds on average for 2000x5000 list of data. My question is, is there any other logic you people can think of or suggest me to improve the performance of this code?
I hope I'm clear, if not please let me know if I can improve question.

You can order both list, then using only one loop iterate on both value, switching which index increments depending on which value is the biggest. Something like:
boolean isWorking = true;
Collections.sort(list1);
Collections.sort(list2);
int index1 = 0;
int index2 = 0;
while(isWorking){
String val1 = list1.get(index1);
String val2 = "A" + list2.get(index2);
int compare = val1.compareTo(val2)
if(compare == 0){
list3.add(val1);
index1++;
index2++;
}else if (compare > 0){
val2++;
}else{ // if(compare < 0)
val1++;
}
isWorking = !(index1 == list1.size() || index2 == list2.size() );
}
Be carefull about what kind of List you're using. The get(int i) on LinkedList is expensive, whereas it is not on an ArrayList. Also, you might want to save list1.size() and list2.size(), I dont't think it calcluates it everytime, but chek it. I'm not sure if it's really usefull/efficient, but you can initialise list3 with the size of the smallest of both list (taking into acount the loadFactor, look up for it), so list3 doesnt have to resize everytime.
The code above is not tested (maybe switch val1++ and val2++), but you get the idea. I believe it's faster than yours (because it's O(n+m) rather than O(n*m) but I'll let you see (both sort() and compareTo() will add some time compared to your method, but normally it shouldn't be too much). If you can, use your RDBMS to sort both list when you get them (so you don't have to do it in the Java code)

I think the problem is how big the list is and how much memory you have.
For me for under 1 million records, I will use a HashSet to make it faster.
Code may like:
Set<String> set1 = //Data of list 1 from db, when you get the data you make it a Set instead of a List. HashSet is enough for you to use.
List<String> list2 = //Data of list 2 from db
Then you just need to:
for(String list2Item:list2) {
if(set1.contains("A" + list2Item) {
}
}
Hope this can help you.

You can use intersection method from apache commons. Example:
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.List;
import org.apache.commons.collections4.CollectionUtils;
public class NewClass {
public static void main(String[] args) {
List<String> list1 = Arrays.asList("A000001","A000002","A000003");
List<String> list2 = Arrays.asList("121111","000111","000001");
List<String> list3 = new ArrayList<>();
list2.stream().forEach((s) -> {list3.add("A"+s);});
Collection<String> common = CollectionUtils.intersection(list1, list3);
}
}

You could try to use the Stream API for this, the code to create the new list with Streams is very concise and straightforward and probably very similar in performance:
List<String> list3 = list2.stream()
.map(s->"A"+s)
.filter(list1::contains)
.collect(Collectors.toList());
If the list are big, you could try to process the list in parallel and use multiple threads to process the list. This may or may not improve the performance. Doing some measures its important to check if processing the list in parallel is actually improving the performance.
To process the stream in parallel, you only need to call the method parallel on the stream:
List<String> list3 = list2.stream()
.parallel()
.map(s->"A"+s)
.filter(list1::contains)
.collect(Collectors.toList());

Your code is doing a lot of String manipulation, 'equalsIgnoreCase' convert the Characters to upper/lower case. This is being executed in your inner loop and the size of your list is 5000x2000, so the String manipulation is being done millions of times.
Ideally, get your Strings in either upper or lower case from the database and avoid the conversion inside the inner loop. If this is not possible, probably converting the case of the String at the beginning improves the performance.
Then, you could create a new list with the elements of one of the lists and keep all the elements present in the other list, the code with the uppercase conversion could be:
list1.replaceAll(String::toUpperCase);
List<String> list3 = new ArrayList<>(list2);
list3.replaceAll(s->"A"+s.toUpperCase());
list3.retainAll(list1);

Related

Check if some elements are present in a List in Java

I would like to check if multiple elements are present in a list, at the same time.
For example
List<Integer> output = Arrays.asList(1,2,3,4);
Instead of checking for occurrence of 1,2 and 3 in the list as
output.contains(1);
output.contains(2);
output.contains(3);
I would like to know if there is a way to check for all elements in a single line.
if (output.containsAll(Arrays.asList(1,2,3))) {
// Your Code
}
There is a method in Java for it. It's called containsAll() Take in mind that under the hood it's not faster than calling contains() for each of the elements. The algorithm speed is approximately O(n*m) where n and m are the sizes of both collections.
Create a new list for the elements which you wish yo check and then do
List<Integer> output = Arrays.asList(1,2,3,4);
List<Integer> results = Arrays.asList(1,2,3);
if (output.containsAll(results)) {
//do stuff
}

Most efficient way to find duplicates in a linkedlist of linkedlist of strings - java

Let us suppose we have a linkedlist of linkedlist of strings.
LinkedList<LinkedList<String>> lls = new LinkedList<LinkedList<String>> ();
LinkedList<String> list1 = new LinkedList<String>(Arrays.asList("dog", "cat", "snake"));
LinkedList<String> list2 = new LinkedList<String>(Arrays.asList("donkey", "fox", "dog"));
LinkedList<String> list3 = new LinkedList<String>(Arrays.asList("horse", "cat", "pig"));
lls.add(list1);
lls.add(list2);
lls.add(list3);
As you can see, this 3 linkedlist of strings are different but also have some elements in common.
My goal is to write a function that compares each list with the others and returns TRUE if there is at least one element in common (dog is in list1 and list2), FALSE otherwise.
I think that the first thing I need is to compare all possible permutation among lists and the comparison between lists is element by element.
I'm not sure this is the most efficient approach.
Could you suggest an idea that is eventually most efficient?
Assuming that the given lists should not be changed by removing elements or sorting them (which has O(nlogn) complexity, by the way), you basically need one function as a "building block" for the actual solution. Namely, a function that checks whether one collection contains any element that is contained in another collection.
Of course, this can be solved by using Collection#contains on the second collection. But for some collections (particularly, for lists), this has O(n), and the overall running time of the check would be O(n*n).
To avoid this, you can create a Set that contains all elements of the second collection. For a Set, the contains method is guaranteed to be O(1).
Then, the actual check can be done conveniently, with Stream#anyMatch:
containing.stream().anyMatch(e -> set.contains(e))
So the complete example could be
import java.util.Arrays;
import java.util.Collection;
import java.util.LinkedHashSet;
import java.util.LinkedList;
import java.util.List;
import java.util.Set;
public class DuplicatesInLinkedLists
{
public static void main(String[] args)
{
LinkedList<LinkedList<String>> lls =
new LinkedList<LinkedList<String>>();
LinkedList<String> list1 =
new LinkedList<String>(Arrays.asList("dog", "cat", "snake"));
LinkedList<String> list2 =
new LinkedList<String>(Arrays.asList("donkey", "fox", "dog"));
LinkedList<String> list3 =
new LinkedList<String>(Arrays.asList("horse", "cat", "pig"));
lls.add(list1);
lls.add(list2);
lls.add(list3);
checkDuplicates(lls);
}
private static void checkDuplicates(
List<? extends Collection<?>> collections)
{
for (int i = 0; i < collections.size(); i++)
{
for (int j = i + 1; j < collections.size(); j++)
{
Collection<?> ci = collections.get(i);
Collection<?> cj = collections.get(j);
boolean b = containsAny(ci, cj);
System.out.println(
"Collection " + ci + " contains any of " + cj + ": " + b);
}
}
}
private static boolean containsAny(Collection<?> containing,
Collection<?> contained)
{
Set<Object> set = new LinkedHashSet<Object>(contained);
return containing.stream().anyMatch(e -> set.contains(e));
}
}
A side note: The code that you posted almost certainly does not make sense in the current form. The declaration and creation of the lists should usually rely on List:
List<List<String>> lists = new ArrayList<List<String>>();
lists.add(Arrays.asList("dog", "cat", "snake");
...
If the elements of the list have to me modifiable, then you could write
lists.add(new ArrayList<String>(Arrays.asList("dog", "cat", "snake"));
or, analogously, use LinkedList instead of ArrayList, but for the sketched use case, I can't imagine why there should be a strong reason to deliberately use LinkedList at all...
Add all the items in all lists to one single list, then sort it (Collections.sort). Then iterate through it and check for duplicates.
E.g.
ArrayList<String> list = new ArrayList<>();
list.addAll(list1); // Add the others as well
Collections.Sort(list);
for (String s : list) {
If (the item is the same as the previous item) {
return true;
}
}
Use retainAll()
for (final LinkedList<String> ll : lls)
{
list1.retainAll(ll);
}
System.out.println("list1 = " + list1);
LinkedList is not the best collection for duplicates detection. If you can, try to use HashSet, but if you can not do it you still can put all elements from list to set. Hashset contains elemnts without duplicates, so if there is a duplicated element in list size of hashset will contain less elements than all lists.
Assuming you want to use LinkedLists and aren't allowed convert to another data structure, what you could do is create a method that accepts a variable amount of LinkedLists. From there you want to grab all unique combinations of LinkedLists, and then compare all unique elements between those linked lists, if you find a common element mark that pair of linked lists as common. How you want to keep track of/return the data (set of linkedlist pairs that have an element in common for example) depends on what your output is supposed to look like, but that's the general structure of the code that i would use.

Exception with ListIterator in Java [duplicate]

Is it possible to add elements to a collection while iterating over it?
More specifically, I would like to iterate over a collection, and if an element satisfies a certain condition I want to add some other elements to the collection, and make sure that these added elements are iterated over as well. (I realise that this could lead to an unterminating loop, but I'm pretty sure it won't in my case.)
The Java Tutorial from Sun suggests this is not possible: "Note that Iterator.remove is the only safe way to modify a collection during iteration; the behavior is unspecified if the underlying collection is modified in any other way while the iteration is in progress."
So if I can't do what I want to do using iterators, what do you suggest I do?
How about building a Queue with the elements you want to iterate over; when you want to add elements, enqueue them at the end of the queue, and keep removing elements until the queue is empty. This is how a breadth-first search usually works.
There are two issues here:
The first issue is, adding to an Collection after an Iterator is returned. As mentioned, there is no defined behavior when the underlying Collection is modified, as noted in the documentation for Iterator.remove:
... The behavior of an iterator is
unspecified if the underlying
collection is modified while the
iteration is in progress in any way
other than by calling this method.
The second issue is, even if an Iterator could be obtained, and then return to the same element the Iterator was at, there is no guarantee about the order of the iteratation, as noted in the Collection.iterator method documentation:
... There are no guarantees concerning the
order in which the elements are
returned (unless this collection is an
instance of some class that provides a
guarantee).
For example, let's say we have the list [1, 2, 3, 4].
Let's say 5 was added when the Iterator was at 3, and somehow, we get an Iterator that can resume the iteration from 4. However, there is no guarentee that 5 will come after 4. The iteration order may be [5, 1, 2, 3, 4] -- then the iterator will still miss the element 5.
As there is no guarantee to the behavior, one cannot assume that things will happen in a certain way.
One alternative could be to have a separate Collection to which the newly created elements can be added to, and then iterating over those elements:
Collection<String> list = Arrays.asList(new String[]{"Hello", "World!"});
Collection<String> additionalList = new ArrayList<String>();
for (String s : list) {
// Found a need to add a new element to iterate over,
// so add it to another list that will be iterated later:
additionalList.add(s);
}
for (String s : additionalList) {
// Iterate over the elements that needs to be iterated over:
System.out.println(s);
}
Edit
Elaborating on Avi's answer, it is possible to queue up the elements that we want to iterate over into a queue, and remove the elements while the queue has elements. This will allow the "iteration" over the new elements in addition to the original elements.
Let's look at how it would work.
Conceptually, if we have the following elements in the queue:
[1, 2, 3, 4]
And, when we remove 1, we decide to add 42, the queue will be as the following:
[2, 3, 4, 42]
As the queue is a FIFO (first-in, first-out) data structure, this ordering is typical. (As noted in the documentation for the Queue interface, this is not a necessity of a Queue. Take the case of PriorityQueue which orders the elements by their natural ordering, so that's not FIFO.)
The following is an example using a LinkedList (which is a Queue) in order to go through all the elements along with additional elements added during the dequeing. Similar to the example above, the element 42 is added when the element 2 is removed:
Queue<Integer> queue = new LinkedList<Integer>();
queue.add(1);
queue.add(2);
queue.add(3);
queue.add(4);
while (!queue.isEmpty()) {
Integer i = queue.remove();
if (i == 2)
queue.add(42);
System.out.println(i);
}
The result is the following:
1
2
3
4
42
As hoped, the element 42 which was added when we hit 2 appeared.
You may also want to look at some of the more specialised types, like ListIterator, NavigableSet and (if you're interested in maps) NavigableMap.
Actually it is rather easy. Just think for the optimal way.
I beleive the optimal way is:
for (int i=0; i<list.size(); i++) {
Level obj = list.get(i);
//Here execute yr code that may add / or may not add new element(s)
//...
i=list.indexOf(obj);
}
The following example works perfectly in the most logical case - when you dont need to iterate the added new elements before the iteration element. About the added elements after the iteration element - there you might want not to iterate them either. In this case you should simply add/or extend yr object with a flag that will mark them not to iterate them.
Use ListIterator as follows:
List<String> l = new ArrayList<>();
l.add("Foo");
ListIterator<String> iter = l.listIterator(l.size());
while(iter.hasPrevious()){
String prev=iter.previous();
if(true /*You condition here*/){
iter.add("Bah");
iter.add("Etc");
}
}
The key is to iterate in reverse order - then the added elements appear on the next iteration.
I know its been quite old. But thought of its of any use to anyone else. Recently I came across this similar problem where I need a queue that is modifiable during iteration. I used listIterator to implement the same much in the same lines as of what Avi suggested -> Avi's Answer. See if this would suit for your need.
ModifyWhileIterateQueue.java
import java.util.ArrayList;
import java.util.List;
import java.util.ListIterator;
public class ModifyWhileIterateQueue<T> {
ListIterator<T> listIterator;
int frontIndex;
List<T> list;
public ModifyWhileIterateQueue() {
frontIndex = 0;
list = new ArrayList<T>();
listIterator = list.listIterator();
}
public boolean hasUnservicedItems () {
return frontIndex < list.size();
}
public T deQueue() {
if (frontIndex >= list.size()) {
return null;
}
return list.get(frontIndex++);
}
public void enQueue(T t) {
listIterator.add(t);
}
public List<T> getUnservicedItems() {
return list.subList(frontIndex, list.size());
}
public List<T> getAllItems() {
return list;
}
}
ModifyWhileIterateQueueTest.java
#Test
public final void testModifyWhileIterate() {
ModifyWhileIterateQueue<String> queue = new ModifyWhileIterateQueue<String>();
queue.enQueue("one");
queue.enQueue("two");
queue.enQueue("three");
for (int i=0; i< queue.getAllItems().size(); i++) {
if (i==1) {
queue.enQueue("four");
}
}
assertEquals(true, queue.hasUnservicedItems());
assertEquals ("[one, two, three, four]", ""+ queue.getUnservicedItems());
assertEquals ("[one, two, three, four]", ""+queue.getAllItems());
assertEquals("one", queue.deQueue());
}
Using iterators...no, I don't think so. You'll have to hack together something like this:
Collection< String > collection = new ArrayList< String >( Arrays.asList( "foo", "bar", "baz" ) );
int i = 0;
while ( i < collection.size() ) {
String curItem = collection.toArray( new String[ collection.size() ] )[ i ];
if ( curItem.equals( "foo" ) ) {
collection.add( "added-item-1" );
}
if ( curItem.equals( "added-item-1" ) ) {
collection.add( "added-item-2" );
}
i++;
}
System.out.println( collection );
Which yeilds:
[foo, bar, baz, added-item-1, added-item-2]
Besides the solution of using an additional list and calling addAll to insert the new items after the iteration (as e.g. the solution by user Nat), you can also use concurrent collections like the CopyOnWriteArrayList.
The "snapshot" style iterator method uses a reference to the state of the array at the point that the iterator was created. This array never changes during the lifetime of the iterator, so interference is impossible and the iterator is guaranteed not to throw ConcurrentModificationException.
With this special collection (usually used for concurrent access) it is possible to manipulate the underlying list while iterating over it. However, the iterator will not reflect the changes.
Is this better than the other solution? Probably not, I don't know the overhead introduced by the Copy-On-Write approach.
public static void main(String[] args)
{
// This array list simulates source of your candidates for processing
ArrayList<String> source = new ArrayList<String>();
// This is the list where you actually keep all unprocessed candidates
LinkedList<String> list = new LinkedList<String>();
// Here we add few elements into our simulated source of candidates
// just to have something to work with
source.add("first element");
source.add("second element");
source.add("third element");
source.add("fourth element");
source.add("The Fifth Element"); // aka Milla Jovovich
// Add first candidate for processing into our main list
list.addLast(source.get(0));
// This is just here so we don't have to have helper index variable
// to go through source elements
source.remove(0);
// We will do this until there are no more candidates for processing
while(!list.isEmpty())
{
// This is how we get next element for processing from our list
// of candidates. Here our candidate is String, in your case it
// will be whatever you work with.
String element = list.pollFirst();
// This is where we process the element, just print it out in this case
System.out.println(element);
// This is simulation of process of adding new candidates for processing
// into our list during this iteration.
if(source.size() > 0) // When simulated source of candidates dries out, we stop
{
// Here you will somehow get your new candidate for processing
// In this case we just get it from our simulation source of candidates.
String newCandidate = source.get(0);
// This is the way to add new elements to your list of candidates for processing
list.addLast(newCandidate);
// In this example we add one candidate per while loop iteration and
// zero candidates when source list dries out. In real life you may happen
// to add more than one candidate here:
// list.addLast(newCandidate2);
// list.addLast(newCandidate3);
// etc.
// This is here so we don't have to use helper index variable for iteration
// through source.
source.remove(0);
}
}
}
For examle we have two lists:
public static void main(String[] args) {
ArrayList a = new ArrayList(Arrays.asList(new String[]{"a1", "a2", "a3","a4", "a5"}));
ArrayList b = new ArrayList(Arrays.asList(new String[]{"b1", "b2", "b3","b4", "b5"}));
merge(a, b);
a.stream().map( x -> x + " ").forEach(System.out::print);
}
public static void merge(List a, List b){
for (Iterator itb = b.iterator(); itb.hasNext(); ){
for (ListIterator it = a.listIterator() ; it.hasNext() ; ){
it.next();
it.add(itb.next());
}
}
}
a1 b1 a2 b2 a3 b3 a4 b4 a5 b5
I prefer to process collections functionally rather than mutate them in place. That avoids this kind of problem altogether, as well as aliasing issues and other tricky sources of bugs.
So, I would implement it like:
List<Thing> expand(List<Thing> inputs) {
List<Thing> expanded = new ArrayList<Thing>();
for (Thing thing : inputs) {
expanded.add(thing);
if (needsSomeMoreThings(thing)) {
addMoreThingsTo(expanded);
}
}
return expanded;
}
IMHO the safer way would be to create a new collection, to iterate over your given collection, adding each element in the new collection, and adding extra elements as needed in the new collection as well, finally returning the new collection.
Given a list List<Object> which you want to iterate over, the easy-peasy way is:
while (!list.isEmpty()){
Object obj = list.get(0);
// do whatever you need to
// possibly list.add(new Object obj1);
list.remove(0);
}
So, you iterate through a list, always taking the first element and then removing it. This way you can append new elements to the list while iterating.
Forget about iterators, they don't work for adding, only for removing. My answer applies to lists only, so don't punish me for not solving the problem for collections. Stick to the basics:
List<ZeObj> myList = new ArrayList<ZeObj>();
// populate the list with whatever
........
int noItems = myList.size();
for (int i = 0; i < noItems; i++) {
ZeObj currItem = myList.get(i);
// when you want to add, simply add the new item at last and
// increment the stop condition
if (currItem.asksForMore()) {
myList.add(new ZeObj());
noItems++;
}
}
I tired ListIterator but it didn't help my case, where you have to use the list while adding to it. Here's what works for me:
Use LinkedList.
LinkedList<String> l = new LinkedList<String>();
l.addLast("A");
while(!l.isEmpty()){
String str = l.removeFirst();
if(/* Condition for adding new element*/)
l.addLast("<New Element>");
else
System.out.println(str);
}
This could give an exception or run into infinite loops. However, as you have mentioned
I'm pretty sure it won't in my case
checking corner cases in such code is your responsibility.
This is what I usually do, with collections like sets:
Set<T> adds = new HashSet<T>, dels = new HashSet<T>;
for ( T e: target )
if ( <has to be removed> ) dels.add ( e );
else if ( <has to be added> ) adds.add ( <new element> )
target.removeAll ( dels );
target.addAll ( adds );
This creates some extra-memory (the pointers for intermediate sets, but no duplicated elements happen) and extra-steps (iterating again over changes), however usually that's not a big deal and it might be better than working with an initial collection copy.
Even though we cannot add items to the same list during iteration, we can use Java 8's flatMap, to add new elements to a stream. This can be done on a condition. After this the added item can be processed.
Here is a Java example which shows how to add to the ongoing stream an object depending on a condition which is then processed with a condition:
List<Integer> intList = new ArrayList<>();
intList.add(1);
intList.add(2);
intList.add(3);
intList = intList.stream().flatMap(i -> {
if (i == 2) return Stream.of(i, i * 10); // condition for adding the extra items
return Stream.of(i);
}).map(i -> i + 1)
.collect(Collectors.toList());
System.out.println(intList);
The output of the toy example is:
[2, 3, 21, 4]
In general, it's not safe, though for some collections it may be. The obvious alternative is to use some kind of for loop. But you didn't say what collection you're using, so that may or may not be possible.

Maintain order in ArrayList after sorting and removing duplicates

Hello I'd like to add Strings to an ArrayList and then sort it to remove duplicates. The order should remain the same way I added those Strings though.
What I want: [randomtext, testtext, anothertext]
What I get: [anothertext, randomtext, testtext]
Is this possible or is there an easier way?
ArrayList<String> abc = new ArrayList();
abc.add("randomtext");
abc.add("testtext");
abc.add("anothertext");
abc.add("randomtext");
abc.add("testtext");
abc.add("anothertext");
abc.add("randomtext");
abc.add("testtext");
abc.add("anothertext");
Collections.sort(abc);
for (int i = 1; i < abc.size() ; i++)
{
if(abc.get(i) == abc.get(i-1))
{
abc.remove(i);
i -= 1;
}
}
System.out.print(abc);
The best way is to ensure you don't add duplicates whenever you add something to the list.
if(!myList.contains(item)){
myList.add(item);
}
If you are receiving a List from outside the scope of your method/class, then the easiest may be adding the contents to a LinkedHashSet to eliminate duplicates and then getting them back out. LinkedHashSet maintains order.
LinkedHashSet<String> set = new LinkedHashSet<>();
set.addAll(myList); // assuming myList is List<String>
myList.clear();
myList.addAll(set);
EDIT: My answer is based on your statements (bold by me for emphasis)
Hello I'd like to add Strings to an ArrayList and then sort it to remove duplicates. The order should remain the same way I added those Strings though.
So you're only sorting to remove duplicates. My answer avoids the sort and puts the burden on LinkedHashSet.
Try this (convert your List into LinkedHashSet)
Set<String> a = new LinkedHashSet<String>(abc);
System.out.println(a);

How to search an array for a part of string?

I have an arraylist<string> of words. I sort it using Collections.sort(wordsList);
I'm using this array for an auto-suggest drop down box, so that when the user is typing in a letter, they are given a list of suggestions similar to what they are typing in.
How do I go about searching this array for a prefix of string, say the user types in "mount" and the array contains the word "mountain", how can I search this array and return similar values.
Here's my code so far:
public List<Interface> returnSuggestedList(String prefix) {
String tempPrefix = prefix;
suggestedPhrases.clear();
//suggestedPhrases = new ArrayList<Interface>();
//Vector<String> list = new Vector<String>();
//List<Interface> interfaceList = new ArrayList<Interface>();
Collections.sort(wordsList);
System.out.println("Sorted Vector contains : " + wordsList);
int i = 0;
while (i != wordsList.size()) {
int index = Collections.binarySearch(wordsList, prefix);
String tempArrayString = wordsList.get(index).toString();
if (tempArrayString.toLowerCase().startsWith(prefix.toLowerCase())) {
ItemInterface itemInt = new Item(tempArrayString);
suggestedPhrases.add(itemInt);
System.out.println(suggestedPhrases.get(i).toString());
System.out.println("Element found at : " + index);
}
i++;
}
return suggestedPhrases;
}
The most basic approach would be
List<String> result = new ArrayList<String>();
for(String str: words){
if(str.contains(keyword){
result.add(str);
}
}
You can improve this version, if you only concern with startWith instead of contains then you can distribute words in a HashMap and you will have narrowed search
For this task, there are better data structures than a sorted array of strings. You might look e.g. at DAWG (Directed acyclic word graph).
If wordList is fixed (does not change from one method call to the other) you should sort it somewhere else, because sort is costly, and store it in lowercase.
In the rest of the method you would do something like:
List<String> selected = new ArrayList<String>();
for(String w:wordList){
if(w.startsWith(prefix.toLower())) // or .contains(), depending on
selected.add(w); // what you want exactly
}
return selected;
Also see the trie data structure. This question has useful info. I should think its getPrefixedBy() will be more efficient than anything you can roll by hand quickly.
Of course, this will work for prefix searches only. Contains search is a different beast altogether.
As #Jiri says you can use a DAWG, but if you don't want to go that far you can do some simple and useful things.
Make use of the sorting
If you want to sort the array of words do it previously. don't sort it each time
As it's sorted you can find the first and the last word in the list that are matches. The use list.subList(from, to) to return sublist. It's a little more optimal that adding each one.
Use a pre-sorted structure
Use a TreeSet<String> for storing the strings (the will be sorted internally).
Then use treeSet.subSet(from, true, to, false);
Where from is the prefix and to is the "prefix plus one char". By example if you're looking for abc, to must be abd. If you don't want to make that char transformation anyway you can ask for treeSet.headSet(from) and iterate over it until there are no more prefixes.
This is specially useful if you read more than you write. Maybe ordering strings is a little expensive but once ordered you can find them very fast (O(log n)).
Case insensitive comparing
You can provide a Comparator<String> to the tree set in order to indicate how it must order the strings. You cam implement it or maybe there are a prebuild case-insensitive comparator over there.
Anyway its code should be:
int compare(String a, String b) {
return a.toLowerCase().compareTo(b.toLowerCase());
}
Here is a similar example:
-> http://samuelsjoberg.com/archive/2009/10/autocompletion-in-swing

Categories

Resources