Binary search over a list of pairs - java

I need to find elem that would match element.
My program works but it is not efficient. I have a very large ArrayList<Obj> pairs (more than 4000 elements) and I use a binary search to find matching indexes.
public int search(String element) {
ArrayList<String> list = new ArrayList<String>();
for (int i = 0; i < pairs.size(); i++) {
list.add(pairs.get(i).getElem());
}
return index = Collections.binarySearch(list, element);
}
I wonder if there is a more efficient way than using a loop to copy half of the ArrayList pairs into a new ArrayList list.
Constructor for Obj: Obj x = new Obj(String elem, String word);

If your master list (pairs) does not change then I'd recommend creating a TreeMap to maintain reverse index structure, e.g.:
List<String> pairs = new ArrayList<String>(); //list containing 4000 entries
Map<String, Integer> indexMap = new TreeMap<>();
int index = 0;
for(String element : pairs){
indexMap.put(element, index++);
}
Now, while searching for an element, all you need to do is :
indexMap.get(element);
That will give you the required index or null if element doesn't exist. Also, if an element can be present in the list multiple times then, you can change the indexMap to be Map<String, List<Integer>>.
Your current algorithm iterates the list and calls the binary search, so complexity would be O(n) for iteration and O(log n) whereas TreeMap guarantees log(n) time cost so it will be much quicker.
Here's the documentation of TreeMap.

It looks like the problem is solved.
As my issue was that ArrayList pairs type was Obj and element type was String, I couldn't use Collections.binarySearch, I decided to create a new variable
Obj x = new Obj(element, "");. It looks like the string doesn't cause any issues (it passed my JUnit tests) as my compareTo method compares two elems and ignores the second variable of Obj x.
My updated method:
public int search(String element) {
Obj x = new Obj(element, "");
int index = Collections.binarySearch(pairs, x);

Related

How to check multiple contains operations faster?

I have a String list as below. I want to do some calculations based on if this list has multiple elements with same value.
I got nearly 120k elements and when I run this code it runs too slow. Is there any faster approach than contains method?
List<String> words= getWordsFromDB(); //words list has nearly 120k elements
List<String> tempWordsList = new LinkedList<String>(); //empty list
String[] keys = getKeysFromDB();
List<String> tempKeysList = new LinkedList<String>();
for (int x = 0; x < words.size(); x++) {
if (!tempWordsList.contains(words.get(x))) {
tempWordsList.add(words.get(x));
String key= keys[x];
tempKeysList.add(key);
} else {
int index = tempWordsList.indexOf(words.get(x));
String m = tempKeysList.get(index);
String n = keys[x];
if (!m.contains(n)) {
String newWord = m + ", " + n;
tempKeysList.set(index, newWord);
}
}
}
EDIT: words list comes from database and problem is there is a service continuously updating and inserting data to this table. I don't have any access to this service and there are other applications who is using the same table.
EDIT2: I have updated for full code.
You are searching the list twice per word: once for contains() and once for indexOf(). You could replace contains() by indexOf(), test the result for -1, otherwise reuse the result instead of calling indexOf() again. But you are certainly using the wrong data structure. What exactly do you need a for? Do you need a? I would use a HashSet, or a HashMap if you need to associate other data with each word.
//1) if you can avoid using linked list use below solution
List<String> words= getWordsFromDB(); //words list has nearly 120k elements
//if you can avoid using linked list, use set instead
Set<String> set=new HashSet<>();
for (String s:words) {
if (!set.add(s)) {
//do some calculations
}
}
//2) if you can't avoid using linked list use below code
List<String> words= getWordsFromDB(); //words list has nearly 120k elements
List<String> tempList = new LinkedList<String>(); //empty list
//if you can't avoid LinkedListv (tempList) you need to use a set
Set<String> set=new HashSet<>();
for (String s:words) {
if (set.add(s)) {
tempList.add(s);
} else {
int a = tempList.indexOf(s);
//do some calculations
}
}
LinkedList.get() runs in O(N) time. Either use ArrayList with O(1) lookup time, or avoid indexed lookups altogether by using an iterator:
for (String word : words) {
if (!tempList.contains(word)) {
tempList.add(word);
} else {
int firstIndex = tempList.indexOf(word);
//do some calculations
}
}
Disclaimer: The above was written under the questionable assumption that words is a LinkedList. I would still recommend the enhanced-for loop, since it's more conventional and its time complexity is not implementation-dependent. Either way, the suggestion below still stands.
You can further improve by replacing tempList with a HashMap. This will avoid the O(N) cost of contains() and indexOf():
Map<String, Integer> indexes = new HashMap<>();
int index = 0;
for (String word : words) {
Integer firstIndex = indexes.putIfAbsent(word, index++);
if (firstIndex != null) {
//do some calculations
}
}
Based on your latest update, it looks like you're trying to group "keys" by their corresponding "word". If so, you might give streams a spin:
List<String> words = getWordsFromDB();
String[] keys = getKeysFromDB();
Collection<String> groupedKeys = IntStream.range(0, words.size())
.boxed()
.collect(Collectors.groupingBy(
words::get,
LinkedHashMap::new, // if word order is significant
Collectors.mapping(
i -> keys[i],
Collectors.joining(", "))))
.values();
However, as mentioned in the comments, it would probably be best to move this logic into your database query.
Acutally, tempList use linear complexity time methods :
if (!tempList.contains(words.get(x))) {
and
int a = tempList.indexOf(words.get(x));
It means that at each invocation of them, the list is in average iterate at half.
Besides, these are redundant.
indexOf() only could be invoked :
for (int x = 0; x < words.size(); x++) {
int indexWord = tempList.indexOf(words.get(x));
if (indexWord != -1) {
tempList.add(words.get(x));
} else {
//do some calculations by using indexWord
}
}
But to improve all accesses, you should change your structure : wrapping or replacing LinkedList by LinkedHashSet.
LinkedHashSet would keep the actual behavior because as List, it defines the iteration ordering, which is the order in which elements were inserted into the set but it also uses hashing feature to improve time access to its elements.

Java: matching ArrayList strings to an iterator, and incrementing the integers of a different ArrayList at the same index

noob here, so sorry if I say anything dumb.
I'm comparing strings in an ArrayList to an iterator of strings in an iterator of Sets. When I find a match, I want to grab the index of matched string in the ArrayList and increment that same index in a different ArrayList of integers. I have something that looks (to me) like it should work, but after this code runs, my integer ArrayList contains mostly -1 with a few 2,1, and 0.
I'm interested in fixing my code first, but I'd also be interested different approaches, so here's the larger picture: I have a map where the keys are usernames in a social network, and the values are sets usernames of people they follow. I need to return a list of all usernames in descending order of followers. In the code below I'm only trying to make an ArrayList of strings (that contains ALL the usernames in the map) that correspond with a different ArrayList of integers like:
usernamesList ... numberOfFollowers
theRealJoe ... 7
javaNovice ... 3
FakeTinaFey ... 3
etc
Map<String, Set<String>> map = new HashMap<String, Set<String>>();
//edit: this map is populated. It's a parameter of the method I'm trying to write.
List<String> usernamesList = new ArrayList<String>();
//populate usernamesList with all strings in map
Iterator<Set<String>> setIter = map.values().iterator();
Iterator<String> strIter;
int strIterIndex = 0;
int w = 0;
List<Integer> numOfFollowers = new ArrayList<Integer>();
//initialize all elements to 0. not sure if necessary
for (int i = 0; i < usernamesList.size(); i++) {
numOfFollowers.add(0);
}
while (setIter.hasNext()) {
Set<String> currentSetIter = setIter.next();
strIter = currentSetIter.iterator();
while (strIter.hasNext()) {
String currentstrIter = strIter.next();
if (usernamesList.contains(currentstrIter)) {
strIterIndex = usernamesList.indexOf(currentstrIter);
numOfFollowers.set(strIterIndex, numOfFollowers.indexOf(strIterIndex) +1);
w++;
System.out.println("if statement has run " + w + " times." );
} else {
throw new RuntimeException("Should always return true. all usernames from guessFollowsGraph should be in usernamesList");
}
}
}
I think everyhing looks ok, except this one:
numOfFollowers.set(strIterIndex, numOfFollowers.indexOf(strIterIndex) +1);
When you do numOfFollowers.indexOf, you are looking for the index of an element that has a value strInterIndex. What you want, is the value (follower count) of an element with index strIterIndex:
numOfFollowers.set(strIterIndex, numOfFollowers.get(strIterIndex) +1);
I would also suggest using int[] (array) instead of a list of indices. It would be faster and more straightforward.
Oh, one more thing: correct the "fake" constructors please, they won't work since there is no "new" keyword after the assignment...

having problems with arraylist arrayList<int[]>

Now this the question am trying to answer:
Write a method which takes a sparse array as an argument and returns
a new equivalent dense array.The dense array only needs to be large enough to fit all of the values.For example,the resulting dense array only needs to hold 90 values if the last element in the sparse array is at index 89.
dense array:[3,8,4,7,9,0,5,0] the number are generated randomly.
sparse array is an arraylist of arrays [[0,3],[1,8],[2,4],[3,7],[4,9],[6,5]]
so in the sparse array if the number generated is !0 the value and its index are stored in array of size 2 but if the number generated is 0 nothing is stored
When you have a fixed size for element (as array) in your collection. Your solution is OK and that is a fast way.
But when your element does not have a fixed size, such as: [[1,2,3],[4,5],[6],[7,8,9,10,11]] so you can interator through your element:
for(int[] e : sparseArr)
{
for(int number : e)
{
tree.add(number);
}
}
No matter how many element in your sparseArr, no how long of your element>
To sort your element, I recommend you should use TreeSet<E>, element push into tree will be sorted automatically.
So if you just want to store 2 Integers paired together I recommend going with HashMaps. In your case you would use:
HashMap<Integer, Integer> map = new HashMap<Integer, Integer>();
HashMaps support .containsKey(key); as well as .containsValue(value);
If you want to check all entries you can transform the Map to an entrySet:
for(Entry<Integer, Integer> e : map.entrySet()) {
int one = e.getKey();
int two = e.getValue();
}
Unless you want to do something more special than just storing 2 paired Integers I really can recommend doing it this way!
The method you're after should do something like this
public int[] sparseToDense (ArrayList<int[]> sparse) {
int i = 0;
int[] dense = new int[sparse.get(sparse.size()-1)[0]];
int[] sp;
ListIterator<int[]> iter = sparse.listIterator();
while (iter.hasNext()) {
sp = iter.next();
while (sp[0] != i) {
dense[i++] = 0;
}
dense[i++] = sp[1];
}
return dense;
}
Just another way to do that, since you have java 8, you will be able to use stream. But if you're a beginner, i recommend you to try with for loops and arrays, will be better for your learning.
public static ArrayList<Integer> returnDense(ArrayList<int[]> sparse) {
return sparse.stream().flatMap(p -> IntStream.of(p).boxed())
.collect(Collectors.toCollection(ArrayList::new));
}
also if you decide change int[] to Integer[].
public ArrayList<Integer> returnDense(ArrayList<Integer[]> sparse) {
return sparse.stream().flatMap(p -> Arrays.asList(p).stream()).filter(Objects::nonNull)
.collect(Collectors.toCollection(ArrayList::new));
}
.filter(Objects::nonNull) is to be sure that will not have nulls values, but if you know that will not have it, that isn't necessary.

Fastest way to find substring in JAVA

lets say i have list of names.
ArrayList<String> nameslist = new ArrayList<String>();
nameslist.add("jon");
nameslist.add("david");
nameslist.add("davis");
nameslist.add("jonson");
and this list contains few thousands nameslist in it. What is the fastes way to know that this list contains names start with given name.
String name = "jon"
result should be 2.
I have tried with comparing every element of list with substring function (it works but) it is very slow specially when list is huge.
Thanks is advance.
You could use a TreeSet for O(log n) access and write something like:
TreeSet<String> set = new TreeSet<String>();
set.add("jon");
set.add("david");
set.add("davis");
set.add("jonson");
set.add("henry");
Set<String> subset = set.tailSet("jon");
int count = 0;
for (String s : subset) {
if (s.startsWith("jon")) count++;
else break;
}
System.out.println("count = " + count);
which prints 2 as you expect.
Alternatively, you could use Set<String> subset = set.subSet("jon", "joo"); to return the full list of al names that start with "jon", but you need to give the first invalid entry that follows the jons (in this case: "joo").
Have a look at Trie. It's a data structure aimed to perform fast searches according to word prefixes. You may need to manipulate it a bit in order to get the number of leafs in the subtree, but in any case you do not traverse the entire list.
The complexity of searching in ArrayList (or linear array) is O(n), where n is number of elements in array.
For best performance you can see Trie
Iterate on the ArrayList, for each element, check if it begins with jon. Time complexity is O(n).
What exactly does "very slow" mean?
Really the only way to do this is to loop through the list and check every element:
int count = 0;
for (String name : nameslist) {
if (name.startsWith("jon")) {
count++;
}
}
System.out.println("Found: " + count);
If your strings in list are not too long you can use this cheat: store in HashSet all prefixes and your complexity will be ~O(1):
// Preprocessing
List<String> list = Arrays.asList("hello", "world"); // Your list
Set<String> set = new HashSet<>()
for(String s: list) {
for (int i = 1; i <= s.length; i++) {
set.add(s.substring(0, i));
}
}
// Now you want to test
assert true == set.contains("wor")
If it is not, you can use any full text search engine like Apache Lucene
I'd suggest you to create a Runnable for processing the list elements. Then you create an ExecutorService with fixed pool size, which processes the elements concurrently.
Rough example:
ExecutorService executor = Executors.newFixedThreadPool(5);
for (String str : coll){
Runnable r = new StringProcessor(str);
executor.execute(r);
}
I suggest TreeSet.
similar way access every element and increment count. alogorithm wise you can improve performance.
int count = 0;
iter = list.iterator();
String name;
while(iter.hasNext()) {
name = iter.next();
if (name.startsWith("jon")) {
count++;
}
if(name.startsWith("k")) break;
}
This break eliminates the checking of rest of string comparisons.
You can consider Boyer–Moore string search algorithm.
complexity O(n+m) worst case.
You need to iterate each name and find the name within it.
String name = "jon";
int count=0;
for(String n:nameslist){
if(n.contains(name){
count++;
}
}

Why do I get an UnsupportedOperationException when trying to remove an element from a List?

I have this code:
public static String SelectRandomFromTemplate(String template,int count) {
String[] split = template.split("|");
List<String> list=Arrays.asList(split);
Random r = new Random();
while( list.size() > count ) {
list.remove(r.nextInt(list.size()));
}
return StringUtils.join(list, ", ");
}
I get this:
06-03 15:05:29.614: ERROR/AndroidRuntime(7737): java.lang.UnsupportedOperationException
06-03 15:05:29.614: ERROR/AndroidRuntime(7737): at java.util.AbstractList.remove(AbstractList.java:645)
How would be this the correct way? Java.15
Quite a few problems with your code:
On Arrays.asList returning a fixed-size list
From the API:
Arrays.asList: Returns a fixed-size list backed by the specified array.
You can't add to it; you can't remove from it. You can't structurally modify the List.
Fix
Create a LinkedList, which supports faster remove.
List<String> list = new LinkedList<String>(Arrays.asList(split));
On split taking regex
From the API:
String.split(String regex): Splits this string around matches of the given regular expression.
| is a regex metacharacter; if you want to split on a literal |, you must escape it to \|, which as a Java string literal is "\\|".
Fix:
template.split("\\|")
On better algorithm
Instead of calling remove one at a time with random indices, it's better to generate enough random numbers in the range, and then traversing the List once with a listIterator(), calling remove() at appropriate indices. There are questions on stackoverflow on how to generate random but distinct numbers in a given range.
With this, your algorithm would be O(N).
This one has burned me many times. Arrays.asList creates an unmodifiable list.
From the Javadoc: Returns a fixed-size list backed by the specified array.
Create a new list with the same content:
newList.addAll(Arrays.asList(newArray));
This will create a little extra garbage, but you will be able to mutate it.
Probably because you're working with unmodifiable wrapper.
Change this line:
List<String> list = Arrays.asList(split);
to this line:
List<String> list = new LinkedList<>(Arrays.asList(split));
The list returned by Arrays.asList() might be immutable. Could you try
List<String> list = new ArrayList<>(Arrays.asList(split));
I think that replacing:
List<String> list = Arrays.asList(split);
with
List<String> list = new ArrayList<String>(Arrays.asList(split));
resolves the problem.
Just read the JavaDoc for the asList method:
Returns a {#code List} of the objects
in the specified array. The size of
the {#code List} cannot be modified,
i.e. adding and removing are
unsupported, but the elements can be
set. Setting an element modifies the
underlying array.
This is from Java 6 but it looks like it is the same for the android java.
EDIT
The type of the resulting list is Arrays.ArrayList, which is a private class inside Arrays.class. Practically speaking, it is nothing but a List-view on the array that you've passed with Arrays.asList. With a consequence: if you change the array, the list is changed too. And because an array is not resizeable, remove and add operation must be unsupported.
The issue is you're creating a List using Arrays.asList() method with fixed Length
meaning that
Since the returned List is a fixed-size List, we can’t add/remove elements.
See the below block of code that I am using
This iteration will give an Exception Since it is an iteration list Created by asList() so remove and add are not possible, it is a fixed array
List<String> words = Arrays.asList("pen", "pencil", "sky", "blue", "sky", "dog");
for (String word : words) {
if ("sky".equals(word)) {
words.remove(word);
}
}
This will work fine since we are taking a new ArrayList we can perform modifications while iterating
List<String> words1 = new ArrayList<String>(Arrays.asList("pen", "pencil", "sky", "blue", "sky", "dog"));
for (String word : words) {
if ("sky".equals(word)) {
words.remove(word);
}
}
Arrays.asList() returns a list that doesn't allow operations affecting its size (note that this is not the same as "unmodifiable").
You could do new ArrayList<String>(Arrays.asList(split)); to create a real copy, but seeing what you are trying to do, here is an additional suggestion (you have a O(n^2) algorithm right below that).
You want to remove list.size() - count (lets call this k) random elements from the list. Just pick as many random elements and swap them to the end k positions of the list, then delete that whole range (e.g. using subList() and clear() on that). That would turn it to a lean and mean O(n) algorithm (O(k) is more precise).
Update: As noted below, this algorithm only makes sense if the elements are unordered, e.g. if the List represents a Bag. If, on the other hand, the List has a meaningful order, this algorithm would not preserve it (polygenelubricants' algorithm instead would).
Update 2: So in retrospect, a better (linear, maintaining order, but with O(n) random numbers) algorithm would be something like this:
LinkedList<String> elements = ...; //to avoid the slow ArrayList.remove()
int k = elements.size() - count; //elements to select/delete
int remaining = elements.size(); //elements remaining to be iterated
for (Iterator i = elements.iterator(); k > 0 && i.hasNext(); remaining--) {
i.next();
if (random.nextInt(remaining) < k) {
//or (random.nextDouble() < (double)k/remaining)
i.remove();
k--;
}
}
This UnsupportedOperationException comes when you try to perform some operation on collection where its not allowed and in your case, When you call Arrays.asList it does not return a java.util.ArrayList. It returns a java.util.Arrays$ArrayList which is an immutable list. You cannot add to it and you cannot remove from it.
I've got another solution for that problem:
List<String> list = Arrays.asList(split);
List<String> newList = new ArrayList<>(list);
work on newList ;)
Replace
List<String> list=Arrays.asList(split);
to
List<String> list = New ArrayList<>();
list.addAll(Arrays.asList(split));
or
List<String> list = new ArrayList<>(Arrays.asList(split));
or
List<String> list = new ArrayList<String>(Arrays.asList(split));
or (Better for Remove elements)
List<String> list = new LinkedList<>(Arrays.asList(split));
Yes, on Arrays.asList, returning a fixed-size list.
Other than using a linked list, simply use addAll method list.
Example:
String idList = "123,222,333,444";
List<String> parentRecepeIdList = new ArrayList<String>();
parentRecepeIdList.addAll(Arrays.asList(idList.split(",")));
parentRecepeIdList.add("555");
You can't remove, nor can you add to a fixed-size-list of Arrays.
But you can create your sublist from that list.
list = list.subList(0, list.size() - (list.size() - count));
public static String SelectRandomFromTemplate(String template, int count) {
String[] split = template.split("\\|");
List<String> list = Arrays.asList(split);
Random r = new Random();
while( list.size() > count ) {
list = list.subList(0, list.size() - (list.size() - count));
}
return StringUtils.join(list, ", ");
}
*Other way is
ArrayList<String> al = new ArrayList<String>(Arrays.asList(template));
this will create ArrayList which is not fixed size like Arrays.asList
Arrays.asList() uses fixed size array internally.
You can't dynamically add or remove from thisArrays.asList()
Use this
Arraylist<String> narraylist=new ArrayList(Arrays.asList());
In narraylist you can easily add or remove items.
Arraylist narraylist=Arrays.asList(); // Returns immutable arraylist
To make it mutable solution would be:
Arraylist narraylist=new ArrayList(Arrays.asList());
Following is snippet of code from Arrays
public static <T> List<T> asList(T... a) {
return new ArrayList<>(a);
}
/**
* #serial include
*/
private static class ArrayList<E> extends AbstractList<E>
implements RandomAccess, java.io.Serializable
{
private static final long serialVersionUID = -2764017481108945198L;
private final E[] a;
so what happens is that when asList method is called then it returns list of its own private static class version which does not override add funcion from AbstractList to store element in array. So by default add method in abstract list throws exception.
So it is not regular array list.
Creating a new list and populating valid values in new list worked for me.
Code throwing error -
List<String> list = new ArrayList<>();
for (String s: list) {
if(s is null or blank) {
list.remove(s);
}
}
desiredObject.setValue(list);
After fix -
List<String> list = new ArrayList<>();
List<String> newList= new ArrayList<>();
for (String s: list) {
if(s is null or blank) {
continue;
}
newList.add(s);
}
desiredObject.setValue(newList);

Categories

Resources