how can I optimize the following:
final String[] longStringArray = {"1","2","3".....,"9999999"};
String searchingFor = "9999998"
for(String s : longStringArray)
{
if(searchingFor.equals(s))
{
//After 9999998 iterations finally found it
// Do the rest of stuff here (not relevant to the string/array)
}
}
NOTE: The longStringArray is only searched once per runtime & is not sorted & is different every other time I run the program.
Im sure there is a way to improve the worst case performance here, but I cant seem to find it...
P.S. Also would appreciate a solution, where string searchingFor does not exist in the array longStringArray.
Thank you.
Well, if you have to use an array, and you don't know if it's sorted, and you're only going to do one lookup, it's always going to be an O(N) operation. There's nothing you can do about that, because any optimization step would be at least O(N) to start with - e.g. populating a set or sorting the array.
Other options though:
If the array is sorted, you could perform a binary search. This will turn each lookup into an O(log N) operation.
If you're going to do more than one search, consider using a HashSet<String>. This will turn each lookup into an O(1) operation (assuming few collisions).
import org.apache.commons.lang.ArrayUtils;
ArrayUtils.indexOf(array, string);
ArrayUtils documentation
You can create a second array with the hash codes of the string and binary search on that.
You will have to sort the hash array and move the elements of the original array accordingly. This way you will end up with extremely fast searching capabilities but it's going to be kept ordered, so inserting new elements takes resources.
The most optimal would be implementing a binary tree or a B-tree, if you have really so much data and you have to handle inserts it's worth it.
Arrays.asList(longStringArray).contains(searchingFor)
Related
I'm looking for the most efficient way to determine whether a specific value exists in a small (16 element) array of integers in Java. The array is unsorted.
Options:
A boring but reliable for loop
Sort the array then Arrays.binarySearch(arr, targetVal)
List.contains method - example Arrays.asList(arr).contains(targetVal)
Something else.
Option 3 must have some overhead in "converting" to a List but I could use a List throughout rather than an array if that would be better overall. I've no feel for how List performs speed wise.
Based on condition that the array is unsorted any search on it will have complexity O(n).
You can try use your second assumption. In that case you will have O(n*log(n)) + O(log(n))
But if you have such small array and you want to search only once better to use a simple loop. Because it hard to predict what time will be elapsed for conversion to List or what type of sorting algorithm will you use and etc.
Just a loop will be a good choice
FYI: Stream will not be efficient at your case.
I have some code, and I noticed that the progress of iterating through an ArrayList became drastically slower over time. The code that seems to be causing the problem is as below:
public boolean isWordOfficial(String word){
return this.wordList.get(this.stringWordList.indexOf(word)).isWordOfficial();
}
Is there something about this code I don't know in terms of accessing the two arraylists?
I don't exactly why, or by how much, your ArrayList performance is becoming too slow, but from a quick glance at your use case, you are doing the following operations:
given a String word, look it up in stringWordList, and return the numerical index
lookup the word in wordList contained at this index and return it
This pattern of usage would better be served by a Map, where the key would be the input word, possibly corresponding to an entry in stringWordList, and the output another word, from wordList.
A map lookup would be an O(1) operation, as compared to O(N) for the lookups in a list.
this.stringWordList.indexOf is O(N) and that is the cause of your issue. As N increases (you add words to the list) these operations take longer and longer.
To avoid this keep your list sorted and use binarySearch.
This takes your complexity from O(n) to O(log(N)).
In Java, I have an ArrayList with a list of objects. Each object has a date field that is just a long data type. The ArrayList is sorted by the date field. I want to insert a new object into the ArrayList so that it appears in the correct position with regard to its date. The only solution I can see is to iterate through all the items and compare the date field of the object being inserted to the objects being iterated on and then insert it once I reach the correct position. This will be a performance issue if I have to insert a lot of records.
What are some possible ways to improve this performance? Maybe an ArrayList is not the best solution?
I would say that you are correct in making the statement:
Maybe an ArrayList is not the best solution
Personally, I think that a tree structure would be better suited for this. Specifically Binary Search Tree, which is sorted on the object's date time. Once you have the tree created, you can use binary search which would take O(log n) time.
Whether or not binary search + O(n) insertion is bad for you depends on at least these things:
size of the list,
access pattern (mostly insert or mostly read),
space considerations (ArrayList is far more compact than the alternatives).
Given the existence of these factors and their quite complex interactions you should not switch over to a binary search tree-based solution until you find out how bad your current design is—through measurements. The switch might even make things worse for you.
I would consider using TreeSet and make your item Comparable. Then you get everything out of the box.
If this is not possible I would search for the index via Collections.binarySearch(...).
EDIT: Make sure performance is an issue before you start optimizing
first you should sort ArrayList Using:
ArrayList<Integer> arr = new ArrayList<>();
...
Collections.sort(arr);
Then Your Answer is:
int index = Collections.binarySearch(arr , 5);
More specifically, suppose I have an array with duplicates:
{3,2,3,4,2,2,1,4}
I want to have a data structure that supports search and remove the first occurrence of some value faster than O(n), say if the value is 4, then it becomes:
{3,2,3,2,2,1,4}
I also need to iterate the list from head according to the same order. Other operations like get(index) or insert are not needed.
You can use O(n) time to record the original data(say it's an int[]) in your data structure, I just need the later search and remove faster than O(n).
"Search and remove" is considered as ONE operation as shown above.
If I have to make it myself, I would use a LinkedList to store the data, and HashMap to map every key to a list of all occurrence of nodes together with their previous and next ones.
Is it a right approach? Are there any better choices already there in Java?
The data structure you describe, essentially a hybrid linked list and map, I think is the most efficient way of handling your stated problem. You'll have to keep track of the nodes yourself, since Java's LinkedList doesn't provide access to the actual nodes. The AbstractSequentialList may be helpful here.
The index structure you'll need is a map from an element value to the appearances of that element in the list. I recommend a hash table from hashCode % modulus to a linked list of (value, list of main-list nodes).
Note that this approach is still O(n) in the worst case, when you have universal hash collisions; this applies whether you use open or closed hashing. In the average case it should be something closer to O(ln(n)), but I'm not prepared to prove that.
Consider also whether the overhead of keeping track of all of this is really worth the gains. Unless you've actually profiled running code and determined that a LinkedList is causing problems because remove is O(n), stick with that until you do.
Since your requirement is that the first occurrence of the element should be removed and the remaining occurrences retained, there would be no way to do it faster than O(n) as you would definitely have to move through to the end of the list to find out if there is another occurrence. There is no standard api from Oracle in the java package that does this.
I need to efficiently find the ratio of (intersection size / union size) for pairs of Lists of strings. The lists are small (mostly about 3 to 10 items), but I have a huge number of them (~300K) and have to do this on every pair, so I need this actual computation to be as efficient as possible. The strings themselves are short unicode strings -- averaging around 5-10 unicode characters.
The accepted answer here Efficiently compute Intersection of two Sets in Java? looked extremely helpful but (likely because my sets are small (?)) I haven't gotten much improvement by using the approach suggested in the accepted answer.
Here's what I have so far:
protected double uuEdgeWeight(UVertex u1, UVertex u2) {
Set<String> u1Tokens = new HashSet<String>(u1.getTokenlist());
List<String> u2Tokens = u2.getTokenlist();
int intersection = 0;
int union = u1Tokens.size();
for (String s:u2Tokens) {
if (u1Tokens.contains(s)) {
intersection++;
} else {
union++;
}
}
return ((double) intersection / union);
My question is, is there anything I can do to improve this, given that I'm working with Strings which may be more time consuming to check equality than other data types.
I think because I'm comparing multiple u2's against the same u1, I could get some improvement by doing the cloning of u2 into a HashSet outside of the loop (which isn't shown -- meaning I'd pass in the HashSet instead of the object from which I could pull the list and then clone into a set)
Anything else I can do to squeak out even a small improvement here?
Thanks in advance!
Update
I've updated the numeric specifics of my problem above. Also, due to the nature of the data, most (90%?) of the intersections are going to be empty. My initial attempt at this used the clone the set and then retainAll the items in the other set approach to find the intersection, and then shortcuts out before doing the clone and addAll to find the union. That was about as efficient as the code posted above, presumably because of the trade of between it being a slower algorithm overall versus being able to shortcut out a lot of the time. So, I'm thinking about ways to take advantage of the infrequency of overlapping sets, and would appreciate any suggestions in that regard.
Thanks in advance!
You would get a large improvement by moving the HashSet outside of the loop.
If the HashSet really has only got a few entries in it then you are probably actually just as fast to use an Array - since traversing an array is much simpler/faster. I'm not sure where the threshold would lie but I'd measure both - and be sure that you do the measurements correctly. (i.e. warm up loops before timed loops, etc).
One thing to try might be using a sorted array for the things to compare against. Scan until you go past current and you can immediately abort the search. That will improve processor branch prediction and reduce the number of comparisons a bit.
If you want to optimize for this function (not sure if it actually works in your context) you could assign each unique String an Int value, when the String is added to the UVertex set that Int as a bit in a BitSet.
This function should then become a set.or(otherset) and a set.and(otherset). Depending on the number of unique Strings that could be efficient.