Get index of element of an array java with hashCode

Get index of element of an array java with hashCode - java

I have a String array that contains a lot of words. I wish to get the index of a word contained in the array (-1 if it is not contained).
I have first made a loop to search through all elements in the array while incrementing a variable and when I find it, I return the variable's value.
However the array can be very very very big so searching through all elements is extremely slow. I have decided that before adding a new word in my string array, I would use hashCode() % arrayLength to get the index of where I should put it. Then, to get the index back, I would just reuse hashCode() % arrayLength to instantly know at what index it is.
The problem is that sometimes there are "clashes", and two elements can have the same index in the array.
Anyone has an idea how to deal with it? Or any other alternatives to get the index of an element faster?

You are trying to implement Open Addressing using an array. Unless this is a homework exercise, Java standard library already has classes to solve the search and collision problem.
You probably want to use a HashSet to check if the String exists. Behind the scene it's using HashMap which implements Separate Chaining to resolve conflicts.
String[] words = { "a" };
Set<String> set = new HashSet<>(Arrays.asList(words));
return set.contains("My Word") ? 1 : -1;

The technique you are referring to is one of the implementations of hash tables in general. It is called Linear Probing which is a form of a general technique called Open Addressing. If you have calculated the index of the word based on the hashCode() % array.lengthand find a conflict (non-empty element or not the element you are looking for); then you have three ways to perform the conflict resolution:
Linear Search
This is done by incrementing the position and check if it is empty or has the element you are looking for. That is, your second position will be (hashCode(input) + 2) % array.length and then (hashCode(input) + 3) % array.length and so on. The problem with this approach is that your insertion or lookup performance will degrade to linear O(n) if the array is close to completely populated.
Quadratic Search
This is just an optimization to the above technique by jumping qudratically if you find a clash. So, your second index will be (hashCode(input) + 2*2) % array.length and then (hashCode(input) + 3*3) % array.length and so on which helps getting to the right location faster.
Double Hashing
This is even a more efficient approach to handle resolution by introducing another hashing function hashCode2() which you use in conjunction to the first one. In that case, your next search index will be (hashCode(input) + 2*hashCode2(input)) % array.length and then (hashCode(input) + 3*hashCode2(input)) % array.length and so on.
The more randomly distributed your jumps are, the better performance it gets over large hash tables
Hope this helps.

Related

Is it better to make temporary copies of elements in an ArrayLists to reduce computing time and complexity?

I've got an ArrayList consiting of objects card, which have the attributes (int) value, (String) symbol and (String) image and to them belonging getters.
ArrayList<card> cardDeck = new ArrayList<>();
cardDeck has self-explanatory 52 elements, each a card object.
Now, if I want to print out all the cards in cardDeck there are two simple solutions:
First solution:
for(int i = 0; i < cardDeck.size(); i++){
System.out.println((i + 1) + ". card:");
System.out.println(cardDeck.get(i).getValue());
System.out.println(cardDeck.get(i).getSymbol());
System.out.println(cardDeck.get(i).getImage());
}
Second solution:
for(int i = 0; i < cardDeck.size(); i++){
card temp = cardDeck.get(i);
System.out.println((i + 1) + ". card:");
System.out.println(temp.getValue());
System.out.println(temp.getSymbol());
System.out.println(temp.getImage());
}
My question is, if there are any noticeable difference in either execution time or complexity.
On the first thought, in the first solution the program would have to look up the card in the ArrayList first every time, before being able to print its info. Which isn't the case in the second solution, as a temporary copy was made.
On second thought though, even in the second solution, the program would still need to look up the info of the temporary card object with every call.
Any help / ideas / advice appreciated!

So we have
3 array lookups (with the same index and no modification on the array, so the compiler MAY optimize it) in the first solution
error prone code in the first solution (what happens if you need to change the index to i+1 and forget to correct the code in all 3 places)
versus:
1 array lookup in the second solution - optimized without relying on the compiler
better readable code in the second solution (if you replace temp by card, which you can do if you properly start the class name uppercase: Card card)
Array lookups are not that cheap in Java - the arrays are guarded (bounds checks) to prevent buffer overflow injection vulnerabilities.
So you have two very good reasons that tell you to go with the second solution.

Using Java 8,
cardDeck.parallelStream().forEach(card -> {System.out.println(card.getValue());System.out.println(card.getSymbol());System.out.println(card.getImage());});
This does not guarantee better performance, this depends on the number of CPU cores available.

Peter has already said what would be the better idea from the programming perspective.
I want to add that the OP asked about complexity. I interpret that in the sense of asymptotical time required in relation to the size of the card deck.
The answer is that from a theoretical complexity perspective, both approaches are the same. Each array lookup adds a constant factor to required time. It's both O(n) with n being the number of cards.
On another note, the OP asked about copying elements of the list. Just to make it clear: The statement card temp = cardDeck.get(i) does not cause the ith list element to be copied. The temp variable now just points to the element that is located at the ith position of cardDeck at the time of running the loop.

First, you have other solutions for example, using for eatch loop or using forEatch method with lambda expressions.
And about speed, you don't have to worry about speed until your program runs in regular computers and you don't have to deal with weak or low processors, but in your case, you can make your app less complex with using functional programming e.g,
cardDec.forEatch(card) -> {
System.out.println(card.getValue());
System.out.println(card.getSymbol());
System.out.println(card.getImage());
};

Reduce the time complexity of the code to find duplicates in an Array from N*N

I was recently asked in an interview to write code to determine if an array of integers contains duplicates simple as it was i confidently told him that i would iterate the elements and add each one to an new array if the array doesn't contain the element already if it does i return true else return false complexity
the code would be like this
//complexity is N*N
public static boolean findIfArrayHasDuplicates(int[] array){
int[] newArr = new int[array.length];
List<Integer> intList = new ArrayList<Integer>();
for (int var: array){
if(intList.contains(var)){
return true;
}else{
intList.add(var);
}
}
return false;
}
}
He asked me to calculate the time complexity of the code i have written
I answered
N for iteration of the loop
N(N+1)/2 for finding if the element exists in in new list
N for adding the elements in the list
total N +N + N*N/2 + N/2 in O() notation multiplying by 2 and simplifying as N tends to infinity this can be simplified O(N^2)
he went on to ask me if there is any better way i answered add the elements to a set and compare the size if the size of set is less than than of the array it contains duplicates , asked what is the complexity of it guess what its still O(N^2) because of the code that adds elements to the set will have to first see if its in the set already .
How can I reduce the complexity from O(N^2) using as much as memory needed.
Any ideas how this can be done ?

he went on to ask me if there is any better way i answered add the elements to a set and compare the size if the size of set is less than than of the array it contains duplicates , asked what is the complexity of it guess what its still NN because of the code that adds elements to the set will have to first see if its in the set already
That's wrong. If you are adding the elements to a HashSet, it takes expected O(1) time to add each element (which includes checking if the element is already present), since all you have to do is compute the hashCode to locate the bin that may contain the element (takes constant time), and then search the elements stored in that bin (which also takes expected constant time, assuming the average number of elements in each bin is bound by a constant).
Therefore the total running time is O(N), and there's nothing to improve (you can't find duplicates in less than O(N)).

I think it would be useful if you would look at basic working mechanisms of a HashSet. A HashSet is internally an array, but data accesses, such as "checking if an element exists", or "adding/deleting an element", is of time complexity O(1), because it takes on a mapping mechanism to map an object to the index storing it. For example, if you have a HashSet and an integer and you do a hashSet.contains(integer), the program will first take the integer and calculate the hash code of it, and then use the mapping mechanism (differs from implementation to implementation) to find the index storing it. Say we have a hash code of 4, and we map to an index of 4 using the simplest mapping mechanism, then we will check if the 4th element of the underlying array is empty. If it is, hashSet.contains(integer) will return true, otherwise false.

The complexity of code provided is O(N^2). However, the code given follows is complexity O(N). It uses HashSet, which required O(1) operation to insert and search.
public static boolean findIfArrayHasDuplicates(int[] array){
HashSet<Integer> set = new HashSet<Integer>();
set.add(array[0]);
for (int index = 1; index < array.length; index++) {
if(!set.add(array[index]))
return true;
}
return false;
}

ImmutableCollections SetN implementation detail

I have sort of a hard time understanding an implementation detail from java-9 ImmutableCollections.SetN; specifically why is there a need to increase the inner array twice.
Suppose you do this:
Set.of(1,2,3,4) // 4 elements, but internal array is 8
More exactly I perfectly understand why this is done (a double expansion) in case of a HashMap - where you never (almost) want the load_factor to be one. A value of !=1 improves search time as entries are better dispersed to buckets for example.
But in case of an immutable Set - I can't really tell. Especially since the way an index of the internal array is chosen.
Let me provide some details. First how the index is searched:
int idx = Math.floorMod(pe.hashCode() ^ SALT, elements.length);
pe is the actual value we put in the set. SALT is just 32 bits generated at start-up, once per JVM (this is the actual randomization if you want). elements.length for our example is 8 (4 elements, but 8 here - double the size).
This expression is like a negative-safe modulo operation. Notice that the same logical thing is done in HashMap for example ((n - 1) & hash) when the bucket is chosen.
So if elements.length is 8 for our case, then this expression will return any positive value that is less than 8 (0, 1, 2, 3, 4, 5, 6, 7).
Now the rest of the method:
while (true) {
E ee = elements[idx];
if (ee == null) {
return -idx - 1;
} else if (pe.equals(ee)) {
return idx;
} else if (++idx == elements.length) {
idx = 0;
}
}
Let's break it down:
if (ee == null) {
return -idx - 1;
This is good, it means that the current slot in the array is empty - we can put our value there.
} else if (pe.equals(ee)) {
return idx;
This is bad - slot is occupied and the already in place entry is equal to the one we want to put. Sets can't have duplicate elements - so an Exception is later thrown.
else if (++idx == elements.length) {
idx = 0;
}
This means that this slot is occupied (hash collision), but elements are not equal. In a HashMap this entry would be put to the same bucket as a LinkedNode or TreeNode - but not the case here.
So index is incremented and the next position is tried (with the small caveat that it moves in a circular way when it reaches the last position).
And here is the question: if nothing too fancy (unless I'm missing something) is being done when searching the index, why is there a need to have an array twice as big? Or why the function was not written like this:
int idx = Math.floorMod(pe.hashCode() ^ SALT, input.length);
// notice the diff elements.length (8) and not input.length (4)

The current implementation of SetN is a fairly simple closed hashing scheme, as opposed to the separate chaining approach used by HashMap. ("Closed hashing" is also confusingly known as "open addressing".) In a closed hashing scheme, elements are stored in the table itself, instead of being stored in a list or tree of elements that are linked from each table slot, which is separate chaining.
This implies that if two different elements hash to the same table slot, this collision needs to be resolved by finding another slot for one of the elements. The current SetN implementation resolves this using linear probing, where the table slots are checked sequentially (wrapping around at the end) until an open slot is found.
If you want to store N elements, they'll certainly fit into a table of size N. You can always find any element that's in the set, though you might have to probe several (or many) successive table slots to find it, because there will be lots of collisions. But if the set is probed for an object that's not a member, linear probing will have to check every table slot before it can determine that object isn't a member. With a full table, most probe operations will degrade to O(N) time, whereas the goal of most hash-based approaches is for operations to be O(1) time.
Thus we have a class space-time tradeoff. If we make the table larger, there will be empty slots sprinkled throughout the table. When storing items, there should be fewer collisions, and linear probing will find empty slots more quickly. The clusters of full slots next to each other will be smaller. Probes for non-members will proceed more quickly, since they're more likely to encounter an empty slot sooner while probing linearly -- possibly after not having to reprobe at all.
In bringing up the implementation, we ran a bunch of benchmarks using different expansion factors. (I used the term EXPAND_FACTOR in the code whereas most literature uses load factor. The reason is that the expansion factor is the reciprocal of the load factor, as used in HashMap, and using "load factor" for both meanings would be confusing.) When the expansion factor was near 1.0, the probe performance was quite slow, as expected. It improved considerably as the expansion factor was increased. The improvement was really flattening out by the time it got up to 3.0 or 4.0. We chose 2.0 since it got most of the performance improvement (close to O(1) time) while providing good space savings compared to HashSet. (Sorry, we haven't published these benchmark numbers anywhere.)
Of course, all of these are implementation specifics and may change from one release to the next, as we find better ways to optimize the system. I'm certain there are ways to improve the current implementation. (And fortunately we don't have to worry about preserving iteration order when we do this.)
A good discussion of open addressing and performance tradeoffs with load factors can be found in section 3.4 of
Sedgewick, Robert and Kevin Wayne. Algorithms, Fourth Edition. Addison-Wesley, 2011.
The online book site is here but note that the print edition has much more detail.

How to find two number whose sum is given number in sorted array in O(n)?

public static void findNumber(int number) {
int[] soretedArray = { 1, 5, 6, 8, 9 };
for (int i = 0; i <= soretedArray.length; i++) {
for (int j = i + 1; j < soretedArray.length; j++) {
if (soretedArray[i] + soretedArray[j] == number) {
System.out.println(soretedArray[i] + "::" + soretedArray[j]);
return;
}
}
}
}
Using this code I am able to find the number and its complexity is O(N^2) but I have to find this using O(N) complexity i.e using only one for loop or hash-map or similar in Java.

I remember, I was watching the official Google video about this problem. Although it is not demonstrated in java, it is explained step-by-step in different variations of the problem. You should definitely check it:
How to: Work at Google — Example Coding/Engineering Interview

As explained in the Google video that Alexander G is linking to, use two array indexes. Initialize one to the first element (0) and the other to the last element (sortedArray.length - 1). In a loop, check the sum of the two elements at the two indexes. If the sum is the number you were looking for, you’re done. If it’s too high, you need to find a smaller number at one of the indexes; move the right index one step to the left (since the array is sorted, this is the right way). If on the other hand, the sum you got was too low, move the left index to the right to obtain a higher first addend. When the two indexes meet, if you still haven’t found the sum you were looking for, there isn’t any. At this point you have been n - 1 times through the loop, so the algorithm runs in O(n).
We ought to first check the precondition, that the array is really sorted. This too can be done in O(n), so doing it doesn’t break any requirements.
The algorithm may need refinement if you are required to find all possible pairs of numbers that yield the desired sum rather than just one pair.
Is this answer superfluous when the video link has already said it? For one thing, my explanation is shorter, so if it suffices, you’re fine. Most importantly, if the video is removed or just moved to another URL, my answer will still be here.

With fixed number, for any chosen x in the array you just have to find if number-x is in the array (Note that you can also bound x). This will not give you O(n), but O(n.log(n)).
Maybe by remarking that if you have a_i and a_j (j>i), taking the sum and comparing against number, if the result is greater next interesting tests are with a_(i-1) or a_(j-1), and if result is lower next interesting tests are with a_(i+1) or a_(j+1), will give hint to linear-time?

Best way to write this program

I have a general programming question, that I have happened to use Java to answer. This is the question:
Given an array of ints write a program to find out how many numbers that are not unique are in the array. (e.g. in {2,3,2,5,6,1,3} 2 numbers (2 and 3) are not unique). How many operations does your program perform (in O notation)?
This is my solution.
int counter = 0;
for(int i=0;i<theArray.length-1;i++){
for(int j=i+1;j<theArray.length;j++){
if(theArray[i]==theArray[j]){
counter++;
break; //go to next i since we know it isn't unique we dont need to keep comparing it.
}
}
}
return counter:
Now, In my code every element is being compared with every other element so there are about n(n-1)/2 operations. Giving O(n^2). Please tell me if you think my code is incorrect/inefficient or my O expression is wrong.

Why not use a Map as in the following example:
// NOTE! I assume that elements of theArray are Integers, not primitives like ints
// You'll nee to cast things to Integers if they are ints to put them in a Map as
// Maps can't take primitives as keys or values
Map<Integer, Integer> elementCount = new HashMap<Integer, Integer>();
for (int i = 0; i < theArray.length; i++) {
if (elementCount.containsKey(theArray[i]) {
elementCount.put(theArray[i], new Integer(elementCount.get(theArray[i]) + 1));
} else {
elementCount.put(theArray[i], new Integer(1));
}
}
List<Integer> moreThanOne = new ArrayList<Integer>();
for (Integer key : elementCount.keySet()) { // method may be getKeySet(), can't remember
if (elementCount.get(key) > 1) {
moreThanOne.add(key);
}
}
// do whatever you want with the moreThanOne list
Notice that this method requires iterating through the list twice (I'm sure there's a way to do it iterating once). It iterates once through theArray, and then implicitly again as it iterates through the key set of elementCount, which if no two elements are the same, will be exactly as large. However, iterating through the same list twice serially is still O(n) instead of O(n^2), and thus has much better asymptotic running time.

Your code doesn't do what you want. If you run it using the array {2, 2, 2, 2}, you'll find that it returns 2 instead of 1. You'll have to find a way to make sure that the counting is never repeated.
However, your Big O expression is correct as a worst-case analysis, since every element might be compared with every other element.

Your analysis is correct but you could easily get it down to O(n) time. Try using a HashMap<Integer,Integer> to store previously-seen values as you iterate through the array (key is the number that you've seen, value is the number of times you've seen it). Each time you try to add an integer into the hashmap, check to see if it's already there. If it is, just increment that integers counter. Then, at the end, loop through the map and count the number of times you see a key with a corresponding value higher than 1.

First, your approach is what I would call "brute force", and it is indeed O(n^2) in the worst case. It's also incorrectly implemented, since numbers that repeat n times are counted n-1 times.
Setting that aside, there are a number of ways to approach the problem. The first (that a number of answers have suggested) is to iterate the array, and using a map to keep track of how many times the given element has been seen. Assuming the map uses a hash table for the underlying storage, the average-case complexity should be O(n), since gets and inserts from the map should be O(1) on average, and you only need to iterate the list and map once each. Note that this is still O(n^2) in the worst case, since there's no guarantee that the hashing will produce contant-time results.
Another approach would be to simply sort the array first, and then iterate the sorted array looking for duplicates. This approach is entirely dependent on the sort method chosen, and can be anywhere from O(n^2) (for a naive bubble sort) to O(n log n) worst case (for a merge sort) to O(n log n) average-though-likely case (for a quicksort.)
That's the best you can do with the sorting approach assuming arbitrary objects in the array. Since your example involves integers, though, you can do much better by using radix sort, which will have worst-case complexity of O(dn), where d is essentially constant (since it maxes out at 9 for 32-bit integers.)
Finally, if you know that the elements are integers, and that their magnitude isn't too large, you can improve the map-based solution by using an array of size ElementMax, which would guarantee O(n) worst-case complexity, with the trade-off of requiring 4*ElementMax additional bytes of memory.

I think your time complexity of O(n^2) is correct.
If space complexity is not the issue then you can have an array of 256 characters(ASCII) standard and start filling it with values. For example
// Maybe you might need to initialize all the values to 0. I don't know. But the following can be done with O(n+m) where n is the length of theArray and m is the length of array.
int[] array = new int[256];
for(int i = 0 ; i < theArray.length(); i++)
array[theArray[i]] = array[theArray[i]] + 1;
for(int i = 0 ; i < array.length(); i++)
if(array[i] > 1)
System.out.print(i);

As others have said, an O(n) solution is quite possible using a hash. In Perl:
my #data = (2,3,2,5,6,1,3);
my %count;
$count{$_}++ for #data;
my $n = grep $_ > 1, values %count;
print "$n numbers are not unique\n";
OUTPUT
2 numbers are not unique

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.