Binary search not detecting duplicates? - java

I have an array of items, cards, all with String names, so
Card c1= new card("TheCard")
Card c2= new card("TheOtherCard")
And then I am using a quicksort to sort the list and then trying a binary search to see if cards already exist before adding more
So,
if(cards.contains(c3)==true)
//do nothing
else
cards.add(c3)
And my cards.contains method is
Comparator<Card> c = new Comparator<Card>() {
#Override
public int compare(Card u1, Card u2) {
return u1.getName().compareTo(u2.getName());
}
};
int index;
index = Collections.binarySearch(cards, it, c);
if (index == -1) {
return false;
} else {
return true;
}
But the problem is that it's searching the cards array, finding cards that aren't in the list and saying they are and saying cards that are in the list aren't
I am trying to add 10,000 cards, 8,000 of them being unique, but the contains method is returning 2,000 unique cards and when I check the list, they're not even unique https://i.imgur.com/N9kQtms.png
I've tried running the code un-sorted and that just returns about 4,000 results with the same problem of repeating cards, when I brute force and just use the base .contains, that works but it is super slow
(Also sorry if I messed up something in my post, it is my first time posting here)

The javadoc states the following:
Searches the specified list for the specified object using the binary search algorithm. The list must be sorted into ascending order according to the specified comparator (as by the sort(List, Comparator) method), prior to making this call. If it is not sorted, the results are undefined. If the list contains multiple elements equal to the specified object, there is no guarantee which one will be found.
It also states that it returns:
the index of the search key, if it is contained in the list; otherwise, (-(insertion point) - 1). The insertion point is defined as the point at which the key would be inserted into the list: the index of the first element greater than the key, or list.size() if all elements in the list are less than the specified key. Note that this guarantees that the return value will be >= 0 if and only if the key is found.
Your list should be therefore sorted beforehand or it won't return anything that make sense. Then you, it does return either the index or the insertion point of the element. Beware of this technicality. You should check after execution that the element at the index is in fact the correct one and not only the index at which you would insert your element it.
There you could have this test to see if it is your card:
// Test if the card at the index found has got the same name than the card you are actually looking for.
return !index == cards.length && cards[index].getName().equals(it.getName()));
You could also override equals to have something that is closer to:
return !index == cards.length && cards[index].equals(it);
In both case, we ensure that we won't have an ArrayOutOfBoundException if the insertion point is at the end of the list.

The binarySearch gives a non-negative index when it finds an item.
It gives the complement of the insert position: ~index == -index-1 when it is not found.
Search d in a b d e gives 2.
Search d in a b e g gives ~2 == -3, the insert position being 2.
So the check is:
int index = Collections.binarySearch(cards, it, c);
return index >= 0;
Furthermore Card should have a correct equality:
public class Card implements Comparable<Card> {
...
#Override
public int compareTo(Card other) {
return name.compareTo(other.name);
}
#Override
public boolean equals(Object obj) {
if (!(obj instanceOf Card)) {
return false;
}
Card other = (Card) obj;
return name.equals(other.name);
}
#Override
public int hashCode() {
return name.hashCode();
}
}
In this case instead of a Comparator you can implement Comparable<Card> as the name is the read identification of a card. Comparator is more for sorting persons on last name + first name, or first name + last name, or on city.
The hashCode allows using HashMap<Card, ...>.

Related

Unique list of objects using HashSet

Can anyone tell me what the issue with my code is here? I converted the code from this post to use a String array instead of two ints, where I want a unique list based on the 0th index of the String array. The problem is that the overridden equals function is never getting called therefore I have repeated entries.
public static void main(String[] args)
{
class bin
{
String[] data;
bin (String[] data)
{
this.data=data;
}
#Override
public boolean equals(Object me)
{
bin binMe = (bin)me;
if(this.data[0].equals(binMe.data[0])) { return true; }
else { return false; }
}
#Override
public int hashCode()
{
final int prime = 31;
int result = 1;
result = prime * result + Arrays.hashCode(data);
return result;
}
#Override
public String toString()
{
return data[0] + " " + data[1];
}
}
Set<bin> q= new HashSet<bin>();
q.add(new bin(new String[]{"100", "200"}));
q.add(new bin(new String[]{"101", "201"}));
q.add(new bin(new String[]{"101", "202"}));
q.add(new bin(new String[]{"103", "203"}));
System.out.println(q);
}
Gives an output of: [101 202, 100 200, 101 201, 103 203]
If you want the comparison based on the first element, don't take the hash code of the full array
Arrays.hashCode(data);
Use
data[0].hashCode();
The problem is that the overridden equals function is never getting
called therefore I have repeated entries.
This is incorrect. It does get called. However, two set elements are considered equal only if for both, the equals method returns true and hashCode returns the same int. In your case, you have overriden the equals method to do the logical comparison based on the first element of the string array. However, you need to make sure the hashCode also returns the same int for two elements that you are think are logically equal.
So update the following statement in your hashCode implementation
from
result = prime * result + Arrays.hashCode(data);
to
result = prime * result + data[0].hashCode();
The way that Hash[Set/Map]'s work, is by using the hashCode to group items into lists, and then searching these lists, this means that if all the hashCodes are unique, the list is only 1 item, and it speeds up lookup for items.
If your hashCode points to the wrong list, there are no items to check for equality, so the equals method is never called, and every item gets added, not just the unique ones.
Instead of computing the hashCode over the whole array of Strings, use data[0].hashCode();, as per #cricket_007's answer

Why does .contains method on ArrayList of custom instances work?

I've been developing a small application for work, and I've come across something I can't figure out.
In the following code, I have an ArrayList of a Custom Class called 'Product' that contains data of type 'String'. I use the .contains method on this ArrayList to ensure it doesn't contain a certain String.
My IDE gives me the warning of 'Suspicious call to java.util.Collections.contains: Given object cannot contain instances of String (expected Product)'.
I completely understand the above message, because I'm comparing two different Types, so how can it ever evaluate correctly? I'm thinking it must be because the 'Product' class contains the data I want to compare, it is defaulting to using the toString method on the Product class (I override this in the Class) and comparing it with the String I want to compare it against.
It seems like JVM black magic to me.
private void createOrderListing(List<String[]> orderList)
{
//For each line of the order list file
for(String[] s : orderList)
{
if(s.length >= 28) //OrderLine should be of this length
{
if (!s[0].equalsIgnoreCase("ProductCode") && !s[0].isEmpty()) //Makes sure we're not including headers
{
//How does this bit work?
if(!productListing.contains(s[0]))
{
OrderLine order = new OrderLine();
//References product code of Product against Order Line, if match, then pack sizes and other basic fields ammended as appropriate
boolean productFound = false;
for (Product p : productListing)
{
if (s[0].contentEquals(p.getProductCode()))
{
order.initialAmendOrderLine(p.getProductCode(), p.getProductName(), p.getPackSize(), p.getProductType());
productFound = true;
}
}
if(productFound)
{
order.setOrderValues(s);
orderListing.add(order);
}
}
//System.out.println("\nOrder Product is: " + order.getProductName()+ "\nOrder Pack Size is: " + order.getInternalPackSize());
}
}
}
}
UPDATE
The reason this works as pointed out in the comments is that the block is always true (the .contains method is always false, the ! inverses this, hence true). Sorry for the confusion and pointing out my carelessness.
Here is an implementation of contains method in ArrayList that I have in OpenJDK:
public boolean contains(Object o) {
return indexOf(o) >= 0;
}
public int indexOf(Object o) {
if (o == null) {
for (int i = 0; i < size; i++)
if (elementData[i]==null)
return i;
} else {
for (int i = 0; i < size; i++)
if (o.equals(elementData[i]))
return i;
}
return -1;
}
Basically, there is nothing complex in it. It iterates through the all elements of your ArrayList and checks whether your given object is equal to the current one. If the condition is true then element exists in the list.
So let's imagine that you are passing String "SomeValue" to this method. Elements of ArrayList are iterated and following action is executed: "SomeValue".equals(elementData[i]) where elementData[i] is a product.
Since equals method of String class cannot compare String with a Product it returns false and as a result, you get false from contains method.
To fix this situation you can iterate over ArrayList manually and compare some Product's field with your string. E.g. you can implement following contains method:
public boolean contains(List<Product> products, String yourStringValue) {
for (Product p : products) {
if(p.getProductCode().equals(yourStringValue)){
return true;
}
}
return false;
}
productListing is a list of Product objects. Yet you are asking the list if it contains a specific String object -- which shouldn't ever happen.
What you should do is check if your Product#getProductCode is equal to your specific String. This can be acheived by using streams:
if(!productListing.contains(s[0])) // replace this
// with this
if (!productListing.stream().filter(o -> o.getProductCode().equals(s[0])).findFirst().isPresent())
What does this code do? It checks all your Product elements to find one whose myStringData attribute is equal to the String you're comparing.
since contains relays on equals implementation, when you do
if(!productListing.contains(s[0]))
you are asking the list OF ARRAYS OF STRINGS if its contains a String.
that will return always false because the type are different, so is not that is working at all, is that your condition will always return false

Linked list doesn't contain values greater than a certain number

How can I create a method that would check whether or not a linked list contains any number larger than a parameter?
Let's say we have the linked list
[ 8 7 1 3 ]. This would return true and
[ 10 12 3 2] would return false.
Would this work?
public boolean f(int k) {
for (int=0; int<linkedList.size(); i++) {
if (linkedList.get(i)>k)
return false;
}
else
return true;
}
Also, I need to mention, this method would not change the list in any way and it should still work if the list contains null elements.
Thanks!
With Java 8
public boolean f(int k) {
return !linkedList.stream().anyMatch(i-> i> k );
}
clarification: I assume that you want to return false from the method in the case that even a single element is higher then the given k. Hence I use anyMatch since we only need to look for one element that is higher. There is no need to loop over the whole list.
No this will not work how you have it currently. You need to loop through the whole list before returning. The only time you should return prematurely is if you find a reason to return false in this context. So move your return true outside of your loop and then you'd be fine.
Also, try to give meaning to your method and class definitions. Saying obj.f(12) doesn't really say much, whereas obj.noElementGreaterThan(12) says a lot more.
for example:
public boolean noElementGreaterThan( int k ) {
for( int i = 0; i < linkedList.size(); i++ )
{
if( linkedList.get(i) > k )
return false;
}
return true;
}
The reason this works is because it will loop through the entire list of objects, comparing each to the value passed in (k). If the value is greater than k, then it will return false, meaning in this case that it does have an element greater than k.
using streams you could do like this:
public boolean f(int k) {
List<Integer> filtered = linkedList.stream().filter(i -> i > k).collect(Collectors.toList());
return !filtered.isEmpty();
}

Best big set data structure in Java

I need to find gaps in a big Integer Set populated with a read loop through files and I want to know if exists something already done for this purpose to avoid a simple Set object with heap overflow risk.
To better explain my question I have to tell you how my ticketing java software works.
Every ticket has a global progressive number stored in a daily log file with other informations. I have to write a check procedure to verify if there are number gaps inside daily log files.
The first idea was to create a read loop with all log files, read each line, get the ticket number and store it in a Integer TreeSet Object and then find gaps in this Set.
The problem is that ticket number can be very high and could saturate the memory heap space and I want a good solution also if I have to switch to Long objects.
The Set solution waste a lot of memory because if I find that there are no gap in the first 100 number has no sense to store them in the Set.
How can I solve? Can I use some datastructure already done for this purpose?
I'm assuming that (A) the gaps you are looking for are the exception and not the rule and (B) the log files you are processing are mostly sorted by ticket number (though some out-of-sequence entries are OK).
If so, then I'd think about rolling your own data structure for this. Here's a quick example of what I mean (with a lot left to the reader).
Basically what it does is implement Set but actually store it as a Map, with each entry representing a range of contiguous values in the set.
The add method is overridden to maintain the backing Map appropriately. E.g., if you add 5 to the set and already have a range containing 4, then it just extends that range instead of adding a new entry.
Note that the reason for the "mostly sorted" assumption is that, for totally unsorted data, this approach will still use a lot of memory: the backing map will grow large (as unsorted entries get added all over the place) before growing smaller (as additional entries fill in the gaps, allowing contiguous entries to be combined).
Here's the code:
package com.matt.tester;
import java.util.Collection;
import java.util.Comparator;
import java.util.Iterator;
import java.util.Map;
import java.util.SortedSet;
import java.util.TreeMap;
public class SE {
public class RangeSet<T extends Long> implements SortedSet<T> {
private final TreeMap<T, T> backingMap = new TreeMap<T,T>();
#Override
public int size() {
// TODO Auto-generated method stub
return 0;
}
#Override
public boolean isEmpty() {
// TODO Auto-generated method stub
return false;
}
#Override
public boolean contains(Object o) {
if ( ! ( o instanceof Number ) ) {
throw new IllegalArgumentException();
}
T n = (T) o;
// Find the greatest backingSet entry less than n
Map.Entry<T,T> floorEntry = backingMap.floorEntry(n);
if ( floorEntry == null ) {
return false;
}
final Long endOfRange = floorEntry.getValue();
if ( endOfRange >= n) {
return true;
}
return false;
}
#Override
public Iterator<T> iterator() {
throw new IllegalAccessError("Method not implemented. Left for the reader. (You'd need a custom Iterator class, I think)");
}
#Override
public Object[] toArray() {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public <T> T[] toArray(T[] a) {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public boolean add(T e) {
if ( (Long) e < 1L ) {
throw new IllegalArgumentException("This example only supports counting numbers, mainly because it simplifies printGaps() later on");
}
if ( this.contains(e) ) {
// Do nothing. Already in set.
}
final Long previousEntryKey;
final T eMinusOne = (T) (Long) (e-1L);
final T nextEntryKey = (T) (Long) (e+1L);
if ( this.contains(eMinusOne ) ) {
// Find the greatest backingSet entry less than e
Map.Entry<T,T> floorEntry = backingMap.floorEntry(e);
final T startOfPrecedingRange;
startOfPrecedingRange = floorEntry.getKey();
if ( this.contains(nextEntryKey) ) {
// This addition will join two previously separated ranges
T endOfRange = backingMap.get(nextEntryKey);
backingMap.remove(nextEntryKey);
// Extend the prior entry to include the whole range
backingMap.put(startOfPrecedingRange, endOfRange);
return true;
} else {
// This addition will extend the range immediately preceding
backingMap.put(startOfPrecedingRange, e);
return true;
}
} else if ( this.backingMap.containsKey(nextEntryKey) ) {
// This addition will extend the range immediately following
T endOfRange = backingMap.get(nextEntryKey);
backingMap.remove(nextEntryKey);
// Extend the prior entry to include the whole range
backingMap.put(e, endOfRange);
return true;
} else {
// This addition is a new range, it doesn't touch any others
backingMap.put(e,e);
return true;
}
}
#Override
public boolean remove(Object o) {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public boolean containsAll(Collection<?> c) {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public boolean addAll(Collection<? extends T> c) {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public boolean retainAll(Collection<?> c) {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public boolean removeAll(Collection<?> c) {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public void clear() {
this.backingMap.clear();
}
#Override
public Comparator<? super T> comparator() {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public SortedSet<T> subSet(T fromElement, T toElement) {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public SortedSet<T> headSet(T toElement) {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public SortedSet<T> tailSet(T fromElement) {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public T first() {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public T last() {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
public void printGaps() {
Long lastContiguousNumber = 0L;
for ( Map.Entry<T, T> entry : backingMap.entrySet() ) {
Long startOfNextRange = (Long) entry.getKey();
Long endOfNextRange = (Long) entry.getValue();
if ( startOfNextRange > lastContiguousNumber + 1 ) {
System.out.println( String.valueOf(lastContiguousNumber+1) + ".." + String.valueOf(startOfNextRange - 1) );
}
lastContiguousNumber = endOfNextRange;
}
System.out.println( String.valueOf(lastContiguousNumber+1) + "..infinity");
System.out.println("Backing map size is " + this.backingMap.size());
System.out.println(backingMap.toString());
}
}
public static void main(String[] args) {
SE se = new SE();
RangeSet<Long> testRangeSet = se.new RangeSet<Long>();
// Start by putting 1,000,000 entries into the map with a few, pre-determined, hardcoded gaps
for ( long i = 1; i <= 1000000; i++ ) {
// Our pre-defined gaps...
if ( i == 58349 || ( i >= 87333 && i <= 87777 ) || i == 303998 ) {
// Do not put these numbers in the set
} else {
testRangeSet.add(i);
}
}
testRangeSet.printGaps();
}
}
And the output is:
58349..58349
87333..87777
303998..303998
1000001..infinity
Backing map size is 4
{1=58348, 58350=87332, 87778=303997, 303999=1000000}
I believe it's a perfect moment to get familiar with bloom-filter. It's a wonderful probabilistic data-structure which can be used for immediate proof that an element isn't in the set.
How does it work? The idea is pretty simple, the boost more complicated and the implementation can be found in Guava.
The idea
Initialize a filter which will be an array of bits of length which would allow you to store maximum value of used hash function. When adding element to the set, calculate it's hash. Determinate what bit's are 1s and assure, that all of them are switched to 1 in the filter (array). When you want to check if an element is in the set, simply calculate it's hash and then check if all bits that are 1s in the hash, are 1s in the filter. If any of those bits is a 0 in the filter, the element definitely isn't in the set. If all of them are set to 1, the element might be in the filter so you have to loop through all of the elements.
The Boost
Simple probabilistic model provides the answer on how big should the filter (and the range of hash function) be to provide optimal chance for false positive which is the situation, that all bits are 1s but the element isn't in the set.
Implementation
The Guava implementation provides the following constructor to the bloom-filter: create(Funnel funnel, int expectedInsertions, double falsePositiveProbability). You can configure the filter on your own depending on the expectedInsertions and falsePositiveProbability.
False positive
Some people are aware of bloom-filters because of false-positive possibility. Bloom filter can be used in a way that don't rely on mightBeInFilter flag. If it might be, you should loop through all the elements and check one by one if the element is in the set or not.
Possible usage
In your case, I'd create the filter for the set, then after all tickets are added simply loop through all the numbers (as you have to loop anyway) and check if they filter#mightBe int the set. If you set falsePositiveProbability to 3%, you'll achieve complexity around O(n^2-0.03m*n) where m stands for the number of gaps. Correct me if I'm wrong with the complexity estimation.
Well either you store everything in memory, and you risk overflowing the heap, or you don't store it in memory and you need to do a lot of computing.
I would suggest something in between - store the minimum needed information needed during processing. You could store the endpoints of the known non-gap sequence in a class with two Long fields. And all these sequence datatypes could be stored in a sorted list. When you find a new number, iterate through the list to see if it is adjacent to one of the endpoints. If so, change the endpoint to the new integer, and check if you can merge the adjacent sequence-objects (and hence remove one of the objects). If not, create a new sequence object in the properly sorted place.
This will end up being O(n) in memory usage and O(n) in cpu usage. But using any data structure which stores information about all numbers will simply be n in memory usage, and O(n*lookuptime) in cpu if lookuptime is not done in constant time.
Read as many ticket numbers as you can fit into available memory.
Sort them, and write the sorted list to a temporary file. Depending on the expected number of gaps, it might save time and space to use a run-length–encoding scheme when writing the sorted numbers.
After all the ticket numbers have been sorted into temporary files, you can merge them into a single, sorted stream of ticket numbers, looking for gaps.
If this would result in too many temporary files to open at once for merging, groups of files can be merged into intermediate files, and so on, maintaining the total number below a workable limit. However, this extra copying can slow the process significantly.
The old tape-drive algorithms are still relevant.
Here is an idea: if you know in advance the range of your numbers, then
pre-calculate the sum of all the numbers that you expect to be there.
2. Then keep reading your numbers and produce the sum of all read numbers as well as the number of your numbers.
3. If the sum you come up with is the same as pre-calculated one, then there are no gaps.
4. If the sum is different and the number of your numbers is short just by one of the expected number then pre-calculated sum - actual sum will give you your missing number.
5. If the number of your numbers is short by more then one, then you will know how many numbers are missing and what their sum is.
The best part is that you will not need to store the collection of your numbers in memory.

Recursive isMember method with only two arguments!

I need to create a recursive Boolean method named isMemeber. The method should accept two arguments ONLY: an array and a value. The method should return true if the value is found in the array, or false if the value is not found in the array.
I think that the base case will be if the passed array is empty, but I need help with the recursive case:
public static boolean isMember(int[] array, int value)
{
if(array.length==0){
return false;
}else{
return isMember(???);
}
}
Here is how it looks with position variable:
public static boolean isMember(int[] array, int value, int position)
{
if (position > -1)
{
if (array[position] == value)
{
return true;
}
else
{
return isMember(array, value, position - 1);
}
}
return false;
}
If you need to use recursion you can copy the array on each recursion. This is inefficent, but using recursion is inefficient compared with using a loop. e.g. Arrays.indexOf()
public static boolean isMember(int[] array, int value) {
if(array.length == 0) return false;
if(array[0] == value) return true;
int[] array2 = new int[array.length-1];
System.arraycopy(array,1,array2,0,array2.length);
return isMember(array2, value);
}
There is a slight issue with your problem. If you are going to use recursion then each array element needs to have a subsey of elements otherwise whay do you passed to the recursive method? If this is not the casr and the case is as you stated then solving this problem with recursion isnot appropriate. Also you are missing the value comparison.
See the MSDN Array class. This looks like it is c#. Maybe try the Array.Find<T> method.
Update:
For Java, I'd recommend looking at Arrays (Java 2 Platform):
binarySearch
public static int binarySearch(int[]
a,
int key)
Searches the specified array of ints for the specified value using the binary search algorithm. The array must be sorted (as by the sort method above) prior to making this call. If
it is not sorted, the results are
undefined. If the array contains
multiple elements with the specified
value, there is no guarantee which one
will be found.
Parameters:
a - the array to be searched.
key - the value to be searched for.
Returns:
index of the search key, if it is contained in the list; otherwise,> (-(insertion point) - 1).
The insertion point is defined as the point at which the key would be inserted into the list: the index of the first element greater than the key, or list.size(), if all elements
in the list are less than the specified key. Note that this guarantees that the return value will be >= 0 if and only if the key is found. See Also: sort(int[])
If this is homework and they want it recursive, then maybe you should:
1 look for the middle value of the array and check if it matches. If it matches, return true
2 apply the function to the first half of the array. If it returns true, return true
3 apply the function to the second half of the aray. If it returns true, return true
4 return false
No code since it is homework.
EDIT: Is the array ordered?
I was just doing the question, and checking answers for alternative ways. Maybe this might be useful when you have to match names to String arrays.
public class Recursion {
public static void main(String[] args) {
String[] array = {"Tom", "Mary"};
if(isMember(array,"John"))
System.out.print("Found!");
else
System.out.println("Not Found!");
}
public static boolean isMember(String[] array, String name)
{
int i = array.length;
if(array.length == 0)
return false;
if(array[i - 1].equals(name))
return true;
else
{
String[] array2 = new String[array.length - 1];
for(int b = 0; b< array.length -1; b++)
{
array2[b] = array[b];
}
return isMember(array2, name);
}
}
}

Categories

Resources