Using a HashMap for search capabilites - java

I need to use a data structure that can provide a low search time, but don't have a need to store key, value pairs.
All I need is to check if an element exists in a collection or not.
I'm thinking of inserting all the values from an array into a hashmap (with the key and value being the same) and then perform search operations on this.
Any alternatives or is this reasonable?

If you don't want to maintain key-value pairs, consider using java.util.HashSet
I assume your main use case would be adding elements to it and then calling 'contains' which has O(1) complexity

Why do you need a HashMap for this? There are a few ArrayList Examples for this.
ArrayList, List, LinkedList
You can define the object you want to store in the List by using the diamond operator
LinkedList<String> this list now stores String values.
or as the comments suggested you can use a HashSet
HashSet<String> hashSet = new HashSet<>();
hashSet.add("Item");

You can go with HashSet. contains(Object o) method can help you in doing desired operation. It returns true if element is present otherwise returns false.

You can use Bloom Filter. it is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed (though this can be addressed with a "counting" filter); the more elements that are added to the set, the larger the probability of false positives.
Cool article on Bloom Filters.

Related

Is it a good idea to store data as keys in HashMap with empty/null values?

I had originally written an ArrayList and stored unique values (usernames, i.e. Strings) in it. I later needed to use the ArrayList to search if a user existed in it. That's O(n) for the search.
My tech lead wanted me to change that to a HashMap and store the usernames as keys in the array and values as empty Strings.
So, in Java -
hashmap.put("johndoe","");
I can see if this user exists later by running -
hashmap.containsKey("johndoe");
This is O(1) right?
My lead said this was a more efficient way to do this and it made sense to me, but it just seemed a bit off to put null/empty as values in the hashmap and store elements in it as keys.
My question is, is this a good approach? The efficiency beats ArrayList#contains or an array search in general. It works.
My worry is, I haven't seen anyone else do this after a search. I may be missing an obvious issue somewhere but I can't see it.
Since you have a set of unique values, a Set is the appropriate data structure. You can put your values inside HashSet, an implementation of the Set interface.
My lead said this was a more efficient way to do this and it made sense to me, but it just seemed a bit off to put null/empty as values in the hashmap and store elements in it as keys.
The advice of the lead is flawed. Map is not the right abstraction for this, Set is. A Map is appropriate for key-value pairs. But you don't have values, only keys.
Example usage:
Set<String> users = new HashSet<>(Arrays.asList("Alice", "Bob"));
System.out.println(users.contains("Alice"));
// -> prints true
System.out.println(users.contains("Jack"));
// -> prints false
Using a Map would be awkward, because what should be the type of the values? That question makes no sense in your use case,
as you have just keys, not key-value pairs.
With a Set, you don't need to ask that, the usage is perfectly natural.
This is O(1) right?
Yes, searching in a HashMap or a HashSet is O(1) amortized worst case, while searching in a List or an array is O(n) worst case.
Some comments point out that a HashSet is implemented in terms of HashMap.
That's fine, at that level of abstraction.
At the level of abstraction of the task at hand ---
to store a collection of unique usernames,
using a set is a natural choice, more natural than a map.
This is basically how HashSet is implemented, so I guess you can say it's a good approach. You might as well use HashSet instead of your HashMap with empty values.
For example :
HashSet's implementation of add is
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
where map is the backing HashMap and PRESENT is a dummy value.
My worry is, I haven't seen anyone else do this after a search. I may be missing an obvious issue somewhere but I can't see it.
As I mentioned, the developers of the JDK are using this same approach.

Should you check for a duplicate before inserting into a set

I am learning to use sets. My question is : Sets do not contain duplicates. When we try to insert duplicates, it does not throw any error and automatically removes duplicates. Is it a good practice to check each value before inserting into set whether it exists or not? Or is it OK to do something like the below code? I think Java would be internally doing the check using .contains(value) . What do you think?
What would be the Big O complexity in both the cases considering there are n elements going into the set?
import java.util.HashSet;
import java.util.Set;
public class DuplicateTest {
public static void main(String[] args) {
// TODO Auto-generated method stub
Set<Integer> mySet = new HashSet<Integer>();
mySet.add(10);
mySet.add(20);
mySet.add(30);
mySet.add(40);
mySet.add(50);
mySet.add(50);
mySet.add(50);
mySet.add(50);
mySet.add(50);
mySet.add(50);
System.out.println("Contents of the Hash Set :"+mySet);
}
}
As per the docs:
public boolean add(E e)
Adds the specified element to this set if it is not already present. More formally, adds the specified element e to this set if this set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false.
So the add() method already returns you a true or a false. So you don't need to do the additional check.
Compare with the API documentation of Set.add(E)
The add method checks if the element is already in the Set. If the element is already present, then the new element is not added, and the Set remains unchanged. In most situations, you don't need to check anything.
The complexity of the method depends of the concrete implementation of Set that you are using.
Its ok not to check. This is the main advantage over Sets of Lists, as they will automatically filter out duplicates.
HashSet has constant time performance (http://docs.oracle.com/javase/8/docs/api/java/util/HashSet.html)
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets
The add function returns a boolean which you can check to determine if the item was already in the Set. This is of course based on your needs and isn't a best practice. Its good to know that it will not remove an item that is already there so it can't be depended on to update the existing value with new information if you are defining equals based on surrogate keys from your database. This is opposite the way Maps work as a map will return any existing value and replace it with the new value.
Here are answers to your questions:
When we try to insert duplicates, it does not throw any error and
automatically removes duplicates.
Your understanding is not correct. The call to Set.add() will not add a new item if it is already in the set; this statement applies to all implementations of Set, including HashSet and TreeSet.
Is it a good practice to check each value before inserting into set
whether it exists or not? or is it okay to do something like the below
code? I think java would be internally doing the check using
.contains(value) . What do you think?
Because your understanding was incorrect from the start, then you do not need to check each value before inserting into the set to see if it already exists. Yes, internally, it is doing something like contains().
What would be the Big Oh complexity in both the cases considering
there are "n" elements going into the set?
For HashSet, the time complexity is O(1) for each add(). For TreeSet() -- which you didn't use -- the time complexity is O(lg N) for each add().

Java data structure to allow boolean flags on objects and sorting?

I wish to have a set of Objects and booleans that mark an object as "visited" or not. Naturally I thought of Map that will tell me if an object is already visited or not. But I want them to be sorted too, so that whenever I ask "Who is the 'smallest' object visited?". The calculation wouldn't be too difficult, max O(n) on that data structure.
In my very specific case I'm asking about Date object, but it's irrelevant.
Objects can be added to that data structure at any moment, and will be entered with 'false' values.
Use a SortedSet. When an object is visited, add it to the set. To find out if an object was visited, just use set.contains(). To find the smallest object:
T smallest = set.isEmpty() ? null : set.iterator().next();
You could use a map of <Boolean, TreeSet<Object>>, where you keep all of the visited objects in the set mapped to true and visa-versa (assuming you're not dealing with duplicate objects). I believe that insertion into a TreeSet runs in O(n) time, and to get the "smallest" object visited, you would use first(), which runs in O(1) time.
What you need is guava's TreeMultiset, please read about the Multiset, a TreeMultiset implementation maintains the ordering of its elements. You can write a custom Comparator - in first place you could have most frequently visited object.
https://code.google.com/p/guava-libraries/wiki/NewCollectionTypesExplained
If you use it you won't have structures like
Collection<Something, Something>
and also sorting would be out of the box after implementing a Comparator.

How to avoid duplicate strings in Java?

I want to be able to add specific words from a text into a vector. Now the problem is I want to avoid adding duplicate strings. The first thing that comes to my mind is to compare all strings before adding them, as the amount of entries grow, this becomes really inefficient solution. The only "time efficient" solution that I can think of is unordered_multimap container that has included in C++11. I couldn't find a Java equivalent of it. I was thinking to add strings to the map and at the end just copying all entries to the vector, in that way it would be a lot more efficient than the first solution. Now I wonder whether there is any Java library that does what I want? If not is there any C++ unordered_multimap container equivalent in Java that I couldn't find?
You can use a Set<String> Collection. It does not allow duplicates. You can choose then as implementantion:
1) HashSet if you do not care about the order of elements (Strings).
2) LinkedHashSet if you want to keep the elements in the inserting order.
3) TreeSet if you want the elements to be sorted.
For example:
Set<String> mySet = new TreeSet<String>();
mySet.add("a_String");
...
Vector is "old-fashioned" in Java. You had better avoid it.
You can use a set (java.util.Set):
Set<String> i_dont_allow_duplicates = new HashSet<String>();
i_dont_allow_duplicates.add(my_string);
i_dont_allow_duplicates.add(my_string); // wont add 'my_string' this time.
HashSet will do the job most effeciently and if you want to keep insertion order then you can use LinkedHashSet.
Use a Set. A HashSet will do fine if you do not need to preserve order. A LinkedHashSet works if you need that.
You should consider using a Set:
A collection that contains no duplicate elements. More formally, sets
contain no pair of elements e1 and e2 such that e1.equals(e2), and at
most one null element. As implied by its name, this interface models
the mathematical set abstraction.
HashSet should be good for your use:
HashSet class implements the Set interface, backed by a hash table
(actually a HashMap instance). It makes no guarantees as to the
iteration order of the set; in particular, it does not guarantee that
the order will remain constant over time. This class permits the null
element.
So simply define a Set like this and use it appropriately:
Set<String> myStringSet = new HashSet<String>();
Set<String> set = new HashSet<String>();
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified.
This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.

How to check if a key in a Map starts with a given String value

I'm looking for a method like:
myMap.containsKeyStartingWith("abc"); // returns true if there's a key starting with "abc" e.g. "abcd"
or
MapUtils.containsKeyStartingWith(myMap, "abc"); // same
I wondered if anyone knew of a simple way to do this
Thanks
This can be done with a standard SortedMap:
Map<String,V> tailMap = myMap.tailMap(prefix);
boolean result = (!tailMap.isEmpty() && tailMap.firstKey().startsWith(prefix));
Unsorted maps (e.g. HashMap) don't intrinsically support prefix lookups, so for those you'll have to iterate over all keys.
From the map, you can get a Set of Keys, and in case they are String, you can iterate over the elements of the Set and check for startsWith("abc")
To build on Adel Boutros answer/comment about the efficiency of iterating keys, you could encapsulate key iteration in a Map subclass or decorator.
Extending HashMap would give you a class to put the method in and keep map-specific code out of your method, so lowering complexity and making the code more natural to read.

Categories

Resources