I'm using inside an iterative algorithm an HashSet that is dynamically enlarged at each algorithm iteration by adding new objects (via method add). Very frequently I check if a generated object has been already put inside the HashSet by using the contains method. Observe that the HashSet may include several thousand objects.
Here follows a citation from the doc about class HashSet:
"This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets."
Apart from other considerations provided inside the doc (not reported for simplicity), I see that add and contains are executed in constant time.
Please, can you suggest another data structure in Java that provides better performance for the "contains" operation with respect to my problem?
Classes from Apache Commons or Guava are also accepted.
The performance of HashSet.contains() will be as good as you can get provided your objects have a properly implemented hashCode() method. That will ensure proper distribution among the buckets.
See Best implementation for hashCode method
As other answers already stated "constant time" is the best runtime-behaviout you can get.
If you will get it does depend on your hashcode-implementation, but since you use the NetBeans suggestion you shouldn't be too bad there.
As to how to keep the "constant time" as small as possible:
try to allocate your HashSet large enough from the very beginning to avoid costly rehash-operations
You can cache your calculated hashcode the first time hashCode() is called and return the cached value later on. There should be no need to add some triggering-mechanism to clear the cache on object-updates, since your relevant fields should be immutable - if they aren't you are bound to run into trouble using HashSet anyway.
You can let your object remember if it has been put in that hashset. Just have a boolean field to store if it was added to the hash set. Then you don't need to call contains on the HashSet but just read the field value of your object. This method will only work if the object is put in exactly one hashset that will check the boolean field.
It might be extended to a constant number of hashsets using java.util.BitSet in the object contained in the hashset where every hashset can be identified by a unique integer when the number of hashsets is known before the algorithm starts.
Since you are saying that you are calling contains frequently, it makes sense to replace newly generated objects with equal existing objects (object pooling), since the overhead of that will amortize by having contains being only a single field read.
As requested here is some sample code. The special set implementation is about 4 times faster than a normal hash set on my machine. However the question is how well this code reflects your use case.
public class FastSetContains {
public static class SetContainedAwareObject {
private final int state;
private boolean contained;
public SetContainedAwareObject(int state) {
this.state = state;
}
public void markAsContained() {
contained = true;
}
public boolean isContained() {
return contained;
}
public void markAsRemoved() {
contained = false;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + state;
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
SetContainedAwareObject other = (SetContainedAwareObject) obj;
if (state != other.state)
return false;
return true;
}
}
public static class FastContainsSet extends
HashSet<SetContainedAwareObject> {
#Override
public boolean contains(Object o) {
SetContainedAwareObject obj = (SetContainedAwareObject) o;
if (obj.isContained()) {
return true;
}
return super.contains(o);
}
#Override
public boolean add(SetContainedAwareObject e) {
boolean add = super.add(e);
e.markAsContained();
return add;
}
#Override
public boolean addAll(Collection<? extends SetContainedAwareObject> c) {
boolean addAll = super.addAll(c);
for (SetContainedAwareObject o : c) {
o.markAsContained();
}
return addAll;
}
#Override
public boolean remove(Object o) {
boolean remove = super.remove(o);
((SetContainedAwareObject) o).markAsRemoved();
return remove;
}
#Override
public boolean removeAll(Collection<?> c) {
boolean removeAll = super.removeAll(c);
for (Object o : c) {
((SetContainedAwareObject) o).markAsRemoved();
}
return removeAll;
}
}
private static final Random random = new Random(1234L);
private static final int additionalObjectsPerIteration = 10;
private static final int iterations = 100000;
private static final int differentObjectCount = 100;
private static final int containsCountPerIteration = 50;
private static long nanosSpentForContains;
public static void main(String[] args) {
Map<SetContainedAwareObject, SetContainedAwareObject> objectPool = new HashMap<>();
// switch comment use different Set implementaiton
//Set<SetContainedAwareObject> set = new FastContainsSet();
Set<SetContainedAwareObject> set = new HashSet<>();
//warm up
for (int i = 0; i < 100; i++) {
addAdditionalObjects(objectPool, set);
callSetContainsForSomeObjects(set);
}
objectPool.clear();
set.clear();
nanosSpentForContains = 0L;
for (int i = 0; i < iterations; i++) {
addAdditionalObjects(objectPool, set);
callSetContainsForSomeObjects(set);
}
System.out.println("nanos spent for contains: " + nanosSpentForContains);
}
private static void callSetContainsForSomeObjects(
Set<SetContainedAwareObject> set) {
int containsCount = set.size() > containsCountPerIteration ? set.size()
: containsCountPerIteration;
int[] indexes = new int[containsCount];
for (int i = 0; i < containsCount; i++) {
indexes[i] = random.nextInt(set.size());
}
Object[] elements = set.toArray();
long start = System.nanoTime();
for (int index : indexes) {
set.contains(elements[index]);
}
long end = System.nanoTime();
nanosSpentForContains += (end - start);
}
private static void addAdditionalObjects(
Map<SetContainedAwareObject, SetContainedAwareObject> objectPool,
Set<SetContainedAwareObject> set) {
for (int i = 0; i < additionalObjectsPerIteration; i++) {
SetContainedAwareObject object = new SetContainedAwareObject(
random.nextInt(differentObjectCount));
SetContainedAwareObject pooled = objectPool.get(object);
if (pooled == null) {
objectPool.put(object, object);
pooled = object;
}
set.add(pooled);
}
}
}
Anothe Edit:
using the following as the Set.contains implementation makes it about 8 times faster than a normal hashset:
#Override
public boolean contains(Object o) {
SetContainedAwareObject obj = (SetContainedAwareObject) o;
return obj.isContained();
}
EDIT:
This technique has a bit with the class enhancement of OpenJPA in common. The enhancement of OpenJPA enables a class to track its persistent state which is used by the entity manager. The suggested method enables an object to track if itself is contained in a set which is used by the algorithm.
Related
I am loading data on network traffic from a file. The information I'm loading is attacker IP address, victim IP address, and date. I've combined these data into a Traffic object, for which I've defined the hashCode and equals functions. Despite this, the HashMap I'm loading them into treats identical Traffic objects as different keys. The entire Traffic object complete with some simple test code in the main method follows:
import java.util.HashMap;
public class Traffic {
public String attacker;
public String victim;
public int date;
//constructors, getters and setters
#Override
public int hashCode() {
long attackerHash = 1;
for (char c:attacker.toCharArray()) {
attackerHash = attackerHash * Character.getNumericValue(c) + 17;
}
long victimHash = 1;
for (char c:victim.toCharArray()) {
victimHash = victimHash * Character.getNumericValue(c) + 17;
}
int IPHash = (int)(attackerHash*victimHash % Integer.MAX_VALUE);
return (IPHash + 7)*(date + 37) + 17;
}
public boolean equals(Traffic t) {
return this.attacker.equals(t.getAttacker()) && this.victim.equals(t.getVictim()) && this.date == t.getDate();
}
public static void main(String[] args) {
Traffic a = new Traffic("209.167.099.071", "172.016.112.100", 7);
Traffic b = new Traffic("209.167.099.071", "172.016.112.100", 7);
System.out.println(a.hashCode());
System.out.println(b.hashCode());
HashMap<Traffic, Integer> h = new HashMap<Traffic, Integer>();
h.put(a, new Integer(1));
h.put(b, new Integer(2));
System.out.println(h);
}
}
I can't speak to the strength of my hash method, but the outputs of the first two prints are identical, meaning it at least holds for this case.
Since a and b are identical in data (and therefore equals returns true), and the hashes are identical, the HashMap should recognize them as the same and update the value from 1 to 2 instead of creating a second entry with value 2. Unfortunately, it does not recognize them as the same and the output of the final print is the following:
{packagename.Traffic#1c051=1, packagename.Traffic#1c051=2}
My best guess at this is that HashMap's internal workings are ignoring my custom hashCode and equals methods, but if that's the case then why? And if that guess is wrong then what is happening here?
The problem here is your equals method, which does not override Object#equals. To prove this, the following will not compile with the #Override annotation:
#Override
public boolean equals(Traffic t) {
return this.attacker.equals(t.getAttacker()) &&
this.victim.equals(t.getVictim()) &&
this.date == t.getDate();
}
The implementation of HashMap uses Object#equals and not your custom implementation. Your equals method should accept an Object as a parameter instead:
#Override
public boolean equals(Object o) {
if (!(o instanceof Traffic)) {
return false;
}
Traffic t = (Traffic) o;
return Objects.equals(attacker, t.attacker) &&
Objects.equals(victim, t.victim) &&
date == t.date;
}
I have a small bug probably stemming from my misunderstanding of HashMap and it's killing me. I've included a small snippet of test code that illustrates the problem.
I omitted the Prefix class for conciseness, but my prefixes are just arrays of words. They are immutable, so when they are constructed they clone an array of strings passed into the constructor. Hashcode() and equals() methods are implemented so the conditionals pass. Essentially the problem is that I can only dereference the suffix list using prefix1 and not prefix2 (it returns null in the latter case.
FYI, my Hashmap is simply declared as:
// Stores mappings between "prefixes" (consecutive word phrases) and "suffixes" (successor words).
private Map<Prefix, ArrayList<String>> prefixSuffixPairs;
Any help is appreciated.
ArrayList<String> suffixInList = new ArrayList<String>();
suffixInList.add("Suffix1");
suffixInList.add("Suffix2");
String[] prefixWords1 = new String[] {"big", "the"};
Prefix prefix1 = new Prefix(prefixWords1);
String[] prefixWords2 = new String[] {"big", "the"};
Prefix prefix2 = new Prefix(prefixWords2);
prefixSuffixPairs.put(prefix1, suffixInList);
if(prefix1.hashCode() == prefix2.hashCode()) {
System.out.println("HASH CODE MATCH");
}
if(prefix1.equals(prefix2)) {
System.out.println("VALUES MATCH");
}
ArrayList<String> suffixOutList = null;
suffixOutList = prefixSuffixPairs.get(prefix2);
suffixOutList = prefixSuffixPairs.get(prefix1);
public int hashCode() {
int result = 1;
for( int i = 0; i< words.length; i++ )
{
result = result * HASH_PRIME + words[i].hashCode();
}
return result;
}
public boolean equals(Prefix prefix) {
if(prefix.words.length != words.length) {
return false;
}
for(int i = 0; i < words.length; i++) {
if(!prefix.words[i].equals(words[i])) {
return false;
}
}
return true;
}
public boolean equals(Prefix prefix) {
That does not override Object#equals (and thus is not used by the HashMap).
You are merely providing an unrelated method of the same name (overloading) -- but you could call that from the one below:
Try
#Override
public boolean equals(Object prefix) {
The #Override is not strictly necessary, but it would have enabled the compiler to detect this problem if you had applied it to your first method (you get an error when your assertion to override is mistaken).
I was asked this in interview. using Google Guava or MultiMap is not an option.
I have a class
public class Alpha
{
String company;
int local;
String title;
}
I have many instances of this class (in order of millions). I need to process them and at the end find the unique ones and their duplicates.
e.g.
instance --> instance1, instance5, instance7 (instance1 has instance5 and instance7 as duplicates)
instance2 --> instance2 (no duplicates for instance 2)
My code works fine
declare datastructure
HashMap<Alpha,ArrayList<Alpha>> hashmap = new HashMap<Alpha,ArrayList<Alpha>>();
Add instances
for (Alpha x : arr)
{
ArrayList<Alpha> list = hashmap.get(x); ///<<<<---- doubt about this. comment#1
if (list == null)
{
list = new ArrayList<Alpha>();
hashmap.put(x, list);
}
list.add(x);
}
Print instances and their duplicates.
for (Alpha x : hashmap.keySet())
{
ArrayList<Alpha> list = hashmap.get(x); //<<< doubt about this. comment#2
System.out.println(x + "<---->");
for(Alpha y : list)
{
System.out.print(y);
}
System.out.println();
}
Question: My code works, but why? when I do hashmap.get(x); (comment#1 in code). it is possible that two different instances might have same hashcode. In that case, I will add 2 different objects to the same List.
When I retrieve, I should get a List which has 2 different instances. (comment#2) and when I iterate over the list, I should see at least one instance which is not duplicate of the key but still exists in the list. I don't. Why?. I tried returning constant value from my hashCode function, it works fine.
If you want to see my implementation of equals and hashCode,let me know.
Bonus question: Any way to optimize it?
Edit:
#Override
public boolean equals(Object obj) {
if (obj==null || obj.getClass()!=this.getClass())
return false;
if (obj==this)
return true;
Alpha guest = (Alpha)obj;
return guest.getLocal()==this.getLocal()
&& guest.getCompany() == this.getCompany()
&& guest.getTitle() == this.getTitle();
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + (title==null?0:title.hashCode());
result = prime * result + local;
result = prime * result + (company==null?0:company.hashCode());
return result;
}
it is possible that two different instances might have same hashcode
Yes, but hashCode method is used to identify the index to store the element. Two or more keys could have the same hashCode but that's why they are also evaluated using equals.
From Map#containsKey javadoc:
Returns true if this map contains a mapping for the specified key. More formally, returns true if and only if this map contains a mapping for a key k such that (key==null ? k==null : key.equals(k)). (There can be at most one such mapping.)
Some enhancements to your current code:
Code oriented to interfaces. Use Map and instantiate it by HashMap. Similar to List and ArrayList.
Compare Strings and Objects in general using equals method. == compares references, equals compares the data stored in the Object depending the implementation of this method. So, change the code in Alpha#equals:
public boolean equals(Object obj) {
if (obj==null || obj.getClass()!=this.getClass())
return false;
if (obj==this)
return true;
Alpha guest = (Alpha)obj;
return guest.getLocal().equals(this.getLocal())
&& guest.getCompany().equals(this.getCompany())
&& guest.getTitle().equals(this.getTitle());
}
When navigating through all the elements of a map in pairs, use Map#entrySet instead, you can save the time used by Map#get (since it is supposed to be O(1) you won't save that much but it is better):
for (Map.Entry<Alpha, List<Alpha>> entry : hashmap.keySet()) {
List<Alpha> list = entry.getValuee();
System.out.println(entry.getKey() + "<---->");
for(Alpha y : list) {
System.out.print(y);
}
System.out.println();
}
Use equals along with hashCode to solve the collision state.
Steps:
First compare on the basis of title in hashCode()
If the title is same then look into equals() based on company name to resolve the collision state.
Sample code
class Alpha {
String company;
int local;
String title;
public Alpha(String company, int local, String title) {
this.company = company;
this.local = local;
this.title = title;
}
#Override
public int hashCode() {
return title.hashCode();
}
#Override
public boolean equals(Object obj) {
if (obj instanceof Alpha) {
return this.company.equals(((Alpha) obj).company);
}
return false;
}
}
...
Map<Alpha, ArrayList<Alpha>> hashmap = new HashMap<Alpha, ArrayList<Alpha>>();
hashmap.put(new Alpha("a", 1, "t1"), new ArrayList<Alpha>());
hashmap.put(new Alpha("b", 2, "t1"), new ArrayList<Alpha>());
hashmap.put(new Alpha("a", 3, "t1"), new ArrayList<Alpha>());
System.out.println("Size : "+hashmap.size());
Output
Size : 2
I want to compare database dump to xml and *.sql. In debagge toRemove and toAdd only differ in dimension. toRemove has size 3, toAdd has size 4. But after running the code, removeAll, toRemove has size 3 and toAdd has size 4. What's wrong?
final DBHashSet fromdb = new DBHashSet(strURL, strUser, strPassword);
final DBHashSet fromxml = new DBHashSet(namefile);
Set<DBRecord> toRemove = new HashSet<DBRecord>(fromdb);
toRemove.removeAll(fromxml);
Set<DBRecord> toAdd = new HashSet<DBRecord>(fromxml);
toAdd.removeAll(fromdb);
Update:
public class DBRecord {
public String depcode;
public String depjob;
public String description;
public DBRecord(String newdepcode, String newdepjobe, String newdesc) {
this.depcode = newdepcode;
this.depjob = newdepjobe;
this.description = newdesc;
}
public String getKey() {
return depcode + depjob;
}
public boolean IsEqualsKey(DBRecord rec) {
return (this.getKey().equals(rec.getKey()));
}
public boolean equals(Object o) {
if (o == this)
return true;
if (o == null)
return false;
if (!(getClass() == o.getClass()))
return false;
else {
DBRecord rec = (DBRecord) o;
if ((rec.depcode.equals(this.depcode)) && (rec.depjob.equals(this.depjob)))
return true;
else
return false;
}
}
}
In order to properly use HashSet (and HashMap, for that matter), you must implement a hashCode() as per the following contract:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
The code you've supplied for DBRecord does not overide it, hence the problem.
You'd probably want to override it in the following way, or something similar:
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + depcode.hashCode();
result = prime * result + depjob.hashCode());
return result;
}
Since the equals function in array only check the instance, it doesn't work well with Set.
Hence, I wonder how to make a set of arrays in Java?
One possible way could be put each array in an object, and implement equals function for that class, but will that decrease the performance too much?
Don't use raw Arrays unless you absolutely have to because of some legacy API that requires an Array.
Always try and use a type safe ArrayList<T> instead and you won't have these kind of issues.
If you make your Set be an instance of TreeSet, you can specify a custom Comparator which will be used for all comparisons (even equality).
You could create a wrapper class for your array and override hashcode and equals accordingly.
For example:
public class MyArrayContainer {
int[] myArray = new int[100];
#Override
public boolean equals(Object other) {
if (null != other && other instanceof MyArrayContainer) {
MyArrayContainer o = (MyArrayContainer) other;
final int myLength = myArray.length;
if (o.myArray.length != myLength) {
return false;
}
for (int i = 0; i < myLength; i++) {
if (myArray[i] != o.myArray[i]) {
return false;
}
}
return true;
}
return false;
}
#Override
public int hashCode() {
return myArray.length;
}
}
Since Java 9, you can use Arrays::compare method as a comparator for TreeSet that compares the contents of arrays.
Set<String[]> set = new TreeSet<>(Arrays::compare);
String[] val1 = {"one", "two"};
String[] val2 = {"one", "two"};
String[] val3 = {"one", "two"};
set.add(val1);
set.add(val2);
System.out.println(set.size()); // 1
System.out.println(set.contains(val1)); // true
System.out.println(set.contains(val2)); // true
System.out.println(set.contains(val3)); // true
See also: Check if an array exists in a HashSet<int[]>
It is better to use lists for this problem.
Also, use trusted sources to ensure that Java is properly configured on your system; Read the complete information in the documentation
Since the ArrayList class already wraps an array, you can extend it and override the equals and hashCode methods. Here is a sample:
public MyArrayList extends ArrayList<MyClass> {
#Override
public boolean equals(Object o) {
if (o instanceof MyArrayList) {
//place your comparison logic here
return true;
}
return false;
}
#Override
public int hashCode() {
//just a sample, you can place your own code
return super.hashCode();
}
}
UPDATE:
You can even override it for a generic use, just changing the code to:
public MyArrayList<T> extends ArrayList<T> {
//overrides the methods you need
#Override
public boolean equals(Object o) {
if (o instanceof MyArrayList) {
//place your comparison logic here
return true;
}
return false;
}
}
A class that extends Set and override the equals method could do it.