hashmap or hashset?

hashmap or hashset? - java

I have two list containing
List<MyObj>.
and MyObj has a "String ID" member.
I need to iterate them from time to time and sometimes I need to find objects which are similar on both.
I want a quicker way than lists. so I can know I can use hashMap
(than ask contains (String ) when comparing )
should I use hashmap or hashset?
note: in a hashset - I need to do implement my equals and when I run contains() - i think it will be slower than hashmap where upon inserting I put the string id in the key.
Am I correct?

note: in a hashset - I need to do implement my equals and when I run contains() - i think it will be slower than hashmap where upon inserting I put the string id in the key. Am I correct?
I don't think you would notice any performance difference. HashSet<E> is implemented using a HashMap<E, E> under the hood. So the only difference would be calling MyObj.equals() (which supposedly calls String.equals()) vs calling String.equals() directly. And the JIT compiler is pretty good at inlining calls...
The bottom line is, you should (almost) never worry about micro-optimizations, rather focus on making your design simple and consistent. If your only concern is to avoid duplication and to check for containment, a Set is a more logical choice.

This does not really make a difference at all, because when you look at the JDK source code, the Sun implementation of HashSet uses an instance of HashMap internally to store its values:
public class HashSet<E>
extends AbstractSet<E>
implements Set<E>, Cloneable, java.io.Serializable
{
static final long serialVersionUID = -5024744406713321676L;
private transient HashMap<E,Object> map;
// Dummy value to associate with an Object in the backing Map
......
And even if that was not the case, all other answers about that it does not really make a difference from a performance POV apply. The only real difference is that instead of using the equals() and hashCode() implementations of your key class you need to write your own for using the Set - but those could be as simple as delegating to the id field of your class, in case that the id field is the unique identifier.

Well, using HashMap you will be forced to store data in this way :
<ID1><MyObject>
<ID2><MyObject>
That isn't the best way, because you already have ID field in MyObject.
Using HashSet you will be able to store only unique instances of MyObject and you also will need to implement hashCode() in MyObject.
It's up to you to choose.

Related

Practical example for Immutable class

It is obvious that immutability increases the re-usability since it creates new object in each state change.Can somebody tells me a practical scenario where we need a immutable class ?

Consider java.lang.String. If it weren't immutable, every time you ever have a string you want to be confident wouldn't change underneath you, you'd have to create a copy.
Another example is collections: it's nice to be able to accept or return a genuinely immutable collection (e.g. from Guava - not just an immutable view on a mutable collection) and have confidence that it won't be changed.
Whether those count as "needs" or not, I don't know - but I wouldn't want to develop without them.

A good example is related to hashing. A class overrides the equals() and hashCode() methods so that it can be used in data structures like HashSet and (as keys in) HashMap, and the hash code is typically derived by some identifying member attributes. However, if these attributes were to change then so would the object's hash code, so the object is no longer usable in a hashing data structure.

Java provides a nice example: String.

This article has a good color example (since color definitions don't change).
http://www.ibm.com/developerworks/java/library/j-jtp02183/index.html

How to make class usable in different HashMaps in Java

I have a class Attribute which has 2 variables say int a,b;
I want to use class Attribute in two different HashSet.
The first hash set considers objects as equal when the value of a is same.
But the second hash set considers objects as equal when the value of b is same.
I know if I override the equals method the hashset will use the overriden version of equals to compare two objects but in this case I would need two different implementations of equals()
One way is to create two subclasses of attribute and provide them with different equals method but I want to know if there is a better way to do it such that I dont have to create subclass of Attribute.
Thanks.

One possible solution is to not use HashSet, but use TreeSet instead. It's the same Set interface, but there is a TreeSet constructor that lets you pass in a Comparator. That way you could leave the Attribute class unchanged- just create two different comparators and use it like
Set<Attribute> setA = new TreeSet<Attribute>(comparatorForA);
Set<Attribute> setB = new TreeSet<Attribute>(comparatorForB);
The comparator takes care of the equality check (e.g. if compare returns 0, the objects are equal)

Unfortunately there's no "Equalizer" class that can override the equals logic. There is such a thing for sorting, where you can either use natural sorting based on the Comparable implementation or provide your own Comparator. I've actually wondered why there's no such thing for equality checks.
Since the semantics of equality are defined by a class and could be considered a trait of that class, the two subclasses approach seems the most natural. Maybe someone knows a useful pattern for doing this in a more simple manner, but I've never encountered it.
EDIT: just thought of something... you could use two Map instances, like HashMap, with the first one using a as key and the second using b as key. It'd let you detect collisions. You could then simply link the attribute to the associated instance.

I did some thing different, Instead of using the HashSet, I have used HashMap where I have used int a as a key in first HashMap and the object is stored as value.
And in the other HashMap I have kept the key as int b and the object as value.
This provides me a way to Hash on both the variables a and b so I dont have to make any sub classes.
And also, I get O(1) time instead of O(log n). But I know I am paying the price by using some more memory but my main concern was time so I chose HashMap over TreeSet.
Thank you all for your comments and suggestions.

It would be very easy to modify HashMap and HashSet to accept hashing and equality-testing strategies.
public interface Hasher {
int hashCode(Object o);
}
public interface Equalizer {
int areEqual(Object o1, Object o2);
}

A simple solution is to bypass HashSet and use HashMap directly. For the first, store each Attribute using its a property as the key, and for the other use b.

I can propose a bit hacky but lesser effort solution :)
Swap the values of a and b when storing in second hashset so that uniqueness is defined by value of b and then when reading the class from hashset then swap the value of a and b again to retain the original state. So the same equals/hascode methods will serve the purpose.

In Java, what precautions should be taken when using a Set as a key in a Map?

I'm not sure what are prevailing opinions about using dynamic objects such as Sets as keys in Maps.
I know that typical Map implementations (for example, HashMap) use a hashcode to decide what bucket to put the entry in, and that if that hashcode should change somehow (perhaps because the contents of the Set should change at all, then that could mess up the HashMap by causing the bucket to be incorrectly computed (compared to how the Set was initially inserted into the HashMap).
However, if I ensure that the Set contents do not change at all, does that make this a viable option? Even so, is this approach generally considered error-prone because of the inherently volatile nature of Sets (even if precautions are taken to ensure that they are not modified)?
It looks like Java allows one to designate function arguments as final; this is perhaps one minor precaution that could be taken?
Do people even do stuff like this in commercial/open-source practice? (put List, Set, Map, or the like as keys in Maps?)
I guess I should describe sort of what I'm trying to accomplish with this, so that the motivation will become more clear and perhaps alternate implementations could be suggested.
What I am trying to accomplish is to have something of this sort:
class TaggedMap<T, V> {
Map<Set<T>, V> _map;
Map<T, Set<Set<T>>> _keys;
}
...essentially, to be able to "tag" certain data (V) with certain keys (T) and write other auxiliary functions to access/modify the data and do other fancy stuff with it (ie. return a list of all entries satisfying some criteria of keys). The function of the _keys is to serve as a sort of index, to facilitate looking up the values without having to cycle through all of _map's entries.
In my case, I intend to specifically use T = String, V = Integer. Someone I talked to about this had suggested substituting a String for the Set, viz, something like:
class TaggedMap<V> {
Map<String, V> _map;
Map<T, Set<String>> _keys;
}
where the key in _map is of the sort "key1;key2;key3" with keys separated by delimiter. But I was wondering if I could accomplish a more generalised version of this rather than having to enforce a String with delimiters between the keys.
Another thing I was wondering was whether there was some way to make this as a Map extension. I was envisioning something like:
class TaggedMap<Set<T>, V> implements Map<Set<T>, V> {
Map<Set<T>, V> _map;
Map<T, Set<Set<T>>> _keys;
}
However, I was not able to get this to compile, probably due to my inferior understanding of generics. With this as a goal, can anyone fix the above declaration so that it works according to the spirit of what I had described or suggest some slight structural modifications? In particular, I am wondering about the "implements Map, V>" clause, whether it is possible to declare such a complex interface implementation.

You are correct that if you ensure that
The Set contents are not modified, and
The Sets themselves are not modified
That it is perfectly safe to use them as keys in a Map.
It's difficult to ensure that (1) is not violated accidentally. One option might be to specifically design the class being stored inside the Set so that all instances of that class are immutable. This would prevent anyone from accidentally changing one of the Set keys, so (1) would not be possible. For example, if you use a Set<String> as a key, you don't need to worry about the Strings inside the Set changing due to external modification.
You can make (2) possible quite easily by using the Collections.unmodifiableSet method, which returns a wrapped view of a Set that cannot be modified. This can be done to any Set, which means that it's probably a very good idea to use something like this for your keys.
Hope this helps! And if your user name means what I think it does, good luck learning every language! :-)

As you mention, sets can change, and even if you prevent the set from changing (i.e., the elements it contains), the elements themselves may change. Those factor into the hashcode.
Can you describe what you are trying to do in higher-level terms?

#templatetypedef's answer is basically correct. You can only safely use a Set as a key in some data structure if the set's state cannot change while it is a key. If the set's state changes, the invariants of the data structure are violated and operations on it will give incorrect results.
The wrappers created using Collections.unmodifiableSet can help, but there is a hidden gotcha. If the original set is still directly reachable, the application could modify it; e.g.
public void addToMap(Set key, Object value);
someMap.put(Collections.unmodifiableSet(key), value);
}
// but ...
Set someKey = ...
addToMap(someKey, "Hi mum");
...
someKey.add("something"); // Ooops ...
To guarantee that this can't happen, you need to make a deep copy of the set before you wrap it. That could be expensive.
Another problem with using a Set as a key is that it can be expensive. There are two general approaches to implementing key / value mappings; using hashcode method or using a compareTo method that implements an ordering. Both of these are expensive for sets.

Removing duplicates without overriding hash method

I have a List which contains a list of objects and I want to remove from this list all the elements which have the same values in two of their attributes. I had though about doing something like this:
List<Class1> myList;
....
Set<Class1> mySet = new HashSet<Class1>();
mySet.addAll(myList);
and overriding hash method in Class1 so it returns a number which depends only in the attributes I want to consider.
The problem is that I need to do a different filtering in another part of the application so I can't override hash method in this way (I would need two different hash methods).
What's the most efficient way of doing this filtering without overriding hash method?
Thanks

Overriding hashCode and equals in Class1 (just to do this) is problematic. You end up with your class having an unnatural definition of equality, which may turn out to be other for other current and future uses of the class.
Review the Comparator interface and write a Comparator<Class1> implementation to compare instances of your Class1 based on your criteria; e.g. based on those two attributes. Then instantiate a TreeSet<Class>` for duplicate detection using the TreeSet(Comparator) constructor.
EDIT
Comparing this approach with #Tom Hawtin's approach:
The two approaches use roughly comparable space overall. The treeset's internal nodes roughly balance the hashset's array and the wrappers that support the custom equals / hash methods.
The wrapper + hashset approach is O(N) in time (assuming good hashing) versus O(NlogN) for the treeset approach. So that is the way to go if the input list is likely to be large.
The treeset approach wins in terms of the lines of code that need to be written.

Let your Class1 implements Comparable. Then use TreeSet as in your example (i.e. use addAll method).

As an alternative to what Roman said you can have a look at this SO question about filtering using Predicates. If you use Google Collections anyway this might be a good fit.

I would suggest introducing a class for the concept of the parts of Class1 that you want to consider significant in this context. Then use a HashSet or HashMap.

Sometimes programmers make things too complicated trying to use all the nice features of a language, and the answers to this question are an example. Overriding anything on the class is overkill. What you need is this:
class MyClass {
Object attr1;
Object attr2;
}
List<Class1> list;
Set<Class1> set=....
Set<MyClass> tempset = new HashSet<MyClass>;
for (Class1 c:list) {
MyClass myc = new MyClass();
myc.attr1 = c.attr1;
myc.attr2 = c.attr2;
if (!tempset.contains(myc)) {
tempset.add(myc);
set.add(c);
}
}
Feel free to fix up minor irregulairites. There will be some issues depending on what you mean by equality for the attributes (and obvious changes if the attributes are primitive). Sometimes we need to write code, not just use the builtin libraries.

Duplicate values in the Set collection?

Is it possible to allow duplicate values in the Set collection?
Is there any way to make the elements unique and have some copies of them?
Is there any functions for Set collection for having duplicate values in it?

Ever considered using a java.util.List instead?
Otherwise I would recommend a Multiset from Google Guava (the successor to Google Collections, which this answer originally recommended -ed.).

The very definition of a Set disallows duplicates. I think perhaps you want to use another data structure, like a List, which will allow dups.
Is there any way to make the elements unique and have some copies of them?
If for some reason you really do need to store duplicates in a set, you'll either need to wrap them in some kind of holder object, or else override equals() and hashCode() of your model objects so that they do not evaluate as equivalent (and even that will fail if you are trying to store references to the same physical object multiple times).
I think you need to re-evaluate what you are trying to accomplish here, or at least explain it more clearly to us.

From the javadocs:
"sets contain no pair of elements e1
and e2 such that e1.equals(e2), and at
most one null element"
So if your objects were to override .equals() so that it would return different values for whatever objects you intend on storing, then you could store them separately in a Set (you should also override hashcode() as well).
However, the very definition of a Set in Java is,
"A collection that contains no
duplicate elements. "
So you're really better off using a List or something else here. Perhaps a Map, if you'd like to store duplicate values based on different keys.

Sun's view on "bags" (AKA multisets):
We are extremely sympathetic to the desire for type-safe collections. Rather than adding a "band-aid" to the framework that enforces type-safety in an ad hoc fashion, the framework has been designed to mesh with all of the parameterized-types proposals currently being discussed. In the event that parameterized types are added to the language, the entire collections framework will support compile-time type-safe usage, with no need for explicit casts. Unfortunately, this won't happen in the the 1.2 release. In the meantime, people who desire runtime type safety can implement their own gating functions in "wrapper" collections surrounding JDK collections.
(source; note it is old and possibly obsolete -ed.)
Apart from Google's collections API, you can use Apache Commons Collections.
Apache Commons Collections:
http://commons.apache.org/collections/
Javadoc for Bag

I don't believe that you can have duplicate values within a set. A set is defined as a collection of unique values. You may be better off using an ArrayList.

These sound like interview questions, so I'll answer them like interview questions...
Is it possible to allow duplicate values in the Set collection?
Yes, but it requires that the person implementing the Set violate the design contract upon which Set is built. Basically, I could write a class that extends Set and doesn't enforce Set's promises.
In addition, other violations are possible. I could use a Set implementation that relies upon Java's hashCode() contract. Then if I provided an Object that violates Java's hashcode contract, I might be able to place two objects into the set which are equal, but yeild different hashcodes (because they might not be checked in equality against each other due to being in different hash bucket chains.
Is there any way to make the elements unique and have some copies of them?
It basically depends on how you define uniqueness. If an object's uniqueness is determined by its value, then one can have multiple copies of the same unique object; however, if the object's uniqueness is determined by its instance, then by definition it would not be possible to have multiple copies of the same object. You could however have multiple references to them.
Is there any functions for Set collection for having duplicate values in it?
The Set interface doesn't have any functions for detecting / reporting duplicates; however, it is based on the Collections interface, which has to support the List interface, so it is possible to pass duplicates into a Set; however, a properly implemented Set will just ignore the duplicates, and present one copy of every element determined to be unique.

I don't think so. The only way would be to use a List. You can also trick with function equals(), hashcode() or compareTo() but it is going to be ankward.

NO chance.... you can not have duplicate values in SET interface...
If you want duplicates then you can try Array-List

As mentioned choose the right collection for the task and likely a List will be what you need. Messing with the equals(), hashcode() or compareTo() to break identity is generally a bad idea simply to wedge an instance into the wrong collection to start with. Worse yet it may break code in other areas of the application that depend on these methods producing valid comparison results and be very difficult to debug or track down such errors.

This question was asked to me also in an interview. I think the answer is, ofcourse Set will not allow duplicate elements and instead ArrayList or other collections should be used for the same, however overriding equals() for the type of the object being stored in the set will allow you to manipulate on the comparison logic. And hence you may be able to store duplicate elements in the Set. Its more of a hack, which would allow non-unique elements in the Set and ofcourse is not recommended in production level code.

You can do so by overriding hashcode as given below:
public class Test
{
static int a=0;
#Override
public int hashCode()
{
a++;
return a;
}
public static void main(String[] args)
{
Set<Test> s=new HashSet<Test>();
Test t1=new Test();
Test t2=t1;
s.add(t1);
s.add(t2);
System.out.println(s);
System.out.println("--Done--");
}
}

Well, In this case we are trying to break the purpose of specific collection. If we want to allow duplicate records simply use list or multimap.

Set will store unique values and if you wants to store duplicate values then for list,but still if you want duplicate values in set then create set of ArrayList so that you can put duplicate elements into it.
Set<ArrayList> s = new HashSet<ArrayList>();
ArrayList<String> arr = new ArrayList<String>();
arr.add("First");
arr.add("Second");
arr.add("Third");
arr.add("Fourth");
arr.add("First");
s.add(arr);

You can use Tree Map instead :
Key can be used as element you wish to store
and Value will be the frequency of input element.
The insertion and removal will require custom handling.
Insertion : Check if the map already contains the element , if yes then increment its frequency. O(log N)
Removal : if the element's frequency is 1 then remove it , else decrease frequency by 1. O(log N)
More details can be found in the java docs of tree map
Overall time complexity will remain same as TreeSet O(log N) but worse than a HashSet O(1)
firstEntry() -> provides smallest element entry, Time Complexity : O(Log N)
lastEntry() -> provides greatest element entry, Time Complexity : O(Log N)

public class SET {
public static void main(String[] args) {
Set set=new HashSet();
set.add(new AB(10, "pawan#email"));
set.add(new AB(10, "pawan#email"));
set.add(new AB(10, "pawan#email"));
Iterator it=set.iterator();
while(it.hasNext()){
Object o=it.next();
System.out.println(o);
}
}
}
public class AB{
int id;
String email;
public AB() {
System.out.println("DC");
}
AB(int id,String email){
this.id=id;
this.email=email;
}
#Override public String toString() {
// TODO Auto-generated method stub return ""+id+"\t"+email;}
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.