Removing duplicates without overriding hash method

Removing duplicates without overriding hash method - java

I have a List which contains a list of objects and I want to remove from this list all the elements which have the same values in two of their attributes. I had though about doing something like this:
List<Class1> myList;
....
Set<Class1> mySet = new HashSet<Class1>();
mySet.addAll(myList);
and overriding hash method in Class1 so it returns a number which depends only in the attributes I want to consider.
The problem is that I need to do a different filtering in another part of the application so I can't override hash method in this way (I would need two different hash methods).
What's the most efficient way of doing this filtering without overriding hash method?
Thanks

Overriding hashCode and equals in Class1 (just to do this) is problematic. You end up with your class having an unnatural definition of equality, which may turn out to be other for other current and future uses of the class.
Review the Comparator interface and write a Comparator<Class1> implementation to compare instances of your Class1 based on your criteria; e.g. based on those two attributes. Then instantiate a TreeSet<Class>` for duplicate detection using the TreeSet(Comparator) constructor.
EDIT
Comparing this approach with #Tom Hawtin's approach:
The two approaches use roughly comparable space overall. The treeset's internal nodes roughly balance the hashset's array and the wrappers that support the custom equals / hash methods.
The wrapper + hashset approach is O(N) in time (assuming good hashing) versus O(NlogN) for the treeset approach. So that is the way to go if the input list is likely to be large.
The treeset approach wins in terms of the lines of code that need to be written.

Let your Class1 implements Comparable. Then use TreeSet as in your example (i.e. use addAll method).

As an alternative to what Roman said you can have a look at this SO question about filtering using Predicates. If you use Google Collections anyway this might be a good fit.

I would suggest introducing a class for the concept of the parts of Class1 that you want to consider significant in this context. Then use a HashSet or HashMap.

Sometimes programmers make things too complicated trying to use all the nice features of a language, and the answers to this question are an example. Overriding anything on the class is overkill. What you need is this:
class MyClass {
Object attr1;
Object attr2;
}
List<Class1> list;
Set<Class1> set=....
Set<MyClass> tempset = new HashSet<MyClass>;
for (Class1 c:list) {
MyClass myc = new MyClass();
myc.attr1 = c.attr1;
myc.attr2 = c.attr2;
if (!tempset.contains(myc)) {
tempset.add(myc);
set.add(c);
}
}
Feel free to fix up minor irregulairites. There will be some issues depending on what you mean by equality for the attributes (and obvious changes if the attributes are primitive). Sometimes we need to write code, not just use the builtin libraries.

Related

Should I implement List interface or extend ArrayList class

I am developing an application where as a background I need to monitor the user activity on particular objects and later when they are visualized they need to be sorted based on the order of which the user used them ( the last used object must be visualized on the first row of a grid for example.)
So if I have an ArrayList where I store the objects which the user is dealing with in order to add the last used object I need to check if it is already in the list and then move it at the first position. If the object is not there I simply add it at the first position of the list.
So instead of doing all these steps I want to make my own list where the logic explained above will be available.
My question is which scenario is better:
Implement the list interface
Extend the ArrayList class and override the ADD method

Create a class that contains an ArrayList and handles any additional functionality.
I.e. prefer composition over inheritance (and in this case, implementing an interface). It's also possible to have that class implement List for relevant cases and just direct the (relevant) operations to the ArrayList inside.
Also note that LinkedHashMap supports insertion order (default) and access order for iteration, if you don't need a List (or if you can suitably replace it with a Map).

So instead of doing all these steps i want to make my own list where
the logic explained above will be available.
I would try to refactor your design parameters (if you can) in order to be able to use the existing Java Collection Framework classes (perhaps a linked collection type). As a part of the Collections Framework, these have been optimized and maintained for years (so efficiency is likely already nearly optimal), and you won't have to worry about maintaining it yourself.
Of the two options you give, it is possible that neither is the easiest or best.
It doesn't sound like you'll be able to extend AbstractList (as a way of implementing List) so you'll have a lot of wheel reinvention to do.
The ArrayList class is not final, but not expressly designed and documented for inheritance. This can result in some code fragility as inheritance breaks encapsulation (discussed in Effective Java, 2nd Ed. by J. Bloch). This solution may not be the best way to go.
Of the options, if you can't refactor your design to allow use of the Collection classes directly, then write a class that encapsulates a List (or other Collection) as an instance field and add instrumentation to it. Favor composition over inheritance. In this way, your solution will be more robust and easier to maintain than a solution based on inheritance.

I think LinkedHashMap already does what you need - it keeps the elements in the order they were inserted or last accessed (this is determined by the parameter accessOrder in one of the constructors).
https://docs.oracle.com/javase/8/docs/api/java/util/LinkedHashMap.html
EDIT
I don't have enough reputation to comment, so I'm putting it here: You don't actually need a map, so Venkatesh's LinkedHashSet suggestion is better.
You can do something like this:
<T> void update(Set<T> set, T value) {
set.remove(value);
set.add(value);
}
and then
LinkedHashSet<String> set = new LinkedHashSet<>();
update(set, "a");
update(set, "b");
update(set, "c");
update(set, "a");
Iterator<String> it = new LinkedList<String>(set).descendingIterator();
while (it.hasNext()) {
System.out.println(it.next());
}
Output:
a
c
b

You might try using HashMap<Integer, TrackedObject> where TrackedObject is the class of the Object you're keep track of.
When your user uses an object, do
void trackObject(TrackedObject object)
{
int x = hashMap.size();
hashMap.add(Integer.valueOf(x), object);
}
then when you want to read out the tracked objects in order of use:
TrackedObject[] getOrderedArray()
{
TrackedObject[] array = new TrackedObject[hashMap.size()];
for(int i = 0; i < hashMap.size(); i++)
{
array[i] = hashMap.get(Integer.valueOf(i));
}
return array;
}

A LinkedHashSet Also can be helpful in your case. You can keep on adding elements to it, it will keep them in insertion order and also will maintain only unique values.

How to make class usable in different HashMaps in Java

I have a class Attribute which has 2 variables say int a,b;
I want to use class Attribute in two different HashSet.
The first hash set considers objects as equal when the value of a is same.
But the second hash set considers objects as equal when the value of b is same.
I know if I override the equals method the hashset will use the overriden version of equals to compare two objects but in this case I would need two different implementations of equals()
One way is to create two subclasses of attribute and provide them with different equals method but I want to know if there is a better way to do it such that I dont have to create subclass of Attribute.
Thanks.

One possible solution is to not use HashSet, but use TreeSet instead. It's the same Set interface, but there is a TreeSet constructor that lets you pass in a Comparator. That way you could leave the Attribute class unchanged- just create two different comparators and use it like
Set<Attribute> setA = new TreeSet<Attribute>(comparatorForA);
Set<Attribute> setB = new TreeSet<Attribute>(comparatorForB);
The comparator takes care of the equality check (e.g. if compare returns 0, the objects are equal)

Unfortunately there's no "Equalizer" class that can override the equals logic. There is such a thing for sorting, where you can either use natural sorting based on the Comparable implementation or provide your own Comparator. I've actually wondered why there's no such thing for equality checks.
Since the semantics of equality are defined by a class and could be considered a trait of that class, the two subclasses approach seems the most natural. Maybe someone knows a useful pattern for doing this in a more simple manner, but I've never encountered it.
EDIT: just thought of something... you could use two Map instances, like HashMap, with the first one using a as key and the second using b as key. It'd let you detect collisions. You could then simply link the attribute to the associated instance.

I did some thing different, Instead of using the HashSet, I have used HashMap where I have used int a as a key in first HashMap and the object is stored as value.
And in the other HashMap I have kept the key as int b and the object as value.
This provides me a way to Hash on both the variables a and b so I dont have to make any sub classes.
And also, I get O(1) time instead of O(log n). But I know I am paying the price by using some more memory but my main concern was time so I chose HashMap over TreeSet.
Thank you all for your comments and suggestions.

It would be very easy to modify HashMap and HashSet to accept hashing and equality-testing strategies.
public interface Hasher {
int hashCode(Object o);
}
public interface Equalizer {
int areEqual(Object o1, Object o2);
}

A simple solution is to bypass HashSet and use HashMap directly. For the first, store each Attribute using its a property as the key, and for the other use b.

I can propose a bit hacky but lesser effort solution :)
Swap the values of a and b when storing in second hashset so that uniqueness is defined by value of b and then when reading the class from hashset then swap the value of a and b again to retain the original state. So the same equals/hascode methods will serve the purpose.

hashmap or hashset?

I have two list containing
List<MyObj>.
and MyObj has a "String ID" member.
I need to iterate them from time to time and sometimes I need to find objects which are similar on both.
I want a quicker way than lists. so I can know I can use hashMap
(than ask contains (String ) when comparing )
should I use hashmap or hashset?
note: in a hashset - I need to do implement my equals and when I run contains() - i think it will be slower than hashmap where upon inserting I put the string id in the key.
Am I correct?

note: in a hashset - I need to do implement my equals and when I run contains() - i think it will be slower than hashmap where upon inserting I put the string id in the key. Am I correct?
I don't think you would notice any performance difference. HashSet<E> is implemented using a HashMap<E, E> under the hood. So the only difference would be calling MyObj.equals() (which supposedly calls String.equals()) vs calling String.equals() directly. And the JIT compiler is pretty good at inlining calls...
The bottom line is, you should (almost) never worry about micro-optimizations, rather focus on making your design simple and consistent. If your only concern is to avoid duplication and to check for containment, a Set is a more logical choice.

This does not really make a difference at all, because when you look at the JDK source code, the Sun implementation of HashSet uses an instance of HashMap internally to store its values:
public class HashSet<E>
extends AbstractSet<E>
implements Set<E>, Cloneable, java.io.Serializable
{
static final long serialVersionUID = -5024744406713321676L;
private transient HashMap<E,Object> map;
// Dummy value to associate with an Object in the backing Map
......
And even if that was not the case, all other answers about that it does not really make a difference from a performance POV apply. The only real difference is that instead of using the equals() and hashCode() implementations of your key class you need to write your own for using the Set - but those could be as simple as delegating to the id field of your class, in case that the id field is the unique identifier.

Well, using HashMap you will be forced to store data in this way :
<ID1><MyObject>
<ID2><MyObject>
That isn't the best way, because you already have ID field in MyObject.
Using HashSet you will be able to store only unique instances of MyObject and you also will need to implement hashCode() in MyObject.
It's up to you to choose.

Is there anyway to add metadata to Java Collections?

Let's say I have a collection of objects which can be sorted using a number of different comparators based on the different fields of the object.
It would be nice to be able to know later on in the code which comparator was used to sort the Collection with and if it was ascending or descending. Is there anyway to do this elegantly instead of using a bunch of Booleans to keep track of things?

Not for the Collection interface, but if you use a SortedSet there's a comparator() method where you can ask for its comparator.
Otherwise you'll have to subclass the collection class you're using to add the accessors you need.

No there's nothing with the implementations that does this. You would need to track it yourself. You could subclass a Collection implementation to add fields which hold this information.
You could also map the implementations to metadata as you like with a Map -- in particular it seems like you want IdentityHashMap to do this, since you don't want two different collections to be compared for equality as keys with equals().
I would store a boolean (ascending/descending), and a reference to the Comparator used to sort, if that's what completely determines the sort. Or if it's sorted on field, store a String naming the field perhaps.

sure:
define methods for your decorated Collection<Foo>
public List<Comparator<Foo>> getComparators() { ... }
and
public int whichComparator() { ... }
that returns which Comparator is currently in use from the List. You could make it fancier with a Map and some sensible keys (say, enums - perhaps even enums which implement the comparators) if you're modifying which comparators might be used over the life of the object, but I think the above is a good enough start.

Duplicate values in the Set collection?

Is it possible to allow duplicate values in the Set collection?
Is there any way to make the elements unique and have some copies of them?
Is there any functions for Set collection for having duplicate values in it?

Ever considered using a java.util.List instead?
Otherwise I would recommend a Multiset from Google Guava (the successor to Google Collections, which this answer originally recommended -ed.).

The very definition of a Set disallows duplicates. I think perhaps you want to use another data structure, like a List, which will allow dups.
Is there any way to make the elements unique and have some copies of them?
If for some reason you really do need to store duplicates in a set, you'll either need to wrap them in some kind of holder object, or else override equals() and hashCode() of your model objects so that they do not evaluate as equivalent (and even that will fail if you are trying to store references to the same physical object multiple times).
I think you need to re-evaluate what you are trying to accomplish here, or at least explain it more clearly to us.

From the javadocs:
"sets contain no pair of elements e1
and e2 such that e1.equals(e2), and at
most one null element"
So if your objects were to override .equals() so that it would return different values for whatever objects you intend on storing, then you could store them separately in a Set (you should also override hashcode() as well).
However, the very definition of a Set in Java is,
"A collection that contains no
duplicate elements. "
So you're really better off using a List or something else here. Perhaps a Map, if you'd like to store duplicate values based on different keys.

Sun's view on "bags" (AKA multisets):
We are extremely sympathetic to the desire for type-safe collections. Rather than adding a "band-aid" to the framework that enforces type-safety in an ad hoc fashion, the framework has been designed to mesh with all of the parameterized-types proposals currently being discussed. In the event that parameterized types are added to the language, the entire collections framework will support compile-time type-safe usage, with no need for explicit casts. Unfortunately, this won't happen in the the 1.2 release. In the meantime, people who desire runtime type safety can implement their own gating functions in "wrapper" collections surrounding JDK collections.
(source; note it is old and possibly obsolete -ed.)
Apart from Google's collections API, you can use Apache Commons Collections.
Apache Commons Collections:
http://commons.apache.org/collections/
Javadoc for Bag

I don't believe that you can have duplicate values within a set. A set is defined as a collection of unique values. You may be better off using an ArrayList.

These sound like interview questions, so I'll answer them like interview questions...
Is it possible to allow duplicate values in the Set collection?
Yes, but it requires that the person implementing the Set violate the design contract upon which Set is built. Basically, I could write a class that extends Set and doesn't enforce Set's promises.
In addition, other violations are possible. I could use a Set implementation that relies upon Java's hashCode() contract. Then if I provided an Object that violates Java's hashcode contract, I might be able to place two objects into the set which are equal, but yeild different hashcodes (because they might not be checked in equality against each other due to being in different hash bucket chains.
Is there any way to make the elements unique and have some copies of them?
It basically depends on how you define uniqueness. If an object's uniqueness is determined by its value, then one can have multiple copies of the same unique object; however, if the object's uniqueness is determined by its instance, then by definition it would not be possible to have multiple copies of the same object. You could however have multiple references to them.
Is there any functions for Set collection for having duplicate values in it?
The Set interface doesn't have any functions for detecting / reporting duplicates; however, it is based on the Collections interface, which has to support the List interface, so it is possible to pass duplicates into a Set; however, a properly implemented Set will just ignore the duplicates, and present one copy of every element determined to be unique.

I don't think so. The only way would be to use a List. You can also trick with function equals(), hashcode() or compareTo() but it is going to be ankward.

NO chance.... you can not have duplicate values in SET interface...
If you want duplicates then you can try Array-List

As mentioned choose the right collection for the task and likely a List will be what you need. Messing with the equals(), hashcode() or compareTo() to break identity is generally a bad idea simply to wedge an instance into the wrong collection to start with. Worse yet it may break code in other areas of the application that depend on these methods producing valid comparison results and be very difficult to debug or track down such errors.

This question was asked to me also in an interview. I think the answer is, ofcourse Set will not allow duplicate elements and instead ArrayList or other collections should be used for the same, however overriding equals() for the type of the object being stored in the set will allow you to manipulate on the comparison logic. And hence you may be able to store duplicate elements in the Set. Its more of a hack, which would allow non-unique elements in the Set and ofcourse is not recommended in production level code.

You can do so by overriding hashcode as given below:
public class Test
{
static int a=0;
#Override
public int hashCode()
{
a++;
return a;
}
public static void main(String[] args)
{
Set<Test> s=new HashSet<Test>();
Test t1=new Test();
Test t2=t1;
s.add(t1);
s.add(t2);
System.out.println(s);
System.out.println("--Done--");
}
}

Well, In this case we are trying to break the purpose of specific collection. If we want to allow duplicate records simply use list or multimap.

Set will store unique values and if you wants to store duplicate values then for list,but still if you want duplicate values in set then create set of ArrayList so that you can put duplicate elements into it.
Set<ArrayList> s = new HashSet<ArrayList>();
ArrayList<String> arr = new ArrayList<String>();
arr.add("First");
arr.add("Second");
arr.add("Third");
arr.add("Fourth");
arr.add("First");
s.add(arr);

You can use Tree Map instead :
Key can be used as element you wish to store
and Value will be the frequency of input element.
The insertion and removal will require custom handling.
Insertion : Check if the map already contains the element , if yes then increment its frequency. O(log N)
Removal : if the element's frequency is 1 then remove it , else decrease frequency by 1. O(log N)
More details can be found in the java docs of tree map
Overall time complexity will remain same as TreeSet O(log N) but worse than a HashSet O(1)
firstEntry() -> provides smallest element entry, Time Complexity : O(Log N)
lastEntry() -> provides greatest element entry, Time Complexity : O(Log N)

public class SET {
public static void main(String[] args) {
Set set=new HashSet();
set.add(new AB(10, "pawan#email"));
set.add(new AB(10, "pawan#email"));
set.add(new AB(10, "pawan#email"));
Iterator it=set.iterator();
while(it.hasNext()){
Object o=it.next();
System.out.println(o);
}
}
}
public class AB{
int id;
String email;
public AB() {
System.out.println("DC");
}
AB(int id,String email){
this.id=id;
this.email=email;
}
#Override public String toString() {
// TODO Auto-generated method stub return ""+id+"\t"+email;}
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.