How do I remove duplicate objects from two separate ArrayLists? - java

Before beginning, I think that this question has a very simple answer that I'm just overlooking. I figured a few more eyes on the question at hand will be able to point out my problem fairly quickly.
I have two ArrayLists that I want to compare and remove duplicates from each of them. The first ArrayList is an ArrayList of older information where as the second ArrayList contains the new information.
Like so
ArrayList<Person> contactList = new ArrayList();
contactList.add(new Person("Bob");
contactList.add(new Person("Jake");
contactList.add(new Person("Joe");
ontactList.add(new Person("Rob");
ArrayList<Person> updatedContactList = new ArrayList();
updatedContactList.add(new Person("Bob");
updatedContactList.add(new Person("Jake");
updatedContactList.add(new Person("Joe");
updatedContactList.add(new Person("Phil");
My Person class is very simple, created solely for this example
public class Person {
private String name;
public Person(String a_name) {
name = a_name;
}
public String getName() {
return name;
}
}
So, using the above examples, I want to remove all duplicates. I'm trying keep it to just the two ArrayLists if possible, but am willing to do a deep clone of one of the ArrayLists if I have to.
So I want the resulting ArrayList to have the following information in it once the comparison is done
contactList //removed Person
- Rob
updatedContactList //new Person
- Phil
Here is the code I've put together
for(int i = 0; i < contactList.size(); i++) {
for(int j = 0; j < updatedContactList.size(); j++) {
if(contactList.get(i).getName().equals(updatedContactList.get(j).getName())) {
//removed friends
contactList.remove(contactList.get(i));
//new friends ---- only one at a time works
//updatedContactList.remove(updatedContactList.get(j));
}
}
}
I'm only able to remove a Person from one of the ArrayLists in the above loop otherwise I get incorrect results.
So my question is, is there an easy way to remove the duplicated elements from both ArrayLists? If so, how do I go about it.
I realize that I could probably deep clone the updated ArrayList and just remove the objects from that one, but I'm wondering if there is a way without having to clone it.
I also realize that I could just stuff all the elements into a Set and it would remove the duplicates, but I want to keep the 'removed' and 'new' Person objects separate.

What you really have is not lists, but sets: model both the old and the new contacts as a Set. Also implement equals and hashCode for your Person class to ensure proper operation.
Once you have that, you'll be able to write one-liners to calculate the set differences (which is what you need):
final Set<Person> contactsBackup = new HashSet<>(contacts);
contacts.removeAll(updatedContacts);
updatedContacts.removeAll(contactsBackup);
Note that this involves making one more copy, but it is not a deep copy—only references are copied. This is a very leightweight operation and you should not worry about its impact.
If, for some reason not at all obvious to me, you really need lists, the same code will work for them, too (List also defines removeAll), but you will have to live with O(n2) complexity this operation entails for lists.

Override equals() and hashCode() in your Person class and simply do:
Set<Person> temp = new HashSet<>(contactList);
contactList.removeAll(updatedContactList);
updatedContactList.removeAll(temp);
temp.clear(); // not necessary if this code is in a method

Create a Set and addAll from both the ArrayLists.
Set<Person> set = new ArrayList<Person>();
http://docs.oracle.com/javase/6/docs/api/java/util/Set.html

This is a one line elegant solution making use of the Java 8 capabilities
public static final <T> void removeCommonEntries(Collection<T> a, Collection<T> b){
b.removeIf(i -> a.remove(i));
}
I use to put this solution in a custom CollectionUtils.

In this case use Set and not List (this is used if you are getting data from DB using say Hibernate) if possible. Then you can override equals and hashcode method in person class so that while adding required comparisons can be made and duplicates can be taken out. LinkedHashSet can be used as Lists can become slow as data in it grows.

Related

Functional Programing; using streams to create a set/ list of objects in java

the object of this specific function is to create a list of (competitor objects ) Using 2 lists containing time and names.
note that this is a trail to convert from object oriented to functional programing , so its actually a record rather than an object.
my initial trial using a sort of recursion
public static List<Competitors> makeObjects(int[] time, String[] name){
if (counter > name.length){
return racers;
}else{
Competitors racer= new Competitors(name[counter],backToTime(time[counter]));
racers.add(racer);
return makeObjects(++counter,racers,time,name);
}
and my try after discovering i can use streams
public static List<Competitors> makeObjects(int[] time, String[] name){
List<Competitors> racers = new ArrayList<Competitors>();
IntStream.range(0,time.length).mapToObj(i-> ).collect(new Competitors(name[i] ,backToTime(time[i])))
}
Your second approach, makeObjects(), is the functional approach.
Where you go wrong is that you are creating the Competitor objects in the wrong place; you map each i to a new Competitor. Then in the collect() call, you specify what type of collection. In your case, it would be Collectors.toList() or Collectors.toSet(). There is no need to create an ArrayList; just assign the IntStream... to List<Competitor> racers.
Note that you should probably guard this method by first asserting that the arrays are the same length.
Also note that your class should be named Competitor, not Competitors; and makeObjects would be more descriptive as createCompetitorsList.

Find indexOf of an object in custom list using one attribute

So, I have a custom class, and an arraylist of that class type. Now, I want to retrieve the index of an object in this arraylist using only an ID that is available to me among the other attributes that make up an object of that class.
Saw a few examples online, but I'm kinda confused, they're overriding hashCode() and equals(), and in equals() they're checking all the attributes, I just want to check with the ID value, since for every object ID is unique.
public class MyClass {
private String ID;
private String name;
private String userName;
private String position;
// Constructors and getters and setters
}
So, what I want is, for say this piece of code:
List<MyClass> list=new ArrayList<>();
//Values are populated into list
int i=list.indexOf(someObjectsID); //Where someObjectsID is a String and not a MyClass object
int i will have the indexOf of the MyClass object in list that has ID equal to someObjectsID
If you're open to using a third party library, you can use detectIndex from Eclipse Collections.
int index = ListIterate.detectIndex(list, each -> each.getID().equals(someObjectsID));
If the list is of type MutableList, the detectIndex method is available directly on the list.
MutableList<MyClass> list = Lists.mutable.empty();
int index = list.detectIndex(each -> each.getID().equals(someObjectsID));
Note: I am a committer for Eclipse Collections
There is one absolutely guaranteed, efficient solution to this problem. Nothing else will work nearly so simply or efficiently.
That solution is to just write the loop and not try to get fancy.
for(int i = 0; i < list.size(); i++){
if (list.get(i).getId().equals(id)) {
return i;
}
}
return -1;
No need to mess with hashCode or equals. No need to force indexes into streams not designed for them.
Override hashCode & equals in your custom object, then indexOf will Just Work (tm).

How to sort a List<String> containing multiple fields delimted by '|~'

I have a List<String> theList
which has following kind of values
"2011-05-05|~JKED"
"2011-05-06|~ABC"
"2011-05-01|~XYZ"
"2011-05-01|~WWX"
As you could guess there are two "fields" in theList.
I want to sort theList on first field and then on second field such as I get following output after sorting operation
"2011-05-01|~WWX"
"2011-05-01|~XYZ"
"2011-05-05|~JKED"
"2011-05-06|~ABC"
If I take these two fields in a separate lists and do Collections.sort(field1List) Collections.sort(field2List) I get the desired output.
But, I want to know, how to use Collections.sort(theList, new Comparator(){}) to be able to sort above theList to get desired output. If it is not possible to solve through Comparator(), please suggest some method which might look like sortMultiFieldList(List<String> theList)
It is a long story why I have to have two or more fields in a single List.
Let me know if you need more clarification.
This is remarkably straightforward. You'll want to write a custom Comparator for this, and enforce its comparison logic to behave the way you want with respect to your two separate "fields".
The motivation here is that these fields are lexicographically compared to one another for the date portion, as well as the alphabetical string portion. If you find that the date comparison isn't giving you accurate results (and it may not; I'm not sure of any cases that it wouldn't work off hand, though), then convert it to a Date and compare that in-line.
Collections.sort(entries, new Comparator<String>() {
#Override
public int compare(String left, String right) {
String[] leftFragments = left.split("[|]");
String[] rightFragments = right.split("[|]");
if(leftFragments[0].compareTo(rightFragments[0]) == 0) {
return leftFragments[1].compareTo(rightFragments[1]);
} else {
return leftFragments[0].compareTo(rightFragments[0]);
}
}
});

why java.util.Set can't return any value?

java.util.Set specifies only methods that return all records (via Iterator or array).
Why is there no option to return any value from Set?
It has a lot of sense in the real life. For example, I have a bowl of strawberries and I want to take just one of them. I totally don't care which one.
Why I can't do the same in java?
This is not answerable. You'd have to ask the original designers of the Java collections framework.
One plausible reason is that methods with non-deterministic behavior tend to be problematic:
They make unit testing harder.
They make bugs harder to track down.
They are more easily misunderstood and misused by programmers who haven't bothered to read the API documentation.
For hashtable-based set organizations, the behavior a "get some element" method is going to be non-deterministic, or at least difficult to determine / predict.
By the way, you can trivially get some element of a non-empty set as follows:
Object someObject = someSet.iterator().next();
Getting a truly (pseudo-)random element is a bit more tricky / expensive because you can't index the elements of a set. (You need to extract all of the set elements into an array ...)
On revisiting this, I realized that there is another reason. It is simply that Set is based on the mathematical notion of a set, and the elements of a set in mathematics have no order. It is simply meaningless to talk about the first element of a mathematical set.
A java.util.Set is an unordered collection; you can see it as a bag that contains things, but not in any particular order. It would not make sense to have a get(int index) method, because elements in a set don't have an index.
The designers of the standard Java library didn't include a method to get a random element from a Set. If you want to know why, that's something you can only speculate about. Maybe they didn't think it was necessary, or maybe they didn't even think about it.
It's easy to write a method yourself that gets a random element out of a Set.
If you don't care about the index of the elements, try using Queue instead of Set.
Queue q = new ArrayDeque();
q.element(); // retrieves the first object but doesn't remove
q.poll(); // retrieves and removes first object
While a plain Set is in no particular, SortedSet and NavigableSet provide a guaranteed order and methods which support this. You can use first() and last()
SortedSet<E> set = ...
E e1 = set.first(); // a value
E e2 = set.last(); // also a value.
Actually the iterator is a lot better then using get(position) (which is something you can do on a java.util.List). It allows for collection modifications during the iterations for one thing. The reason you don't have them in sets is probably because most of them don't guarantee order of insertion. You can always do something like new ArrayList<?>(mySet).get(position)
If you are not concerned with performance you can create a new type and back the data in an arraylist.
( Please note before donwvoting this is just an naive implementation of the idea and not the proposed final solution )
import ...
public class PickableSet<E> extends AbstractSet<E>{
private final List<E> arrayList = new ArrayList<E>();
private final Set<E> hashSet = new HashSet<E>();
private final Random random = new Random();
public boolean add( E e ) {
return hashSet.add( e ) && arrayList.add( e );
}
public int size() {
return arrayList.size();
}
public Iterator<E> iterator() {
return hashSet.iterator();
}
public E pickOne() {
return arrayList.get( random.nextInt( arrayList.size() ) );
}
}
Of course, since you're using a different interface you'll have to cast to invoke the method:
Set<String> set = new PickableSet<String>();
set.add("one");
set.add("other");
String oneOfThem = ((PickableSet)set).pickOne();
ie
https://gist.github.com/1986763
Well, you can with a little bit of work like this
Set<String> s = new HashSet<String>();
Random r = new Random();
String res = s.toArray(new String[0])[r.nextInt(s.toArray().length)];
This grabs a randomly selected object from the set.

Have I found a bug in java.util.ArrayList.containsAll?

In Java I've two lists:
List<Satellite> sats = new ArrayList<Satellite>();
List<Satellite> sats2 = new ArrayList<Satellite>();
Satellite sat1 = new Satellite();
Satellite sat2 = new Satellite();
sats.add(sat1);
sats2.add(sat1);
sats2.add(sat2);
When I do the following containsAll method on the first list:
sats.containsAll(sats2); //Returns TRUE!
It returns true. But the first List (sats) only contains 1 item and the second list contains 2. Therefor it's not even possible that the first list (sats) containsAll items from the second list (sats2). Any idea why or is this a bug in the Java JDK?
I've read in another StackOverflow question that this is not the most performant way to do something like this, so if anyone has a suggestion on how to make it more performant that would be great!
Thanks in advance!
As pointed out by #Progman, you're probably overriding the equals method in Satellite.
The program below prints false.
import java.util.*;
class Satellite {
}
class Test {
public static void main(String[] args) {
List<Satellite> sats = new ArrayList<Satellite>();
List<Satellite> sats2 = new ArrayList<Satellite>();
Satellite sat1 = new Satellite();
Satellite sat2 = new Satellite();
sats.add(sat1);
sats2.add(sat1);
sats2.add(sat2);
System.out.println(sats.containsAll(sats2));
}
}
(ideone.com demo)
I suggest that you print the contents of the two lists and check that the content corresponds to what you expect it to be.
For many classes, it makes sense that two objects created the same way (e.g. new Satellite()) would be considered equal. Keep in mind that containsAll doesn't care about the number of copies of an object that a Collection contains, just that it contains at least one of each distinct element in the Collection that it's given. So for example, if you had a List a that contained [A, A] and a list b that just contained [A], b.containsAll(a) and a.containsAll(b) would both return true. This is probably analogous to what's happening here.
Too late to reply but the second part of your question - a more efficient way to do containsAll is : CollectionUtils.isSubCollection(subSet, superSet)
That is O(n^2) vs O(n) complexity

Categories

Resources