if I have two lists of objects from Collection interface
list 1 = {John, Tim, Tom}
list 2 = {John, Tim}
and both of the lists are instances of ArrayList
how does Java knows if list2 is contained in list1 with list1.containsall(list2)?
I knows that Java uses contain method inside the implementation of containsall() method, and the contain method uses the equal() method. I understand the differences but I am not sure how Java iterates through the elements of list 1.
so If I use list1.containsAll(list2),, constainsAll() method is implemented with a loop that iterates through every object of in this case, list2, and throws false if one of the elements is not in list 1.
So my main question is how does JAVA know that list 1 contains all of the elements without another loop to iterate through the elements of list 1? Does java does the work internally or something?
I currently know that to do such a thing, I would have to use
for (int i = 0; i < list1.size(), i++)
list1.get(i).constainsAll(list2);
,,
that seems more logical to me taking into consideration that I would have to modified the code for containsAll to work correctly and also implement the method of get()
Maybe to answer this formally because I think it is a good Question.
The containsAll method iterates through the provided collection and performs the contains() method on each entry which also iterates through the other list being compared. See below extract from java code
public boolean containsAll(Collection<?> c) {
for (Object e : c)
if (!contains(e))
return false;
return true;
}
and
public boolean contains(Object o) {
Iterator<E> it = iterator();
if (o==null) {
while (it.hasNext())
if (it.next()==null)
return true;
} else {
while (it.hasNext())
if (o.equals(it.next()))
return true;
}
return false;
}
This makes this o(n^2) (Worst case scenario if the last values do not match or if the list actually matches) (Which is really bad, especially if you have big collections which you are comparing).
a better approach would be to do something like the following: (Obviously this needs to be adjusted if you are using objects or other collections apart from strings and do some null checks or something)
public boolean containsAllStrings(List<String> list1, List<String> List2) {
Map<String, String> list1Map = list1.stream().collect(Collectors.toMap(c -> c, c -> c));
return List2.stream().allMatch(list1Map::containsKey);
}
This way it Iterates a max number of 2n (one for adding items to map and one for comparing) times (n being the biggest list of the 2) and not n^2.
It may seem the same but hash maps are nice because they contain a pointer to the value in memory (Using the hashed value of the key) and do not iterate overall all the values, making accessing a value in a map always o(1). Which is optimal.
Obviously, there are tradeoffs between approaches like memory utilization, but for speed, this is the best approach.
Related
I'm new in coding and I decided to learn java, groovy. I am making a simple exercise. I got a two array and I must compare them if they are equal. I take values from 2 database and these databases are same, but values are not in the same order, but they are equal. For example, I have:
ArrayList collection1 = ["test","a"]
ArrayList collection2 = ["a","test"]
Well I tried this:
assert collection1.equals(collection2)
But I know that this works only when values in those arrays are placed in same order.
I can think of two methods:
Check that they are equal sizes
Wrap the two arrays with Arrays.asList()
Check if a contains all elements from b
public static boolean equals(Object[] a, Object[] b) {
return a.length == b.length && Array.asList(a).containsAll(Arrays.asList(b));
}
Another way would be to just iterate over both arrays at once and then check if the elements are equal:
public static boolean equals(Object[] a, Object[] b) {
if(a.length != b.length) return false;
outer: for(Object aObject : a) {
for(Object bObject : b) {
if(a.equals(b)) continue outer;
}
return false;
}
return true;
}
Both methods are rather fast, the first introduces an additional wrapper around the arrays, but is neglectable as Arrays.asList() just uses the given array as a View and does not do any additional copying.
Now it seems that you're actually comparing two Collections, then you can just use this approach:
public static boolean equals(Collection<?> a, Collection<?> b) {
return a.size() == b.size() && a.containsAll(b);
}
In an array the order is important. If you want an array without checking the order, you should use Sets Sets tutorial.
However, if you don't want to use another type I suggest you implement your own function that checks the presence of each element in one another.
I hope this can helps !
I know absolutely zilch about Java programming, but I've thought of this problem more generally for some time and I think I have a workable solution that is generalizable if you know a priori all the values that can be contained in the array.
If you assign a prime number to each possible string that can be in the array and then multiply all the elements of an array together, then the multiplied number will represent a unique combination, but not order, or the elements of the array. To close the loop, then you just have to compare the values of that multiplication. If there's a better answer, use that, but I thought I would share this idea.
In Groovy, just sort them, and check the sorted lists:
assert listA.sort(false) == listB.sort(false)
Or if they can't have duplicates, use Sets as suggested by #Baldwin
The official documentation (archive) of containsAll only says "Returns true if this list contains all of the elements of the specified collection.". However, I just tested this:
List<Integer> list1 = new ArrayList<>();
list1.add(1);
list1.add(2);
list1.add(1);
List<Integer> list2 = new ArrayList<>();
list2.add(2);
list2.add(1);
list2.add(2);
System.out.println(list1.containsAll(list2));
The result is true, even though list1 does not contain a second 2.
So what is the official, completely defined behaviour of containsAll? Does it act as if all duplicates were removed from both lists? I remember reading somewhere that it can cause problems with duplicates, but I don't know the exact case.
The List.containsAll method behaves just as documented: it returns true if all the elements of the given collection belong to this collection, false otherwise. The docs say nothing about the order or cardinality of the elements.
The documentation for containsAll does not explicitly say how it determines whether an element belongs to the Collection. But the documentation for contains (which is implicitly specifying the semantics of "contains") does: it uses equals. Again, no mention of cardinality.
The containsAll method is declared in the Collection interface and re-declared in the List and Set interfaces, but it's first implemented in the Collection hierarchy by the AbstractCollection class, as follows:
public boolean containsAll(Collection<?> c) {
for (Object e : c)
if (!contains(e))
return false;
return true;
}
As far as I know, this implementation is inherited by most common classes that implement the Collection interface in the Java Collections framework, except for the CopyOnWriteArrayList class and other specialized classes, such as empty lists and checked and immutable wrappers, etc.
So, if you look at the code, you'll see that it fulfils the docs you quoted:
Returns true if this list contains all of the elements of the specified collection.
In the docs of the AbstractList.containsAll method, there's also an #implSpec tag, which says the following:
#implSpec
This implementation iterates over the specified collection, checking each element returned by the iterator in turn to see if it's contained in this collection. If all elements are so contained true is returned, otherwise false.
With regard to possible optimizations, they're all relayed to the different implementations of the contains method, which is also implemented by AbstractCollection in a naive, brute-force-like way. However, contains is overriden in i.e. HashSet to take advantage of hashing, and also in ArrayList, where it uses indexes, etc.
You can iterate over one list and remove elements by value from another, then check if another list size == 0. If it is, then that means all second list elements were present in first list at least as many times as in the second list.
public boolean containsAll(List<Character> source, List<Character> target) {
for (Character character : source) {
target.remove(character);
if (target.isEmpty()) {
return true;
}
}
return target.size() == 0;
}
HashMap will be more efficient if lists are huge
public static boolean containsAll(List<Character> source, List<Character> target) {
Map<Character, Long> targetMap = target.stream().collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
for (Character character : source) {
Long count = targetMap.get(character);
if (count != null) {
if (count > 1) {
targetMap.put(character, --count);
} else {
targetMap.remove(character);
}
}
}
return targetMap.isEmpty();
}
For example in the code below:
public int commonTwo(String[] a, String[] b)
{
Set common = new HashSet<String>(Arrays.asList(a));
common.retainAll(new HashSet<String>(Arrays.asList(b)));
return common.size();
}
Lets take a peruse at the code. The method retainAll is inherited from AbstractCollection and (at least in OpenJDK) looks like this:
public boolean retainAll(Collection<?> c) {
boolean modified = false;
Iterator<E> it = iterator();
while (it.hasNext()) {
if (!c.contains(it.next())) {
it.remove();
modified = true;
}
}
return modified;
}
There is one big this to note here, we loop over this.iterator() and call c.contains. So the time complexity is n calls to c.contains where n = this.size() and at most n calls to it.remove().
This important thing is that the contains method is called on the other Collection and so the complexity is dependant upon the complexity of the other Collection contains.
So, whilst:
Set<String> common = new HashSet<>(Arrays.asList(a));
common.retainAll(new HashSet<>(Arrays.asList(b)));
Would be O(a.length), as HashSet.contains and HashSet.remove are both O(1) (amortized).
If you were to call
common.retainAll(Arrays.asList(b));
Then due to the O(n) contains on Arrays.ArrayList this would become O(a.length * b.length) - i.e. by spending O(n) copying the array to a HashSet you actually make the call to retainAll much faster.
As far as space complexity goes, no additional space (beyond the Iterator) is required by retainAll, but your invocation is actually quite expensive space-wise as you allocate two new HashSet implementations which are actually fully fledged HashMap.
Two further things can be noted:
There is no reason to allocate a HashSet from the elements in a - a cheaper collection that also has O(1) remove from the middle such as an LinkedList can be used. (cheaper in memory and also build time - a hash table is not built)
Your modifications are being lost as you create new collection instances and only return b.size().
The implementation can be found in the java.util.AbstractCollection class. The way it is implemented looks like this:
public boolean retainAll(Collection<?> c) {
Objects.requireNonNull(c);
boolean modified = false;
Iterator<E> it = iterator();
while (it.hasNext()) {
if (!c.contains(it.next())) {
it.remove();
modified = true;
}
}
return modified;
}
So it will iterate everything in your common set and check if the collection that was passed as a parameter contains this element.
In your case both are HashSets, thus it will be O(n), as contains should be O(1) amortized and iterating over your common set is O(n).
One improvement you can make, is simply not copy a into a new HashSet, because it will be iterated anyway you can keep a list.
I have the following problem: I need to find pairs of the same elements in two lists, which are unordered. The thing about these two lists is that they are "roughly equal" - only certain elements are shifted by a few indexes e.g. (Note, these objects are not ints, I am just using integers in this example):
[1,2,3,5,4,8,6,7,10,9]
[1,2,3,4,5,6,7,8,9,10]
My first attempt would be to iterate through both lists and generate two HashMaps based on some unique key for each object. Then, upon the second pass, I would simply pull the elements from both maps. This yields O(2N) in space and time.
I was thinking about a different approach: we would keep pointers to the current element in both lists, as well as currentlyUnmatched set for each of the list. the pseudocode would be sth of the following sort:
while(elements to process)
elem1 = list1.get(index1)
elem2 = list2.get(index2)
if(elem1 == elem2){ //do work
... index1++;
index2++;
}
else{
//Move index of the list that has no unamtched elems
if(firstListUnmatched.size() ==0){
//Didn't find it also in the other list so we save for later
if(secondListUnamtched.remove(elem1) != true)
firstListUnmatched.insert(elem1)
index1++
}
else { // same but with other index}
}
The above probably does not work... I just wanted to get a rough idea what you think about this approach. Basically, this maintains a hashset on the side of each list, which size << problem size. This should be ~O(N) for small number of misplaced elements and for small "gaps". Anyway, I look forward to your replies.
EDIT: I cannot simply return a set intersection of two object lists, as I need to perform operations (multiple operations even) on the objects I find as matching/non-matching
I cannot simply return a set intersection of two object lists, as I need to perform operations (multiple operations even) on the objects I find as matching/non-matching
You can maintain a set of the objects which don't match. This will be O(M) in space where M is the largest number of swapped elements at any point. It will be O(N) for time where N is the number of elements.
interface Listener<T> {
void matched(T t1);
void onlyIn1(T t1);
void onlyIn2(T t2);
}
public static <T> void compare(List<T> list1, List<T> list2, Listener<T> tListener) {
Set<T> onlyIn1 = new HashSet<T>();
Set<T> onlyIn2 = new HashSet<T>();
for (int i = 0; i < list1.size(); i++) {
T t1 = list1.get(i);
T t2 = list2.get(i);
if (t1.equals(t2)) {
tListener.matched(t1);
continue;
}
if (onlyIn2.remove(t1))
tListener.matched(t1);
else
onlyIn1.add(t1);
if (!onlyIn1.remove(t2))
onlyIn2.add(t2);
}
for (T t1 : onlyIn1)
tListener.onlyIn1(t1);
for (T t2 : onlyIn2)
tListener.onlyIn2(t2);
}
If I have understood your question correctly, You can use Collection.retainAll and then iterate over collection that is been retained and do what you have to do.
list2.retainAll(list1);
All approaches based on maps will be O(n log(n)) at best, because creating the map is an insertion sort. The effect is to do an insertion sort on both, and then compare them, which is as good as it's going to get.
If the lists are nearly sorted to begin with, a sort step shouldn't take as long as the average case, and will scale with O(n log(n)), so just do a sort on both and compare. This allows you to step through and perform your operations on the items that match or do not match as appropriate.
I have two Collections in a Java class.The first collection contains previous data, the second contains updated data from the previous collection.
I would like to compare the two collections but I'm not sure of the best way to implement this efficiently.Both collections will contain the same amount of items.
Based then on the carType being the same in each collection I want to execute the carType method.
Any help is appreciated
Difficult to help, because you didn't tell us how you like to compare the (equal-size) collections. Some ideas, hoping one will fit:
Compare both collections if they contain the same objects in the same order
Iterator targetIt = target.iterator();
for (Object obj:source)
if (!obj.equals(targetIt.next()))
// compare result -> false
Compare both collections if they contain the same objects in the any order
for (Object obj:source)
if (target.contains(obj))
// compare result -> false
Find elements in other collection that has changed
Iterator targetIt = target.iterator();
for (Object obj:source)
if (!obj.equals(targetIt.next())
// Element has changed
Based on your comment, this algorithm would do it. It collects all Cars that have been updated. If the method result is an empty list, both collections contain equal entries in the same order. The algorithm relies on a correct implementation of equals() on the Car type!
public List<Car> findUpdatedCars(Collection<Car> oldCars, Collection<Car> newCars)
List<Car> updatedCars = new ArrayList<Car>();
Iterator oldIt = oldCars.iterator();
for (Car newCar:newCars) {
if (!newCar.equals(oldIt.next()) {
updatedCars.add(newCar);
}
}
return updatedCars;
}
From the set arithmetics, the sets A and B are equal iff A subsetequal B and B subsetequal A. So, in Java, given two collections A and B you can check their equality without respect to the order of the elements with
boolean collectionsAreEqual = A.containsAll(B) && B.containsAll(A);
Iterate over the first collection and add it into a Map<Entity, Integer> whereby Entity is the class being stored in your collection and the Integer represents the number of times it occurs.
Iterate over the second collection and, for each element attempt to look it up in the Map - If it exists then decrement the Integer value by one and perform any action necessary when a match is found. If the Integer value has reached zero then remove the (Entity, Integer) entry from the map.
This algorithm will run in linear time assuming you've implemented an efficient hashCode() method.
Slightly updated one considering null values:
static <T> boolean equals(Collection<T> lhs, Collection<T> rhs) {
boolean equals = false;
if(lhs!=null && rhs!=null) {
equals = lhs.size( ) == rhs.size( ) && lhs.containsAll(rhs) && rhs.containsAll(lhs);
} else if (lhs==null && rhs==null) {
equals = true;
}
return equals;
}
If not worried about cases like (2,2,3), (2,3,3):
static <T> boolean equals(Collection<T> lhs, Collection<T> rhs) {
return lhs.size( ) == rhs.size( ) && lhs.containsAll(rhs) && rhs.containsAll(lhs);
}
public static boolean isEqualCollection(java.util.Collection a,
java.util.Collection b)
Returns true if the given Collections contain exactly the same elements with exactly the same cardinalities.
That is, iff the cardinality of e in a is equal to the cardinality of e in b, for each element e in a or b.
Parameters:
the first collection, must not be null
the second
collection, must not be null
Returns:
true if the collections contain the same elements with the same cardinalities.