Java - matching two unordered lists - java

I have the following problem: I need to find pairs of the same elements in two lists, which are unordered. The thing about these two lists is that they are "roughly equal" - only certain elements are shifted by a few indexes e.g. (Note, these objects are not ints, I am just using integers in this example):
[1,2,3,5,4,8,6,7,10,9]
[1,2,3,4,5,6,7,8,9,10]
My first attempt would be to iterate through both lists and generate two HashMaps based on some unique key for each object. Then, upon the second pass, I would simply pull the elements from both maps. This yields O(2N) in space and time.
I was thinking about a different approach: we would keep pointers to the current element in both lists, as well as currentlyUnmatched set for each of the list. the pseudocode would be sth of the following sort:
while(elements to process)
elem1 = list1.get(index1)
elem2 = list2.get(index2)
if(elem1 == elem2){ //do work
... index1++;
index2++;
}
else{
//Move index of the list that has no unamtched elems
if(firstListUnmatched.size() ==0){
//Didn't find it also in the other list so we save for later
if(secondListUnamtched.remove(elem1) != true)
firstListUnmatched.insert(elem1)
index1++
}
else { // same but with other index}
}
The above probably does not work... I just wanted to get a rough idea what you think about this approach. Basically, this maintains a hashset on the side of each list, which size << problem size. This should be ~O(N) for small number of misplaced elements and for small "gaps". Anyway, I look forward to your replies.
EDIT: I cannot simply return a set intersection of two object lists, as I need to perform operations (multiple operations even) on the objects I find as matching/non-matching

I cannot simply return a set intersection of two object lists, as I need to perform operations (multiple operations even) on the objects I find as matching/non-matching
You can maintain a set of the objects which don't match. This will be O(M) in space where M is the largest number of swapped elements at any point. It will be O(N) for time where N is the number of elements.
interface Listener<T> {
void matched(T t1);
void onlyIn1(T t1);
void onlyIn2(T t2);
}
public static <T> void compare(List<T> list1, List<T> list2, Listener<T> tListener) {
Set<T> onlyIn1 = new HashSet<T>();
Set<T> onlyIn2 = new HashSet<T>();
for (int i = 0; i < list1.size(); i++) {
T t1 = list1.get(i);
T t2 = list2.get(i);
if (t1.equals(t2)) {
tListener.matched(t1);
continue;
}
if (onlyIn2.remove(t1))
tListener.matched(t1);
else
onlyIn1.add(t1);
if (!onlyIn1.remove(t2))
onlyIn2.add(t2);
}
for (T t1 : onlyIn1)
tListener.onlyIn1(t1);
for (T t2 : onlyIn2)
tListener.onlyIn2(t2);
}

If I have understood your question correctly, You can use Collection.retainAll and then iterate over collection that is been retained and do what you have to do.
list2.retainAll(list1);

All approaches based on maps will be O(n log(n)) at best, because creating the map is an insertion sort. The effect is to do an insertion sort on both, and then compare them, which is as good as it's going to get.
If the lists are nearly sorted to begin with, a sort step shouldn't take as long as the average case, and will scale with O(n log(n)), so just do a sort on both and compare. This allows you to step through and perform your operations on the items that match or do not match as appropriate.

Related

how containsall method from collections interface works in java?

if I have two lists of objects from Collection interface
list 1 = {John, Tim, Tom}
list 2 = {John, Tim}
and both of the lists are instances of ArrayList
how does Java knows if list2 is contained in list1 with list1.containsall(list2)?
I knows that Java uses contain method inside the implementation of containsall() method, and the contain method uses the equal() method. I understand the differences but I am not sure how Java iterates through the elements of list 1.
so If I use list1.containsAll(list2),, constainsAll() method is implemented with a loop that iterates through every object of in this case, list2, and throws false if one of the elements is not in list 1.
So my main question is how does JAVA know that list 1 contains all of the elements without another loop to iterate through the elements of list 1? Does java does the work internally or something?
I currently know that to do such a thing, I would have to use
for (int i = 0; i < list1.size(), i++)
list1.get(i).constainsAll(list2);
,,
that seems more logical to me taking into consideration that I would have to modified the code for containsAll to work correctly and also implement the method of get()
Maybe to answer this formally because I think it is a good Question.
The containsAll method iterates through the provided collection and performs the contains() method on each entry which also iterates through the other list being compared. See below extract from java code
public boolean containsAll(Collection<?> c) {
for (Object e : c)
if (!contains(e))
return false;
return true;
}
and
public boolean contains(Object o) {
Iterator<E> it = iterator();
if (o==null) {
while (it.hasNext())
if (it.next()==null)
return true;
} else {
while (it.hasNext())
if (o.equals(it.next()))
return true;
}
return false;
}
This makes this o(n^2) (Worst case scenario if the last values do not match or if the list actually matches) (Which is really bad, especially if you have big collections which you are comparing).
a better approach would be to do something like the following: (Obviously this needs to be adjusted if you are using objects or other collections apart from strings and do some null checks or something)
public boolean containsAllStrings(List<String> list1, List<String> List2) {
Map<String, String> list1Map = list1.stream().collect(Collectors.toMap(c -> c, c -> c));
return List2.stream().allMatch(list1Map::containsKey);
}
This way it Iterates a max number of 2n (one for adding items to map and one for comparing) times (n being the biggest list of the 2) and not n^2.
It may seem the same but hash maps are nice because they contain a pointer to the value in memory (Using the hashed value of the key) and do not iterate overall all the values, making accessing a value in a map always o(1). Which is optimal.
Obviously, there are tradeoffs between approaches like memory utilization, but for speed, this is the best approach.

Java LinkedList : remove from to to

I have a java.util.LinkedList containing data logically like
1 > 2 > 3 > 4 > 5 > null
and I want to remove elements from 2 to 4 and make the LinkedList like this
1 > 5 > null
In reality we should be able to achieve this in O(n) complexity considering you have to break chain at 2 and connect it to 5 in just a single operation.
In Java LinkedList I am not able to find any function which lets remove chains from linkedlist using from and to in a single O(n) operation.
It only provides me an option to remove the elements individually (Making each operation O(n)).
Is there anyway I can achieve this in just a single operation (Without writing my own List)?
One solution provided here solves the problem using single line of code, but not in single operation.
list.subList(1, 4).clear();
The question was more on algorithmic and performance. When I checked the performance, this is actually slower than removing the element one by one. I am guessing this solution do not actually remove an entire sublist in o(n) but doing that one by one for each element (each removal of O(n)). Also adding extra computation to take the sublist.
Average of 1000000 computations in ms:
Without sublist = 1414
With the provided sublist solution : = 1846**
The way to do it in one step is
list.subList(1, 4).clear();
as documented in the Javadoc for java.util.LinkedList#subList(int, int).
Having checked the source code, I see that this ends up removing the elements one at a time. subList is inherited from AbstractList. This implementation returns a List that simply calls removeRange on the backing list when you invoke clear on it. removeRange is also inherited from AbstractList and the implementation is
protected void removeRange(int fromIndex, int toIndex) {
ListIterator<E> it = listIterator(fromIndex);
for (int i=0, n=toIndex-fromIndex; i<n; i++) {
it.next();
it.remove();
}
}
As you can see, this removes the elements one at a time. listIterator is overridden in LinkedList, and it starts by finding the first node by following chains either by following links from the start of the list or the end (depending on whether fromIndex is in the first or second half of the list). This means that list.subList(i, j).clear() has time complexity
O(j - i + min(i, list.size() - i)).
Apart from the case when the you are better off starting from the end and removing the elements in reverse order, I am not convinced there is a solution that is noticeably faster. Testing the performance of code is not easy, and it is easy to be drawn to false conclusions.
There is no way of using the public API of the LinkedList class to remove all the elements in the middle in one go. This surprised me, as about the only reason for using a LinkedList rather than an ArrayList is that you are supposed to be able to insert and remove elements from the middle efficiently, so I thought this case worth optimising (especially as it's so easy to write).
If you absolutely need the O(1) performance that you should be able to get from a call such as
list.subList(1, list.size() - 1)).clear();
you will either have to write your own implementation or do something fragile and unwise with reflection like this:
public static void main(String[] args) {
LinkedList<Integer> list = new LinkedList<>();
for (int a = 0; a < 5; a++)
list.add(a);
removeRange_NEVER_DO_THIS(list, 2, 4);
System.out.println(list); // [0, 1, 4]
}
public static void removeRange_NEVER_DO_THIS(LinkedList<?> list, int from, int to) {
try {
Method node = LinkedList.class.getDeclaredMethod("node", int.class);
node.setAccessible(true);
Object low = node.invoke(list, from - 1);
Object hi = node.invoke(list, to);
Class<?> clazz = low.getClass();
Field nextNode = clazz.getDeclaredField("next");
Field prevNode = clazz.getDeclaredField("prev");
nextNode.setAccessible(true);
prevNode.setAccessible(true);
nextNode.set(low, hi);
prevNode.set(hi, low);
Field size = LinkedList.class.getDeclaredField("size");
size.setAccessible(true);
size.set(list, list.size() - to + from);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
To remove the middle elements in a single operation (method call) you could subclass java.util.LinkedList and then expose a call to List.removeRange(int, int):
list.removeRange(1, 4);
(Credit to the person who posted this answer then removed it. :)) However, even this method calls ListIterator.remove() n times.
I do not believe there is a way to remove n consecutive entries from a java.util.LinkedList without performing n operations under the hood.
In general removing n consecutive items from any linked list seems to require O(n) operations as one must traverse from the start index to the end index one item at a time - inherently - in order to find the next list entry in the modified list.

Fastest and optimized way to search for value in a List<T>

I have a List<Person> persons = new ArrayList<Person>(), the size of this list is 100+. I want to check whether a particular personID object is contained in this list or not. Currently I am doing it in this way :
for(Person person : persons)
{
for(Long pid : listOfIDs)
{
if(person.personid == pid)
{
// do somthing
}
else
{
// do somthing
}
} // end of inner for
}
But I don't want to traverse through the persons list for each element in listOfIDs. I thought of taking HashMap of Person with personid as the key and Person object as value. So that I can only traverse through listOfIDs and check for contains()
Is there any other way to do it?
Your implementation with nested loops will not scale well if the lists get long. The number of operations you will do is the product of the length of the two lists.
If at least one of your lists is sorted by ID, you can use binary search. This will be an improvement over nested loops.
Building a Map is a good idea and will scale well. Using this technique, you will iterate over the list of Persons once to build the map and then iterate over the list of IDs once to do the lookups. Make sure that you initialize the size of the HashMap with the number of Persons (so you don't have to rehash as you put the Persons into the Map). This is a very scalable option and does not require that either list be sorted.
If BOTH lists happen to be sorted by ID, you have another attractive alternative: jointly walk down the two lists. You will start at the beginning of both lists and move forward in the list with the smallest ID. If the IDs are equal, then you do your business logic for having found the person with that ID and step forward in both lists. As soon as you get to the end of either list, you are done.
Java's Collections provides a binary search which is very fast but it assumes you are searching for a member of the list. You could implement your own using your ID criteria:
Collections.sort(persons, (p1, p2) -> p1.personID - p2.personID);
if (binarySearch(persons, id)) {
...
}
boolean binarySearch(List<Person> personList, Long id) {
if (personList.empty())
return false;
long indexToTest = personList.size() / 2;
long idToTest = personList.get(indexToTest).personID;
if (idToTest < id)
return binarySearch(personList.subList(indexToTest + 1, personList.size());
else if (idToTest > id)
return binarySearch(personList.subList(0, indexToTest));
else
return true;
}
If you don't want to sort your list then you could copy it to a sorted list and search on that: for large lists that would still be much faster than iterating through it. In fact that's pretty similar to keeping a separate hash map (though a hash map could be faster).
If you must iterate, then you can at least use a parallel stream to take advantage of multiple cores if you have them:
if (persons.parallelStream().anyMatch(p -> p.personID == id)) {
...
}

How to find a missing element between two linked lists in O(n)?

I have two Singly Linked Lists of Integer. One of them is a subset of another (the order of numbers is different). What is the best way (regarding performance) to find a number which the first list does contain and the second one does not?
My thought is first to sort them (using merge sort) and then just compare element by element.
So, it takes O(nlogn+mlogm+n), but a better O(n) soltuion should exist.
This is O(n) solution both in Time and Space.
Logic
Lets say the original Linked List has size N we'll call it LL1 and second Linked List as LL2.
=> Prepare a Hasmap of size N, key would be the numbers in the LL1 and value would be frequency in LL2
HashMap<Integer,Integer> map= new HashMap<Integer,Integer>();
=> Start traversing LL1 and set the frequency to 0 for all the NumbersBy the time all values in LL1 is iterated, you have all the Numbers present in HashMap with frequency = 0
map.put(key, 0);
=> Now start looping through the LL2, pick the numbers using them as key and increment the value by 1.By the time all values in LL2 is iterated, you have all the common numbers present in both LL1 and LL1 inside HashMap havingfrequency > 0
map.put(key, map.get(key) + 1);
=> Now start traversing the hasmap, searching for value = 0, when found, print the key as this number present only in LL1 and not in LL2
for (map.Entry<Integer,Integer> entry : map.entrySet())
{
if(entry.getValue() == 0)
System.out.println(entry.getKey());//This is a loner
}
2 Iterations and O(n) memory with O(n) time.
You can put both of them in different maps and then compare them. Putting in a map should be 2 single for loops of m & n and look up time for map is 1.
HashSet is the best data structure to use in this case.
With this code, you can achieve your results in O(n).
Let me know if you have more conditions, i can suggest something accordingly.
public class LinkedList {
private ListNode head;
public ListNode getHead() {
return head;
}
}
public class ListNode {
public int value;
public ListNode next;
ListNode(int value) {
this.value = value;
}
}
public class UtilClass{
public static int checkLists(LinkedList list1, LinkedList list){
ListNode head = myList2.getHead();
HashSet<Integer> hashSet = new HashSet<Integer>();
while(head!=null){
hashSet.add(head.value);
head = head.next;
}
head = myList.getHead();
while(head!=null){
boolean b = hashSet.add(head.value);
if(b == true) return head.value;
head = head.next;
}
return -1111;
}
}
You can use removeAll method. All you have to do is create a method that accepts two lists, one is the original and the other is the sublist, then, return a list of missing elements:
List getMissing(List original, List sub){
original.removeAll(sub);
return original;
}
This runs in quadratic time though.
If you really want to force it to run in linear time, O(n), then you have to write custom class that wrap your inputs such that for each input, there is a flag that monitors whether or not it has been added to the sublist. You can also design a class that facilitates addition and deletion of elements while monitoring the contents of both lists.
Let N = m+n.
Add the lists. As they are linked lists, this is cheap O(1).
Sort them O(N log N) - maybe better have used ArrayList.
Walk the list and on not finding a consecutive pair {x, x} you have found a missing one, O(N),
as the second list is a subset.
So O(N . log N).
As the lists are not ordered, any speedup consists of something like sorting, and that costs. So O(N.log N) is fine.
If you want O(N) you could do it as follows (simplified, using positive numbers):
BitSet present = new BitSet(Integer.MAX_VALUE);
for (int value : sublist)
present.set(value);
for (int value : list)
if (!present.isSet(value)) {
System.out.println("Missing: " + value);
break;
}
This trades memory against time. Mind this answer might not be accepted, as the memory is 2MAX_VALUE which to initialize/clear costs time too.
The possible < O(N log N) solutions
The most intelligent answer might be (quasi) sorting cooperatively both lists. And during the sort detect the missing element. Something like picking a haphazard "median" element and shifting shifting indices to split the lists, and divide and conquer.
If the list sizes differ by 1
Then you would only need to make the sums for every list, the difference being the missing value: O(N).
Works with overflow.

How to retrieve elements from sorted TreeSet using Binary Search?

I am trying to merge multiple sorted lists into one TreeSet.. And then I am thinking to apply Binary Search algorithm on that TreeSet to retrieve the element in O(log n) time complexity..
Below is my code in which I am passing List of Lists in in one of my method and combining them into TreeSet to avoid duplicacy... All the lists inside inputs are sorted -
private TreeSet<Integer> tree = new TreeSet<Integer>();
public void mergeMultipleLists(final List<List<Integer>> inputs) {
tree = new TreeSet<Integer>();
for (List<Integer> input : inputs) {
for(Integer ii : input) {
tree.add(ii);
}
}
}
public List<Integer> getItem(final Integer x) {
// extract elements from TreeSet in O(log n)
}
First of all, is this right way to merge multiple sorted lists into TreeSet? Is there any direct way to merge multiple sorted lists in TreeSet efficiently?
Secondly, how would I extract an element from that TreeSet in O(log n) time complexity? I would like to find an element x in that TreeSet, if it is there, then return it, if it is not there then return the next largest value from the TreeSet.
Or may be I am better off to another data structure as compared to which I am using currently?
UPDATED CODE:-
private TreeSet tree = new TreeSet();
public SearchItem(final List<List<Integer>> inputs) {
tree = new TreeSet<Integer>();
for (List<Integer> input : inputs) {
tree.addAll(input);
}
}
public Integer getItem(final Integer x) {
if(tree.contains(x)) {
return x;
} else {
// now how do I extract next largest
// element from it if x is not present
}
}
TreeSet is backed by a NavigableMap, a TreeMap specifically. Calling contains() on a TreeSet delegates to TreeMap.containsKey(), which is a binary search implementation.
You can check if an object is contained in the set by using TreeSet.contains(), but you have to have the object first. If you want to be able to look up and retrieve an object, then a Map implementation will be better.
You could use TreeSet.floor(), which according to the docs
Returns the greatest element in this set less than or equal to the given element, or null if there is no such element.
TreeSet, by it's nature is a sorted set and uses a red-tree-black-tree via TreeMap as it's backing
Basically: TreeSet.add(E) -> TreeMap.put(E,NULL);
As it is already a binary, sorted tree structure any 'get' or 'contains' will result in an O(log n) operation.
Your code and your question though don't line up.
You're flattening a List<List<Integer>> and just putting them all in to get all unique elements (or, at least, that's what this code will do).
But then your following method says "given this integer, give me a List<Integer>" which isn't achievable in the above code
So, let me answer your questions in order:
Sure/Yes Y
No. You misunderstand Sets (you can't extract by design) If you can do Set.contains(e)
then you HAVE the element and need not extract anything
If you need to do something like a "Set extraction" then use a TreeMap or turn your set back into a list and do myList.get(Collections.binarySearch(myElement));

Categories

Resources