Compare Set & List in my case - java

I have a List<Integer> , this list contains duplicated elements:
//myList content is something like e.g. 1,2,1,3,3,6,7,...
List<Integer> myList = getNumbers();
I have also an Set<String> , as you all know, Set only contain unique elements, no duplicated one. My Set<String> contains String-type-integers:
//The Set content is String type integer , e.g. "1", "3", "5" …
Set<String> mySet = getNumSet();
I would like to compare mySet with myList to figure out what elements mySet has but myList doesn't have & remove those elements from mySet.
The way I do now is to use nested iteration like following:
for(Integer i : myList){
for(String s : mySet){
if(!myList.contains(Integer.valueOf(s))){
mySet.remove(s);
}
}
}
Is there more efficient way than mine to do it?

The easiest way may be using Collection#retainAll(Collection<?> c) which could be implements with some optimization in function of the collection's type.
mySet.retainAll(myList)
However mySet and myList must be Set<X> and List<X>. I advice you to change the declaration of mySet and fill it with something like Integer#valueOf(String s), then use the retainAll method.

for (Iterator<String> it = mySet.iterator(); it.hasNext();) {
if(myList.contains(Integer.parseInt(it.next()))){
it.remove();
}
}

Copy the set.
Iterate through the list, removing found items from
the copy.
Subtract the copy from the original set.
Suppose you have m elements in the set and n elements in the list.
Then removing items from the set is O(log m) and iterating through the list is O(n). The overall complexity is O(n × log m). The subtraction is at most O(m × log m), so the overall complexity is O(s × log s) where s = max(m, n).
The complexity of your algorithm is O(n2 × (log m)2), because you
iterate over the list (O(n)) and for each item
iterate through the set (O(log m)),
finding the corresponding item in the list (O(n)), and finally
removing it from the set if appropriate (O(log m)).

Actually more information is needed to find out the best approach.
If myList.size() >> mySet.size(), follow Oswald's solution, or
Create a new Set of String.
For each item in myList, if it's not in mySet, save it to the newSet.
call mySet.removeAll(newSet).
If mySet.size() >> myList.size(), the best solution would be convert myList to a new Set of String, and then call mySet.retainAll method.

Related

Remove items from a Set that are present in a List - Streams

I am trying to remove some objects from a Set (HashSet), but only if they are also present into a List (LinkedList).
How can I achieve this with Java 8+ features (streams).
Set<MyObject> theSet -> want to remove items that are present in the list
List<MyObject> theList
I have overriden the equals and hashcode for MyObject (use only a few fields for comparison of equality).
If your set is mutable, you could just remove all entries present in your list from the set like so:
theSet.removeAll(theList);
Suppose your set is immutable, then you can filter it with a stream resulting in a new set missing the entries present in the list like so:
var newSet = theSet.stream()
.filter(n -> !theList.contains(n))
.collect(Collectors.toSet());
For Java 11+ you could use a method reference for the filter predicate via composition:
var newSet = theSet.stream()
.filter(Predicate.not(theList::contains))
.collect(Collectors.toSet());
Little note on performance
Both approaches (removeAll and via streams) run in O(N * M) where N is the size of theSet and M the size of theList. It boils down to two nested for-loops.
An easy enhancement would be to turn theList into a set and driving the cost of the linear contains check down to an asymptotic runtime of O(1).
var numsToExclude = new HashSet<>(theList);
var newSet = theSet.stream()
.filter(Predicate.not(numsToExclude::contains))
.collect(Collectors.toSet());
You can take a look at the removeIf() method from the Collection interface. So you can say something like:
theSet.removeIf(theList::contains)
It will also return a boolean that indicates whether any elements were removed or not.
I doubt a stream is as fast as a simple loop consider the streams overhead. You are always going have to iterate over the entire list so I would do it like so. This presumes the original set is immutable.
List<Integer> list = List.of(1,2,5,8,9,10);
Set<Integer> set = Set.of(3,4,8,2,1);
Set<Integer> result = new HashSet<>(set);
for(int val : list) {
result.remove(val);
}
System.out.println("Before: " + set);
System.out.println("After: " + result);
prints
Before: [1, 8, 4, 3, 2]
After: [3, 4]
Since Sets can't hold duplicates, encountering duplicates in the removal list won't affect the outcome. So if you could gather those in a Set rather than a List it might offer some improvement.
Finally, your Object to be removed must override equals and hashCode for the above to work.

Check if some elements are present in a List in Java

I would like to check if multiple elements are present in a list, at the same time.
For example
List<Integer> output = Arrays.asList(1,2,3,4);
Instead of checking for occurrence of 1,2 and 3 in the list as
output.contains(1);
output.contains(2);
output.contains(3);
I would like to know if there is a way to check for all elements in a single line.
if (output.containsAll(Arrays.asList(1,2,3))) {
// Your Code
}
There is a method in Java for it. It's called containsAll() Take in mind that under the hood it's not faster than calling contains() for each of the elements. The algorithm speed is approximately O(n*m) where n and m are the sizes of both collections.
Create a new list for the elements which you wish yo check and then do
List<Integer> output = Arrays.asList(1,2,3,4);
List<Integer> results = Arrays.asList(1,2,3);
if (output.containsAll(results)) {
//do stuff
}

How to retrieve elements from sorted TreeSet using Binary Search?

I am trying to merge multiple sorted lists into one TreeSet.. And then I am thinking to apply Binary Search algorithm on that TreeSet to retrieve the element in O(log n) time complexity..
Below is my code in which I am passing List of Lists in in one of my method and combining them into TreeSet to avoid duplicacy... All the lists inside inputs are sorted -
private TreeSet<Integer> tree = new TreeSet<Integer>();
public void mergeMultipleLists(final List<List<Integer>> inputs) {
tree = new TreeSet<Integer>();
for (List<Integer> input : inputs) {
for(Integer ii : input) {
tree.add(ii);
}
}
}
public List<Integer> getItem(final Integer x) {
// extract elements from TreeSet in O(log n)
}
First of all, is this right way to merge multiple sorted lists into TreeSet? Is there any direct way to merge multiple sorted lists in TreeSet efficiently?
Secondly, how would I extract an element from that TreeSet in O(log n) time complexity? I would like to find an element x in that TreeSet, if it is there, then return it, if it is not there then return the next largest value from the TreeSet.
Or may be I am better off to another data structure as compared to which I am using currently?
UPDATED CODE:-
private TreeSet tree = new TreeSet();
public SearchItem(final List<List<Integer>> inputs) {
tree = new TreeSet<Integer>();
for (List<Integer> input : inputs) {
tree.addAll(input);
}
}
public Integer getItem(final Integer x) {
if(tree.contains(x)) {
return x;
} else {
// now how do I extract next largest
// element from it if x is not present
}
}
TreeSet is backed by a NavigableMap, a TreeMap specifically. Calling contains() on a TreeSet delegates to TreeMap.containsKey(), which is a binary search implementation.
You can check if an object is contained in the set by using TreeSet.contains(), but you have to have the object first. If you want to be able to look up and retrieve an object, then a Map implementation will be better.
You could use TreeSet.floor(), which according to the docs
Returns the greatest element in this set less than or equal to the given element, or null if there is no such element.
TreeSet, by it's nature is a sorted set and uses a red-tree-black-tree via TreeMap as it's backing
Basically: TreeSet.add(E) -> TreeMap.put(E,NULL);
As it is already a binary, sorted tree structure any 'get' or 'contains' will result in an O(log n) operation.
Your code and your question though don't line up.
You're flattening a List<List<Integer>> and just putting them all in to get all unique elements (or, at least, that's what this code will do).
But then your following method says "given this integer, give me a List<Integer>" which isn't achievable in the above code
So, let me answer your questions in order:
Sure/Yes Y
No. You misunderstand Sets (you can't extract by design) If you can do Set.contains(e)
then you HAVE the element and need not extract anything
If you need to do something like a "Set extraction" then use a TreeMap or turn your set back into a list and do myList.get(Collections.binarySearch(myElement));

how to calculate efficent how many elements in a list are the same?

I need to do the following task:
I have a list with items.
Each of the items also have a List with strings like "gkejgueieriug"
Now I need to run throw the list and check how many of the items in the list of each item are also in the current element
here is a small pseudeo code:
OneItem;
List AllItems;
for Item in AllItems:
int count = number strings in Item.Values which are also in OneItem.Values
because the data is very big, I need some help to make a efficent implementation.
How to do this? Should I use a hashmap? how to count the overlap?
Your question doesn't provide detailed information about the involved types which you want to compare. So I assume you have a List<Item>. Each item has a String and an own List<Item>
So first I would create a HashSet of the Strings of the Items in your AllItems-List. Iterate the AllList and add the String of each Item to the HashSet.
Then in the second step iterate the AllList again and iterate the List in the Items and check each String here if it is in the HashSet which was created before.
If you have to check this several times you can keep the HashSet as a cache which you refresh when the AllList gets changed.
// Step 1: Create Set of Strings
Set<String> allStrings = new HashSet<String>();
for (Item item : allList) {
allStrings.add(item.getString());
}
// Step 2: Calculate occurrences
for (Item item : allList) {
for (Item internalItem : item.getItems()) {
if (allStrings.contains(internalItem.getString()) {
// Count one up for this String
// This might be done by replacing the HashSet by a HashMap and use its values for counting
}
}
}
Make Item.Values a Set rather than a List. A decent Set implementation - like a HashSet - will run the contains() operation in constant time. Then iterate over one set and increment a count each time the other set contains the element.
An optimization is to always iterate over the smaller set. That way the counting operation is O(n) where n is the size of the smaller set.
If the comparison is only one way (i.e. only counting strings in one list that are also in another but NOT the other way around) then the best way of doing it would probably be to put both lists in a Set instead:
HashSet firstSet = ...
HashSet secondSet = ...
for(each value in firstSet)
{
if(secondSet.contains(value)
{
// Do what you want with the value.
// Sugestion: Add value to a separate set
// so you can track duplicates etc
}
}
With this code you create an ArrayList of Map with the string values and the number of matches in your OneItem.Values...
ArrayList<Map<String,Integer>> matches=new ArrayList<>();
for (Item i : AllItems) {
Map<String,Integer> map=new HashMap<>();
for(String s:values){
map.put(s,Collections.frequency(OneItem.Values, s));
}
matches.add(map);
}

Looking for an efficient way to find a sorted order from 2 lists

I have 2 set of unsorted integers: set A and set B. But we don't know how many items are there in setB in advance.
I need to :
while setA and setB are not empty:
pop the smallest no from setA
move an int from setB to setA
What is the most efficient way to do that in Java?
I am thinking
create an ArrayList for setA and LinkedList for setB
while (setA and setB are not empty)
sort(setA)
pop setA
remove an integer from setB and insert in setA
Is there a better way to do this in Java? I would like to remove the 'sort in the while loop' if possible.
TreeSet<Integer> setA = new TreeSet<Integer>(listA);
TreeSet<Integer> setB = new TreeSet<Integer>(listB);
while (!setA.isEmpty()) {
setA.remove(setA.first());
if (!setB.isEmpty()) {
Integer first = setB.first();
setB.remove(first);
setA.add(first);
}
}
Explanation: the TreeSet class maintains the set in a red-black tree that is ordered on the natural ordering of the set elements; i.e. the Integer.compareTo() method in this case. Adding or removing an element finds the appropriate place in the tree for the element, and then adds or removes it without the need to sort.
The isEmpty method is O(1), and the first, add and remove methods are all O(log N), where each has to be called O(N) times. Creating the initial treesets is also O(N log N). So the overall complexity is going to be O(N log N) where N is the total list size.
Here's what I understood from the question: We need one sorted collection that contains all elements from both setA and setB. Because setA and setB may contain equal items, we should use a list (preserves duplicates). If we don't want duplicates, just exchange ArrayList by TreeSet and remove the extra sorting.
I assume, that both sets contain the same type of elements - if not, we can still use Collections.sort but have to push a Comparator to the sort method that is capable of comparing different types of objects.
private Collection<Integer> combineAndSort(Set<Integer>...sets) {
Collection<Integer> sorted = new ArrayList<Integer>();
// Collection<Integer> sorted = new TreeSet<Integer>();
for(Set<Integer> set:sets) {
sorted.addAll(set);
}
Collections.sort(sorted); // obsolete if using TreeSet
}
If you are using Java 5 onwards, consider java.util.PriorityQueue:
Collection<Integer> setA = ...;
Collection<Integer> setB = ...;
if (setA.isEmpty()) {
// if A is empty, no need to go on.
return;
}
PriorityQueue<Integer> pq = new PriorityQueue<Integer>(setA);
Iterator<Integer> iterB = new LinkedList<Integer>(setB).iterator();
// no need to check if A is empty anymore: starting off non-empty,
// and for each element we remove, we move one over from B.
while (iterB.hasNext()) {
int smallest = pq.poll();
doStuffWithSmallest(smallest);
pq.add(iterB.next());
iterB.remove();
}
Note that in the above code, I've first wrapped the setB in a linked list to support efficient removal. If your setB supports efficient removal, you don't need to wrap it.
Also note that since you don't care about moving elements from the front of setB, an ArrayList can support very efficient removal:
private <T> T removeOne(ArrayList<T> array) throws IndexOutOfBoundsException {
return array.remove(array.size() - 1);
}

Categories

Resources