Finding and printing duplicates in a string Arraylist [duplicate]

Finding and printing duplicates in a string Arraylist [duplicate] - java

This question already has answers here:
Java: Detect duplicates in ArrayList?
(17 answers)
Closed 6 years ago.
Would anyone know the most efficient way to look for duplicates in a String ArrayList and print out the duplicates?
For example I have an ArrayList containing the following:
ArrayList<String> carLIST = new ArrayList<String>();
carLIST = {"Car1", "Car2", "Car3", "Car1", "Car2", "Car2"};
Basically anything in the list more than once I'm looking to find the duplicates (which I think I've done below) and also return a System.out.println(); to show the following:
Car1 : count=2
Car2 : count=3
Map<String,Integer> repeatationMap = new HashMap<String,Integer>();
for(String str : carLIST) {
if (repeatationMap.containsKey(str) {
repeatationMap.put(str,repeatationMap.get(str) +1);
}
else {
epeatationMap.put(str, 1);
}
// if (repeatationMap.get(str) > 1) {
// System.out.println(repeatationMap.entrySet());
// }
}
The code commented out is what I thought it would be to print out the duplicates but I'm seriously wrong! Have no idea how to print out the duplicate cars in the list and show its count.

Once you've done populating the map, you could iterate it and print only the entries with keys greater than 1:
for (Map.Entry<String, Integer> e : repeatationMap.entrySet()) {
if (e.getValue() > 1) {
System.out.println (e.getKey());
}
}
Note, BTW, that Java 8 allows you to do the entire counting and reduction flow in a single statement in a relatively elegant fashion:
List<String> duplicateCars =
carLIST.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet()
.stream()
.filter(e -> e.getValue() > 1)
.map(Map.Entry::getKey)
.collect(Collectors.toList());

The commented out section will print something unreadable object description. Instead use:
System.out.println(repeatationMap.get(str));
Also, to avoid printing the same string several time create a boolean set to track which strings have already been printed. In this way you avoid looping through the map again.
If you want to show number of duplicates, you will be forced to make a second loop anyways.

Related

Iterate over the "entrySet" instead of "keySet" [duplicate]

This question already has answers here:
Iterating over Key Set vs Iterating over Entry Set
(2 answers)
FindBugs warning: Inefficient use of keySet iterator instead of entrySet iterator
(5 answers)
Closed 28 days ago.
I'm scanning my code through SonarQube and it shows this code smell "Iterate over the "entrySet" instead of "keySet"". I tried, but I can't figure it out.
Sample code:
public Set<Date> getAccountShiftDate(Map<String, Set<String>> shiftDatesMap, List<Groups> shiftSchedule) {
// set that has the account shift dates
Set<Date> accountShiftDatesTemplate = new Hashset<>();
// iterate on accounts
for (String accounts : shiftDatesMap.keySet()) {
//get group of an account
Optional <Groups> shiftOptional = shiftList
.stream()
.filter(g -> StringUtils.equalsIgnoreCase(accounts,g.getLongName()))
.findFirst();
// ...
Can someone give me a reference to understand this.

If you iterate over shiftDatesMap.keySet() and later on call in your loop shiftDatesMap.get(accounts) you execute an unnecessary get operation for each entry.
Try to read an understand the description of the metioned code smell from SonarQube.
Instead you should use shiftDatesMap.entrySet() which gives you an Entry, i.e. as pair of the key and the value. So if you later on in your loop want to access the value for your given accounts key, you must only access the value from your entry, which is cheaper than calling a get operation on the map for each value.
Before:
for (String accounts : shiftDatesMap.keySet()) {
Set<String> shiftDates = shiftDatesMap.get(accounts); // unnecessary operation
}
After:
for (Map.Entry<String, Set<String>> entry: shiftDatesMap.entrySet()) {
String accounts = entry.getKey();
Set<String> shiftDates = entry.getValue(); // access value for free without 'get' on map
}

Java 8 nested streams - convert chained for loops

I'm currently playing around with Java 8 features .
I have the following piece of code, and tried out multiple ways to use Streams, but without success.
for (CheckBox checkBox : checkBoxList) {
for (String buttonFunction : buttonFunctionsList) {
if (checkBox.getId().equals(buttonFunction)) {
associatedCheckBoxList.add(checkBox);
}
}
}
I tried the following, but I am not sure is this correct or not:
checkBoxList.forEach(checkBox -> {
buttonFunctionsList.forEach(buttonFunction -> {
if (checkBox.getId().equals(buttonFunction))
associatedCheckBoxList.add(checkBox);
});
});
Thanks!

Eran's answer is probably fine; but since buttonFunctionList is (presumably) a List, there is a possibility of it containing duplicate elements, meaning that the original code would add the checkboxes to the associated list multiple times.
So here is an alternative approach: you are adding the checkbox to the list as many times as there are occurrences of that item's id in the other list.
As such, you can write the inner loop as:
int n = Collections.frequency(buttonFunctionList, checkBox.getId();
associatedCheckboxList.addAll(Collections.nCopies(checkBox, n);
Thus, you can write this as:
List<CheckBox> associatedCheckBoxList =
checkBoxList.flatMap(cb -> nCopies(cb, frequency(buttonFunctionList, cb.getId())).stream())
.collect(toList());
(Using static imports for brevity)
If either checkBoxList or buttonFunctionList is large, you might want to consider computing the frequencies once:
Map<String, Long> frequencies = buttonFunctionList.stream().collect(groupingBy(k -> k, counting());
Then you can just use this in the lambda as the n parameter of nCopies:
(int) frequencies.getOrDefault(cb.getId(), 0L)

You should prefer collect over forEach when your goal is to produce some output Collection:
List<CheckBox> associatedCheckBoxList =
checkBoxList.stream()
.filter(cb -> buttonFunctionsList.stream().anyMatch(bf -> cb.getId().equals(bf)))
.collect(Collectors.toList());

java 8, most efficient method to return duplicates from a list (not remove them)? [duplicate]

This question already has answers here:
How to select duplicate values from a list in java?
(13 answers)
Closed 5 years ago.
I have an ArrayList of Strings, and I want to find and return all values which exist more than once in the list. Most cases are looking for the opposite (removing the duplicate items like distinct()), and so example code is hard to come by.
I was able to come up with this:
public synchronized List<String> listMatching(List<String> allStrings) {
long startTime = System.currentTimeMillis();
List<String> duplicates = allStrings.stream().filter(string -> Collections.frequency(allStrings, string) > 1)
.collect(Collectors.toList());
long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;
LOG.info("Time for Collections.frequency(): "+ elapsedTime);
return duplicates;
}
But this uses Collections.frequency, which loops through the whole list for each item and counts every occurrence. This takes about 150ms to run on my current list of about 4,000 strings.
This is a bit slow for me and will only get worse as the list size increases. I took the frequency method and rewrote it to return immediately on the 2nd occurrence:
protected boolean moreThanOne(Collection<?> c, Object o) {
boolean found = false;
if (o != null) {
for (Object e : c) {
if (o.equals(e)) {
if (found) {
return found;
} else {
found = true;
}
}
}
}
return found;
}
and changed my method to use it:
public synchronized List<String> listMatching(List<String> allStrings) {
long startTime = System.currentTimeMillis();
List<String> duplicates = allStrings.stream().filter(string -> moreThanOne(allStrings, string))
.collect(Collectors.toList());
long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;
LOG.info("Time for moreThanOne(): "+ elapsedTime);
return duplicates;
}
This seems to work as expected, but does not really increase the speed as much as I was hoping, clocking in at about 120ms. This is probably due to it also needing to loop through the whole list for each item, but I am not sure how to avoid that and still accomplish the task.
I know this might seem like premature optimization, but my List can easily be 1mil+, and this method is a critical piece of my app that influences the timing of everything else.
Do you see any way that I could further optimize this code? Perhaps using some sort of fancy Predicate? An entirely different approach?
EDIT:
Thanks to all your suggestions, I was able to come up with something significantly faster:
public synchronized Set<String> listMatching(List<String> allStrings) {
Set<String> allItems = new HashSet<>();
Set<String> duplicates = allStrings.stream()
.filter(string -> !allItems.add(string))
.collect(Collectors.toSet());
return duplicates;
}
Running under the same conditions, this is able to go through my list in <5ms.
All the HashMap suggestions would have been great though, if I had needed to know the counts. Not sure why the Collections.frequency() method doesn't use that technique.

An easy way to find duplicates is to iterate over the list and use the add() method to add the item to some other temp set. It will return false if the item already exists in the set.
public synchronized List<String> listMatching(List<String> allStrings) {
Set<String> tempSet = new HashSet();
Set<String> duplicates = new HashSet();
allStrings.forEach( item -> {
if (!tempSet.add(item)) duplicates.add(item);
});
return duplicates;
}

A good way to make this really scalable is to build a Map that contains the count of each string. To build the map, you will look up each string in your list. If the string is not yet in the map, put the string and a count of one in the map. If the string is found in the map, increment the count.
You probably want to use some type that allows you to increment the count in-place, rather than having to put() the updated count each time. For example, you can use an int[] with one element.
The other advantage of not re-putting counts is that it is easy to execute in parallel, because you can synchronize on the object that contains your count when you want to read/write the count.
The non-parallel code might look something like this:
Map<String, int[]> map = new HashMap<>(listOfStrings.size());
for (String s: listOfStrings) {
int[] curCount = map.get(s);
if (curCount == null) {
curCount = new int[1];
curCount[0] = 1;
map.put(s, curCount);
} else {
curCount[0]++;
}
}
Then you can iterate over the map entries and do the right thing based on the count of each string.

Best data-structure will be Set<String>.
Add all elements from list in set.
Delete elements from set one by one traversing from list.
If element not found in set then it's duplicate in list. (Because it's already deleted)
this will take O(n)+O(n).
coding-
List<String> list = new ArrayList<>();
List<String> duplicates = new ArrayList<>();
list.add("luna");
list.add("mirana");
list.add("mirana");
list.add("mirana");
Set<String> set = new HashSet<>();
set.addAll(list);
for(String a:list){
if(set.contains(a)){
set.remove(a);
}else{
duplicates.add(a);
}
}
System.out.println(duplicates);
Output
[mirana, mirana]

Iterating and comparing big data set

Basically I receive a 2 big data lists from 2 different database, the list looks like this:
List 1:
=============
A000001
A000002
A000003
.
.
A999999
List 2:
=============
121111
000111
000003
000001
.
.
I need to compare two list and find out each data which is in List 1 is available in List 2 (after appending some standard key to it), so that and if it is available put it in 3rd list for further manipulation. As an example A000001 is available in List 1 as well as in List 2 (after appending some standard key to it) so I need to put it in 3rd list.
Basically I have this code, it does like this for each row in List 1, I'm iterating through all data in List 2 and doing comparison. (Both are array list)
List<String> list1 = //Data of list 1 from db
List<String> list2 = //Data of list 2 from db
for(String list1Item:list1) {
for(String list2Item:list2) {
String list2ItemAfterAppend = "A" + list2Item;
if(list1Item.equalsIgnoreCase(list2ItemAfterAppend)) {
//Add it to 3rd list
}
}
}
Yes, this logic works fine, but I feel this is not efficient way to iterate list. After putting timers, it's taking 13444 milliseconds on average for 2000x5000 list of data. My question is, is there any other logic you people can think of or suggest me to improve the performance of this code?
I hope I'm clear, if not please let me know if I can improve question.

You can order both list, then using only one loop iterate on both value, switching which index increments depending on which value is the biggest. Something like:
boolean isWorking = true;
Collections.sort(list1);
Collections.sort(list2);
int index1 = 0;
int index2 = 0;
while(isWorking){
String val1 = list1.get(index1);
String val2 = "A" + list2.get(index2);
int compare = val1.compareTo(val2)
if(compare == 0){
list3.add(val1);
index1++;
index2++;
}else if (compare > 0){
val2++;
}else{ // if(compare < 0)
val1++;
}
isWorking = !(index1 == list1.size() || index2 == list2.size() );
}
Be carefull about what kind of List you're using. The get(int i) on LinkedList is expensive, whereas it is not on an ArrayList. Also, you might want to save list1.size() and list2.size(), I dont't think it calcluates it everytime, but chek it. I'm not sure if it's really usefull/efficient, but you can initialise list3 with the size of the smallest of both list (taking into acount the loadFactor, look up for it), so list3 doesnt have to resize everytime.
The code above is not tested (maybe switch val1++ and val2++), but you get the idea. I believe it's faster than yours (because it's O(n+m) rather than O(n*m) but I'll let you see (both sort() and compareTo() will add some time compared to your method, but normally it shouldn't be too much). If you can, use your RDBMS to sort both list when you get them (so you don't have to do it in the Java code)

I think the problem is how big the list is and how much memory you have.
For me for under 1 million records, I will use a HashSet to make it faster.
Code may like:
Set<String> set1 = //Data of list 1 from db, when you get the data you make it a Set instead of a List. HashSet is enough for you to use.
List<String> list2 = //Data of list 2 from db
Then you just need to:
for(String list2Item:list2) {
if(set1.contains("A" + list2Item) {
}
}
Hope this can help you.

You can use intersection method from apache commons. Example:
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.List;
import org.apache.commons.collections4.CollectionUtils;
public class NewClass {
public static void main(String[] args) {
List<String> list1 = Arrays.asList("A000001","A000002","A000003");
List<String> list2 = Arrays.asList("121111","000111","000001");
List<String> list3 = new ArrayList<>();
list2.stream().forEach((s) -> {list3.add("A"+s);});
Collection<String> common = CollectionUtils.intersection(list1, list3);
}
}

You could try to use the Stream API for this, the code to create the new list with Streams is very concise and straightforward and probably very similar in performance:
List<String> list3 = list2.stream()
.map(s->"A"+s)
.filter(list1::contains)
.collect(Collectors.toList());
If the list are big, you could try to process the list in parallel and use multiple threads to process the list. This may or may not improve the performance. Doing some measures its important to check if processing the list in parallel is actually improving the performance.
To process the stream in parallel, you only need to call the method parallel on the stream:
List<String> list3 = list2.stream()
.parallel()
.map(s->"A"+s)
.filter(list1::contains)
.collect(Collectors.toList());

Your code is doing a lot of String manipulation, 'equalsIgnoreCase' convert the Characters to upper/lower case. This is being executed in your inner loop and the size of your list is 5000x2000, so the String manipulation is being done millions of times.
Ideally, get your Strings in either upper or lower case from the database and avoid the conversion inside the inner loop. If this is not possible, probably converting the case of the String at the beginning improves the performance.
Then, you could create a new list with the elements of one of the lists and keep all the elements present in the other list, the code with the uppercase conversion could be:
list1.replaceAll(String::toUpperCase);
List<String> list3 = new ArrayList<>(list2);
list3.replaceAll(s->"A"+s.toUpperCase());
list3.retainAll(list1);

How to delete from a list, while modifying the list [duplicate]

This question already has answers here:
Calling remove in foreach loop in Java [duplicate]
(11 answers)
Closed 9 years ago.
I am trying to create a huffman tree, and am in the middle of attempting to merge two trees. I can not figure out how to remove a Tree in my program without getting the "concurrent Modification Exception" because I am iterating over a list and attempting to remove from the list at the same time.
BinaryTree<Character, Integer> t1 = null;
BinaryTree<Character, Integer> t2 = null;
BinaryTree<Character, Integer> tFinal = null;
int treeSize = TREES.size();
for (int i = 0; i < treeSize; i++) {
for (BinaryTree<Character, Integer> t : TREES) {
System.out.println("treeSize " + treeSize);
System.out.println(t.getRoot().getElement()
+ " t.getRoot().getElement()");
// here I edited the merge function in Binary Tree to set
// the new root
// to have null value for value, and itemTwo for weight
System.out.println(t.getRoot().getValue() + " weight of tree \n");
t1 = t;
TREES.remove(t);
}
for (BinaryTree<Character, Integer> t : TREES){
t2 = t;
System.out.println(t);
}
int weight = t1.getRoot().getElement() + t2.getRoot().getElement();
tFinal.merge(null, weight, t1, t2);
}

Java prevents you from modifying collections in a loop. You will need to use an iterator.

If you want to modify the list while iterating over it, you need to use Iterator.
Below are some SO questions answering this:
Iterating through a Collection, avoiding ConcurrentModificationException when removing in loop
How to modify a Collection while iterating using for-each loop without ConcurrentModificationException?

Your code doesn't compile, so we're limited in the way we can help you. But in general, the way you resolve this issue is to use an Iterator instead of a foreach loop.
For example, this gives a concurrent modification exception:
List<String> l = new ArrayList<String>(asList("a", "b", "c"));
for (String s : l) {
l.remove(s);
}
But this doesn't, and it gives you the result you'd want:
List<String> l = new ArrayList<String>(asList("a", "b", "c"));
for (Iterator<String> iterator = l.iterator(); iterator.hasNext(); ) {
String s = iterator.next();
iterator.remove();
}
System.out.println(l.size());
The latter will output "0".

A solution is to store in another list the items you want to remove, and then, after iterating, remove them.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Finding and printing duplicates in a string Arraylist [duplicate] - java

Related

Iterate over the "entrySet" instead of "keySet" [duplicate]

Java 8 nested streams - convert chained for loops

java 8, most efficient method to return duplicates from a list (not remove them)? [duplicate]

Iterating and comparing big data set

How to delete from a list, while modifying the list [duplicate]

Categories

Resources