Removing duplicate elements & count repetitions in ArrayList

Removing duplicate elements & count repetitions in ArrayList - java

This is more difficult than I expected. I have a sorted ArrayList of Strings (words), and my task is to remove the repetitions and print out a list of each word, followed by the number of the word's repetitions. Suffice it to say that it's more complex than I expected. After trying different things, I decided to use a HashMap to store the words (key), value(repetitions).
This is the code. Dictionary is the sorted ArrayList and Repetitions that HashMap.
public void countElements ()
{
String word=dictionary.get(0);
int wordCount=1;
int count=dictionary.size();
for (int i=0;i<count;i++)
{
word=dictionary.get(i);
for (int j=i+1; j<count;j++)
{
if(word.equals(dictionary.get(j)))
{
wordCount=wordCount+1;
repetitions.put(word, wordCount);
dictionary.remove(j--);
count--;
}
}
}
For some reason that I do not understand (I'm a beginner), after I call the dictionary.remove(j--) method, variable j decrements by 1, even though it should be i+1. What am I missing? Any ideas on how to do this properly would be appreciated. I know that it would be best to use an iterator, but that can become even more confusing.
Many thanks.

A version which uses streams:
final Map<String, Long> countMap = dictionary.stream().collect(
Collectors.groupingBy(word -> word, LinkedHashMap::new, Collectors.counting()));
System.out.println("Counts follow");
System.out.println(countMap);
System.out.println("Duplicate-free list follows");
System.out.println(countMap.keySet());
Here we group (using Collectors.groupingBy) the elements of the list using each element (i.e. each word) as a key in the resulting map, and counting this word occurrences (using Collectors.counting()).
Outer collector (groupingBy) uses counting collector as a downstream collector that collects (here, counts) all the occurrences of a single word.
We're using LinkedHashMap here to build the map because it maintains the order in which key-value pairs were added to it as we want to maintain the same order that words had in your initial list.
And one more thing: countMap.keySet() is not a List. If you want to get a List in the end, it's just new ArrayList(countMap.keySet()).

This code will serve your purpose. Now dictionary would contain the unique words and hashmap would contain the frequency count of each word.
public class newq {
public static void main(String[] args)
{
ArrayList<String> dictionary=new ArrayList<String>();
dictionary.add("hello");
dictionary.add("hello");
dictionary.add("asd");
dictionary.add("qwet");
dictionary.add("qwet");
HashMap<String,Integer> hs=new HashMap<String,Integer>();
int i=0;
while(i<dictionary.size())
{
String word=dictionary.get(i);
if(hs.containsKey(word)) // check if word repeated
{
hs.put(word, hs.get(word)+1); //if repeated increase the count
dictionary.remove(i); // remove the word
}
else
{
hs.put(word, 1); //not repeated
i++;
}
}
Iterator it = hs.entrySet().iterator();
while(it.hasNext())
{
HashMap.Entry pair = (HashMap.Entry)it.next();
System.out.println(pair.getKey() + " = " + pair.getValue());
it.remove();
}
for(String word: dictionary)
{
System.out.println(word);
}
}
}

If you don't want 'j' to decrement you should use j-1.
Using j--, --j, j++, or ++j changes the value of the variable.
This link has a good explanation and simple examples about post- en pre-incrementing.

Related

ConcurrentModificationException during putting new element into HashMap

I have some code:
Map<String, Integer> letters = new HashMap<String, Integer>();
letters.put(String.valueOf(input.charAt(0)),
numberOfLettersInWord(input,input.charAt(0)));
for (int i = 0; i < input.length(); i++) {
for (String key : letters.keySet()) {
if (!letters.containsKey(String.valueOf(input.charAt(i)))) {
letters.put(String.valueOf(input.charAt(i)),
numberOfLettersInWord(input,input.charAt(i)));
} else continue;
System.out.println(letters);
}
System.out.println(1);
}
System.out.println(2);
The main idea in the code - there is some word in String input(not empty, not null, with no non-letter symbols), need to count how many times each letter can be found there. Counting works OK (in the numberOfLettersInWord method) but when I try to add letters and digits to HashMap<String, Integer> some problems happens - it adds all letters and their numbers correctly but some error pops up. For this code it will show:
1
1
{a=4, b=4}
1
1
1
1
{a=4, b=4, c=3}
Exception in thread "main" java.util.ConcurrentModificationException
at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1579)
at java.base/java.util.HashMap$KeyIterator.next(HashMap.java:1602)
at LetterCounter.count(LetterCounter.java:25)
at LetterCounter.main(LetterCounter.java:11)
Process finished with exit code 1
From what I see there is something happens when there are no new letters to be added. Can you explain why this happens and how to solve this?
It supposed to have some more digit outputs after the {a=4, b=4, c=3} was shown but it ends with the exception (it is not really necessary, just an indicator where it stops working...)
The word used in this run was String input = "aabbabcccba";
numberOfLettersInWord returns Integer value of how many times letter input.charAt(i) was found in word input(this works ok)
line 2 in code fragment was used just to make the HashMap contain at least one line (null and empty checks already done by this moment and work well)
I saw people had problems with hashmap.remove() in Why is a ConcurrentModificationException thrown and how to debug it but I am not sure this is the same-same thing that can be solved with that answer.
Also I am not sure this answer is applicable for my case ConcurrentModificationException - HashMap

ok, i think i solved it:
Map<String, Integer> letters = new HashMap<String, Integer>();
letters.put(String.valueOf(input.charAt(0)),numberOfLettersInWord(input,input.charAt(0)));
for(int i = 0; i < input.length(); i++) {
letters.putIfAbsent(String.valueOf(input.charAt(i)),numberOfLettersInWord(input,input.charAt(i)));
}
i removed some extra code and it started work, even all tests passed

Why the ConcurrentModificationException?
You're getting a ConcurrentModificationException because you are structurally modifying the map while iterating its key set.
Documentation
Here's what the documentation of HashMap says on the subject:
The iterators returned by all of this class's "collection view methods" are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Those "collection view methods" it mentions are the following:
HashMap#keySet(), which returns a Set<K>.
HashMap#values(), which returns a Collection<V>.
HashMap#entrySet(), which returns a Set<Map.Entry<K, V>>.
For-Each Loops
If you aren't aware, a for-each loop uses an Iterator behind the scenes. In other words, something like this:
List<String> list = List.of("one", "two", "three");
for (String element : list) {
System.out.println(element);
}
Is compiled down to:
List<String> list = List.of("one", "two", "three");
for (Iterator<String> iterator = list.iterator(); iterator.hasNext(); ) {
String element = iterator.next();
System.out.println(element);
}
Your Code
You have a for-each loop iterating over the key set of your map. Inside this for-each loop you have a call to put, which is a structurally-modifying operation, on the same map.
for (String key : letters.keySet()) {
if (!letters.containsKey(String.valueOf(input.charAt(i)))) {
letters.put(String.valueOf(input.charAt(i)),
numberOfLettersInWord(input,input.charAt(i)));
} else continue;
System.out.println(letters);
}
Thus, a ConcurrentModificationException is likely to be thrown. In your case it's all but guaranteed.
Solution
You are apparently trying to count the frequencies of each letter in a string. This does not require you to loop over the key set of the map. The fact you don't actually use the key variable anywhere inside the for-each loop is a good indicator of this. This means you can simply get rid of the for-each loop and your code should work just fine.
Map<String, Integer> letters = new HashMap<String, Integer>();
letters.put(String.valueOf(input.charAt(0)), numberOfLettersInWord(input,input.charAt(0)));
for (int i = 0; i < input.length(); i++) {
if (!letters.containsKey(String.valueOf(input.charAt(i)))) {
letters.put(String.valueOf(input.charAt(i)), numberOfLettersInWord(input,input.charAt(i)));
}
}
Note that call to put if the map does not already contain the key could be replaced with a call to computeIfAbsent. That method takes the key and a Function that computes the value if the key is not already contained in the map (or if the key is currently mapped to null). It would look something like this:
Map<String, Integer> letters = new HashMap<String, Integer>();
letters.put(String.valueOf(input.charAt(0)), numberOfLettersInWord(input,input.charAt(0)));
for (int i = 0; i < input.length(); i++) {
letters.computeIfAbsent(String.valueOf(input.charAt(i)), key -> numberOfLettersInWord(input, key));
}
Note: The second argument the above computeIfAbsent call is a Function implemented via a lambda expression.
Potential Improvements
There may be a couple of improvements you could make to your code.
Change Key Type to Character
Given you're counting the frequency of characters, it would make sense to represent that in the code by using a Map<Character, Integer> instead of a Map<String, Integer>.
Count as You Go
I can only assume that numberOfLettersInWord loops over the input string and counts how many times the given character occurs in said string. This means you loop over the string for each character in the string, resulting in an inefficient algorithm. Though you do have optimization where you only compute the frequency of a character if you haven't already done so for that character, so that improves things a little.
However, you're already looping over all the characters in the input string. You might as well count the frequency of each character as you go. It could look something like:
String input = ...;
Map<Character, Integer> frequencies = new HashMap<>();
for (int i = 0; i < input.length(); i++) {
Character key = input.charAt(i);
Integer value = frequencies.get(key);
if (value == null) {
frequencies.put(key, 1);
} else {
frequencies.put(key, value + 1);
}
}
Use compute to Count
The body of that for loop can be replaced with a call to compute:
String input = ...;
Map<Character, Integer> frequencies = new HashMap<>();
for (int i = 0; i < input.length(); i++) {
frequencies.compute(input.charAt(i), (key, value) -> {
if (value == null) {
return 1;
} else {
return value + 1;
}
});
}
And that lambda expression (implementing a BiFunction) can be "simplified" even more:
(key, value) -> value == null ? 1 : value + 1
Use merge to Count
Another option is to use merge:
frequencies.merge(input.charAt(i), 1, Integer::sum);
Note: The Integer::sum is a method reference implementing a BiFunction.

letters.keySet() is returning a set which is backed by the keys of the HashMap, which you then alter by calling put(). So the conflict here is between the keySet and the keys of the map. You would need to use an iterator, or extract the keys into a separate collection, by copying the keySet each time through the outer loop. Honestly, the algorithm is sounding kind of expensive, though I haven't really tried to work out a better approach...

How to check multiple contains operations faster?

I have a String list as below. I want to do some calculations based on if this list has multiple elements with same value.
I got nearly 120k elements and when I run this code it runs too slow. Is there any faster approach than contains method?
List<String> words= getWordsFromDB(); //words list has nearly 120k elements
List<String> tempWordsList = new LinkedList<String>(); //empty list
String[] keys = getKeysFromDB();
List<String> tempKeysList = new LinkedList<String>();
for (int x = 0; x < words.size(); x++) {
if (!tempWordsList.contains(words.get(x))) {
tempWordsList.add(words.get(x));
String key= keys[x];
tempKeysList.add(key);
} else {
int index = tempWordsList.indexOf(words.get(x));
String m = tempKeysList.get(index);
String n = keys[x];
if (!m.contains(n)) {
String newWord = m + ", " + n;
tempKeysList.set(index, newWord);
}
}
}
EDIT: words list comes from database and problem is there is a service continuously updating and inserting data to this table. I don't have any access to this service and there are other applications who is using the same table.
EDIT2: I have updated for full code.

You are searching the list twice per word: once for contains() and once for indexOf(). You could replace contains() by indexOf(), test the result for -1, otherwise reuse the result instead of calling indexOf() again. But you are certainly using the wrong data structure. What exactly do you need a for? Do you need a? I would use a HashSet, or a HashMap if you need to associate other data with each word.

//1) if you can avoid using linked list use below solution
List<String> words= getWordsFromDB(); //words list has nearly 120k elements
//if you can avoid using linked list, use set instead
Set<String> set=new HashSet<>();
for (String s:words) {
if (!set.add(s)) {
//do some calculations
}
}
//2) if you can't avoid using linked list use below code
List<String> words= getWordsFromDB(); //words list has nearly 120k elements
List<String> tempList = new LinkedList<String>(); //empty list
//if you can't avoid LinkedListv (tempList) you need to use a set
Set<String> set=new HashSet<>();
for (String s:words) {
if (set.add(s)) {
tempList.add(s);
} else {
int a = tempList.indexOf(s);
//do some calculations
}
}

LinkedList.get() runs in O(N) time. Either use ArrayList with O(1) lookup time, or avoid indexed lookups altogether by using an iterator:
for (String word : words) {
if (!tempList.contains(word)) {
tempList.add(word);
} else {
int firstIndex = tempList.indexOf(word);
//do some calculations
}
}
Disclaimer: The above was written under the questionable assumption that words is a LinkedList. I would still recommend the enhanced-for loop, since it's more conventional and its time complexity is not implementation-dependent. Either way, the suggestion below still stands.
You can further improve by replacing tempList with a HashMap. This will avoid the O(N) cost of contains() and indexOf():
Map<String, Integer> indexes = new HashMap<>();
int index = 0;
for (String word : words) {
Integer firstIndex = indexes.putIfAbsent(word, index++);
if (firstIndex != null) {
//do some calculations
}
}
Based on your latest update, it looks like you're trying to group "keys" by their corresponding "word". If so, you might give streams a spin:
List<String> words = getWordsFromDB();
String[] keys = getKeysFromDB();
Collection<String> groupedKeys = IntStream.range(0, words.size())
.boxed()
.collect(Collectors.groupingBy(
words::get,
LinkedHashMap::new, // if word order is significant
Collectors.mapping(
i -> keys[i],
Collectors.joining(", "))))
.values();
However, as mentioned in the comments, it would probably be best to move this logic into your database query.

Acutally, tempList use linear complexity time methods :
if (!tempList.contains(words.get(x))) {
and
int a = tempList.indexOf(words.get(x));
It means that at each invocation of them, the list is in average iterate at half.
Besides, these are redundant.
indexOf() only could be invoked :
for (int x = 0; x < words.size(); x++) {
int indexWord = tempList.indexOf(words.get(x));
if (indexWord != -1) {
tempList.add(words.get(x));
} else {
//do some calculations by using indexWord
}
}
But to improve all accesses, you should change your structure : wrapping or replacing LinkedList by LinkedHashSet.
LinkedHashSet would keep the actual behavior because as List, it defines the iteration ordering, which is the order in which elements were inserted into the set but it also uses hashing feature to improve time access to its elements.

Java: matching ArrayList strings to an iterator, and incrementing the integers of a different ArrayList at the same index

noob here, so sorry if I say anything dumb.
I'm comparing strings in an ArrayList to an iterator of strings in an iterator of Sets. When I find a match, I want to grab the index of matched string in the ArrayList and increment that same index in a different ArrayList of integers. I have something that looks (to me) like it should work, but after this code runs, my integer ArrayList contains mostly -1 with a few 2,1, and 0.
I'm interested in fixing my code first, but I'd also be interested different approaches, so here's the larger picture: I have a map where the keys are usernames in a social network, and the values are sets usernames of people they follow. I need to return a list of all usernames in descending order of followers. In the code below I'm only trying to make an ArrayList of strings (that contains ALL the usernames in the map) that correspond with a different ArrayList of integers like:
usernamesList ... numberOfFollowers
theRealJoe ... 7
javaNovice ... 3
FakeTinaFey ... 3
etc
Map<String, Set<String>> map = new HashMap<String, Set<String>>();
//edit: this map is populated. It's a parameter of the method I'm trying to write.
List<String> usernamesList = new ArrayList<String>();
//populate usernamesList with all strings in map
Iterator<Set<String>> setIter = map.values().iterator();
Iterator<String> strIter;
int strIterIndex = 0;
int w = 0;
List<Integer> numOfFollowers = new ArrayList<Integer>();
//initialize all elements to 0. not sure if necessary
for (int i = 0; i < usernamesList.size(); i++) {
numOfFollowers.add(0);
}
while (setIter.hasNext()) {
Set<String> currentSetIter = setIter.next();
strIter = currentSetIter.iterator();
while (strIter.hasNext()) {
String currentstrIter = strIter.next();
if (usernamesList.contains(currentstrIter)) {
strIterIndex = usernamesList.indexOf(currentstrIter);
numOfFollowers.set(strIterIndex, numOfFollowers.indexOf(strIterIndex) +1);
w++;
System.out.println("if statement has run " + w + " times." );
} else {
throw new RuntimeException("Should always return true. all usernames from guessFollowsGraph should be in usernamesList");
}
}
}

I think everyhing looks ok, except this one:
numOfFollowers.set(strIterIndex, numOfFollowers.indexOf(strIterIndex) +1);
When you do numOfFollowers.indexOf, you are looking for the index of an element that has a value strInterIndex. What you want, is the value (follower count) of an element with index strIterIndex:
numOfFollowers.set(strIterIndex, numOfFollowers.get(strIterIndex) +1);
I would also suggest using int[] (array) instead of a list of indices. It would be faster and more straightforward.
Oh, one more thing: correct the "fake" constructors please, they won't work since there is no "new" keyword after the assignment...

Fastest way to find substring in JAVA

lets say i have list of names.
ArrayList<String> nameslist = new ArrayList<String>();
nameslist.add("jon");
nameslist.add("david");
nameslist.add("davis");
nameslist.add("jonson");
and this list contains few thousands nameslist in it. What is the fastes way to know that this list contains names start with given name.
String name = "jon"
result should be 2.
I have tried with comparing every element of list with substring function (it works but) it is very slow specially when list is huge.
Thanks is advance.

You could use a TreeSet for O(log n) access and write something like:
TreeSet<String> set = new TreeSet<String>();
set.add("jon");
set.add("david");
set.add("davis");
set.add("jonson");
set.add("henry");
Set<String> subset = set.tailSet("jon");
int count = 0;
for (String s : subset) {
if (s.startsWith("jon")) count++;
else break;
}
System.out.println("count = " + count);
which prints 2 as you expect.
Alternatively, you could use Set<String> subset = set.subSet("jon", "joo"); to return the full list of al names that start with "jon", but you need to give the first invalid entry that follows the jons (in this case: "joo").

Have a look at Trie. It's a data structure aimed to perform fast searches according to word prefixes. You may need to manipulate it a bit in order to get the number of leafs in the subtree, but in any case you do not traverse the entire list.

The complexity of searching in ArrayList (or linear array) is O(n), where n is number of elements in array.
For best performance you can see Trie

Iterate on the ArrayList, for each element, check if it begins with jon. Time complexity is O(n).

What exactly does "very slow" mean?
Really the only way to do this is to loop through the list and check every element:
int count = 0;
for (String name : nameslist) {
if (name.startsWith("jon")) {
count++;
}
}
System.out.println("Found: " + count);

If your strings in list are not too long you can use this cheat: store in HashSet all prefixes and your complexity will be ~O(1):
// Preprocessing
List<String> list = Arrays.asList("hello", "world"); // Your list
Set<String> set = new HashSet<>()
for(String s: list) {
for (int i = 1; i <= s.length; i++) {
set.add(s.substring(0, i));
}
}
// Now you want to test
assert true == set.contains("wor")
If it is not, you can use any full text search engine like Apache Lucene

I'd suggest you to create a Runnable for processing the list elements. Then you create an ExecutorService with fixed pool size, which processes the elements concurrently.
Rough example:
ExecutorService executor = Executors.newFixedThreadPool(5);
for (String str : coll){
Runnable r = new StringProcessor(str);
executor.execute(r);
}

I suggest TreeSet.
similar way access every element and increment count. alogorithm wise you can improve performance.
int count = 0;
iter = list.iterator();
String name;
while(iter.hasNext()) {
name = iter.next();
if (name.startsWith("jon")) {
count++;
}
if(name.startsWith("k")) break;
}
This break eliminates the checking of rest of string comparisons.

You can consider Boyer–Moore string search algorithm.
complexity O(n+m) worst case.

You need to iterate each name and find the name within it.
String name = "jon";
int count=0;
for(String n:nameslist){
if(n.contains(name){
count++;
}
}

Counting occurrences of words in an array

I've been working on something which takes a stream of characters, forms words, makes an array of the words, then creates a vector which contains each unique words and the number of times it occurs (basically a word counter).
Anyway I've not used Java in a long time, or much programming to be honest and I'm not happy with how this currently looks. The part I have which makes the vector looks ugly to me and I wanted to know if I could make it less messy.
int counter = 1;
Vector<Pair<String, Integer>> finalList = new Vector<Pair<String, Integer>>();
Pair<String, Integer> wordAndCount = new Pair<String, Integer>(wordList.get(1), counter); // wordList contains " " as first word, starting at wordList.get(1) skips it.
for(int i= 1; i<wordList.size();i++){
if(wordAndCount.getLeft().equals(wordList.get(i))){
wordAndCount = new Pair<String, Integer>(wordList.get(i), counter++);
}
else if(!wordAndCount.getLeft().equals(wordList.get(i))){
finalList.add(wordAndCount);
wordAndCount = new Pair<String, Integer>(wordList.get(i), counter=1);
}
}
finalList.add(wordAndCount); //UGLY!!
As a secondary question, this gives me a vector with all the words in alphabetical order (as in the array). I want to have it sorted by occurrence, the alphabetical within that.
Would the best option be:
Iterate down the vector, testing each occurrence int with the one above, using Collections.swap() if it was higher, then checking the next one above (as its now moved up 1) and so on until it's no longer larger than anything above it. Any occurrence of 1 could be skipped.
Iterate down the vector again, testing each element against the first element of the vector and then iterating downwards until the number of occurrences is lower and inserting it above that element. All occurrences of 1 would once again be skipped.
The first method would doing more in terms of iterating over the elements, but the second one requires you to add and remove components of the vector (I think?) so I don't know which is more efficient, or whether its worth considering.

Why not use a Map to solve your problem?
String[] words // your incoming array of words.
Map<String, Integer> wordMap = new HashMap<String, Integer>();
for(String word : words) {
if(!wordMap.containsKey(word))
wordMap.put(word, 1);
else
wordMap.put(word, wordMap.get(word) + 1);
}
Sorting can be done using Java's sorted collections:
SortedMap<Integer, SortedSet<String>> sortedMap = new TreeMap<Integer, SortedSet<String>>();
for(Entry<String, Integer> entry : wordMap.entrySet()) {
if(!sortedMap.containsKey(entry.getValue()))
sortedMap.put(entry.getValue(), new TreeSet<String>());
sortedMap.get(entry.getValue()).add(entry.getKey());
}
Nowadays you should leave the sorting to the language's libraries. They have been proven correct with the years.
Note that the code may use a lot of memory because of all the data structures involved, but that is what we pay for higher level programming (and memory is getting cheaper every second).
I didn't run the code to see that it works, but it does compile (copied it directly from eclipse)

re: sorting, one option is to write a custom Comparator which first examines the number of times each word appears, then (if equal) compares the words alphabetically.
private final class PairComparator implements Comparator<Pair<String, Integer>> {
public int compareTo(<Pair<String, Integer>> p1, <Pair<String, Integer>> p2) {
/* compare by Integer */
/* compare by String, if necessary */
/* return a negative number, a positive number, or 0 as appropriate */
}
}
You'd then sort finalList by calling Collections.sort(finalList, new PairComparator());

How about using google guava library?
Multiset<String> multiset = HashMultiset.create();
for (String word : words) {
multiset.add(word);
}
int countFoo = multiset.count("foo");
From their javadocs:
A collection that supports order-independent equality, like Set, but may have duplicate elements. A multiset is also sometimes called a bag.
Simple enough?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Removing duplicate elements & count repetitions in ArrayList - java

If you don't want 'j' to decrement you should use j-1. Using j--, --j, j++, or ++j changes the value of the variable. This link has a good explanation and simple examples about post- en pre-incrementing.

Related

ConcurrentModificationException during putting new element into HashMap

How to check multiple contains operations faster?

Java: matching ArrayList strings to an iterator, and incrementing the integers of a different ArrayList at the same index

Fastest way to find substring in JAVA

Counting occurrences of words in an array

Categories

Resources