print only repeated words in java - java

I want to display only the words that appear more than once in a string, single appearance of string should not be printed. Also i want to print strings whose length is more than 2 (to eliminate is,was,the etc)..
The code which I tried..prints all the strings and shows is occurrence number..
Code:
public static void main(String args[])
{
Map<String, Integer> wordcheck = new TreeMap<String, Integer>();
String string1="world world is new world of kingdom of palace of kings palace";
String string2[]=string1.split(" ");
for (int i=0; i<string2.length; i++)
{
String string=string2[i];
wordcheck.put(string,(wordcheck.get(string) == null?1: (wordcheck.get(string)+1)));
}
System.out.println(wordcheck);
}
Output:
{is=1, kingdom=1, kings=1, new=1, of=3, palace=2, world=3}
single appearance of string should not be printed...
also i want to print strings whose length is more than 2 (to eliminate is,was,the etc)..

Use it
for (String key : wordcheck.keySet()) {
if(wordcheck.get(key)>1)
System.out.println(key + " " + wordcheck.get(key));
}

Keeping track of the number of occurrences in a map will allow you to do this.
import java.util.HashMap;
import java.util.Map.Entry;
import java.util.Set;
public class Test1
{
public static void main(String[] args)
{
String string1="world world is new world of kingdom of palace of kings palace";
String string2[]=string1.split(" ");
HashMap<String, Integer> uniques = new HashMap<String, Integer>();
for (String word : string2)
{
// ignore words 2 or less characters long
if (word.length() <= 2)
{
continue;
}
// add or update the word occurrence count
Integer existingCount = uniques.get(word);
uniques.put(word, (existingCount == null ? 1 : (existingCount + 1)));
}
Set<Entry<String, Integer>> uniqueSet = uniques.entrySet();
boolean first = true;
for (Entry<String, Integer> entry : uniqueSet)
{
if (entry.getValue() > 1)
{
System.out.print((first ? "" : ", ") + entry.getKey() + "=" + entry.getValue());
first = false;
}
}
}
}

To get only the words occurring more then once, you have to filter your map.
Depending on your Java version you can use either this:
List<String> wordsOccuringMultipleTimes = new LinkedList<String>();
for (Map.Entry<String, Integer> singleWord : wordcheck.entrySet()) {
if (singleWord.getValue() > 1) {
wordsOccuringMultipleTimes.add(singleWord.getKey());
}
}
or starting with Java 8 this equivalent Lambda expression:
List<String> wordsOccuringMultipleTimes = wordcheck.entrySet().stream()
.filter((entry) -> entry.getValue() > 1)
.map((entry) -> entry.getKey())
.collect(Collectors.toList());
Regarding the nice printing, you have to do something similar while iterating over your result.

Use the below code
for (String key : wordcheck.keySet()) {
if(wordcheck.get(key)>1)
System.out.println(key + " " + wordcheck.get(key));
}

public static void main(String args[])
{
Map<String, Integer> wordcheck = new TreeMap<String, Integer>();
String string1="world world is new world of kingdom of palace of kings palace";
String string2[]=string1.split(" ");
HashSet<String> set = new HashSet<String>();
for (int i=0; i<string2.length; i++)
{
String data=string2[i];
for(int j=0;j<string2.length;j++)
{
if(i != j)
{
if(data.equalsIgnoreCase(string2[j]))
{
set.add(data);
}
}
}
}
System.out.println("Duplicate word size :"+set.size());
System.out.println("Duplicate words :"+set);
}

TreeMap.toString() is inherited from AbstractMap and the documentation states that
Returns a string representation of this map. The string representation consists of a list of key-value mappings in the order returned by the map's entrySet view's iterator, enclosed in braces ("{}"). Adjacent mappings are separated by the characters ", " (comma and space). Each key-value mapping is rendered as the key followed by an equals sign ("=") followed by the associated value. Keys and values are converted to strings as by String.valueOf(Object).
So better you write your own method that prints out the TreeMap in a way you want.

Related

count and print same number of occurences in String in java

Need help to sort and efficiently print the same occurrences of the words in the below string.
Here is the occurrence for the string: {java=2, occurences=1, program=3, sample=1, the=2}
Expected output:
java=2,occurences=1,sample=1,the=2
String str = "sample program java program the occurences the java program";
String[] inputstr = str.split(" ");
TreeMap<String,Integer> map = new TreeMap<>();
for(String input: inputstr) {
if(map.containsKey(input)) {
int value = map.get(input);
map.put(input,value+1);
} else {
map.put(input,1);
}
}
You can simply convert the above code to a single line using java-8
Map<String, Long> countMap = Arrays.stream(inputstr)
.collect(Collectors.groupingBy(Object::toString, Collectors.counting()));
EDIT :
We need to find values in the map that have an occurrence of more than one. Achieved so using the following code :
// inversed the map using "google-guava.jar"
Multimap<Long, String> inverseMap = HashMultimap.create();
for (Entry<String, Long> entry : countMap .entrySet()) {
inverseMap.put(entry.getValue(), entry.getKey());
}
for (Entry<Long, Collection<String>> entry : inverseMap.asMap().entrySet()) {
// split the values into an array
Object[] split = entry.getValue().stream().toArray();
if (split != null && split.length > 1) {
for (int j = 0; j < split.length; j++) {
System.out.println(String.valueOf(split[j]) + " : "
+ countMap.get(String.valueOf(split[j])));
}
}
}

Sorting string occurrences from text file

I have stored strings from a file into an ArrayList, and used a HashSet to count the number of occurrences of each string.
I am looking to list the top 5 words and their number of occurrences. I should be able to accomplish this w/o implementing a hashtable, treemap, etc. How can I go about achieving this?
Here is my ArrayList:
List<String> word_list = new ArrayList<String>();
while (INPUT_TEXT1.hasNext()) {
String input_word = INPUT_TEXT1.next();
word_list.add(input_word);
}
INPUT_TEXT1.close();
int word_list_length = word_list.size();
System.out.println("There are " + word_list_length + " words in the .txt file");
System.out.println("\n\n");
System.out.println("word_list's elements are: ");
for (int i = 0; i<word_list.size(); i++) {
System.out.print(word_list.get(i) + " ");
}
System.out.println("\n\n");
Here is my HashSet:
Set<String> unique_word = new HashSet<String>(word_list);
int number_of_unique = unique_word.size();
System.out.println("unique worlds are: ");
for (String e : unique_word) {
System.out.print(e + " ");
}
System.out.println("\n\n");
String [] word = new String[number_of_unique];
int [] freq = new int[number_of_unique];
int count = 0;
System.out.println("Frequency counts : ");
for (String e : unique_word) {
word[count] = e;
freq[count] = Collections.frequency(word_list, e);
System.out.println(word[count] + " : "+ freq[count] + " time(s)");
count++;
}
Could it be that I am overthinking a step? Thanks in advance
You can do this using HashMap (holds with unique word as key and frequency as value) and then sorting the values in the reverse order as explained in the below steps:
(1) Load the word_list with the words
(2) Find the unique words from word_list
(3) Store the unique words into HashMap with unique word as key and frequency as value
(4) Sort the HashMap with value (frequency)
You can refer the below code:
public static void main(String[] args) {
List<String> word_list = new ArrayList<>();
//Load your words to the word_list here
//Find the unique words now from list
String[] uniqueWords = word_list.stream().distinct().
toArray(size -> new String[size]);
Map<String, Integer> wordsMap = new HashMap<>();
int frequency = 0;
//Load the words to Map with each uniqueword as Key and frequency as Value
for (String uniqueWord : uniqueWords) {
frequency = Collections.frequency(word_list, uniqueWord);
System.out.println(uniqueWord+" occured "+frequency+" times");
wordsMap.put(uniqueWord, frequency);
}
//Now, Sort the words with the reverse order of frequency(value of HashMap)
Stream<Entry<String, Integer>> topWords = wordsMap.entrySet().stream().
sorted(Map.Entry.<String,Integer>comparingByValue().reversed()).limit(5);
//Now print the Top 5 words to console
System.out.println("Top 5 Words:::");
topWords.forEach(System.out::println);
}
Using java 8 and putting all code in one block.
Stream<Map.Entry<String,Long>> topWords =
words.stream()
.map(String::toLowerCase)
.collect(groupingBy(identity(), counting()))
.entrySet().stream()
.sorted(Map.Entry.<String, Long> comparingByValue(reverseOrder())
.thenComparing(Map.Entry.comparingByKey()))
.limit(5);
Iterate over stream
topWords.forEach(m -> {
System.out.print(m.getKey() + " : "+ m.getValue() + "time(s)");
});

To print the first biggest and second biggest elements in a string

Below is the code I have implemented. My doubt here is: when I am trying to print the first biggest and second Biggest values in the string, the output I get is in the order of [second biggest, first biggest].
Here is the output of what I got for the below code:
The output of the map is: real--6
The output of the map is: to--2
The output of the map is: world--1
The output of the map is: hello--0
The list after insertion is: [to, real]
The list inserted as [biggest,secondBiggest] after calling main is: [to, real]
......
but, I want The list after insertion to be: [real, to].
public class ReadString {
static String input = "This is a real project with real code to do real things to solve real problems in real world real";
public static void main(String[] args) {
List<String> lst = ReadString.RepeatedString("This is a real project with real "
+ "code to do real things to solve real " + "problems in real world real");
System.out.println("The list inserted as [biggest,secondBiggest] after calling main is: " + lst);
}
public static List<String> RepeatedString(String s) {
String[] s2 = input.split(" ");
String[] key = { "real", "to", "world", "hello" };
int count = 0;
Integer biggest = 0;
Integer secondBiggest = 1;
Map<String, Integer> map = new HashMap<String, Integer>();
for (int j = 0; j < key.length; j++) {
count = 0;
for (int i = 0; i < s2.length; i++) {
if (s2[i].equals(key[j])) {
count++;
}
}
map.put(key[j], count);
System.out.println("The output of the map is: " +key[j] + "--" + count);
}
/*
* To find the top two most repeated values.
*/
List<Integer> values = new ArrayList<Integer>(map.values());
Collections.sort(values);
for (int n : map.values()) {
if (biggest < n) {
secondBiggest = biggest;
biggest = n;
} else if (secondBiggest < n)
secondBiggest = n;
}
/* To get the top most repeated strings. */
List<String> list = new ArrayList<String>();
for (String s1 : map.keySet()) {
if (map.get(s1).equals(biggest))
list.add(s1);
else if (map.get(s1).equals(secondBiggest))
list.add(s1);
}
System.out.println("The list after insertion is: " +list);
return list;
}
}
The problem appears to be when you are adding items to the list. As you are iterating through the map.keySet(), there is no guarantee that you will get the biggest item first. The smallest change I would make would be to add the biggest item first in the list.
for (String s1 : map.keySet()) {
if (map.get(s1).equals(biggest))
list.add(0, s1);
else if (map.get(s1).equals(secondBiggest))
list.add(s1);
}
This way, if secondBiggest is added first, biggest will be at the top of the list.
We can simplify your approach quite a bit if we extract the word and count into a simple POJO. Something like,
static class WordCount implements Comparable<WordCount> {
String word;
int count;
WordCount(String word, int count) {
this.word = word;
this.count = count;
}
#Override
public int compareTo(WordCount o) {
return Integer.compare(count, o.count);
}
}
Then we can use that in repeatedString. First, count the words in the String; then build a List of WordCount(s). Sort it (since it's Comparable it has natural ordering). Then build the List to return by iterating the sorted List of WordCount(s) in reverse (for two items). Like,
static List<String> repeatedString(String s) {
Map<String, Integer> map = new HashMap<>();
for (String word : s.split("\\s+")) {
map.put(word, !map.containsKey(word) ? 1 : 1 + map.get(word));
}
List<WordCount> al = new ArrayList<>();
for (Map.Entry<String, Integer> entry : map.entrySet()) {
al.add(new WordCount(entry.getKey(), entry.getValue()));
}
Collections.sort(al);
List<String> ret = new ArrayList<>();
for (int i = al.size() - 1; i >= al.size() - 2; i--) {
ret.add(al.get(i).word);
}
return ret;
}
Finally, your main method should use your static input (or static input should be removed)
static String input = "This is a real project with real code to do "
+ "real things to solve real problems in real world real";
public static void main(String[] args) {
List<String> lst = repeatedString(input);
System.out.println("The list inserted as [biggest,"
+ "secondBiggest] after calling main is: " + lst);
}
And I get (as requested)
The list inserted as [biggest,secondBiggest] after calling main is: [real, to]
If you are only concerned about biggest and secondbiggest,
you can refer to the code below.
Instead of creating the list directly, I created an array, added required elements on specified positions. (This way it becomes more readable)
and finally convert the array to a list.
/* To get the top most repeated strings. */
String[] resultArray = new String[2];
for (String s1 : map.keySet()) {
if (map.get(s1).equals(biggest))
resultArray[0]=s1;
else if (map.get(s1).equals(secondBiggest))
resultArray[1]=s1;
}
List<String> list = Arrays.asList(resultArray);

Sort the words and letters in Java

The code below counts how many times the words and letters appeared in the string. How do I sort the output from highest to lowest? The output should be like:
the - 2
quick - 1
brown - 1
fox - 1
t - 2
h - 2
e - 2
b - 1
My code:
import java.util.HashMap;
import java.util.Map;
import java.util.StringTokenizer;
public class Tokenizer {
public static void main(String[] args) {
int index = 0;
int tokenCount;
int i = 0;
Map<String, Integer> wordCount = new HashMap<String, Integer>();
Map<Integer, Integer> letterCount = new HashMap<Integer, Integer>();
String message = "The Quick brown fox the";
StringTokenizer string = new StringTokenizer(message);
tokenCount = string.countTokens();
System.out.println("Number of tokens = " + tokenCount);
while (string.hasMoreTokens()) {
String word = string.nextToken().toLowerCase();
Integer count = wordCount.get(word);
Integer lettercount = letterCount.get(word);
if (count == null) {
// this means the word was encountered the first time
wordCount.put(word, 1);
} else {
// word was already encountered we need to increment the count
wordCount.put(word, count + 1);
}
}
for (String words : wordCount.keySet()) {
System.out.println("Word : " + words + " has count :" + wordCount.get(words));
}
for (i = 0; i < message.length(); i++) {
char c = message.charAt(i);
if (c != ' ') {
int value = letterCount.getOrDefault((int) c, 0);
letterCount.put((int) c, value + 1);
}
}
for (int key : letterCount.keySet()) {
System.out.println((char) key + ": " + letterCount.get(key));
}
}
}
You have a Map<String, Integer>; I'd suggest something along the lines of another LinkedHashMap<String, Integer> which is populated by inserting keys that are sorted by value.
It seems that you want to sort the Map by it's value (i.e., count). Here are some general solutions.
Specifically for your case, a simple solution might be:
Use a TreeSet<Integer> to save all possible values of counts in the HashMap.
Iterate the TreeSetfrom high to low.
Inside the iteration mentioned in 2., use a loop to output all word-count pairs with count equals to current iterated count.
Please see if this may help.
just use the concept of the list and add all your data into list and then use sort method for it

Using HashMap to count instances

I have the following code to count the instances of different strings in an array;
String words[] = {"the","cat","in","the","hat"};
HashMap<String,Integer> wordCounts = new HashMap<String,Integer>(50,10);
for(String w : words) {
Integer i = wordCounts.get(w);
if(i == null) wordCounts.put(w, 1);
else wordCounts.put(w, i + 1);
}
Is this a correct way of doing it? It seems a bit long-winded for a simple task. The HashMap result is useful to me because I will be indexing it by the string.
I am worried that the line
else wordCounts.put(w, i + 1);
could be inserting a second key-value pair due to the fact that
new Integer(i).equals(new Integer(i + 1));
would be false, so two Integers would end up under the same String key bucket, right? Or have I just over-thought myself into a corner?
Your code will work - but it would be simpler to use HashMultiset from Guava.
// Note: prefer the below over "String words[]"
String[] words = {"the","cat","in","the","hat"};
Multiset<String> set = HashMultiset.create(Arrays.asList(words));
// Write out the counts...
for (Multiset.Entry<String> entry : set.entrySet()) {
System.out.println(entry.getElement() + ": " + entry.getCount());
}
Yes you are doing it correct way. HashMap replaces values if same key is provided.
From Java doc of HashMap#put
Associates the specified value with the specified key in this map. If the map previously contained a mapping for the key, the old value is replaced.
Your code is perfectly fine. You map strings to integers. Nothing is duplicated.
HashMap don't allow duplicate keys, so there is no way to have more than one SAME key-value pairs in your map.
Here is a String-specific counter that should be genericized and have a sort by value option for toString(), but is an object-oriented wrapper to the problem, since I can't find anything similar:
package com.phogit.util;
import java.util.Map;
import java.util.HashMap;
import java.lang.StringBuilder;
public class HashCount {
private final Map<String, Integer> map = new HashMap<>();
public void add(String s) {
if (s == null) {
return;
}
Integer i = map.get(s);
if (i == null) {
map.put(s, 1);
} else {
map.put(s, i+1);
}
}
public int getCount(String s) {
if (s == null) {
return -1;
}
Integer i = map.get(s);
if (i == null) {
return -1;
}
return i;
}
public String toString() {
if (map.size() == 0) {
return null;
}
StringBuilder sb = new StringBuilder();
// sort by key for now
Map<String, Integer> m = new TreeMap<String, Integer>(map);
for (Map.Entry pair : m.entrySet()) {
sb.append("\t")
.append(pair.getKey())
.append(": ")
.append(pair.getValue())
.append("\n");;
}
return sb.toString();
}
public void clear() {
map.clear();
}
}
Your code looks fine to me and there is no issue with it. Thanks to Java 8 features it can be simplified to:
String words[] = {"the","cat","in","the","hat"};
HashMap<String,Integer> wordCounts = new HashMap<String,Integer>(50,10);
for(String w : words) {
wordCounts.merge(w, 1, (a, b) -> a + b);
}
the follwowing code
System.out.println("HASH MAP DUMP: " + wordCounts.toString());
would print out.
HASH MAP DUMP: {cat=1, hat=1, in=1, the=2}

Categories

Resources