Sorting string occurrences from text file

Sorting string occurrences from text file - java

I have stored strings from a file into an ArrayList, and used a HashSet to count the number of occurrences of each string.
I am looking to list the top 5 words and their number of occurrences. I should be able to accomplish this w/o implementing a hashtable, treemap, etc. How can I go about achieving this?
Here is my ArrayList:
List<String> word_list = new ArrayList<String>();
while (INPUT_TEXT1.hasNext()) {
String input_word = INPUT_TEXT1.next();
word_list.add(input_word);
}
INPUT_TEXT1.close();
int word_list_length = word_list.size();
System.out.println("There are " + word_list_length + " words in the .txt file");
System.out.println("\n\n");
System.out.println("word_list's elements are: ");
for (int i = 0; i<word_list.size(); i++) {
System.out.print(word_list.get(i) + " ");
}
System.out.println("\n\n");
Here is my HashSet:
Set<String> unique_word = new HashSet<String>(word_list);
int number_of_unique = unique_word.size();
System.out.println("unique worlds are: ");
for (String e : unique_word) {
System.out.print(e + " ");
}
System.out.println("\n\n");
String [] word = new String[number_of_unique];
int [] freq = new int[number_of_unique];
int count = 0;
System.out.println("Frequency counts : ");
for (String e : unique_word) {
word[count] = e;
freq[count] = Collections.frequency(word_list, e);
System.out.println(word[count] + " : "+ freq[count] + " time(s)");
count++;
}
Could it be that I am overthinking a step? Thanks in advance

You can do this using HashMap (holds with unique word as key and frequency as value) and then sorting the values in the reverse order as explained in the below steps:
(1) Load the word_list with the words
(2) Find the unique words from word_list
(3) Store the unique words into HashMap with unique word as key and frequency as value
(4) Sort the HashMap with value (frequency)
You can refer the below code:
public static void main(String[] args) {
List<String> word_list = new ArrayList<>();
//Load your words to the word_list here
//Find the unique words now from list
String[] uniqueWords = word_list.stream().distinct().
toArray(size -> new String[size]);
Map<String, Integer> wordsMap = new HashMap<>();
int frequency = 0;
//Load the words to Map with each uniqueword as Key and frequency as Value
for (String uniqueWord : uniqueWords) {
frequency = Collections.frequency(word_list, uniqueWord);
System.out.println(uniqueWord+" occured "+frequency+" times");
wordsMap.put(uniqueWord, frequency);
}
//Now, Sort the words with the reverse order of frequency(value of HashMap)
Stream<Entry<String, Integer>> topWords = wordsMap.entrySet().stream().
sorted(Map.Entry.<String,Integer>comparingByValue().reversed()).limit(5);
//Now print the Top 5 words to console
System.out.println("Top 5 Words:::");
topWords.forEach(System.out::println);
}

Using java 8 and putting all code in one block.
Stream<Map.Entry<String,Long>> topWords =
words.stream()
.map(String::toLowerCase)
.collect(groupingBy(identity(), counting()))
.entrySet().stream()
.sorted(Map.Entry.<String, Long> comparingByValue(reverseOrder())
.thenComparing(Map.Entry.comparingByKey()))
.limit(5);
Iterate over stream
topWords.forEach(m -> {
System.out.print(m.getKey() + " : "+ m.getValue() + "time(s)");
});

Related

To print the first biggest and second biggest elements in a string

Below is the code I have implemented. My doubt here is: when I am trying to print the first biggest and second Biggest values in the string, the output I get is in the order of [second biggest, first biggest].
Here is the output of what I got for the below code:
The output of the map is: real--6
The output of the map is: to--2
The output of the map is: world--1
The output of the map is: hello--0
The list after insertion is: [to, real]
The list inserted as [biggest,secondBiggest] after calling main is: [to, real]
......
but, I want The list after insertion to be: [real, to].
public class ReadString {
static String input = "This is a real project with real code to do real things to solve real problems in real world real";
public static void main(String[] args) {
List<String> lst = ReadString.RepeatedString("This is a real project with real "
+ "code to do real things to solve real " + "problems in real world real");
System.out.println("The list inserted as [biggest,secondBiggest] after calling main is: " + lst);
}
public static List<String> RepeatedString(String s) {
String[] s2 = input.split(" ");
String[] key = { "real", "to", "world", "hello" };
int count = 0;
Integer biggest = 0;
Integer secondBiggest = 1;
Map<String, Integer> map = new HashMap<String, Integer>();
for (int j = 0; j < key.length; j++) {
count = 0;
for (int i = 0; i < s2.length; i++) {
if (s2[i].equals(key[j])) {
count++;
}
}
map.put(key[j], count);
System.out.println("The output of the map is: " +key[j] + "--" + count);
}
/*
* To find the top two most repeated values.
*/
List<Integer> values = new ArrayList<Integer>(map.values());
Collections.sort(values);
for (int n : map.values()) {
if (biggest < n) {
secondBiggest = biggest;
biggest = n;
} else if (secondBiggest < n)
secondBiggest = n;
}
/* To get the top most repeated strings. */
List<String> list = new ArrayList<String>();
for (String s1 : map.keySet()) {
if (map.get(s1).equals(biggest))
list.add(s1);
else if (map.get(s1).equals(secondBiggest))
list.add(s1);
}
System.out.println("The list after insertion is: " +list);
return list;
}
}

The problem appears to be when you are adding items to the list. As you are iterating through the map.keySet(), there is no guarantee that you will get the biggest item first. The smallest change I would make would be to add the biggest item first in the list.
for (String s1 : map.keySet()) {
if (map.get(s1).equals(biggest))
list.add(0, s1);
else if (map.get(s1).equals(secondBiggest))
list.add(s1);
}
This way, if secondBiggest is added first, biggest will be at the top of the list.

We can simplify your approach quite a bit if we extract the word and count into a simple POJO. Something like,
static class WordCount implements Comparable<WordCount> {
String word;
int count;
WordCount(String word, int count) {
this.word = word;
this.count = count;
}
#Override
public int compareTo(WordCount o) {
return Integer.compare(count, o.count);
}
}
Then we can use that in repeatedString. First, count the words in the String; then build a List of WordCount(s). Sort it (since it's Comparable it has natural ordering). Then build the List to return by iterating the sorted List of WordCount(s) in reverse (for two items). Like,
static List<String> repeatedString(String s) {
Map<String, Integer> map = new HashMap<>();
for (String word : s.split("\\s+")) {
map.put(word, !map.containsKey(word) ? 1 : 1 + map.get(word));
}
List<WordCount> al = new ArrayList<>();
for (Map.Entry<String, Integer> entry : map.entrySet()) {
al.add(new WordCount(entry.getKey(), entry.getValue()));
}
Collections.sort(al);
List<String> ret = new ArrayList<>();
for (int i = al.size() - 1; i >= al.size() - 2; i--) {
ret.add(al.get(i).word);
}
return ret;
}
Finally, your main method should use your static input (or static input should be removed)
static String input = "This is a real project with real code to do "
+ "real things to solve real problems in real world real";
public static void main(String[] args) {
List<String> lst = repeatedString(input);
System.out.println("The list inserted as [biggest,"
+ "secondBiggest] after calling main is: " + lst);
}
And I get (as requested)
The list inserted as [biggest,secondBiggest] after calling main is: [real, to]

If you are only concerned about biggest and secondbiggest,
you can refer to the code below.
Instead of creating the list directly, I created an array, added required elements on specified positions. (This way it becomes more readable)
and finally convert the array to a list.
/* To get the top most repeated strings. */
String[] resultArray = new String[2];
for (String s1 : map.keySet()) {
if (map.get(s1).equals(biggest))
resultArray[0]=s1;
else if (map.get(s1).equals(secondBiggest))
resultArray[1]=s1;
}
List<String> list = Arrays.asList(resultArray);

Finding Unique Words In A Text File Using ArrayList

I'm working on a project where I enter a URL, the file is read and the amount of lines, characters, and words are outputted in a text file. I'm not having an issue with that. Code below will be pretty long, sorry in advance.
I also have to output to the same text file all of the words in the file, and the amount of times each word is displayed in the file. I've been working on it for a while and I've gotten to the point where all the lines/characters/words are outputted to the text file, but I can't figure out how to display the actual words and the amount of times they are in the file.
String[] wordSubstrings = line.replaceAll("\\s+", " ").split(" ");
List<String> uniqueWords = new ArrayList<String>();
for (int i = 0; i < wordSubstrings.length; i++) {
if (!(uniqueWords.contains(wordSubstrings[i]))) {
uniqueWords.add(wordSubstrings[i]);

You could use a Multiset
Multiset<String> words = HashMultiset.create();
for (String word : wordList)
words.add(word);
for (String word : words.elementSet())
System.out.println(word + ": " + words.count(word));

I've tested something with a HashMap which seems to work pretty well.
Here is my code that I used to test it, I hope it helps:
String[] wordSubstrings = new String[]{"test","stuff","test","thing","test","test","stuff"};
HashMap<String,Integer> uniqueWords = new HashMap<>();
for ( int i = 0; i < wordSubstrings.length; i++)
{
if(!(uniqueWords.containsKey(wordSubstrings[i])))
{
uniqueWords.put(wordSubstrings[i], 1);
}
else
{
int number = uniqueWords.get(wordSubstrings[i]);
uniqueWords.put(wordSubstrings[i],number + 1);
}
}
for (Map.Entry<String, Integer> entry : uniqueWords.entrySet()) {
String key = entry.getKey();
int value = entry.getValue();
//Do Something with the key and value
}

You can use arraylist of class which will contain word and count as member variables.
List <MyClass> uniqueWords = new ArrayList<MyClass> ();
MyClass()
{
String uniqueword;
int count;
}

How to take for loop output to separate array in Java?

I need to take following code output to separate array for use another calculation. In this example output is,
z count is= 1
y count is= 3
x count is= 2.
I need to take 1, 3, 2 to separate array. How I do it? I am new to Java.
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;
public class ForArrays {
public static void main(String[] args) {
ArrayList<String> name = new ArrayList<String>();
HashMap<String, Integer> termFreqMap = new HashMap<String, Integer>();
ArrayList<String> words = new ArrayList<String>();
HashMap<String, Integer> wordFreqMap = new HashMap<String, Integer>();
name.add("x");
name.add("y x");
name.add("y z y");
// count words
for (int j = 0; j < name.size(); j++) {
String tempname = name.get(j);
String[] result = tempname.split(" ");
for (String s : result) {
Integer twf = wordFreqMap.get(s);
if (twf == null)
wordFreqMap.put(s, new Integer(1));
else {
wordFreqMap.put(s, new Integer(++twf));
}
}
}
for (Map.Entry<String, Integer> entry : wordFreqMap.entrySet()) {
String tempWord = entry.getKey();
Integer wf = entry.getValue();
System.out.println(tempWord + " " + "count is= " + wf);
}
}
}

Create a int array with the size of the Hashmap so that both hashmap size and array size will be similar. And while iterating the Hashmap save the result in array.
The catch with this is hashmap doesn't have insertion order so the array will not be in insertion order.So you may not know the count is for what key.For that you can create one keys array and put the keys also there for one to one mapping of keys array and countarray.
int countArray[] = new int[wordFreqMap.size()];
int i=0;
for (Map.Entry<String, Integer> entry : wordFreqMap.entrySet()) {
String tempWord = entry.getKey();
Integer wf = entry.getValue();
System.out.println(tempWord + " " + "count is= " + wf);
countArray[i++] = wf;
}

Sort the words and letters in Java

The code below counts how many times the words and letters appeared in the string. How do I sort the output from highest to lowest? The output should be like:
the - 2
quick - 1
brown - 1
fox - 1
t - 2
h - 2
e - 2
b - 1
My code:
import java.util.HashMap;
import java.util.Map;
import java.util.StringTokenizer;
public class Tokenizer {
public static void main(String[] args) {
int index = 0;
int tokenCount;
int i = 0;
Map<String, Integer> wordCount = new HashMap<String, Integer>();
Map<Integer, Integer> letterCount = new HashMap<Integer, Integer>();
String message = "The Quick brown fox the";
StringTokenizer string = new StringTokenizer(message);
tokenCount = string.countTokens();
System.out.println("Number of tokens = " + tokenCount);
while (string.hasMoreTokens()) {
String word = string.nextToken().toLowerCase();
Integer count = wordCount.get(word);
Integer lettercount = letterCount.get(word);
if (count == null) {
// this means the word was encountered the first time
wordCount.put(word, 1);
} else {
// word was already encountered we need to increment the count
wordCount.put(word, count + 1);
}
}
for (String words : wordCount.keySet()) {
System.out.println("Word : " + words + " has count :" + wordCount.get(words));
}
for (i = 0; i < message.length(); i++) {
char c = message.charAt(i);
if (c != ' ') {
int value = letterCount.getOrDefault((int) c, 0);
letterCount.put((int) c, value + 1);
}
}
for (int key : letterCount.keySet()) {
System.out.println((char) key + ": " + letterCount.get(key));
}
}
}

You have a Map<String, Integer>; I'd suggest something along the lines of another LinkedHashMap<String, Integer> which is populated by inserting keys that are sorted by value.

It seems that you want to sort the Map by it's value (i.e., count). Here are some general solutions.
Specifically for your case, a simple solution might be:
Use a TreeSet<Integer> to save all possible values of counts in the HashMap.
Iterate the TreeSetfrom high to low.
Inside the iteration mentioned in 2., use a loop to output all word-count pairs with count equals to current iterated count.
Please see if this may help.

just use the concept of the list and add all your data into list and then use sort method for it

print only repeated words in java

I want to display only the words that appear more than once in a string, single appearance of string should not be printed. Also i want to print strings whose length is more than 2 (to eliminate is,was,the etc)..
The code which I tried..prints all the strings and shows is occurrence number..
Code:
public static void main(String args[])
{
Map<String, Integer> wordcheck = new TreeMap<String, Integer>();
String string1="world world is new world of kingdom of palace of kings palace";
String string2[]=string1.split(" ");
for (int i=0; i<string2.length; i++)
{
String string=string2[i];
wordcheck.put(string,(wordcheck.get(string) == null?1: (wordcheck.get(string)+1)));
}
System.out.println(wordcheck);
}
Output:
{is=1, kingdom=1, kings=1, new=1, of=3, palace=2, world=3}
single appearance of string should not be printed...
also i want to print strings whose length is more than 2 (to eliminate is,was,the etc)..

Use it
for (String key : wordcheck.keySet()) {
if(wordcheck.get(key)>1)
System.out.println(key + " " + wordcheck.get(key));
}

Keeping track of the number of occurrences in a map will allow you to do this.
import java.util.HashMap;
import java.util.Map.Entry;
import java.util.Set;
public class Test1
{
public static void main(String[] args)
{
String string1="world world is new world of kingdom of palace of kings palace";
String string2[]=string1.split(" ");
HashMap<String, Integer> uniques = new HashMap<String, Integer>();
for (String word : string2)
{
// ignore words 2 or less characters long
if (word.length() <= 2)
{
continue;
}
// add or update the word occurrence count
Integer existingCount = uniques.get(word);
uniques.put(word, (existingCount == null ? 1 : (existingCount + 1)));
}
Set<Entry<String, Integer>> uniqueSet = uniques.entrySet();
boolean first = true;
for (Entry<String, Integer> entry : uniqueSet)
{
if (entry.getValue() > 1)
{
System.out.print((first ? "" : ", ") + entry.getKey() + "=" + entry.getValue());
first = false;
}
}
}
}

To get only the words occurring more then once, you have to filter your map.
Depending on your Java version you can use either this:
List<String> wordsOccuringMultipleTimes = new LinkedList<String>();
for (Map.Entry<String, Integer> singleWord : wordcheck.entrySet()) {
if (singleWord.getValue() > 1) {
wordsOccuringMultipleTimes.add(singleWord.getKey());
}
}
or starting with Java 8 this equivalent Lambda expression:
List<String> wordsOccuringMultipleTimes = wordcheck.entrySet().stream()
.filter((entry) -> entry.getValue() > 1)
.map((entry) -> entry.getKey())
.collect(Collectors.toList());
Regarding the nice printing, you have to do something similar while iterating over your result.

Use the below code
for (String key : wordcheck.keySet()) {
if(wordcheck.get(key)>1)
System.out.println(key + " " + wordcheck.get(key));
}

public static void main(String args[])
{
Map<String, Integer> wordcheck = new TreeMap<String, Integer>();
String string1="world world is new world of kingdom of palace of kings palace";
String string2[]=string1.split(" ");
HashSet<String> set = new HashSet<String>();
for (int i=0; i<string2.length; i++)
{
String data=string2[i];
for(int j=0;j<string2.length;j++)
{
if(i != j)
{
if(data.equalsIgnoreCase(string2[j]))
{
set.add(data);
}
}
}
}
System.out.println("Duplicate word size :"+set.size());
System.out.println("Duplicate words :"+set);
}

TreeMap.toString() is inherited from AbstractMap and the documentation states that
Returns a string representation of this map. The string representation consists of a list of key-value mappings in the order returned by the map's entrySet view's iterator, enclosed in braces ("{}"). Adjacent mappings are separated by the characters ", " (comma and space). Each key-value mapping is rendered as the key followed by an equals sign ("=") followed by the associated value. Keys and values are converted to strings as by String.valueOf(Object).
So better you write your own method that prints out the TreeMap in a way you want.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Sorting string occurrences from text file - java

Related

To print the first biggest and second biggest elements in a string

Finding Unique Words In A Text File Using ArrayList

How to take for loop output to separate array in Java?

Sort the words and letters in Java

print only repeated words in java

Categories

Resources