Stop words removal method in java not working

Stop words removal method in java not working - java

I am trying to write Java code that count than read all cran Field (hot topic in Information Retrial) in order to do tokenizing, count total tokens , find 50 frequent words and remove the the pre-defined stop words.
It works will except the StopWordsRemoval method (last one in the code),
it does not change the output as it required, the output before/after this method are same !
could you help me in figuring out what is the problem?
it is the first code for me in Java :(
import java.io.*;
import java.util.*;
public class Information_Retrieval_Hw1 {
//Global variables
public static BufferedReader buffer;
public static Hashtable<String, Integer> wordList = new Hashtable<String, Integer>();
public static ArrayList<Hashtable <String,Integer>> fileMap = new ArrayList<Hashtable<String,Integer>>();
public static Set<String> tagNames = new HashSet<String>();
//public static ArrayList<Map.Entry<String, Integer>> list;
public static int documentsCount = 0;
public static int totalTokens = 0;
public static int uniqueWords = 0;
public static int tagCount = 0;
public static int singleOccureneWords = 0;
public static ArrayList<Map.Entry<String, Integer>> sortedList;
public Information_Retrieval_Hw1() {
// TODO Auto-generated constructor stub
}
public static void main(String[] args) throws IOException {
String cranfield = "/Users/Manal/Desktop/semster1/IR/assigenment 1/cranfieldDocs";
File cranfieldFiles = new File(cranfield);
ReadFile(cranfieldFiles);
System.out.println("Total number of documents: " + fileMap.size());
//Calculate total number of tokens
totalTokens = CalculateNumberOfTokens(wordList);
System.out.println("Total number Of words = " + totalTokens);
//Calculate number of unique words
uniqueWords = CalculateUniqueWords(wordList);
System.out.println("Total number Of distinct words = " + uniqueWords);
//Calculate number of unique words
singleOccureneWords = CalculateSingleOccurenceWords(wordList);
System.out.println("Total number Of words that occur only once = " + singleOccureneWords);
//Find the 30 most frequent words
FindFiftyMostFrequentWords(wordList);
StopWordsRemoval (cranfieldFiles,wordList);
//reprint all information after removing stopword;
System.out.println("\n***********************************\nAfter removing stop words \n***********************************\n");
//Calculate total number of tokens
totalTokens = CalculateNumberOfTokens(wordList);
System.out.println("Total number Of words = " + totalTokens);
//Calculate number of unique words
uniqueWords = CalculateUniqueWords(wordList);
System.out.println("Total number Of distinct words = " + uniqueWords);
//Calculate number of unique words
singleOccureneWords = CalculateSingleOccurenceWords(wordList);
System.out.println("Total number Of words that occur only once = " + singleOccureneWords);
//Find the 30 most frequent words
FindFiftyMostFrequentWords(wordList);
}
public static void ReadFile(File cranfieldFiles) throws IOException{
for (File file: cranfieldFiles.listFiles())
{
//read files recursively if path contains folder
if(file.isDirectory())
{
ReadFile(file);
}
else
{
documentsCount++;
try
{
buffer = new BufferedReader(new FileReader(file));
}
catch (FileNotFoundException e)
{
System.out.println("File not Found");
}
//find the tags and their count
tagCount = tagCount + TagHandler(file, tagNames);
//find words in the cranfield
TokenHandler(file, tagNames);
}
}
}
public static int TagHandler(File file, Set<String> tagNames) throws IOException
{
String line;
int tag_count = 0;
buffer = new BufferedReader(new FileReader(file));
while((line = buffer.readLine()) != null)
{
/*
* If the line contains a '<', it is considered a tag and tag_count is incremented.
*/
if(line.contains("<"))
{
tag_count++;
String b = line.replaceAll("[<*>/]", "");
tagNames.add(b);
}
}
tag_count/=2; //Since each tag represent the beginning and the end, we divide it by two to get the actual count.
return tag_count;
}
public static void TokenHandler(File file, Set<String> tagNames) throws IOException
{
String line;
String words[];
buffer = new BufferedReader(new FileReader(file));
Hashtable<String, Integer> tempMap = new Hashtable<String, Integer>();
while((line = buffer.readLine()) != null)
{
String s1 = line.replaceAll("[^a-zA-Z.]+"," "); //Replace everything that is not an alphabet with a blank space.
String s2 = s1.replaceAll("[.]", "");//Replace words with . (eg U.S) as 1 word
words = s2.split(" ");
for(String word : words)
{
//Handle the tags properly
if(!tagNames.contains(word) && !word.equals(""))
{
word = word.toLowerCase(); // Converts all words to lower case.
//add word if it isn't added already
if(!wordList.containsKey(word))
{
//first occurance of this word
wordList.put(word, 1);
//Following is to compute the unique words in each document
if(!tempMap.containsKey(word))
{
tempMap.put(word,1);
}
else
{
tempMap.put(word, tempMap.get(word)+ 1);
}
}
else
{
//Increament the count of that word
wordList.put(word, wordList.get(word) + 1);
if(!tempMap.containsKey(word))
{
tempMap.put(word,1);
}
else
{
tempMap.put(word, tempMap.get(word)+ 1);
}
}
}
}
}
//Add count to file map and after reading every file
fileMap.add(tempMap);
}
//Function to find the total number of tokens in the cranfield database
public static int CalculateNumberOfTokens(Hashtable<String, Integer> myWordList)
{
int noOfTokens = 0;
for (Integer value: myWordList.values())
{
noOfTokens = noOfTokens + value;
}
return noOfTokens;
}
public static int CalculateUniqueWords(Hashtable<String, Integer> myWordList)
{
return myWordList.size();
}
public static int CalculateSingleOccurenceWords(Hashtable<String, Integer> myWordList)
{
int count = 0;
for (Integer value: myWordList.values())
{
if(value == 1)
{
count++;
}
}
return count;
}
//Sorting the hashTable
public static ArrayList<Map.Entry<String, Integer>> SortHashTable(Hashtable<String, Integer> myWordList)
{
ArrayList<Map.Entry<String, Integer>> list = new ArrayList<Map.Entry<String, Integer>>(myWordList.entrySet());
Collections.sort(list, new Comparator<Map.Entry<String, Integer>>(){
public int compare(Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2) {
return o2.getValue().compareTo(o1.getValue());
}});
return list;
}
public static void FindFiftyMostFrequentWords(Hashtable<String, Integer> myWordList)
{
//Sort the hashtable based on value
sortedList = SortHashTable(myWordList);
System.out.println("The 50 most frequent words are: ");
for(int i=0;i<50;i++)
{
System.out.println("\t" + (i+1) + "." + " " + sortedList.get(i));
}
}
public static Hashtable<String, Integer> StopWordsRemoval (File file, Hashtable<String, Integer> wordList) throws IOException {
int k=0,j;
String sCurrentLine;
String[] stopwords = new String[2000];
try
{
FileReader fr=new FileReader("/Users/Manal/Desktop/semster1/IR/assigenment 1/xid-10624858_1.txt");
BufferedReader br= new BufferedReader(fr);
while ((sCurrentLine = br.readLine()) != null){
stopwords[k]=sCurrentLine;
k++;
}
Set<String> keys = wordList.keySet();
for(String key: keys)
{
for(j = 0; j < k; j++)
{
if(wordList.keySet().equals(stopwords[j]))
wordList.remove(key);
}
}
}
catch(Exception ex)
{System.out.println(ex);}
return wordList;
}
}

I think this is the issue in the code
if(wordList.keySet().equals(stopwords[j]))
What you're doing there is checking if the keySet is equal to the word (keySet() returns Set) vs. whether or not the keySet contains the word. Try this instead:
if(wordList.keySet().contains(stopwords[j]))
Let me know if that fixes your problem.

Related

Problems with sorting the characters of a word

I have a problem that I have been struggling with for some time.
I am given a word consisting of small or large letters of the English alphabet, to sort the characters so that in the first positions appear the characters that appear most often in the word, and if they appear by the same number of times, they will be sorted lexicographical.
Such as:
input:
Instructions
output:
iinnssttcoru
So far I have written this, but from here I do not know how to sort them and display properly, a tip?
public class Main {
public static void main(String[] args) throws IOException {
String testString = " ";
BufferedReader rd = new BufferedReader(new InputStreamReader(System.in));
testString = rd.readLine();
Map<Character, List<Character>> map = new HashMap<>();
for (int i = 0; i < testString.length(); i++) {
char someChar = testString.charAt(i);
if (someChar == ' ') {
continue;
}
char ch = testString.charAt(i);
List<Character> characters = map.getOrDefault(Character.toLowerCase(ch), new ArrayList<>());
characters.add(ch);
map.put(Character.toLowerCase(ch), characters);
}
List<Map.Entry<Character, List<Character>>> list = new ArrayList<>(map.entrySet());}

You can add TreeMap counterAppear with the key is the number of repetitions of the character and value is a list of characters has the same number of key repetitions. This list needs to be sorted before printing to ensure the order as required. Use TreeMap to make sure the map is sorted by key(the number of repetitions).
public static void main(String[] args) throws IOException {
String testString = " ";
BufferedReader rd = new BufferedReader(new InputStreamReader(System.in));
testString = rd.readLine();
Map<Character, List<Character>> map = new HashMap<>();
for (int i = 0; i < testString.length(); i++) {
char someChar = testString.charAt(i);
if (someChar == ' ') {
continue;
}
char ch = testString.charAt(i);
//Change to Optimize Code
Character keyCharacter = Character.toLowerCase(ch);
if (map.get(keyCharacter) == null) {
map.put(keyCharacter, new ArrayList<>());
}
List<Character> characters = map.get(keyCharacter);
characters.add(ch);
}
TreeMap<Integer, List<Character>> counterAppear = new TreeMap<>();
for (Map.Entry<Character, List<Character>> entry : map.entrySet()) {
Character character = entry.getKey();
int repeatCharTime = entry.getValue().size();
if (counterAppear.get(repeatCharTime) == null) {
counterAppear.put(repeatCharTime, new ArrayList<>());
}
List<Character> characters = counterAppear.get(repeatCharTime);
characters.add(character);
}
for (Integer repeatCharTime : counterAppear.descendingKeySet()) {
List<Character> keyCharacters = counterAppear.get(repeatCharTime);
Collections.sort(keyCharacters);
for (Character character : keyCharacters) {
for (int i = 0; i < repeatCharTime; i++) {
System.err.print(character);
}
}
}
}

Here's my solution:
import java.util.*;
public class Test
{
static void process(String s)
{
HashMap<Character,Integer> map = new HashMap<Character,Integer>();
for(Character c : s.toLowerCase().toCharArray())
{
Integer nb = map.get(c);
map.put(c, nb==null ? 1 : nb+1);
}
ArrayList<Map.Entry<Character,Integer>> list = new ArrayList<>(map.entrySet());
Collections.sort(list, (a,b) ->
{
int res = b.getValue().compareTo(a.getValue());
if(res!=0)
return res;
return a.getKey().compareTo(b.getKey());
});
for(Map.Entry<Character,Integer> e : list)
{
for(int i=0;i<e.getValue();i++)
System.out.print(e.getKey());
}
}
public static void main(String[] args)
{
process("Instructions");
}
}

How to sort two list in java concurrently?

I have two lists:
1. with words
2. with respective frequency counts
Now I want to sort both of the list in descending order so that index of a word in the first list matches to that of the second list containing frequency counts, respectively.
Adding a function:
public String[] process() throws Exception
{
String[] ret = new String[20];
int c=0;
BufferedReader br = new BufferedReader(new FileReader(inputFileName));
String line = br.readLine();
List<String> result = new ArrayList<String>();
List<Integer> sorted = new ArrayList<Integer>();
List<String> key= new ArrayList<String>();
List<String> new_list = new ArrayList<String>();
int x=0;
while(line!=null){
StringTokenizer st = new StringTokenizer(line,delimiters);
String token = "";
while (st.hasMoreTokens()) {
token = st.nextToken();
//System.out.println(token);
if(token!=null)
{
//System.out.println(token);
result.add( x,token.toLowerCase());
//System.out.println("Key is" + x + "\t" + result.get(x));
x++;
}
}
line=br.readLine();
}
for(int w =0;w<x;w++){
c=0;
String copy=result.get(w);
int i;
for(i =0;i<stopWordsArray.length;i++){
if(copy.compareTo(stopWordsArray[i])==0){
c=1;
break;
}
}
if(c==0){
new_list.add(copy);
}
}
if(c==0){
Map<String, Integer> map = new HashMap<String, Integer>();
for (String temp : new_list) {
Integer count = map.get(temp);
map.put(temp, (count == null) ? 1 : count + 1);
}
int i=0;
int sort = 0;
String key1 = "";
for (Map.Entry<String, Integer> entry : map.entrySet()) {
sort = entry.getValue();
key1 = entry.getKey();
sorted.add(i,sort);
key.add(i,key1);
i++;
}
Integer maxi= Collections.max(sorted);
System.out.println(maxi);
Integer value = sorted.indexOf(maxi);
System.out.println(value);
System.out.println("Word is:" + key.get(value));
}
return ret; }
Here sorted is a list which contains frequencies of words and key is list which contains word.

One option is to create a class with two members: word and frequency. Create a Comparator or implement Comparable to sort based on the frequency, then implement toString() to print it however you like.

I don't completely understand the situation, but throwing this out there.
You could use a Map<String,Integer> to store your data with the mapping Word -> Frequency. Now if you use TreeMap it automatically sorts according to the keys (words in your case). Now if you want to sort by values (frequency) , follow this SOF Post - TreeMap sort by value

Map will not work if there are duplicate words. The last value of the same key will replace the earlier.
So the solution that comes to my mind is as follows:
SortingWordAndCounts.java
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
public class SortingWordAndCounts {
public static void main(String args[]) {
ArrayList<WordFreq> wordFreqList = new ArrayList<WordFreq>();
for (int i = 10; i >= 0; i--) {
WordFreq wf = new WordFreq();
wf.setWord("Word" + (i + 1));
wf.setFrequency(i + 10);
wordFreqList.add(wf);
}
System.out.println("===== Unsorted Result=====");
for (WordFreq wf : wordFreqList) {
System.out.println(wf.word + "=" + wf.frequency);
}
System.out.println("===== sort by Word=====");
// Now Sort list and print
for (WordFreq wf : new SortingWordSAndCounts().sortByWord(wordFreqList,"DESC")) {
System.out.println(wf.word + "=" + wf.frequency);
}
System.out.println("===== sort by Frequency=====");
// Now Sort list and print
for (WordFreq wf : new SortingWordSAndCounts().sortByFrequency(wordFreqList,"DESC")) {
System.out.println(wf.word + "=" + wf.frequency);
}
}
public ArrayList<WordFreq> sortByWord(ArrayList<WordFreq> wordFreqList, String sortOrder) {
Comparator<WordFreq> comparator = new Comparator<WordFreq>() {
#Override
public int compare(WordFreq o1, WordFreq o2) {
if (sortOrder.equalsIgnoreCase("DESC"))
return o2.word.compareTo(o1.word);
else
return o1.word.compareTo(o2.word);
}
};
Collections.sort(wordFreqList, comparator);
return wordFreqList;
}
public ArrayList<WordFreq> sortByFrequency(ArrayList<WordFreq> wordFreqList, String sortOrder) {
Comparator<WordFreq> comparator = new Comparator<WordFreq>() {
#Override
public int compare(WordFreq o1, WordFreq o2) {
if (sortOrder.equalsIgnoreCase("DESC"))
return o2.frequency - o1.frequency;
else
return o1.frequency - o2.frequency;
}
};
Collections.sort(wordFreqList, comparator);
return wordFreqList;
}
}
Create the pojo:
WordFreq.java
public class WordFreq {
String word;
int frequency;
public String getWord() {
return word;
}
public void setWord(String word) {
this.word = word;
}
public int getFrequency() {
return frequency;
}
public void setFrequency(int frequency) {
this.frequency = frequency;
}
}
Hope it helps.

Java - Hashmapping a text file

and please excuse my ignorance, I have been puzzling on this for a while.
I have a huge .txt file containing mostly letters. I need to create HashMaps to store word length, Word characters and Word count...i have to print out the longest word occurred more than three times and show how many times it occurred.
Im thinking something like that
private void readWords(){
BufferedReader in = new BufferedReader(new FileReader("text.txt"));
Map<Integer, Map<String, Integer>>
}
The problem is that i dont quite know how to save to HashMap, can anybody help please?
Thank you!

import java.io.File;
import java.io.FileNotFoundException;
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;
public class HashMapExample {
static String fileName = "text.txt";
private static Scanner input;
public static void main(String[] args) throws FileNotFoundException {
input = new Scanner(new File(fileName));
Map<String, Integer> map = new HashMap<String, Integer>();
while (input.hasNext()) {
String word = input.next();
if (map.containsKey(word)) {
int temp = map.get(word) + 1;
map.put(word, temp);
} else {
map.put(word, 1);
}
}
System.out.println("printing longest word(s) with word count < 3");
System.out.println("");
// iterate through the key set and display word, word length and values
System.out.printf("%-25s\t%-25s\t%s\n", "Word", "Word Length", "Count");
String longest = getLongest(map);
int valueOfLongest = 0;
if (!longest.equals("")) {
valueOfLongest = longest.length();
System.out.printf("%-25s\t%-25s\t%s\n", longest, longest.length(), map.get(longest));
map.remove(longest);
}
boolean isAllRemoved = false;
while (!isAllRemoved) {
isAllRemoved = false;
longest = getLongest(map);
if (!longest.equals("") && longest.length() == valueOfLongest){
System.out.printf("%-25s\t%-25s\t%s\n", longest, longest.length(), map.get(longest));
map.remove(longest);
} else
isAllRemoved = true;
}
System.out.println("");
System.out.println("printing next longest word(s) with word count > = 3");
System.out.println("");
// iterate through the key set and display word, word length and values
System.out.printf("%-25s\t%-25s\t%s\n", "Word", "Word Length", "Count");
String nextLongest = getNextLongest(map, valueOfLongest);
int valueOfNextLongest = 0;
if (!longest.equals("")) {
valueOfNextLongest = nextLongest.length();
System.out.printf("%-25s\t%-25s\t%s\n", nextLongest, nextLongest.length(), map.get(nextLongest));
map.remove(nextLongest);
}
boolean isNextLongest = false;
while (!isNextLongest) {
isNextLongest = true;
nextLongest = getNextLongest(map, valueOfLongest);
if (!(nextLongest.equals("")) && nextLongest.length() == valueOfNextLongest) {
System.out.printf("%-25s\t%-25s\t%s\n", nextLongest, nextLongest.length(), map.get(nextLongest));
map.remove(nextLongest);
isNextLongest = false;
}
}
}
public static String getLongest(Map<String, Integer> map) {
String longest = "";
for (Map.Entry<String, Integer> entry : map.entrySet()) {
String key = (String) entry.getKey();
if (longest.length() < key.length() && map.get(key) < 3) {
longest = key;
}
}
return longest;
}
public static String getNextLongest(Map<String, Integer> map,
int valueOfLongest) {
String nextLongest = "";
for (Map.Entry<String, Integer> entry : map.entrySet()) {
String key = (String) entry.getKey();
if (valueOfLongest > key.length() && nextLongest.length() < key.length() && map.get(key) >= 3) {
nextLongest = key;
}
}
return nextLongest;
}
}

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashSet;
import java.util.Set;
import com.google.common.collect.HashMultiset;
import com.google.common.collect.Multiset;
public class CountWord {
public static void main(String args[]) throws IOException {
FileReader fr = new FileReader("c:/a.txt");
BufferedReader br = new BufferedReader(fr);
// init the longest size 0
int longestSize = 0;
String s = null;
// may be some word have the same length
Set<String> finalAnswerSet = new HashSet<String>();
Multiset<String> everyWordSet = HashMultiset.create();
while (br != null && (s = br.readLine()) != null) {
// put every word into the everyWordSet
everyWordSet.add(s);
// we care about the word appear 3+ times
if (everyWordSet.count(s) > 3) {
if (s.length() > longestSize) {
//if s'length is the longest,clear the finalAnswerSet and put s into it
longestSize = s.length();
finalAnswerSet.clear();
finalAnswerSet.add(s);
} else if (s.length() == longestSize) {
// finalAnswerSet may contains multi values
finalAnswerSet.add(s);
}
}
}
// and now we have the longestSize,and finalAnswerSet contains the answers,let's check it
System.out.println("The longest size is:" + longestSize);
for (String answer : finalAnswerSet) {
System.out.println("The word is :" + answer);
System.out.println("The word appears time is:" + everyWordSet.count(answer));
}
//don't forget to close the resource
br.close();
fr.close();
}
}

How to count words in array of strings in java?

I am learning about arrays and I wanted to make a program count words. Given: String myWords = {"soon; hi; also; soon; job; also"};
, I have to create a method like countWrods(myWords);
The printed result should be the words printed alphabetical order, the number of unique words and total words.
here is my code:
public class Words {
public static void main(String[] args){
String[] myWords = {"soon; hi; also; soon; job; mother; job; also; soon; later"};
Words myW= new Words();
myW.countWords();
System.out.println("\tWords \tFreq");
}
public static String[] countWords(myWords){
for (int i=0; i<myWords.length; i++){
String temp = myWords[i];
//System.out.println(temp + " ");
for(int j=i+1; j<myWords.length; j++){
String temp2= myWords[j];
System.out.println("No. of unique words: " );
}
}
}
}
What should I do next?

import java.io.*;
import java.util.*;
public class Count_Words_Scan
{
void main()throws IOException
{
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
System.out.println("ENTER A STRING ");
String str = br.readLine();
str= str.toLowerCase();
int c=0;
Scanner sc = new Scanner(str);
while(sc.hasNext())
{
sc.next();
c++;
}
System.out.println("NO.OF WORDS = "+c);
}
}
Input: the word counter
Output: NO.OF WORDS = 3

I would suggest you take a look at split, trim and HashSet.

I am assuming you want to count the words in a string .
String : "soon hi also soon job mother job also soon later"
public class Words {
Map<String , Integer> dictionary=new HashMap<String,Integer>();
public static void main(String[] args) {
String myWords = "soon hi also soon job mother job also soon later";
Words myW = new Words();
String[] array=myWords.split("\\s+");
myW.countWords(array);
System.out.println(myW.dictionary);
}
private void countWords(String[] myWords) {
for(String s:myWords){
if(dictionary.containsKey(s))
dictionary.put(s, dictionary.get(s)+1);
else
dictionary.put(s, 1);
}
}
}
O/P : {mother=1, later=1, job=2, hi=1, also=2, soon=3}

First you need to split your String, presumably on ";" - then you can whack that into a TreeSet to sort it and make then words unqiue. Add a counter to count the total words. You could also use a TreeMap to keep a count of each word, override the put method on the map to aggregate as you go...
final String myString = {"soon; hi; also; soon; job; mother; job; also; soon; later"};
final String[] myStrings = myString.split(";");
final Map<String, Integer> myStringMap = new TreeMap<>(){
#override
public String put(final String key, final Integer value) {
if(contains(key)) {
return put(key, get(key) + 1);
} else {
return put(key, 1);
}
}
};
for(final String string : myStrings) {
myStringMap.put(string.trim(), 1);
}
Now myStringMap.size() is the number of unique words, myStringMap.keys() is a alphabetically sorted Set of all unquie words and if you want the total you just need to add up the values:
int totalWords = 0;
for(final Integer count : myStringMap.values()) {
totalWorks += count;
}

Finding repeated words on a string and counting the repetitions

I need to find repeated words on a string, and then count how many times they were repeated. So basically, if the input string is this:
String s = "House, House, House, Dog, Dog, Dog, Dog";
I need to create a new string list without repetitions and save somewhere else the amount of repetitions for each word, like such:
New String: "House, Dog"
New Int Array: [3, 4]
Is there a way to do this easily with Java? I've managed to separate the string using s.split() but then how do I count repetitions and eliminate them on the new string? Thanks!

You've got the hard work done. Now you can just use a Map to count the occurrences:
Map<String, Integer> occurrences = new HashMap<String, Integer>();
for ( String word : splitWords ) {
Integer oldCount = occurrences.get(word);
if ( oldCount == null ) {
oldCount = 0;
}
occurrences.put(word, oldCount + 1);
}
Using map.get(word) will tell you many times a word occurred. You can construct a new list by iterating through map.keySet():
for ( String word : occurrences.keySet() ) {
//do something with word
}
Note that the order of what you get out of keySet is arbitrary. If you need the words to be sorted by when they first appear in your input String, you should use a LinkedHashMap instead.

Try this,
public class DuplicateWordSearcher {
#SuppressWarnings("unchecked")
public static void main(String[] args) {
String text = "a r b k c d se f g a d f s s f d s ft gh f ws w f v x s g h d h j j k f sd j e wed a d f";
List<String> list = Arrays.asList(text.split(" "));
Set<String> uniqueWords = new HashSet<String>(list);
for (String word : uniqueWords) {
System.out.println(word + ": " + Collections.frequency(list, word));
}
}
}

public class StringsCount{
public static void main(String args[]) {
String value = "This is testing Program testing Program";
String item[] = value.split(" ");
HashMap<String, Integer> map = new HashMap<>();
for (String t : item) {
if (map.containsKey(t)) {
map.put(t, map.get(t) + 1);
} else {
map.put(t, 1);
}
}
Set<String> keys = map.keySet();
for (String key : keys) {
System.out.println(key);
System.out.println(map.get(key));
}
}
}

As mentioned by others use String::split(), followed by some map (hashmap or linkedhashmap) and then merge your result. For completeness sake putting the code.
import java.util.*;
public class Genric<E>
{
public static void main(String[] args)
{
Map<String, Integer> unique = new LinkedHashMap<String, Integer>();
for (String string : "House, House, House, Dog, Dog, Dog, Dog".split(", ")) {
if(unique.get(string) == null)
unique.put(string, 1);
else
unique.put(string, unique.get(string) + 1);
}
String uniqueString = join(unique.keySet(), ", ");
List<Integer> value = new ArrayList<Integer>(unique.values());
System.out.println("Output = " + uniqueString);
System.out.println("Values = " + value);
}
public static String join(Collection<String> s, String delimiter) {
StringBuffer buffer = new StringBuffer();
Iterator<String> iter = s.iterator();
while (iter.hasNext()) {
buffer.append(iter.next());
if (iter.hasNext()) {
buffer.append(delimiter);
}
}
return buffer.toString();
}
}
New String is Output = House, Dog
Int array (or rather list) Values = [3, 4] (you can use List::toArray) for getting an array.

Using java8
private static void findWords(String s, List<String> output, List<Integer> count){
String[] words = s.split(", ");
Map<String, Integer> map = new LinkedHashMap<>();
Arrays.stream(words).forEach(e->map.put(e, map.getOrDefault(e, 0) + 1));
map.forEach((k,v)->{
output.add(k);
count.add(v);
});
}
Also, use a LinkedHashMap if you want to preserve the order of insertion
private static void findWords(){
String s = "House, House, House, Dog, Dog, Dog, Dog";
List<String> output = new ArrayList<>();
List<Integer> count = new ArrayList<>();
findWords(s, output, count);
System.out.println(output);
System.out.println(count);
}
Output
[House, Dog]
[3, 4]

If this is a homework, then all I can say is: use String.split() and HashMap<String,Integer>.
(I see you've found split() already. You're along the right lines then.)

It may help you somehow.
String st="I am am not the one who is thinking I one thing at time";
String []ar = st.split("\\s");
Map<String, Integer> mp= new HashMap<String, Integer>();
int count=0;
for(int i=0;i<ar.length;i++){
count=0;
for(int j=0;j<ar.length;j++){
if(ar[i].equals(ar[j])){
count++;
}
}
mp.put(ar[i], count);
}
System.out.println(mp);

Once you have got the words from the string it is easy.
From Java 10 onwards you can try the following code:
import java.util.Arrays;
import java.util.stream.Collectors;
public class StringFrequencyMap {
public static void main(String... args) {
String[] wordArray = {"House", "House", "House", "Dog", "Dog", "Dog", "Dog"};
var freq = Arrays.stream(wordArray)
.collect(Collectors.groupingBy(x -> x, Collectors.counting()));
System.out.println(freq);
}
}
Output:
{House=3, Dog=4}

You can use Prefix tree (trie) data structure to store words and keep track of count of words within Prefix Tree Node.
#define ALPHABET_SIZE 26
// Structure of each node of prefix tree
struct prefix_tree_node {
prefix_tree_node() : count(0) {}
int count;
prefix_tree_node *child[ALPHABET_SIZE];
};
void insert_string_in_prefix_tree(string word)
{
prefix_tree_node *current = root;
for(unsigned int i=0;i<word.size();++i){
// Assuming it has only alphabetic lowercase characters
// Note ::::: Change this check or convert into lower case
const unsigned int letter = static_cast<int>(word[i] - 'a');
// Invalid alphabetic character, then continue
// Note :::: Change this condition depending on the scenario
if(letter > 26)
throw runtime_error("Invalid alphabetic character");
if(current->child[letter] == NULL)
current->child[letter] = new prefix_tree_node();
current = current->child[letter];
}
current->count++;
// Insert this string into Max Heap and sort them by counts
}
// Data structure for storing in Heap will be something like this
struct MaxHeapNode {
int count;
string word;
};
After inserting all words, you have to print word and count by iterating Maxheap.

//program to find number of repeating characters in a string
//Developed by Subash<subash_senapati#ymail.com>
import java.util.Scanner;
public class NoOfRepeatedChar
{
public static void main(String []args)
{
//input through key board
Scanner sc = new Scanner(System.in);
System.out.println("Enter a string :");
String s1= sc.nextLine();
//formatting String to char array
String s2=s1.replace(" ","");
char [] ch=s2.toCharArray();
int counter=0;
//for-loop tocompare first character with the whole character array
for(int i=0;i<ch.length;i++)
{
int count=0;
for(int j=0;j<ch.length;j++)
{
if(ch[i]==ch[j])
count++; //if character is matching with others
}
if(count>1)
{
boolean flag=false;
//for-loop to check whether the character is already refferenced or not
for (int k=i-1;k>=0 ;k-- )
{
if(ch[i] == ch[k] ) //if the character is already refferenced
flag=true;
}
if( !flag ) //if(flag==false)
counter=counter+1;
}
}
if(counter > 0) //if there is/are any repeating characters
System.out.println("Number of repeating charcters in the given string is/are " +counter);
else
System.out.println("Sorry there is/are no repeating charcters in the given string");
}
}

public static void main(String[] args) {
String s="sdf sdfsdfsd sdfsdfsd sdfsdfsd sdf sdf sdf ";
String st[]=s.split(" ");
System.out.println(st.length);
Map<String, Integer> mp= new TreeMap<String, Integer>();
for(int i=0;i<st.length;i++){
Integer count=mp.get(st[i]);
if(count == null){
count=0;
}
mp.put(st[i],++count);
}
System.out.println(mp.size());
System.out.println(mp.get("sdfsdfsd"));
}

If you pass a String argument it will count the repetition of each word
/**
* #param string
* #return map which contain the word and value as the no of repatation
*/
public Map findDuplicateString(String str) {
String[] stringArrays = str.split(" ");
Map<String, Integer> map = new HashMap<String, Integer>();
Set<String> words = new HashSet<String>(Arrays.asList(stringArrays));
int count = 0;
for (String word : words) {
for (String temp : stringArrays) {
if (word.equals(temp)) {
++count;
}
}
map.put(word, count);
count = 0;
}
return map;
}
output:
Word1=2, word2=4, word2=1,. . .

import java.util.HashMap;
import java.util.LinkedHashMap;
public class CountRepeatedWords {
public static void main(String[] args) {
countRepeatedWords("Note that the order of what you get out of keySet is arbitrary. If you need the words to be sorted by when they first appear in your input String, you should use a LinkedHashMap instead.");
}
public static void countRepeatedWords(String wordToFind) {
String[] words = wordToFind.split(" ");
HashMap<String, Integer> wordMap = new LinkedHashMap<String, Integer>();
for (String word : words) {
wordMap.put(word,
(wordMap.get(word) == null ? 1 : (wordMap.get(word) + 1)));
}
System.out.println(wordMap);
}
}

I hope this will help you
public void countInPara(String str) {
Map<Integer,String> strMap = new HashMap<Integer,String>();
List<String> paraWords = Arrays.asList(str.split(" "));
Set<String> strSet = new LinkedHashSet<>(paraWords);
int count;
for(String word : strSet) {
count = Collections.frequency(paraWords, word);
strMap.put(count, strMap.get(count)==null ? word : strMap.get(count).concat(","+word));
}
for(Map.Entry<Integer,String> entry : strMap.entrySet())
System.out.println(entry.getKey() +" :: "+ entry.getValue());
}

import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
public class DuplicateWord {
public static void main(String[] args) {
String para = "this is what it is this is what it can be";
List < String > paraList = new ArrayList < String > ();
paraList = Arrays.asList(para.split(" "));
System.out.println(paraList);
int size = paraList.size();
int i = 0;
Map < String, Integer > duplicatCountMap = new HashMap < String, Integer > ();
for (int j = 0; size > j; j++) {
int count = 0;
for (i = 0; size > i; i++) {
if (paraList.get(j).equals(paraList.get(i))) {
count++;
duplicatCountMap.put(paraList.get(j), count);
}
}
}
System.out.println(duplicatCountMap);
List < Integer > myCountList = new ArrayList < > ();
Set < String > myValueSet = new HashSet < > ();
for (Map.Entry < String, Integer > entry: duplicatCountMap.entrySet()) {
myCountList.add(entry.getValue());
myValueSet.add(entry.getKey());
}
System.out.println(myCountList);
System.out.println(myValueSet);
}
}
Input: this is what it is this is what it can be
Output:
[this, is, what, it, is, this, is, what, it, can, be]
{can=1, what=2, be=1, this=2, is=3, it=2}
[1, 2, 1, 2, 3, 2]
[can, what, be, this, is, it]

import java.util.HashMap;
import java.util.Scanner;
public class class1 {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
String inpStr = in.nextLine();
int key;
HashMap<String,Integer> hm = new HashMap<String,Integer>();
String[] strArr = inpStr.split(" ");
for(int i=0;i<strArr.length;i++){
if(hm.containsKey(strArr[i])){
key = hm.get(strArr[i]);
hm.put(strArr[i],key+1);
}
else{
hm.put(strArr[i],1);
}
}
System.out.println(hm);
}
}

Please use the below code. It is the most simplest as per my analysis. Hope you will like it:
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Scanner;
import java.util.Set;
public class MostRepeatingWord {
String mostRepeatedWord(String s){
String[] splitted = s.split(" ");
List<String> listString = Arrays.asList(splitted);
Set<String> setString = new HashSet<String>(listString);
int count = 0;
int maxCount = 1;
String maxRepeated = null;
for(String inp: setString){
count = Collections.frequency(listString, inp);
if(count > maxCount){
maxCount = count;
maxRepeated = inp;
}
}
return maxRepeated;
}
public static void main(String[] args)
{
System.out.println("Enter The Sentence: ");
Scanner s = new Scanner(System.in);
String input = s.nextLine();
MostRepeatingWord mrw = new MostRepeatingWord();
System.out.println("Most repeated word is: " + mrw.mostRepeatedWord(input));
}
}

package day2;
import java.util.ArrayList;
import java.util.HashMap;`enter code here`
import java.util.List;
public class DuplicateWords {
public static void main(String[] args) {
String S1 = "House, House, House, Dog, Dog, Dog, Dog";
String S2 = S1.toLowerCase();
String[] S3 = S2.split("\\s");
List<String> a1 = new ArrayList<String>();
HashMap<String, Integer> hm = new HashMap<>();
for (int i = 0; i < S3.length - 1; i++) {
if(!a1.contains(S3[i]))
{
a1.add(S3[i]);
}
else
{
continue;
}
int Count = 0;
for (int j = 0; j < S3.length - 1; j++)
{
if(S3[j].equals(S3[i]))
{
Count++;
}
}
hm.put(S3[i], Count);
}
System.out.println("Duplicate Words and their number of occurrences in String S1 : " + hm);
}
}

public class Counter {
private static final int COMMA_AND_SPACE_PLACE = 2;
private String mTextToCount;
private ArrayList<String> mSeparateWordsList;
public Counter(String mTextToCount) {
this.mTextToCount = mTextToCount;
mSeparateWordsList = cutStringIntoSeparateWords(mTextToCount);
}
private ArrayList<String> cutStringIntoSeparateWords(String text)
{
ArrayList<String> returnedArrayList = new ArrayList<>();
if(text.indexOf(',') == -1)
{
returnedArrayList.add(text);
return returnedArrayList;
}
int position1 = 0;
int position2 = 0;
while(position2 < text.length())
{
char c = ',';
if(text.toCharArray()[position2] == c)
{
String tmp = text.substring(position1, position2);
position1 += tmp.length() + COMMA_AND_SPACE_PLACE;
returnedArrayList.add(tmp);
}
position2++;
}
if(position1 < position2)
{
returnedArrayList.add(text.substring(position1, position2));
}
return returnedArrayList;
}
public int[] countWords()
{
if(mSeparateWordsList == null) return null;
HashMap<String, Integer> wordsMap = new HashMap<>();
for(String s: mSeparateWordsList)
{
int cnt;
if(wordsMap.containsKey(s))
{
cnt = wordsMap.get(s);
cnt++;
} else {
cnt = 1;
}
wordsMap.put(s, cnt);
}
return printCounterResults(wordsMap);
}
private int[] printCounterResults(HashMap<String, Integer> m)
{
int index = 0;
int[] returnedIntArray = new int[m.size()];
for(int i: m.values())
{
returnedIntArray[index] = i;
index++;
}
return returnedIntArray;
}
}

/*count no of Word in String using TreeMap we can use HashMap also but word will not display in sorted order */
import java.util.*;
public class Genric3
{
public static void main(String[] args)
{
Map<String, Integer> unique = new TreeMap<String, Integer>();
String string1="Ram:Ram: Dog: Dog: Dog: Dog:leela:leela:house:house:shayam";
String string2[]=string1.split(":");
for (int i=0; i<string2.length; i++)
{
String string=string2[i];
unique.put(string,(unique.get(string) == null?1:(unique.get(string)+1)));
}
System.out.println(unique);
}
}

//program to find number of repeating characters in a string
//Developed by Rahul Lakhmara
import java.util.*;
public class CountWordsInString {
public static void main(String[] args) {
String original = "I am rahul am i sunil so i can say am i";
// making String type of array
String[] originalSplit = original.split(" ");
// if word has only one occurrence
int count = 1;
// LinkedHashMap will store the word as key and number of occurrence as
// value
Map<String, Integer> wordMap = new LinkedHashMap<String, Integer>();
for (int i = 0; i < originalSplit.length - 1; i++) {
for (int j = i + 1; j < originalSplit.length; j++) {
if (originalSplit[i].equals(originalSplit[j])) {
// Increment in count, it will count how many time word
// occurred
count++;
}
}
// if word is already present so we will not add in Map
if (wordMap.containsKey(originalSplit[i])) {
count = 1;
} else {
wordMap.put(originalSplit[i], count);
count = 1;
}
}
Set word = wordMap.entrySet();
Iterator itr = word.iterator();
while (itr.hasNext()) {
Map.Entry map = (Map.Entry) itr.next();
// Printing
System.out.println(map.getKey() + " " + map.getValue());
}
}
}

public static void main(String[] args){
String string = "elamparuthi, elam, elamparuthi";
String[] s = string.replace(" ", "").split(",");
String[] op;
String ops = "";
for(int i=0; i<=s.length-1; i++){
if(!ops.contains(s[i]+"")){
if(ops != "")ops+=", ";
ops+=s[i];
}
}
System.out.println(ops);
}

For Strings with no space, we can use the below mentioned code
private static void findRecurrence(String input) {
final Map<String, Integer> map = new LinkedHashMap<>();
for(int i=0; i<input.length(); ) {
int pointer = i;
int startPointer = i;
boolean pointerHasIncreased = false;
for(int j=0; j<startPointer; j++){
if(pointer<input.length() && input.charAt(j)==input.charAt(pointer) && input.charAt(j)!=32){
pointer++;
pointerHasIncreased = true;
}else{
if(pointerHasIncreased){
break;
}
}
}
if(pointer - startPointer >= 2) {
String word = input.substring(startPointer, pointer);
if(map.containsKey(word)){
map.put(word, map.get(word)+1);
}else{
map.put(word, 1);
}
i=pointer;
}else{
i++;
}
}
for(Map.Entry<String, Integer> entry : map.entrySet()){
System.out.println(entry.getKey() + " = " + (entry.getValue()+1));
}
}
Passing some input as "hahaha" or "ba na na" or "xxxyyyzzzxxxzzz" give the desired output.

Hope this helps :
public static int countOfStringInAText(String stringToBeSearched, String masterString){
int count = 0;
while (masterString.indexOf(stringToBeSearched)>=0){
count = count + 1;
masterString = masterString.substring(masterString.indexOf(stringToBeSearched)+1);
}
return count;
}

package string;
import java.util.HashMap;
import java.util.Map;
import java.util.Set;
public class DublicatewordinanArray {
public static void main(String[] args) {
String str = "This is Dileep Dileep Kumar Verma Verma";
DuplicateString(str);
}
public static void DuplicateString(String str) {
String word[] = str.split(" ");
Map < String, Integer > map = new HashMap < String, Integer > ();
for (String w: word)
if (!map.containsKey(w)) {
map.put(w, 1);
}
else {
map.put(w, map.get(w) + 1);
}
Set < Map.Entry < String, Integer >> entrySet = map.entrySet();
for (Map.Entry < String, Integer > entry: entrySet)
if (entry.getValue() > 1) {
System.out.printf("%s : %d %n", entry.getKey(), entry.getValue());
}
}
}

Using Java 8 streams collectors:
public static Map<String, Integer> countRepetitions(String str) {
return Arrays.stream(str.split(", "))
.collect(Collectors.toMap(s -> s, s -> 1, (a, b) -> a + 1));
}
Input: "House, House, House, Dog, Dog, Dog, Dog, Cat"
Output: {Cat=1, House=3, Dog=4}

please try these it may be help for you.
public static void main(String[] args) {
String str1="House, House, House, Dog, Dog, Dog, Dog";
String str2=str1.replace(",", "");
Map<String,Integer> map=findFrquenciesInString(str2);
Set<String> keys=map.keySet();
Collection<Integer> vals=map.values();
System.out.println(keys);
System.out.println(vals);
}
private static Map<String,Integer> findFrquenciesInString(String str1) {
String[] strArr=str1.split(" ");
Map<String,Integer> map=new HashMap<>();
for(int i=0;i<strArr.length;i++) {
int count=1;
for(int j=i+1;j<strArr.length;j++) {
if(strArr[i].equals(strArr[j]) && strArr[i]!="-1") {
strArr[j]="-1";
count++;
}
}
if(count>1 && strArr[i]!="-1") {
map.put(strArr[i], count);
strArr[i]="-1";
}
}
return map;
}

as introduction of stream has changed the way we code; i would like to add some of the ways of doing this using it
String[] strArray = str.split(" ");
//1. All string value with their occurrences
Map<String, Long> counterMap =
Arrays.stream(strArray).collect(Collectors.groupingBy(e->e, Collectors.counting()));
//2. only duplicating Strings
Map<String, Long> temp = counterMap.entrySet().stream().filter(map->map.getValue() > 1).collect(Collectors.toMap(map -> map.getKey(), map -> map.getValue()));
System.out.println("test : "+temp);
//3. List of Duplicating Strings
List<String> masterStrings = Arrays.asList(strArray);
Set<String> duplicatingStrings =
masterStrings.stream().filter(i -> Collections.frequency(masterStrings, i) > 1).collect(Collectors.toSet());

Use Function.identity() inside Collectors.groupingBy and store everything in a MAP.
String a = "Gini Gina Gina Gina Gina Protijayi Protijayi ";
Map<String, Long> map11 = Arrays.stream(a.split(" ")).collect(Collectors
.groupingBy(Function.identity(),Collectors.counting()));
System.out.println(map11);
// output => {Gina=4, Gini=1, Protijayi=2}
In Python we can use collections.Counter()
a = "Roopa Roopi loves green color Roopa Roopi"
words = a.split()
wordsCount = collections.Counter(words)
for word,count in sorted(wordsCount.items()):
print('"%s" is repeated %d time%s.' % (word,count,"s" if count > 1 else "" ))
Output :
"Roopa" is repeated 2 times.
"Roopi" is repeated 2 times.
"color" is repeated 1 time.
"green" is repeated 1 time.
"loves" is repeated 1 time.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Stop words removal method in java not working - java

Related

Problems with sorting the characters of a word

How to sort two list in java concurrently?

Java - Hashmapping a text file

How to count words in array of strings in java?

Finding repeated words on a string and counting the repetitions

Categories

Resources