I want to list every unique word in a text file and how many times every word is found in it.
I tried using an if cycle but I'm not sure how to eliminate the already listed words after they are being counted.
for (int i = 0; i < words.size(); i++) {
count = 1;
//Count each word in the file and store it in variable count
for (int j = i + 1; j < words.size(); j++) {
if (words.get(i).equals(words.get(j))) {
count++;
}
}
System.out.println("The word " + words.get(i) + " can be
found " + count + " times in the file.");
}
The contents of the text file is "Hello world. Hello world.", and the program will print the following:
The word Hello can be found 2 times in the file.
The word world can be found 2 times in the file.
The word Hello can be found 1 times in the file.
The word world can be found 1 times in the file.
I would suggest to leverage a HashMap to solve this problem. simply put, HashMap is a key value pair that hashes the keys and has a search complexity of O(1).
Iterate the list of words only once and keep on storing the encountered word in a HashMap. when you encounter a word, check if it already exists in the HashMap. If it does not exist, add it to the map with key as the word itself and value as 1.
if The word alrady exists, Increase the value by 1.
After completing the iteration, the HashMap would contain key value pairs of unique words vs their count !!
just in case if you are not aware of maps in java - https://www.javatpoint.com/java-hashmap
You need to use an ArrayList to store the already found words, and after that, you need to check every word in the file, whether it is present within the ArrayList or not. If the word is present inside the ArrayList, you need to ignore that word. Otherwise, add that word to the ArrayList.
A sample code for you:
ArrayList<String> found_words=new ArrayList<String>();
public static void main(String arguments[])
{
String data=""; //data from your file
String[] words=data.split("\\s"); //split the string into individual words
for(int i=0;i<words.length;i++)
{
String current_word=words[i];
if(!is_present(current_word))
{
found_words.add(current_word);
int count=1;
for(int j=i+1;j<words.length;j++)
{
if(words[j].equals(words[i]))
++count;
}
System.out.println("The word "+current_word+" can be found "+count+" times in the file.");
}
}
}
static boolean is_present(String word)
{
for(int i=0;i<found_words.size();i++)
{
if(found_words.get(i).equals(word))
return true;
}
return false;
}
You could do this :
public void printWordOccurence(String filePath) throws FileNotFoundException {
if (filePath.isEmpty())
return;
File file = new File(filePath);
Scanner input = new Scanner(file);
HashMap<String, Integer> wordOccurence = new HashMap<>();
while (input.hasNext()) {
wordOccurence.merge(input.next(), 1, Integer::sum);
}
for (String word : wordOccurence.keySet()) {
System.out.println(word + " appears " + wordOccurence.get(word) + " times");
}
}
Related
I'm working on a project where I enter a URL, the file is read and the amount of lines, characters, and words are outputted in a text file. I'm not having an issue with that. Code below will be pretty long, sorry in advance.
I also have to output to the same text file all of the words in the file, and the amount of times each word is displayed in the file. I've been working on it for a while and I've gotten to the point where all the lines/characters/words are outputted to the text file, but I can't figure out how to display the actual words and the amount of times they are in the file.
String[] wordSubstrings = line.replaceAll("\\s+", " ").split(" ");
List<String> uniqueWords = new ArrayList<String>();
for (int i = 0; i < wordSubstrings.length; i++) {
if (!(uniqueWords.contains(wordSubstrings[i]))) {
uniqueWords.add(wordSubstrings[i]);
You could use a Multiset
Multiset<String> words = HashMultiset.create();
for (String word : wordList)
words.add(word);
for (String word : words.elementSet())
System.out.println(word + ": " + words.count(word));
I've tested something with a HashMap which seems to work pretty well.
Here is my code that I used to test it, I hope it helps:
String[] wordSubstrings = new String[]{"test","stuff","test","thing","test","test","stuff"};
HashMap<String,Integer> uniqueWords = new HashMap<>();
for ( int i = 0; i < wordSubstrings.length; i++)
{
if(!(uniqueWords.containsKey(wordSubstrings[i])))
{
uniqueWords.put(wordSubstrings[i], 1);
}
else
{
int number = uniqueWords.get(wordSubstrings[i]);
uniqueWords.put(wordSubstrings[i],number + 1);
}
}
for (Map.Entry<String, Integer> entry : uniqueWords.entrySet()) {
String key = entry.getKey();
int value = entry.getValue();
//Do Something with the key and value
}
You can use arraylist of class which will contain word and count as member variables.
List <MyClass> uniqueWords = new ArrayList<MyClass> ();
MyClass()
{
String uniqueword;
int count;
}
I am trying to write a program that reads a text file, counts the total number of words and determines which words are repeated in the text, and how many times they occur (for simplicity, the text file contains no punctuation).
I have the following code for finding the repeated words, and how many times they occur.
ArrayList<String> words = new ArrayList<String>();
String myString;
String[] line;
// Read words from file, populate array
while ((myString=br.readLine()) != null) {
line = myString.split(" ");
for (String word : line) {
words.add(word.toLowerCase()); // Ignore case
}
}
The above part reads the text file, and adds every word into an ArrayList, words. The following part uses HashSet to determine which words in the ArrayList words occur more than once. It then prints out these words, followed by a counter indicating the number of occurrences.
// Count the occurrences of each word
Set<String> unique = new HashSet<String>(words);
for (String key : unique) {
if (Collections.frequency(words, key) > 1) {
System.out.println(key + ": " + Collections.frequency(words, key));
}
}
Is there a way to do this WITHOUT using HashSet? For example by using two arrays, and comparing them simultaneously? I have tried the following:
numWords = words.size();
String[] wordArray = new String[numWords];
String[] newWordArray = new String[wordArray.length];
String compWord;
// Find duplicates
for(int i = 0; i < newWordArray.length; i++) {
for(int j = 0; j < newWordArray.length; j++) {
if( i != j && newWordArray[i].equals(newWordArray[j])) {
compWord = newWordArray[i];
System.out.println(compWord);
}
}
}
This only prints out the words that occur more than once, as they are read from the file. Which means that it detects repeated words. However, is there a way to get these words in the form "[WORD : timesRepeated]"?
I'm a newbie Java Developer. I want to write code to count the number of palindrome words in the paragraph using Java.
The assumptions are : User can enter a paragraph containing as many sentences as possible. Each word is separated by a whitespace, and each sentence is separated by a period and The punctuation right before or after the word will be ignored, while the punctuation inside the word will be counted.
Sample Input : Otto goes to school. Otto sees a lot of animals at the pets store.
Sample output : Otto = 2 a = 1 Sees = 1
Read the file into your program, split the entries at every space and enter those into an arraylist. Afterwards, apply your palindrome algorithm onto each value in your arraylist and keep track of the words that were a palindrome and their occurences (for example a 2D array, or an arraylist with an object that holds both values).
When you've followed these steps, you should pretty much be there. More specific help will probably be given once you've shown attempts of your own.
Using Collections in java will reduce the programming effort
Algorithm :
Read the paragraph to a String variable
Split the String using StringTokenizer using token as ' '(space) and add each word to ArrayList (Set wont allow duplicates)
Write a method which return boolean (TRUE/ FALSE) value based on whether a given String is palindrome or not.
Define a Map to hold the values of palindrome String and number of times it is repeated.
If yes
add the String to Map with key as palindrome String and value as number of times
else
dont add the String to Map
Repeat the same logic until all the words are finished
Sample Code:
` public class StringPalindromeCalculator {
private Map<String, int> wordsMap = new HashMap<>();
private List<String> wordsList = new ArrayLiat<>();
private boolean isPalindrome(String inputString) {
// write String palindrome logic here
}
public Map<String, int> findPalindromeWords(String completeString) {
StringTokenizer wordTokenizer = new StringTokenizer(completeString, ' ');
while(wordTokenizer.hasMoreTokens()) {
wordsList.add(wordTokenizer.nextToken());
}
for(String word : wordsList) {
if(isPalindrome(word)) {
if(wordsMap.containsKey(word)) {
// increment the value of word
}
} else {
// put the word into Map and return the map value
}
}
return wordsMap;
}
}`
Hope this Helps :)
public class Palindrome {
int count = 0;
public static void main(String[] args) {
String a = "malayalammadyoydaraarasdasdkfjasdsjhtj";
Palindrome palindrome = new Palindrome();
palindrome.countPalin(a);
}
private int countPalin(String str) {
for (int i = 0; i < str.length() - 1; i++) {
char start = str.charAt(i);
String st = "";
st += start;
for (int j = i + 1; j < str.length(); j++) {
st += str.charAt(j);
StringBuffer rev = new StringBuffer(st).reverse();
if (st.equals(rev.toString()) && st.length() > 1) {
System.out.println(st.toString());
count++;
}
}
st = "";
}
System.out.println("Total Count : " + count);
return count;
}
}
Hi all I wrote a mergesort program for a string array that reads in .txt files from the user. But what I want to do now is compare both files and print out the words in file one and not in file two for example apple is in file 1 but not file 2. I tried storing it in a string array again and then printing that out at the end but I just cant seem to implement it.
Here is what I have,
FileIO reader = new FileIO();
String words[] = reader.load("C:\\list1.txt");
String list[] = reader.load("C:\\list2.txt");
mergeSort(words);
mergeSort(list);
String x = null ;
for(int i = 0; i<words.length; i++)
{
for(int j = 0; j<list.length; j++)
{
if(!words[i].equals(list[j]))
{
x = words[i];
}
}
}
System.out.println(x);
Any help or suggestions would be appriciated!
If you want to check the words that are in the first array but do not exist in the second, you can do like this:
boolean notEqual = true;
for(int i = 0; i<words.length; i++)
{
for(int j = 0; j<list.length && notEqual; j++)
{
if(words[i].equals(list[j])) // If the word of file one exist
{ // file two we set notEqual to false
notEqual = false; // and we terminate the inner cycle
}
}
if(notEqual) // If the notEqual remained true
System.out.println(words[i]); // we print the the element of file one
// that do not exist in the second file
notEqual = true; // set variable to true to be used check
} // the other words of file one.
Basically, you take a word from the first file (string from the array) and check if there is a word in file two that is equal. If you find it, you set the control variable notEqual to false, thus getting out of the inner loop for and not print the word. Otherwise, if there is not any word on file two that match the word from file one, the control variable notEqual will be true. Hence, print the element outside the inner loop for.
You can replace the printing statement, for another one that store the unique word in an extra array, if you wish.
Another solution, although slower that the first one:
List <String> file1Words = Arrays.asList(words);
List <String> file2Words = Arrays.asList(list);
for(String s : file1Words)
if(!file2Words.contains(s))
System.out.println(s);
You convert your arrays to a List using the method Arrays.asList, and use the method contains to verify if the word of the first file is on the second file.
Why not just convert the Arrays to Sets? Then you can simply do
result = wordsSet.removeAll(listSet);
your result will contain all the words that do not exist in list2.txt
Also keep in mind that the set will remove duplicates ;)
you can also just go through the loop and add it when you reached list.length-1.
and if it matches you can break the whole stuff
FileIO reader = new FileIO();
String words[] = reader.load("C:\\list1.txt");
String list[] = reader.load("C:\\list2.txt");
mergeSort(words);
mergeSort(list);
//never ever null
String x = "" ;
for(int i = 0; i<words.length; i++)
{
for(int j = 0; j<list.length; j++)
{
if(words[i].equals(list[j]))
break;
if(j == list.length-1)
x += words[i] + " ";
}
}
System.out.println(x);
Here is a version (though it does not use sorting)
String[] file1 = {"word1", "word2", "word3", "word4"};
String[] file2 = {"word2", "word3"};
List<String> l1 = new ArrayList(Arrays.asList(file1));
List<String> l2 = Arrays.asList(file2);
l1.removeAll(l2);
System.out.println("Not in file2 " + l1);
it prints
Not in file2 [word1, word4]
This looks kind of close. What you're doing is for every string in words, you're comparing it to every word in list, so if you have even one string in list that's not in words, x is getting set.
What I'd suggest is changing if(!words[i].equals(list[j])) to if(words[i].equals(list[j])). So now you know that the string in words appears in list, so you don't need to display it. if you completely cycle through list without seeing the word, then you know you need to explain it. So something like this:
for(int i = 0; i<words.length; i++)
{
boolean wordFoundInList = false;
for(int j = 0; j<list.length; j++)
{
if(words[i].equals(list[j]))
{
wordFoundInList = true;
break;
}
}
if (!wordFoundInList) {
System.out.println(x);
}
}
Ok..so I am doing a program on NLP. It uses function eliminateStopWords(). This function reads from a 2D array "sentTokens" (of detected tokens). In the code below, index i is sentence number, j is for each token in the ith sentence.
Now, what my eliminateStopWords() does is this:
it reads stop words from a text file and stores them in a TreeSet
reads tokens from sentTokens array and checks them for stop words. If they are collocations, then they should not be checked for stop words, they are just dumped into a finalTokens array. If they are not a collection, then they are individually checked for stop words and are added to finalTokens array only if they are not stop words.
The problem comes in the loop of this step 2. Here is some code of it: (I have marked // HERE at the location where the error actually occurs... it's near the end)
private void eliminateStopWords() {
try {
// Loading TreeSet for stopwords from the file.
stopWords = new TreeSet<String> ();
fin = new File("stopwords.txt");
fScan = new Scanner(fin);
while (fScan.hasNextLine())
stopWords.add(fScan.nextLine());
fScan.close();
/* Test code to print all read stopwords
iter2 = stopWords.iterator();
while (iter2.hasNext())
System.out.println(iter2.next()); */
int k=0,m=0; // additional indices for finalTokens array
System.out.println(NO_OF_SENTENCES);
newSentence: for(i=0; i < NO_OF_SENTENCES; i++)
{
System.out.println("i = " + i);
for (j=0; j < sentTokens[i].length; j+=2)
{
System.out.println("j = " + j);
// otherwsise, get two successive tokens
String currToken = sentTokens[i][j];
String nextToken = sentTokens[i][j+1];
System.out.println("i = " + i);
System.out.println(currToken + " " + nextToken);
if ( isCollocation(currToken, nextToken) ) {
// if the current and next tokens form a bigram collocation, they are not checked for stop words
// but are directly dumped into finalTokens array
finalTokens[k][m] = currToken; m++;
finalTokens[k][m] = nextToken; m++;
}
if ( !stopWords.contains(currToken) )
{ finalTokens[k][m] = currToken; m++; }
if ( !stopWords.contains(nextToken) )
{ finalTokens[k][m] = nextToken; m++; }
// if current token is the last in the sentence, do not check for collocations, only check for stop words
// this is done to avoid ArrayIndexOutOfBounds Exception in sentences with odd number of tokens
// HERE
System.out.println("i = " + i);
if ( j==sentTokens[i].length - 2) {
String lastToken = sentTokens [i][++j];
if (!stopWords.contains(lastToken))
{ finalTokens[k][m] = lastToken; m++; }
// after analyzing last token, move to analyzing the next sentence
continue newSentence;
}
}
k++; // next sentence in finalTokens array
}
// Test code to print finalTokens array
for(i=0; i < NO_OF_SENTENCES; i++) {
for (j=0; j < finalTokens[i].length; j++)
System.out.print( finalTokens[i][j] + " " );
System.out.println();
}
}
catch (Exception e) {
e.printStackTrace();
}
}
I have printed the indices i & j at the entry of their respective for loops...it all works fine for the first iteration of the loop, but when the loop is about to reach its end... I have printed again the value of 'i'. This time it comes out as 14.
it starts the first iteration with 0...
does not get manipulated anywhere in the loop...
and just by the end of (only) first iteration, it prints the value as 14
I mean this is seriously the WEIRDEST error I have come across ever while working with Java. It throws up an ArrayIndexOutOfBoundsException just before the final if block. It's like MAGIC. You do nothing on the variable in the code, still the value changes. HOW CAN THIS HAPPEN?
You never declared i or j in your code, which leads me to believe that they are fields.
I'm pretty sure that some of your other methods re-use those variables and thus mess with your result. isCollocation looks like a candidate for that.
The counters in for loops should always be local variables, ideally declared inside the for statement itself (for minimal scope). Everything else is just asking for trouble (as you see).