Want to count occurances of Strings in Java - java

So I have a .txt file which I am calling using
String[] data = loadStrings("data/data.txt");
The file is already sorted and essentially looks like:
Animal
Animal
Cat
Cat
Cat
Dog
I am looking to create an algorithm to count the sorted list in java, without using any libraries like Multisets or without the use of Maps/HashMaps. I have managed so far to get it print out the top occurring word like so:
ArrayList<String> words = new ArrayList();
int[] occurrence = new int[2000];
Arrays.sort(data);
for (int i = 0; i < data.length; i ++ ) {
words.add(data[i]); //Put each word into the words ArrayList
}
for(int i =0; i<data.length; i++) {
occurrence[i] =0;
for(int j=i+1; j<data.length; j++) {
if(data[i].equals(data[j])) {
occurrence[i] = occurrence[i]+1;
}
}
}
int max = 0;
String most_talked ="";
for(int i =0;i<data.length;i++) {
if(occurrence[i]>max) {
max = occurrence[i];
most_talked = data[i];
}
}
println("The most talked keyword is " + most_talked + " occuring " + max + " times.");
I want rather than just to get the highest occurring word perhaps the top 5 or top 10.
Hope that was clear enough. Thanks for reading

Since you said you dont want to use some kind of data structure i think that you can do something like this, but it is not performant.
I usually prefer to store index rather than values.
ArrayList<String> words = new ArrayList();
int[] occurrence = new int[2000];
Arrays.sort(data);
int nwords = 0;
occurrence[nwords]=1;
words.add(data[0]);
for (int i = 1; i < data.length; i ++ ) {
if(!data[i].equals(data[i-1])){ //if a new word is found
words.add(data[i]); //put it into the words ArrayList
nwords++; //increment the index
occurrence[nwords]=0; //initialize its occurrence counter
}
occurrence[nwords]++; //increment the occurrence counter
}
int max;
for(int k=0; k<5; k++){ //loop to find 5 times the most talked word
max = 0; //index of the most talked word
for(int i = 1; i<words.size(); i++) { //for every word
if(occurrence[i]>occurrence[max]) { //if it is more talked than max
max = i; //than it is the new most talked
}
}
println("The most talked keyword is " + words.get(max) + " occuring " + occurence[max] + " times.");
occurence[max]=0;
}
Every time I find the value with the higher occurence value, i set his occurrence counter to 0 and I reiterate again the array, this for 5 times.

If you cannot use Guava's Multiset, then you can implement an equivalent yourself. Basically, you just need to create a Map<String, Integer>, which keeps track of counts (value) per each word (key). This means changing this
ArrayList<String> words = new ArrayList<String>();
// ...
for (int i = 0; i < data.length; i ++ ) {
words.add(data[i]); //Put each word into the words ArrayList
}
into this:
Map<String, Integer> words = new HashMap<String>();
// ...
for (String word : data) {
Integer count = words.get(word);
words.put(word, (count != null : count.intValue() + 1 ? 1));
}
After you've filled the map, just sort it by the values.
If you cannot use a Map either, you can do the following:
First, create a wrapper class for your word counts:
public class WordCount implements Comparable<WordCount> {
private String word;
private int count;
public WordCount(String w, int c) {
this.word = w;
this.count = c;
}
public String getWord() {
return word;
}
public int getCount() {
return count;
}
public void incrementCount() {
count++;
}
#Override
public int compareTo(WordCount other) {
return this.count - other.count;
}
}
Then, change your code to store WordCount instances in your list (instead of Strings):
ArrayList<WordCount> words = new ArrayList<WordCount>();
// ...
for (String word : data) {
WordCount wc = new WordCount(word, 1);
boolean wordFound = false;
for (WordCount existing : words) {
if (existing.getWord().equals(wc.getWord())) {
existing.incrementCount();
wordFound = true;
break;
}
}
if (!wordFound) {
words.add(wc);
}
}
Finally, after populating the List, simply sort it using Collections.sort(). This is easy because the value objects implement Comparable:
Collections.sort(words, Collections.reverseOrder());

You could try something simple like this..
int count = 0;
for( int i = 0; i < words.size(); i++ ){
System.out.printf("%s: ", words.get( i ));
for( int j = 0; j < words.size(); j++ ) {
if( words.get( i ).equals( words.get( j ) ) )
count++;
}
System.out.printf( "%d\n", count );
}

Related

Using an array to input strings, and another one to output word frequency

So I am trying to complete this code. The goal is to input an array of strings, then count the frequency of how often the words are found. For example:
input:
joe
jim
jack
jim
joe
output:
joe 2
jim 2
jack 1
jim 2
joe 2
An array must be chosen for Strings, and another array much be chosen for word frequency.
My code so far:
I am stuck into trying to implement this. The string method is set, but how am I going to count the frequency of words, and also assign those values to an array. Then print both side by side. I do know that once the integer array is set. We can simply do a for loop to print the values together such as. System.out.println(String[i] + " " + countarray[i]);
public class LabClass {
public static int getFrequencyOfWord(String[] wordsList, int listSize, String currWord) {
int freq = 0;
for (int i = 0; i < listSize; i++) {
if (wordsList[i].compareTo(currWord) == 0) {
freq++;
}
}
return freq;
}
public static void main(String[] args) {
LabClass scall = new LabClass();
Scanner scnr = new Scanner(System.in);
// assignments
int listSize = 0;
System.out.println("Enter list Amount");
listSize = scnr.nextInt();
// removing line to allow input of integer
int size = listSize; // array length
// end of assignments
String[] wordsList = new String[size]; // string array
for (int i = 0; i < wordsList.length; i++) { //gathers string input
wordsList[i] = scnr.nextLine();
}
for (int i = 0; i < listSize; i++) {
String currWord = wordsList[i];
int freqCount = getFrequencyOfWord(wordsList, listSize, currWord);
System.out.println(currWord + " " + freqCount);
}
}
}
int some_method(String[] arr, String word) {
int count = 0;
for (int i=0; i<arr.size(); i++) {
if (arr[i].equals(word)) count++;
}
return count;
}
Then in main method:
String[] array = ["joe", "jake", "jim", "joe"] //or take from user input
int[] countArray = new int[array.size()]
for (int i=0; i<array.size(); i++) {
countArray[i] = some_method(array, array[i])
}
System.out.println(array[0] + " " + countArray[0]);
Ouput:
joe 2

Remove duplicate elements from an array recursively

I have this code, duplicate elements are removed from the array, it works, but I don't know how to make it recursive
public static void main(String[] args) {
dupe cadena= new dupe();
String arraycar[]={"h","i","e","l","o","i","s","e"};
System.out.println(arraycar);
for(int i=0; i<arraycar.length; i++){
for (int j=0; j<arraycar.length-1; j++){
if (i!=j){
if (arraycar[i]==arraycar[j]){
arraycar [i]="";
}
}
}
}
int n= arraycar.length;
for (int k=0; k<=n-1; k++){
if (arraycar[k]!=""){
System.out.println(arraycar[k]);
}
}
}
In current implementation there are nested loops and in the inner loop the duplicate elements are replaced with empty string.
A recursive implementation may replace the outer loop, so the recursive method should have an index argument, and the exit condition will be met when the index achieves the length of the input array.
Example implementation:
public static void removeDuplicates(String[] arr) {
removeDuplicates(1, arr);
}
public static void removeDuplicates(int start, String[] arr) {
if (start >= arr.length) {
return;
}
String val = arr[start - 1];
int nonDup = -1; // the index of the first non-empty string to use as start
for (int i = start; i < arr.length; i++) {
if (val.equals(arr[i])) {
arr[i] = "";
}
else if (nonDup == -1) {
nonDup = i;
}
}
if (nonDup >= start) { // non-duplicate candidates are available
removeDuplicates(nonDup + 1, arr);
}
}
Test
String arraycar[]={"h","h", "i","e","l","o","i","s","e", "s"};
Arrays.stream(arraycar).filter(Predicate.not(""::equals)).forEach(s -> System.out.print(s + " "));
removeDuplicates(arraycar);
System.out.println("\nduplicates removed:");
Arrays.stream(arraycar).filter(Predicate.not(""::equals)).forEach(s -> System.out.print(s + " "));
Output
h h i e l o i s e s
duplicates removed:
h i e l o s
However, one of the simplest way would be to use Stream::distinct to get rid of duplicates (no recursion needed):
String[] noDups = Arrays.stream(arraycar).distinct().toArray(String[]::new);

Maximum repeated String in an array

The problem is
how to get the maximum repeated String in an array using only operations on the arrays in java?
so i got into this question in a test and couldn't figure it out.
lets suppose we have an array of string.
str1[] = { "abbey", "bob", "caley", "caley", "zeeman", "abbey", "bob", "abbey" }
str2[] = { "abbey", "bob", "caley", "caley", "zeeman", "abbey", "bob", "abbey", "caley" }
in str1 abbey was maximum repeated, so abbey should be returned and
in str2 abbey and caley both have same number of repetitions and hence we take maximum alphabet as the winner and is returned(caley here).
c > a
so i tried till
import java.util.*;
public class test {
static String highestRepeated(String[] str) {
int n = str.length, num = 0;
String temp;
String str2[] = new String[n / 2];
for (int k = 0;k < n; k++) { // outer comparision
for (int l = k + 1; l < n; l++) { // inner comparision
if (str[k].equals(str[l])) {
// if matched, increase count
num++;
}
}
// I'm stuck here
}
return result;
}
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
System.out.println("enter how many votes");
int n = sc.nextInt();
String[] str = new String[n];
for (int i = 0; i < n; i++) {
Str[i] = sc.nextLine();
}
String res = highestRepeated(str);
System.out.println(res + " is the winner");
}
}
so, how should i take the count of occurrence of each string with and attach it with the string itself.
All this, without using a map and any hashing but just by using arrays?
Here is a (unpolished) solution:
static String highestRepeated(String[] str) {
String[] sorted = Arrays.copyOf(str, str.length);
Arrays.sort(sorted, 0, sorted.length, Comparator.reverseOrder());
String currentString = sorted[0];
String bestString = sorted[0];
int maxCount = 1;
int currentCount = 1;
for (int i = 1 ; i < sorted.length ; i++) {
if (currentString.equals(sorted[i])) {
currentCount++;
} else {
if (maxCount < currentCount) {
maxCount = currentCount;
bestString = currentString;
}
currentString = sorted[i];
currentCount = 1;
}
}
if (currentCount > maxCount) {
return currentString;
}
return bestString;
}
Explanation:
Sort the array from highest to lowest lexicographically. That's what Arrays.sort(sorted, 0, sorted.length, Comparator.reverseOrder()); does. we sort in this order because you want the largest string if there are multiple strings with the same number of repeats.
Now we can just count the strings by looping through the array. We don't need a hash map or anything because we know that there will be no more of a string in the rest of the array when we encounter a different string.
currentString is the string that we are currently counting the number of repeats of, using currentCount. maxCount is the number of occurrence of the most repeated string - bestString - that we have currently counted.
The if statement is pretty self-explanatory: if it is the same string, count it, otherwise see if the previous string we counted (currentCount) appears more times than the current max.
At the end, I check if the last string being counted is more than max. If the last string in the array happens to be the most repeated one, bestString won't be assigned to it because bestString is only assigned when a different string is encountered.
Note that this algorithm does not handle edge cases like empty arrays or only one element arrays. I'm sure you will figure that out yourself.
another version
static String lastMostFrequent(String ... strs) {
if (strs.length == 0) return null;
Arrays.sort(strs);
String str = strs[0];
for (int longest=0, l=1, i=1; i<strs.length; i++) {
if (!strs[i-1].equals(strs[i])) { l=1; continue; }
if (++l < longest) continue;
longest = l;
str = strs[i];
}
return str;
}
change in
if (++l <= longest) continue;
for firstMostFrequent
you can't use == to check if two strings are the same.
try using this instead:
if (str[k].equals(str[l])) {
// if matched, increase count
num++;
}

deleting an object from an array of objects in java

import java.util.StringTokenizer;
class Count {
int count;
String name;
void SetCount(int c, String n) {
this.count = c;
this.name = n;
}
void Show() {
System.out.print("Word= " + name);
System.out.print(" Count= " + count);
System.out.println();
}
}
class Contains2 extends Count {
public static void main(String args[]) {
String s = "Hello this program will repeat itself for this useless purpose and will not end until it repeats itself again and again and again so watch out";
int i, c2, j;
StringTokenizer st = new StringTokenizer(s, " ");
c2 = st.countTokens();
String[] test = new String[c2];
Count[] c = new Count[c2];
for (i = 0; i < c2; i++) {
c[i] = new Count();
}
i = 0;
while (st.hasMoreTokens()) {
String token = st.nextToken();
test[i] = token;
c[i].SetCount(0, test[i]);
i++;
}
for (i = 0; i < c2; i++) {
for (j = 0; j < c2; j++) {
if (c[i].name.equals(test[j]))
c[i].count += 1;
}
}
for (i = 0; i < c2; i++) {
c[i].Show();
}
}
}
so i made this small program to count the number every word was repeated in a paragraph. its working as planned but now i am getting duplicates of every word since i made separate objects for each and printing them all. so is there any way i could delete the duplicate words i mean deleting those objects based on their names. i can set them to null but it would still print them so i just wanna get rid of them or skip them somehow
You cannot adjust the size of a array once it's created. You could only create a new array with a different size and copy the non-null elements.
You could also set elements to null and just ignore those elements for printing...
for (i = 0; i < c2; i++) {
if (c[i] != null)
c[i].Show();
}
An alternative would be using a List, which allows you to remove elements.
Alternatively a Map<String, Integer> mapping from a word to the count could be used. In this case you don't even need the Count class:
String s = "Hello this program will repeat itself for this useless purpose and will not end until it repeats itself again and again and again so watch out";
StringTokenizer st = new StringTokenizer(s, " ");
Map<String, Integer> map = new HashMap<>();
while (st.hasMoreTokens()) {
map.merge(st.nextToken(), 1, Integer::sum);
}
for (Map.Entry<String, Integer> e : map.entrySet()) {
System.out.print("Word= " + e.getKey());
System.out.print(" Count= " + e.getValue());
System.out.println();
}

Extract words from an array of Strings in java based on conditions

I am trying to do an assignment that works with Arrays and Strings. The code is almost complete, but I've run into a hitch. Every time the code runs, it replaces the value in the index of the output array instead of putting the new value in a different index. For example, if I was trying to search for the words containing a prefix "b" in the array of strings, the intended output is "bat" and "brewers" but instead, the output comes out as "brewers" and "brewers". Any suggestions? (ps. The static main method is there for testing purposes.)
--
public static void main(String[] args) {
String[] words = {"aardvark", "bat", "brewers", "cadmium", "wolf", "dastardly", "enigmatic", "frenetic",
"sycophant", "rattle", "zinc", "alloy", "tunnel", "nitrate", "sample", "yellow", "mauve", "abbey",
"thinker", "junk"};
String prefix = "b";
String[] output = new String[wordsStartingWith(words, prefix).length];
output = wordsStartingWith(words, prefix);
for (int i = 0; i < output.length; i++) {
System.out.println("Words: " + i + " " + output[i]);
}
}
public static String[] wordsStartingWith(String[] words, String prefix) {
// method that finds and returns all strings that start with the prefix
String[] returnWords;
int countWords = 0;
for (int i = 0; i < words.length; i++) {
// loop to count the number of words that actually have the prefix
if (words[i].substring(0, prefix.length()).equalsIgnoreCase(prefix)) {
countWords++;
}
}
// assign length of array based on number of words containing prefix
returnWords = new String[countWords];
for (int i = 0; i < words.length; i++) {
// loop to put strings containing prefix into new array
for (int j = 0; j < returnWords.length; j++) {
if (words[i].substring(0, prefix.length()).equalsIgnoreCase(prefix)) {
returnWords[j] = words[i];
}
}
}
return returnWords;
}
--
Thank You
Soul
Don't reinvent the wheel. Your code can be replaced by this single, easy to read, bug free, line:
String[] output = Arrays.stream(words)
.filter(w -> w.startsWith(prefix))
.toArray(String[]::new);
Or if you just want to print the matching words:
Arrays.stream(words)
.filter(w -> w.startsWith(prefix))
.forEach(System.out::println);
Its because of the code you have written. If you would have thought it properly you would have realized your mistake.
The culprit code
for (int j = 0; j < returnWords.length; j++) {
if (words[i].substring(0, prefix.length()).equalsIgnoreCase(prefix)) {
returnWords[j] = words[i];
}
}
When you get a matching word you set whole of your output array to that word. This would mean the last word found as satisfying the condition will replace all the previous words in the array.
All elements of array returnWords gets first initialized to "bat" and then each element gets replaced by "brewers"
corrected code will be like this
int j = 0;
for (int i = 0; i < words.length; i++) {
if (words[i].substring(0, prefix.length()).equalsIgnoreCase(prefix)) {
returnWords[j] = words[i];
j++;
}
}
Also you are doing multiple iterations which is not exactly needed.
For example this statement
String[] output = new String[wordsStartingWith(words, prefix).length];
output = wordsStartingWith(words, prefix);
can be rectified to a simpler statement
String[] output = wordsStartingWith(words, prefix);
The way you're doing this is looping through the same array multiple times.
You only need to check the values once:
public static void main(String[] args) {
String[] words = {"aardvark", "bat", "brewers", "cadmium", "wolf", "dastardly", "enigmatic", "frenetic",
"sycophant", "rattle", "zinc", "alloy", "tunnel", "nitrate", "sample", "yellow", "mauve", "abbey",
"thinker", "junk"};
String prefix = "b";
for (int i = 0; i < words.length; i++) {
if (words[i].toLowerCase().startsWith(prefix.toLowerCase())) {
System.out.println("Words: " + i + " " + words[i]);
}
}
}
Instead of doing two separate loops, try just having one:
String[] returnWords;
int[] foundWords = new int[words.length];
int countWords = 0;
for (int i = 0; i < words.length; i++) {
// loop to count the number of words that actually have the prefix
if (words[i].substring(0, prefix.length()).equalsIgnoreCase(prefix)) {
foundWords[index] = words[i];
countWords++;
}
}
// assign length of array based on number of words containing prefix
returnWords = new String[countWords];
for (int i = 0; i < countWords; i++) {
returnWords[i] = foundWords[i];
}
My method has another array (foundWords) for all the words that you found during the first loop which has the size of words in case every single word starts with the prefix. And index keeps track of where to place the found word in foundWords. And lastly, you just have to go through the countWords and assign each element to your returnWords.
Not only will this fix your code but it will optimize it so that it will run faster (very slightly; the bigger the word bank is, the greater fast it will search through).

Categories

Resources