Search for matching words in text WITHOUT using HashSet - java

I am trying to write a program that reads a text file, counts the total number of words and determines which words are repeated in the text, and how many times they occur (for simplicity, the text file contains no punctuation).
I have the following code for finding the repeated words, and how many times they occur.
ArrayList<String> words = new ArrayList<String>();
String myString;
String[] line;
// Read words from file, populate array
while ((myString=br.readLine()) != null) {
line = myString.split(" ");
for (String word : line) {
words.add(word.toLowerCase()); // Ignore case
}
}
The above part reads the text file, and adds every word into an ArrayList, words. The following part uses HashSet to determine which words in the ArrayList words occur more than once. It then prints out these words, followed by a counter indicating the number of occurrences.
// Count the occurrences of each word
Set<String> unique = new HashSet<String>(words);
for (String key : unique) {
if (Collections.frequency(words, key) > 1) {
System.out.println(key + ": " + Collections.frequency(words, key));
}
}
Is there a way to do this WITHOUT using HashSet? For example by using two arrays, and comparing them simultaneously? I have tried the following:
numWords = words.size();
String[] wordArray = new String[numWords];
String[] newWordArray = new String[wordArray.length];
String compWord;
// Find duplicates
for(int i = 0; i < newWordArray.length; i++) {
for(int j = 0; j < newWordArray.length; j++) {
if( i != j && newWordArray[i].equals(newWordArray[j])) {
compWord = newWordArray[i];
System.out.println(compWord);
}
}
}
This only prints out the words that occur more than once, as they are read from the file. Which means that it detects repeated words. However, is there a way to get these words in the form "[WORD : timesRepeated]"?

Related

how to find the most common phrases in a list of strings

I got a list of sentences. I split each sentences and filtered the unwanted words and puncuations. and then store them into
ArrayList<ArrayList<String>> sentence
then I used a hashMap to find the most common word. how could I modify the following hashmap code so I can also find the most common consecutive pairs of words.(N-grams for phrases)
HashMap<String, Integer> hashMap = new HashMap<>();
// Splitting the words of string
// and storing them in the array.
for(int i =0; i < sentence.size(); i++){
ArrayList<String> words = new ArrayList<String>(sentence.get(i));
for (String word : words) {
//Asking whether the HashMap contains the
//key or not. Will return null if not.
Integer integer = hashMap.get(word);
if (integer == null)
// Storing the word as key and its
// occurrence as value in the HashMap.
hashMap.put(word, 1);
else {
// Incrementing the value if the word
// is already present in the HashMap.
hashMap.put(word, integer + 1);
}
}
}
i dont know where to start. should i adjust the way i split or do i no split at all in the first place.
To find the most common consecutive pairs of words (N-grams for phrases), you can modify the above code by looping through the sentence arraylist and creating a new hashmap with the pairs of words as the keys and the number of times they appear as the values. Then, you can iterate through the new hashmap and find the pair of words with the highest value.
public static String getMostCommonNGram(ArrayList<ArrayList<String>> sentence) {
HashMap<String, Integer> nGramMap = new HashMap<>();
// loop through the sentences
for (ArrayList<String> words : sentence) {
// loop through the words and create pairs of words
for (int i = 0; i < words.size() - 1; i++) {
String nGram = words.get(i) + " " + words.get(i + 1);
// check if the n-gram already exists in the map
Integer count = nGramMap.get(nGram);
// if not, add it to the map with count = 1
if (count == null) {
nGramMap.put(nGram, 1);
} else {
// if yes, increment the count
nGramMap.put(nGram, count + 1);
}
}
}
// find the n-gram with the highest count
String mostCommonNGram = "";
int maxCount = 0;
for (String nGram : nGramMap.keySet()) {
int count = nGramMap.get(nGram);
if (count > maxCount) {
maxCount = count;
mostCommonNGram = nGram;
}
}
return mostCommonNGram;
}

Printing Individual Words Without Repeating?

I want to list every unique word in a text file and how many times every word is found in it.
I tried using an if cycle but I'm not sure how to eliminate the already listed words after they are being counted.
for (int i = 0; i < words.size(); i++) {
count = 1;
//Count each word in the file and store it in variable count
for (int j = i + 1; j < words.size(); j++) {
if (words.get(i).equals(words.get(j))) {
count++;
}
}
System.out.println("The word " + words.get(i) + " can be
found " + count + " times in the file.");
}
The contents of the text file is "Hello world. Hello world.", and the program will print the following:
The word Hello can be found 2 times in the file.
The word world can be found 2 times in the file.
The word Hello can be found 1 times in the file.
The word world can be found 1 times in the file.
I would suggest to leverage a HashMap to solve this problem. simply put, HashMap is a key value pair that hashes the keys and has a search complexity of O(1).
Iterate the list of words only once and keep on storing the encountered word in a HashMap. when you encounter a word, check if it already exists in the HashMap. If it does not exist, add it to the map with key as the word itself and value as 1.
if The word alrady exists, Increase the value by 1.
After completing the iteration, the HashMap would contain key value pairs of unique words vs their count !!
just in case if you are not aware of maps in java - https://www.javatpoint.com/java-hashmap
You need to use an ArrayList to store the already found words, and after that, you need to check every word in the file, whether it is present within the ArrayList or not. If the word is present inside the ArrayList, you need to ignore that word. Otherwise, add that word to the ArrayList.
A sample code for you:
ArrayList<String> found_words=new ArrayList<String>();
public static void main(String arguments[])
{
String data=""; //data from your file
String[] words=data.split("\\s"); //split the string into individual words
for(int i=0;i<words.length;i++)
{
String current_word=words[i];
if(!is_present(current_word))
{
found_words.add(current_word);
int count=1;
for(int j=i+1;j<words.length;j++)
{
if(words[j].equals(words[i]))
++count;
}
System.out.println("The word "+current_word+" can be found "+count+" times in the file.");
}
}
}
static boolean is_present(String word)
{
for(int i=0;i<found_words.size();i++)
{
if(found_words.get(i).equals(word))
return true;
}
return false;
}
You could do this :
public void printWordOccurence(String filePath) throws FileNotFoundException {
if (filePath.isEmpty())
return;
File file = new File(filePath);
Scanner input = new Scanner(file);
HashMap<String, Integer> wordOccurence = new HashMap<>();
while (input.hasNext()) {
wordOccurence.merge(input.next(), 1, Integer::sum);
}
for (String word : wordOccurence.keySet()) {
System.out.println(word + " appears " + wordOccurence.get(word) + " times");
}
}

Compare two arrayList and get longest matching String

So what I'm trying to do is get two text files and to return the longest matching string in both. I put both textfiles in arraylist and seperated them by everyword. This is my code so far, but I'm just wondering how I would return the longest String and not just the first one found.
for(int i = 0; i < file1Words.size(); i++)
{
for(int j = 0; j < file2Words.size(); j++)
{
if(file1Words.get(i).equals(file2Words.get(j)))
{
matchingString += file1Words.get(i) + " ";
}
}
}
String longest = "";
for (String s1: file1Words)
for (String s2: file2Words)
if (s1.length() > longest.length() && s1.equals(s2)) longest = s1;
if you are looking for performance in time and space,when compared to above replies, you can use below code.
System.out.println("Start time :"+System.currentTimeMillis());
String longestMatch="";
for(int i = 0; i < file1Words.size(); i++) {
if(file1Words.get(i).length()>longestMatch.length()){
for(int j = 0; j < file2Words.size(); j++) {
String w = file1Words.get(i);
if (w.length() > longestMatch.length() && w.equals(file2Words.get(j)))
longestMatch = w;
}
}
System.out.println("End time :"+System.currentTimeMillis());
I'm not going to give you the code but I'll help you with the main ides...
You will need a new string variable "curLargestString" to keep track of what is currently the largest string. Declare this outside of your for loops. Now, for every time you get two matching words, compare the size of the matching word to the size of the size of the word in "curLargestString". If the new matching word is larger, than set "curLargestString" to the new word. Then, after your for loop have run, return curLargestString.
One more note, be sure to initialize curLargestString with an empty string. This will prevent an error when you call the size function on it after you get your first matching word
Assuming, your files are small enough to fit in memory, sort them both with a custom comparator, that puts longer strings before shorter ones, and otherwise sorts lexicographically.
Then go through both files in order, advancing only one index at a time (teh one, pointing to the "smallest" entry of two), and return the first match.
You can use following code:
String matchingString = "";
Set intersection = new HashSet(file1Words);
intersection.retainAll(file2Words)
for(String word: intersection)
if(word.length() > matchingString.size())
matchingString = word;
private String getLongestString(List<String> list1, List<String> list2) {
String longestString = null;
for (String list1String : list1) {
if (list1String.size() > longestString.size()) {
for (String list2String : list2) {
if (list1String.equals(list2String)) {
longestString = list1String;
}
}
}
}
return longestString;
}

how to count the number of words of the same(PALINDROME) in java

I'm a newbie Java Developer. I want to write code to count the number of palindrome words in the paragraph using Java.
The assumptions are : User can enter a paragraph containing as many sentences as possible. Each word is separated by a whitespace, and each sentence is separated by a period and The punctuation right before or after the word will be ignored, while the punctuation inside the word will be counted.
Sample Input : Otto goes to school. Otto sees a lot of animals at the pets store.
Sample output : Otto = 2 a = 1 Sees = 1
Read the file into your program, split the entries at every space and enter those into an arraylist. Afterwards, apply your palindrome algorithm onto each value in your arraylist and keep track of the words that were a palindrome and their occurences (for example a 2D array, or an arraylist with an object that holds both values).
When you've followed these steps, you should pretty much be there. More specific help will probably be given once you've shown attempts of your own.
Using Collections in java will reduce the programming effort
Algorithm :
Read the paragraph to a String variable
Split the String using StringTokenizer using token as ' '(space) and add each word to ArrayList (Set wont allow duplicates)
Write a method which return boolean (TRUE/ FALSE) value based on whether a given String is palindrome or not.
Define a Map to hold the values of palindrome String and number of times it is repeated.
If yes
add the String to Map with key as palindrome String and value as number of times
else
dont add the String to Map
Repeat the same logic until all the words are finished
Sample Code:
` public class StringPalindromeCalculator {
private Map<String, int> wordsMap = new HashMap<>();
private List<String> wordsList = new ArrayLiat<>();
private boolean isPalindrome(String inputString) {
// write String palindrome logic here
}
public Map<String, int> findPalindromeWords(String completeString) {
StringTokenizer wordTokenizer = new StringTokenizer(completeString, ' ');
while(wordTokenizer.hasMoreTokens()) {
wordsList.add(wordTokenizer.nextToken());
}
for(String word : wordsList) {
if(isPalindrome(word)) {
if(wordsMap.containsKey(word)) {
// increment the value of word
}
} else {
// put the word into Map and return the map value
}
}
return wordsMap;
}
}`
Hope this Helps :)
public class Palindrome {
int count = 0;
public static void main(String[] args) {
String a = "malayalammadyoydaraarasdasdkfjasdsjhtj";
Palindrome palindrome = new Palindrome();
palindrome.countPalin(a);
}
private int countPalin(String str) {
for (int i = 0; i < str.length() - 1; i++) {
char start = str.charAt(i);
String st = "";
st += start;
for (int j = i + 1; j < str.length(); j++) {
st += str.charAt(j);
StringBuffer rev = new StringBuffer(st).reverse();
if (st.equals(rev.toString()) && st.length() > 1) {
System.out.println(st.toString());
count++;
}
}
st = "";
}
System.out.println("Total Count : " + count);
return count;
}
}

Java String Array Mergesort

Hi all I wrote a mergesort program for a string array that reads in .txt files from the user. But what I want to do now is compare both files and print out the words in file one and not in file two for example apple is in file 1 but not file 2. I tried storing it in a string array again and then printing that out at the end but I just cant seem to implement it.
Here is what I have,
FileIO reader = new FileIO();
String words[] = reader.load("C:\\list1.txt");
String list[] = reader.load("C:\\list2.txt");
mergeSort(words);
mergeSort(list);
String x = null ;
for(int i = 0; i<words.length; i++)
{
for(int j = 0; j<list.length; j++)
{
if(!words[i].equals(list[j]))
{
x = words[i];
}
}
}
System.out.println(x);
Any help or suggestions would be appriciated!
If you want to check the words that are in the first array but do not exist in the second, you can do like this:
boolean notEqual = true;
for(int i = 0; i<words.length; i++)
{
for(int j = 0; j<list.length && notEqual; j++)
{
if(words[i].equals(list[j])) // If the word of file one exist
{ // file two we set notEqual to false
notEqual = false; // and we terminate the inner cycle
}
}
if(notEqual) // If the notEqual remained true
System.out.println(words[i]); // we print the the element of file one
// that do not exist in the second file
notEqual = true; // set variable to true to be used check
} // the other words of file one.
Basically, you take a word from the first file (string from the array) and check if there is a word in file two that is equal. If you find it, you set the control variable notEqual to false, thus getting out of the inner loop for and not print the word. Otherwise, if there is not any word on file two that match the word from file one, the control variable notEqual will be true. Hence, print the element outside the inner loop for.
You can replace the printing statement, for another one that store the unique word in an extra array, if you wish.
Another solution, although slower that the first one:
List <String> file1Words = Arrays.asList(words);
List <String> file2Words = Arrays.asList(list);
for(String s : file1Words)
if(!file2Words.contains(s))
System.out.println(s);
You convert your arrays to a List using the method Arrays.asList, and use the method contains to verify if the word of the first file is on the second file.
Why not just convert the Arrays to Sets? Then you can simply do
result = wordsSet.removeAll(listSet);
your result will contain all the words that do not exist in list2.txt
Also keep in mind that the set will remove duplicates ;)
you can also just go through the loop and add it when you reached list.length-1.
and if it matches you can break the whole stuff
FileIO reader = new FileIO();
String words[] = reader.load("C:\\list1.txt");
String list[] = reader.load("C:\\list2.txt");
mergeSort(words);
mergeSort(list);
//never ever null
String x = "" ;
for(int i = 0; i<words.length; i++)
{
for(int j = 0; j<list.length; j++)
{
if(words[i].equals(list[j]))
break;
if(j == list.length-1)
x += words[i] + " ";
}
}
System.out.println(x);
Here is a version (though it does not use sorting)
String[] file1 = {"word1", "word2", "word3", "word4"};
String[] file2 = {"word2", "word3"};
List<String> l1 = new ArrayList(Arrays.asList(file1));
List<String> l2 = Arrays.asList(file2);
l1.removeAll(l2);
System.out.println("Not in file2 " + l1);
it prints
Not in file2 [word1, word4]
This looks kind of close. What you're doing is for every string in words, you're comparing it to every word in list, so if you have even one string in list that's not in words, x is getting set.
What I'd suggest is changing if(!words[i].equals(list[j])) to if(words[i].equals(list[j])). So now you know that the string in words appears in list, so you don't need to display it. if you completely cycle through list without seeing the word, then you know you need to explain it. So something like this:
for(int i = 0; i<words.length; i++)
{
boolean wordFoundInList = false;
for(int j = 0; j<list.length; j++)
{
if(words[i].equals(list[j]))
{
wordFoundInList = true;
break;
}
}
if (!wordFoundInList) {
System.out.println(x);
}
}

Categories

Resources