Ok..so I am doing a program on NLP. It uses function eliminateStopWords(). This function reads from a 2D array "sentTokens" (of detected tokens). In the code below, index i is sentence number, j is for each token in the ith sentence.
Now, what my eliminateStopWords() does is this:
it reads stop words from a text file and stores them in a TreeSet
reads tokens from sentTokens array and checks them for stop words. If they are collocations, then they should not be checked for stop words, they are just dumped into a finalTokens array. If they are not a collection, then they are individually checked for stop words and are added to finalTokens array only if they are not stop words.
The problem comes in the loop of this step 2. Here is some code of it: (I have marked // HERE at the location where the error actually occurs... it's near the end)
private void eliminateStopWords() {
try {
// Loading TreeSet for stopwords from the file.
stopWords = new TreeSet<String> ();
fin = new File("stopwords.txt");
fScan = new Scanner(fin);
while (fScan.hasNextLine())
stopWords.add(fScan.nextLine());
fScan.close();
/* Test code to print all read stopwords
iter2 = stopWords.iterator();
while (iter2.hasNext())
System.out.println(iter2.next()); */
int k=0,m=0; // additional indices for finalTokens array
System.out.println(NO_OF_SENTENCES);
newSentence: for(i=0; i < NO_OF_SENTENCES; i++)
{
System.out.println("i = " + i);
for (j=0; j < sentTokens[i].length; j+=2)
{
System.out.println("j = " + j);
// otherwsise, get two successive tokens
String currToken = sentTokens[i][j];
String nextToken = sentTokens[i][j+1];
System.out.println("i = " + i);
System.out.println(currToken + " " + nextToken);
if ( isCollocation(currToken, nextToken) ) {
// if the current and next tokens form a bigram collocation, they are not checked for stop words
// but are directly dumped into finalTokens array
finalTokens[k][m] = currToken; m++;
finalTokens[k][m] = nextToken; m++;
}
if ( !stopWords.contains(currToken) )
{ finalTokens[k][m] = currToken; m++; }
if ( !stopWords.contains(nextToken) )
{ finalTokens[k][m] = nextToken; m++; }
// if current token is the last in the sentence, do not check for collocations, only check for stop words
// this is done to avoid ArrayIndexOutOfBounds Exception in sentences with odd number of tokens
// HERE
System.out.println("i = " + i);
if ( j==sentTokens[i].length - 2) {
String lastToken = sentTokens [i][++j];
if (!stopWords.contains(lastToken))
{ finalTokens[k][m] = lastToken; m++; }
// after analyzing last token, move to analyzing the next sentence
continue newSentence;
}
}
k++; // next sentence in finalTokens array
}
// Test code to print finalTokens array
for(i=0; i < NO_OF_SENTENCES; i++) {
for (j=0; j < finalTokens[i].length; j++)
System.out.print( finalTokens[i][j] + " " );
System.out.println();
}
}
catch (Exception e) {
e.printStackTrace();
}
}
I have printed the indices i & j at the entry of their respective for loops...it all works fine for the first iteration of the loop, but when the loop is about to reach its end... I have printed again the value of 'i'. This time it comes out as 14.
it starts the first iteration with 0...
does not get manipulated anywhere in the loop...
and just by the end of (only) first iteration, it prints the value as 14
I mean this is seriously the WEIRDEST error I have come across ever while working with Java. It throws up an ArrayIndexOutOfBoundsException just before the final if block. It's like MAGIC. You do nothing on the variable in the code, still the value changes. HOW CAN THIS HAPPEN?
You never declared i or j in your code, which leads me to believe that they are fields.
I'm pretty sure that some of your other methods re-use those variables and thus mess with your result. isCollocation looks like a candidate for that.
The counters in for loops should always be local variables, ideally declared inside the for statement itself (for minimal scope). Everything else is just asking for trouble (as you see).
Related
public static void CountWordFrequency(ArrayList<String> UserString) {
//creating an array list to store every word
//each element in the UserString is one line
ArrayList<String> words_storage = new ArrayList<String>();
String words[]= {};
for(int i=0;i<UserString.size();i++) {//this is outer loop to access every line of the ArrayList
//we need to split the line and put them inside the array String
words = UserString.get(i).split("\\s");
//we still need to work with the "\'" , the upper case, and the dot and comma
for(int j=0;j<words.length;j++) {
for(int k=0;k<words[j].length();k++) {//access every character of one word
if(Character.isUpperCase(words[j].charAt(k))) {//first I want to convert them to Lower Case first
words[j]=words[j].toLowerCase();
}
if(!Character.isLetterOrDigit(words[j].charAt(k)) && words[j].charAt(k)!=',' && words[j].charAt(k)!= '.') {
//I am separating the comma and dot situations with the ' \' '
//need more work on this
if(words[j].compareTo("can't")==0) {
words[j]=words[j].replace(words[j].charAt(k), '\0');
words[j]=words[j].replace(words[j].charAt(k+1), '\0');
words[j] = "can";
words_storage.add("not");
}
else {
words[j]=words[j].replace(words[j].charAt(k), '\0');
words_storage.add("is");
}
}
//now if the that character is comma or dot
if(words[j].charAt(k)==',' ||words[j].charAt(k)=='.') {
words[j]=words[j].replace(words[j].charAt(k), '\0');
}
}//done with one-word loop
}
//now we need to store every element of the String Array inside the array list
for(int j=0;j<words.length;j++) {
words_storage.add(words[j]);
}
}//this is the end of the outer loop
//since it's harder to change the content of element in array list compared to array
//we need to store elements in another array
String[] array = new String[words_storage.size()];
for(int a =0;a<words_storage.size();a++) {
array[a] = words_storage.get(a);
}
//now when we are done with storing elements, we need to sort alphabetically
for(int a=0;a<array.length;a++) {
for(int b = a+1;b<array.length;b++) {
if(array[a].compareTo(array[b])>0) {
String temp = array[a];
array[a] = array[b];
array[b] = temp;
}
}
}
//now we count the frequency of each element in the Array array
int marker = 0;//marker will help me skip the word that already counted in the frequency
for(int x =0;x<array.length;x=marker) {
int counter = 1;
for(int y =x+1; y< array.length;y++) {
if(array[x].compareTo(array[y])==0) {//if they have the same content then we increase the counter and mark the y
counter++;
marker = y+1;
}
}
if(counter==1) {//if we did not find any similar word, we need to increase the marker by one to check on the next word
marker++;
}
System.out.println(array[x]+":"+counter); //now just print it out
}
}
Hey guys
I am trying to count word frequency in the given input which has many lines. I stored it in an ArrayList and put it as a paramenter.
First of all, I try to sort them aphabetically first
Right now, I am trying to remove the character ' in the word can't. But it didn't seem to work. so I tried using replace method but it will leave a blank when I replace it with '\0'
Hopefully, I got some solutions. Thanks in advance.
Just use compareTo() or compareToIgnoreCase() method to find the word.
I want to list every unique word in a text file and how many times every word is found in it.
I tried using an if cycle but I'm not sure how to eliminate the already listed words after they are being counted.
for (int i = 0; i < words.size(); i++) {
count = 1;
//Count each word in the file and store it in variable count
for (int j = i + 1; j < words.size(); j++) {
if (words.get(i).equals(words.get(j))) {
count++;
}
}
System.out.println("The word " + words.get(i) + " can be
found " + count + " times in the file.");
}
The contents of the text file is "Hello world. Hello world.", and the program will print the following:
The word Hello can be found 2 times in the file.
The word world can be found 2 times in the file.
The word Hello can be found 1 times in the file.
The word world can be found 1 times in the file.
I would suggest to leverage a HashMap to solve this problem. simply put, HashMap is a key value pair that hashes the keys and has a search complexity of O(1).
Iterate the list of words only once and keep on storing the encountered word in a HashMap. when you encounter a word, check if it already exists in the HashMap. If it does not exist, add it to the map with key as the word itself and value as 1.
if The word alrady exists, Increase the value by 1.
After completing the iteration, the HashMap would contain key value pairs of unique words vs their count !!
just in case if you are not aware of maps in java - https://www.javatpoint.com/java-hashmap
You need to use an ArrayList to store the already found words, and after that, you need to check every word in the file, whether it is present within the ArrayList or not. If the word is present inside the ArrayList, you need to ignore that word. Otherwise, add that word to the ArrayList.
A sample code for you:
ArrayList<String> found_words=new ArrayList<String>();
public static void main(String arguments[])
{
String data=""; //data from your file
String[] words=data.split("\\s"); //split the string into individual words
for(int i=0;i<words.length;i++)
{
String current_word=words[i];
if(!is_present(current_word))
{
found_words.add(current_word);
int count=1;
for(int j=i+1;j<words.length;j++)
{
if(words[j].equals(words[i]))
++count;
}
System.out.println("The word "+current_word+" can be found "+count+" times in the file.");
}
}
}
static boolean is_present(String word)
{
for(int i=0;i<found_words.size();i++)
{
if(found_words.get(i).equals(word))
return true;
}
return false;
}
You could do this :
public void printWordOccurence(String filePath) throws FileNotFoundException {
if (filePath.isEmpty())
return;
File file = new File(filePath);
Scanner input = new Scanner(file);
HashMap<String, Integer> wordOccurence = new HashMap<>();
while (input.hasNext()) {
wordOccurence.merge(input.next(), 1, Integer::sum);
}
for (String word : wordOccurence.keySet()) {
System.out.println(word + " appears " + wordOccurence.get(word) + " times");
}
}
package Collections;
import java.util.HashSet;
import java.util.Iterator;
import java.util.LinkedHashSet;
public class Stringchar {
public static void main(String[] args) {
int count =0;
String s = "mmamma";
//System.out.println(s.length());
LinkedHashSet<Character> ch = new LinkedHashSet<Character>();
for (int i=0; i<s.length(); i++){
ch.add(s.charAt(i));
}
Iterator<Character> iterator = ch.iterator();
while(iterator.hasNext()){
Character st = (Character) iterator.next();
for (int k=0; k<s.length() ; k++){
if(charAt(k)== st){ // Why this charAt method is not working?
count = count+1;
}
if(count>1) {
System.out.println("Occurance of "+ st + "is" + count);
}
}
}
}
}
I am new to coding so I might be silly in asking this question. I have written a code where I am trying to print the occurrences and the number of the same of one character in a string using sets however I am facing some issues in doing so. Request you to help.
Here:
charAt(k);
is basically the same as
this.charAt(k);
In other words: you are trying to invoke a method on the class this code sits in.
I assume you intended to do someStringVariable.charAt(k) instead! ( sure, you meant s.charAt(), but s is a terrible, nothing telling name for a variable. Your variables are your pets, give them names that mean something!)
The method charAt is not static and need to be applied on a given String, if not how to know where to look for the xxth char ?
str.charAt(index);
Also, the print operation would better be after the for loop which counts the occurences, if not you'll have a print at each occurence
for (int k=0; k<s.length() ; k++){
if(s.charAt(k) == st){
count = count+1;
}
}
if(count>1) {
System.out.println("Occurance of "+ st + "is" + count);
}
I suppose you want to check, how often the Character appears in your string (String s = "mmamma";).
The charAt() method has to be applied on a String object, so you have to change the if condition from this:
if(charAt(k) == st)
To this:
if(s.charAt(k) == st)
The problem is that you are trying to get a character at a position of a character. When you create the variable st it is a character and will have a length of 1; there fore you are unable to get a charAt(index) there. Additionally this method of using the LinkedHashSet will not work because when you add those characters to the LinkedHashSet it will not add each character more than once. Instead you want an ArrayList.
This is probably not the most efficient solution but it will accomplish what you are trying to do with the HashSet
String s = "mmamma";
List<Character> characterList = new ArrayList<>();
LinkedHashSet<Character> characterLinkedHashSet = new LinkedHashSet<>();
for(char c : s.toCharArray()) {
characterLinkedHashSet.add(c);
characterList.add(c);
}
for (Character character : characterLinkedHashSet) {
int frequency = Collections.frequency(characterList, character);
System.out.println("The frequency of char " + character + " is " + frequency);
}
So what this does it is creates your LinkedHashSet as well as an ArrayList. The ArrayList stores all of the characters in a Collection and the LinkedHashSet stores only one instance of each Character. We can then loop over the HashSet and get the frequency inside the ArrayList
You have to correct your code like so,
while (iterator.hasNext()) {
int count = 0;
Character st = (Character) iterator.next();
for (int k = 0; k < s.length(); k++) {
if (s.charAt(k) == st) { // Why this charAt method is not working?
count++;
}
}
if (count > 1) {
System.out.println("Occurance of " + st + " is: " + count);
}
}
charAt method is available in String class hence you have to call it on a String reference. I have made few more improvements to the code too. Declare the count variable inside the while loop which is less error prone. Finally notice that I have moved the if statement away from the for loop since it gives some spurious intermediary results if it is kept inside the for loop.
I'm writing a method that allows me to count how many times an element of type String shows up in a LinkedList of type Strings. my code shown below does not work. I keep getting index out of bounds in the line i commented on down below. Can't seem to find the bug
public int findDuplicate (LinkedList<String> e) {
int j = 1;
LinkedList<String> test = e;
while (!test.isEmpty()){
test = e;
String value = test.pop();
//Screws up here when i = 6
for(int i =0; i<=test.size() && test.get(i)!=null; i++){
String value3 = test.get(i);
if(e.get(i).equals(value) && i<=test.size()){
String value2 = test.get(i);
j++;
String Duplicate = e.get(i);
e.remove(i);
}
}
System.out.println(value + " is listed " + j + " times");
}
return j;
}
using hashmaps.. still doesn't work
public void findDuplicate (LinkedList e) {
Map<String,Integer> counts = new HashMap<String,Integer>();
while(!e.isEmpty()){
String value = e.pop();
for(int i =0; i<e.size(); i++){
counts.put(value, i);
}
}
System.out.println(counts.toString());
}
My code should go through the linked list find out how many times an element within the list appears and deletes duplicates from the list at the same time. Then prints the element and the number of times it appears in the list. I posted about this last night but didn't get a response yet. Sorry for the repost.
You are running off the end of the list. Change
for(int i =0; i<=test.size() && test.get(i)!=null; i++){
to
for(int i =0; i< test.size() && test.get(i)!=null; i++){
Valid indexes for a List (or an array) are 0 through size() - 1.
Regarding your hashmap example to count the duplicates:
#Test
public void countOccurrences() {
LinkedList<String> strings = new LinkedList<String>(){{
add("Fred");
add("Fred");
add("Joe");
add("Mary");
add("Mary");
add("Mary");
}};
Map<String,Integer> count = count(strings,new HashMap<String,Integer>());
System.out.println("count = " + count);
}
private Map<String, Integer> count(List<String> strings, Map<String, Integer> runningCount) {
if(strings.isEmpty()) {
return runningCount;
}
String current = strings.get(0);
int startingSize = strings.size();
while(strings.contains(current)) {
strings.remove(current);
}
runningCount.put(current, startingSize - strings.size());
return count(strings,runningCount);
}
If you want the original strings list preserved you could do
Map<String,Integer> count = count(new LinkedList<String>(strings),new HashMap<String,Integer>());
System.out.println("strings = " + strings);
System.out.println("count = " + count);
Check out google's guava collections which has a perfect class for maintaining a map and getting a count:
https://code.google.com/p/guava-libraries/wiki/NewCollectionTypesExplained#BiMap
Multiset<String> wordsMultiset = HashMultiset.create();
wordsMultiset.addAll(words);
// now we can use wordsMultiset.count(String) to find the count of a word
I hope you realize what the test = e statement is doing. After this statement executes both test and e refer to the same object.
If anyone of them modifies the list, the other sees it as they both are looking at the same object.
If this is not intended you need to clone the list before assigning it to another list reference.
This doesn't affect your out of bounds issue, but you are removing elements from your list while still evaluating it. If you remove an element, you should call i-- afterwards, or you skip the next entity (which is re-indexed) for evaluation.
Also of note regarding your code, I see you are trying to make a copy of your list, but standard assignment means test and e both point to the same instance. You need to use Collections.copy() see this SO thread on how to use the class.
Hi all I wrote a mergesort program for a string array that reads in .txt files from the user. But what I want to do now is compare both files and print out the words in file one and not in file two for example apple is in file 1 but not file 2. I tried storing it in a string array again and then printing that out at the end but I just cant seem to implement it.
Here is what I have,
FileIO reader = new FileIO();
String words[] = reader.load("C:\\list1.txt");
String list[] = reader.load("C:\\list2.txt");
mergeSort(words);
mergeSort(list);
String x = null ;
for(int i = 0; i<words.length; i++)
{
for(int j = 0; j<list.length; j++)
{
if(!words[i].equals(list[j]))
{
x = words[i];
}
}
}
System.out.println(x);
Any help or suggestions would be appriciated!
If you want to check the words that are in the first array but do not exist in the second, you can do like this:
boolean notEqual = true;
for(int i = 0; i<words.length; i++)
{
for(int j = 0; j<list.length && notEqual; j++)
{
if(words[i].equals(list[j])) // If the word of file one exist
{ // file two we set notEqual to false
notEqual = false; // and we terminate the inner cycle
}
}
if(notEqual) // If the notEqual remained true
System.out.println(words[i]); // we print the the element of file one
// that do not exist in the second file
notEqual = true; // set variable to true to be used check
} // the other words of file one.
Basically, you take a word from the first file (string from the array) and check if there is a word in file two that is equal. If you find it, you set the control variable notEqual to false, thus getting out of the inner loop for and not print the word. Otherwise, if there is not any word on file two that match the word from file one, the control variable notEqual will be true. Hence, print the element outside the inner loop for.
You can replace the printing statement, for another one that store the unique word in an extra array, if you wish.
Another solution, although slower that the first one:
List <String> file1Words = Arrays.asList(words);
List <String> file2Words = Arrays.asList(list);
for(String s : file1Words)
if(!file2Words.contains(s))
System.out.println(s);
You convert your arrays to a List using the method Arrays.asList, and use the method contains to verify if the word of the first file is on the second file.
Why not just convert the Arrays to Sets? Then you can simply do
result = wordsSet.removeAll(listSet);
your result will contain all the words that do not exist in list2.txt
Also keep in mind that the set will remove duplicates ;)
you can also just go through the loop and add it when you reached list.length-1.
and if it matches you can break the whole stuff
FileIO reader = new FileIO();
String words[] = reader.load("C:\\list1.txt");
String list[] = reader.load("C:\\list2.txt");
mergeSort(words);
mergeSort(list);
//never ever null
String x = "" ;
for(int i = 0; i<words.length; i++)
{
for(int j = 0; j<list.length; j++)
{
if(words[i].equals(list[j]))
break;
if(j == list.length-1)
x += words[i] + " ";
}
}
System.out.println(x);
Here is a version (though it does not use sorting)
String[] file1 = {"word1", "word2", "word3", "word4"};
String[] file2 = {"word2", "word3"};
List<String> l1 = new ArrayList(Arrays.asList(file1));
List<String> l2 = Arrays.asList(file2);
l1.removeAll(l2);
System.out.println("Not in file2 " + l1);
it prints
Not in file2 [word1, word4]
This looks kind of close. What you're doing is for every string in words, you're comparing it to every word in list, so if you have even one string in list that's not in words, x is getting set.
What I'd suggest is changing if(!words[i].equals(list[j])) to if(words[i].equals(list[j])). So now you know that the string in words appears in list, so you don't need to display it. if you completely cycle through list without seeing the word, then you know you need to explain it. So something like this:
for(int i = 0; i<words.length; i++)
{
boolean wordFoundInList = false;
for(int j = 0; j<list.length; j++)
{
if(words[i].equals(list[j]))
{
wordFoundInList = true;
break;
}
}
if (!wordFoundInList) {
System.out.println(x);
}
}