i'm looking for some assistance. I've made a program that uses two classes - that i've also made. The first class is called CollectionOfWords that reads in text-files and store the words contained in the text-files within a HashMap. The second is called WordFrequencies that calls an object called Collection from the CollectionOfWords class, which in turn reads in another document and to see if the documents contents are in the Collection. This then outputs an ArrayList with the frequencies counted in the document.
Whilst this works and returns the frequencies of the words found in both the collection and document, i'd like it to be able to produce zero values for the words that are in the collection, but not in the document, if that makes sense? For example, test3 returns [1, 1, 1], but i'd like it to return [1, 0, 0, 0, 1, 0, 1] - where the zeroes represent the words in the collection, but are not found in test3.
The test text-files i use can be found here:
https://drive.google.com/open?id=1B1cDpjmZZo01HizxJUSWSVIlHcQke2mU
Cheers
WordFrequencies
public class WordFrequencies {
static HashMap<String, Integer> collection = new HashMap<>();
private static ArrayList<Integer> processDocument(String inFileName) throws IOException {
// Rests collections frequency values to zero
collection.clear();
// Reads in the new document file to an ArrayList
Scanner textFile = new Scanner(new File(inFileName));
ArrayList<String> file = new ArrayList<String>();
while(textFile.hasNext()) {
file.add(textFile.next().trim().toLowerCase());
}
/* Iterates the ArrayList of words -and- updates collection with
frequency of words in the document */
for(String word : file) {
Integer dict = collection.get(word);
if (!collection.containsKey(word)) {
collection.put(word, 1);
} else {
collection.put(word, dict + 1);
}
}
textFile.close();
// Stores the frequency values in an ArrayList
ArrayList<Integer> values = new ArrayList<>(collection.values());
return values;
}
public static void main(String[] args) {
// Stores text files for the dictionary (collection of words)
List<String> textFileList = Arrays.asList("Test.txt", "Test2.txt");
// Declares empty ArrayLists for output of processDocument function
ArrayList<Integer> test3 = new ArrayList<Integer>();
ArrayList<Integer> test4 = new ArrayList<Integer>();
// Creates a new CollectionOfWords object called dictionary
CollectionOfWords dictionary = new CollectionOfWords(collection);
// Reads in the ArrayLists text files and processes it
for (String text : textFileList) {
dictionary.scanFile(text);
}
try {
test3 = processDocument("test3.txt");
test4 = processDocument("test4.txt");
} catch(IOException e){
e.printStackTrace();
}
System.out.println(test3);
System.out.println(test4);
}
}
CollectionOfWords
public class CollectionOfWords {
// Declare set in a higher scope (making it a property within the object)
private HashMap<String, Integer> collection = new HashMap<String, Integer>();
// Assigns the value of the parameter to the field of the same name
public CollectionOfWords(HashMap<String, Integer> collection) {
this.collection = collection;
}
// Gets input text file, removes white spaces and adds to dictionary object
public void scanFile(String textFileName) {
try {
Scanner textFile = new Scanner(new File(textFileName));
while (textFile.hasNext()) {
collection.put(textFile.next().trim(), 0);
}
textFile.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
public void printDict(HashMap<String, Integer> dictionary) {
System.out.println(dictionary.keySet());
}
}
I didn't go through the trouble of figuring out your entire code, so sorry if this answer is stupid.
As a solution to your problem, you could initialize the map with every word in the dictionary mapping to zero. Right now, you use the clear method on the hashmap, this does not set everything to zero, but removes all the mappings.
The following code should work, use it instead of collection.clear()
for (Map.Entry<String, Integer> entry : collection.entrySet()) {
entry.setValue(0);
}
Related
I have a method which creates a stopword list with the 10% of most frequent words from the lemmas key in my JSON file – which looks like this:
{..
,"lemmas":{
"doc41":"the dynamically expand when there too many collision i e have distinct hash code but fall into same slot modulo size expect average effect"
,"doc40":"retrieval operation include get generally do block so may overlap update operation include put remove retrieval reflect result any non null k new longadder increment"
,"doc42":"a set projection"..
}
}
private static List<String> StopWordsFile(ConcurrentHashMap<String, String> lemmas) {
// ConcurrentHashMap stores each word and its frequency
ConcurrentHashMap<String, Integer> counts = new ConcurrentHashMap<String, Integer>();
// Array List for all the individual words
ArrayList<String> corpus = new ArrayList<String>();
for (Entry<String, String> entry : lemmas.entrySet()) {
String line = entry.getValue().toLowerCase();
line = line.replaceAll("\\p{Punct}", " ");
line = line.replaceAll("\\d+"," ");
line = line.replaceAll("\\s+", " ");
line = line.trim();
String[] value = line.split(" ");
List<String> words = new ArrayList<String>(Arrays.asList(value));
corpus.addAll(words);
}
// count all the words in the corpus and store the words with each frequency in
// the counts
for (String word : corpus) {
if (counts.keySet().contains(word)) {
counts.put(word, counts.get(word) + 1);
} else {
counts.put(word, 1);
}
}
// Create a list to store all the words with their frequency and sort it by values.
List<Entry<String, Integer>> list = new ArrayList<>(counts.entrySet());
list.sort((e2, e1) -> e1.getValue().compareTo(e2.getValue()));
List<Entry<String, Integer>> stopwordslist = new ArrayList<>(list.subList(0, (int) (0.10 * list.size())));
// Create the stopwords list with the 10% most frequent words
List<String> stopwords = new ArrayList<>();
// for (Map.Entry<String, Integer> e : sublist) {
for (ConcurrentHashMap.Entry<String, Integer> e : stopwordslist) {
stopwords.add(e.getKey());
}
System.out.println(stopwords);
return stopwords;
}
It outputs these words:
[the, of, value, v, key, to, given, a, k, map, in, for, this, returns, if, is, super, null, ... that, none]
I want to add single digits to it such as '1,2,3,4,5,6,7,8,9' or/and another stopwords.txt file containing digits.
How can I do that?
Also, how can I output this stopwords list to a CSV file? Can someone point me in the right direction?
I'm new to Java.
I am creating a program that takes two .txt files and prints out the words that appear in both texts and the number of times each shared word appears in each text. I declared two file objects that have valid paths. However, when I try to create two Scanner objects that use the two .txt files, I get FileNotFoundException compiler errors for both lines of code that are declaring the new Scanner objects.
FYI, I use scannerObject.hasNext() in a while loop that adds each word from scannerObject.Next() as a new key in a HashMap variable with a value of 1 or, if the word is already a key in the HashMap, increasing the value (number of occurrences) by 1.
I have tried running the following with both file paths and the simple program below runs without error and outputs "It worked! Hehehe":
import java.io.*;
import java.util.*;
public class readingFilesPractice {
public static void main(String[] args) {
try{
File x = new File("C:\\Users\\aravd.000\\Desktop\\Book1.txt");
Scanner sc = new Scanner(x);
while(sc.hasNext()){
System.out.println(sc.next());
}
sc.close();
System.out.println("It worked! Hehehe");
}
catch (Exception e){
System.out.println("Error!");
}
}
}
By the way, the .txt files has areas where there are multiple spaces in succession and stuff like "1.".
The code below runs into two FileNotFoundExceptions (without the try and catch blocks) and in Visual Studios, new Scanner(book1) and new Scanner(book2) have a red squiggly line that states "Unhandled exception type FileNotFoundExceptionJava(16777384)" when I hover over it with my mouse. My complete code for reference is below.
import java.io.*;
import java.util.*;
public class program1 {
public static void main(String[] args) {
try {
File book1 = new File("C:\\Users\\aravd.000\\Desktop\\Book1.txt");
File book2 = new File("C:\\Users\\aravd.000\\Desktop\\Book2.txt");
// Counting the number of occurences of each word in book1
Scanner readBook1 = new Scanner(book1);
HashMap<String, Integer> wordsInBook1 = new HashMap<String, Integer>();
while (readBook1.hasNext()) {
String word = readBook1.next();
if (wordsInBook1.containsKey(word)) {
int occurences = wordsInBook1.get(word) + 1;
wordsInBook1.put(word, occurences);
} else {
wordsInBook1.put(word, 1);
}
}
readBook1.close();
// Counting the number of occurences of each word in book2
Scanner readBook2 = new Scanner(book2);
HashMap<String, Integer> wordsInBook2 = new HashMap<String, Integer>();
while (readBook2.hasNext()) {
String word = readBook2.next();
if (wordsInBook2.containsKey(word)) {
int occurences = wordsInBook2.get(word) + 1;
wordsInBook2.put(word, occurences);
} else {
wordsInBook2.put(word, 1);
}
}
readBook2.close();
// Creating two iterators for each HashMap
Iterator wordsInB1Iter = wordsInBook1.entrySet().iterator();
Iterator wordsInB2Iter = wordsInBook2.entrySet().iterator();
// Running the wordsInB1Iter iterator to find and delete unique keys in
// wordsInBook1
while (wordsInB1Iter.hasNext()) {
Map.Entry pair = (Map.Entry) wordsInB1Iter.next();
if (!wordsInBook2.containsKey(pair.getKey())) {
wordsInBook1.remove(pair.getKey());
}
}
// Running the wordsInB2Iter iterator to find and delete unique keys
while (wordsInB2Iter.hasNext()) {
Map.Entry pair = (Map.Entry) wordsInB2Iter.next();
if (!wordsInBook1.containsKey(pair.getKey())) {
wordsInBook2.remove(pair.getKey());
}
}
System.out.println(wordsInBook1);
System.out.println(wordsInBook2);
} catch (Exception e) {
e.printStackTrace();
}
}
}
If the other parts of the code are broken, I wouldn't know because I haven't debugged that yet. If you find an error elsewhere, let me know if you want. Thank you for your effort and please let me know if there's anything that needs further clarification!
UPDATE: When I changed my catch block to Exception e and used the e.printStackTrace, my code outputted the following:
java.util.ConcurrentModificationException
at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1493)
at java.base/java.util.HashMap$EntryIterator.next(HashMap.java:1526)
at java.base/java.util.HashMap$EntryIterator.next(HashMap.java:1524)
at prorgam1.main(program1.java:50)
Link to error descriptions within the "PROBLEMS" tab in VisualStudios
The picture above may provide more details about the issues with my iterators and HashMaps.
The same answer than #Pedro Borges but
Please use generics! Your code is full of cast while it should not.
Use Iterator.remove() to remove current value instead of using the source collection. This is the reason your are getting a ConcurrentModificationException.
If you don't need the Map.Entry, you may use keySet() instead.
You are using Java > 8. If this is Java 11, you may also use var.
Your code:
Iterator<Map.Entry<String, Integer>>> wordsInB1Iter = wordsInBook1.entrySet().iterator();
Iterator<Map.Entry<String, Integer>>> wordsInB2Iter = wordsInBook2.entrySet().iterator();
// Running the wordsInB1Iter iterator to find and delete unique keys in
// wordsInBook1
while (wordsInB1Iter.hasNext()) {
Map.Entry<String,Integer> pair = wordsInB1Iter.next();
if (!wordsInBook2.containsKey(pair.getKey())) {
wordsInB1Iter.remove();
}
}
// Running the wordsInB2Iter iterator to find and delete unique keys
while (wordsInB2Iter.hasNext()) {
Map.Entry<String,Integer> pair = wordsInB2Iter.next();
if (!wordsInBook1.containsKey(pair.getKey())) {
wordsInB2Iter.remove();
}
}
And while I'm at it, you may also consider refactoring how your read words:
By using a method instead of duplicating the code
By using try with resource (Java 7++)
By using Map.merge (Java 8++)
As in:
void words(File file) {
try (Scanner scanner = new Scanner(file)) {
var result = new HashMap<String,Integer>();
while (scanner.hasNext()) {
var word = scanner.next();
result.merge(word, 1, Integer::sum); // or (a, b) -> a + b
}
return result;
}
}
You may (should?) use a MutableInteger (from common-lang3) to avoid unboxing from Integer to int for performance reasons.
The ConcurrentModificationException comes from the fact you are removing elements from a Set while you're iterating it. That happens because under the hood the iterator is backed by the set, it's not a copy of it.
One way to corner it, although not tremendously elegant is to iterate over a copy of the Set.
If you replace
Iterator wordsInB1Iter = wordsInBook1.entrySet().iterator();
Iterator wordsInB2Iter = wordsInBook2.entrySet().iterator();
with
Iterator wordsInB1Iter = new HashSet<>(wordsInBook1.entrySet()).iterator();
Iterator wordsInB2Iter = new HashSet<>(wordsInBook2.entrySet()).iterator();
you will no longer have concurrent modification.
I am implementing a Java based synonym finder, which will store the thesaurus of 250k words into a map and each associated googleWord into the txt file (1000 words in total) will be assigned as values for each of the thesaurus word if its the synonym of it.
Now, that I am doing that I am iterating over each Thesaurus word list and checking for its synonym using wordnet library and if the google word has one of those synonym word them I am assigning that value to Thesaurus map. Code block is provided below:
#SuppressWarnings("rawtypes")
public TreeMap fetchMap() throws IOException {
generateThesaurusList();
generateGoogleList();
/** loop through the array of Thesaurus Words..*/
for (int i=0; i<thesaurusList.size(); i++) {
SynonymFinder sf = new SynonymFinder();
// find the
ArrayList synonymList = sf.getSynonym(thesaurusList.get(i).toString().trim());
for (int j=0; j<synonymList.size(); j++) {
if (googleList.contains(synonymList.get(j)));
hm.put(thesaurusList.get(i).toString().trim(), synonymList.get(j).toString().trim());
}
}
return hm;
}
But, the iteration of the list and its insertion is taking very huge time. Can someone suggest something to cater it fast.
I have used HashMap for the same, but it was also slow..
Note: I must have to use some sort of map for storing data..
My change after suggestions, but nothing helped out.
#SuppressWarnings("rawtypes")
public TreeMap fetchMap() throws IOException {
generateThesaurusList();
generateGoogleList();
Set<String> gWords = new HashSet<>(googleList);
int record =1;
int loopcount=0;
ArrayList thesaurusListing = removeDuplicates(thesaurusList);
Map<String, Set<String>> tWordsWithSynonymsMatchingGoogleWords = new TreeMap<>();
/** loop through the array of Google Words..*/
for (int i=0; i<thesaurusListing.size(); i++) {
SynonymFinder sf = new SynonymFinder();
System.out.println(record);
// find the
ArrayList synonymList = sf.getSynonym(thesaurusListing.get(i).toString().trim());
for (int j=0; j<synonymList.size(); j++) {
if (googleList.contains(synonymList.get(j))) {
/**to avoid duplicate keys*/
tWords.put(thesaurusListing.get(i).toString().trim(), new HashSet<>(synonymList));
}
}
for (String tWord : tWords.keySet()) {
tWords.get(tWord).retainAll(gWords);
tWordsWithSynonymsMatchingGoogleWords.put(tWord, tWords.get(tWord));
}
record++;
}
return (TreeMap) tWordsWithSynonymsMatchingGoogleWords;
}
Your code was missing part of creation, entry which will consist of {key, set}, but was {key, value}. Based on what you want to achieve, you need to intersect two sets. Here is example how you can approach that:
public static Map<String, Set<String>> getThesaurusWordsWithSynonymsMatchingGoogleWords(
Map<String, Set<String>> tWordsWithSynonyms, Set<String> gWords) {
Map<String, Set<String>> tWordsWithSynonymsMatchingGoogleWords = new TreeMap<>();
for (String tWord : tWordsWithSynonyms.keySet()) {
tWordsWithSynonyms.get(tWord).retainAll(gWords);
tWordsWithSynonymsMatchingGoogleWords.put(tWord, tWordsWithSynonyms.get(tWord));
}
return tWordsWithSynonymsMatchingGoogleWords;
}
public static void main(String[] args) {
Map<String, Set<String>> tWords = new HashMap<>();
tWords.put("B", new HashSet<>(Arrays.asList("d")));
tWords.put("A", new HashSet<>(Arrays.asList("a", "b", "c")));
tWords.put("C", new HashSet<>(Arrays.asList("e")));
Set<String> gWords = new HashSet<>(Arrays.asList("a", "b", "e"));
System.out.println("Input -> thesaurusWordsWithSynonyms:");
System.out.println(tWords);
System.out.println("Input -> googleWords:");
System.out.println(gWords);
Map<String, Set<String>> result = getThesaurusWordsWithSynonymsMatchingGoogleWords(tWords, gWords);
System.out.println("Input -> thesaurusWordsWithSynonymsMatchingGoogleWords:");
System.out.println(result);
}
}
To make all things working, firstly you should trim you thesaurus words and find their matching synonyms.
I have to write a piece of code for a class that counts the occurrences of characters within an input file and then sorts them by that, and I chose to do that by creating an ArrayList where each object[] has two elements, the character and the number of occurrences.
I was trying to increment the integer representing the number of occurrences and I just couldn't get that to work
My current attempt looks like this:
for(int i=0;i<=text.length();i++) {
if(freqlist.contains(text.charAt(i))) {
freqlist.indexOf(text.charAt(i))[1]=freqlist.get(freqlist.indexOf(text.charAt(i)))[1]+1;
}
}
text is just a string containing all of the input file
freqlist is declared earlier as
List<Object[]> freqlist=new ArrayList<Object[]>();
So, I was wondering how one could increment or modify an element of an array that is inside of an arraylist
In General there are 3 mistakes in your program which prevent it from working. It cannot work because the for loop has i<=text.length() and it should be i < text.length(), otherwise you will have exception. Second mistake is that you use freqlist.contains(...) where you assume both elements of object arrays are the same, or in other words the array is the equal, which is wrong assumption. Third mistake is using freqlist.indexOf(...) which relies on array equality again. I made the example working although this data structure List<Object[]> is inefficient for the task. It is best to use Map<Character,Integer>.
Here it is:
import java.util.ArrayList;
import java.util.List;
class Scratch {
public static void main(String[] args) {
String text = "abcdacd";
List<Object[]> freqlist= new ArrayList<>();
for(int i=0;i < text.length();i++) {
Object [] objects = find(freqlist, text.charAt(i));
if(objects != null) {
objects[1] = (Integer)objects[1] +1;
} else {
freqlist.add(new Object[]{text.charAt(i), 1});
}
}
for (Object[] objects : freqlist) {
System.out.println(String.format(" %s => %d", objects[0], objects[1]));
}
}
private static Object[] find(List<Object[]> freqlist, Character charAt) {
for (Object[] objects : freqlist) {
if (charAt.equals(objects[0])) {
return objects;
}
}
return null;
}
}
The way I would do this is first parse the file and convert it to an array of characters. This would then be sent to the charCounter() method which would count the number of times a letter occurs in the file.
/**
* Calculate the number of times a character is present in a character array
*
* #param myChars An array of characters from an input file, this should be parsed and formatted properly
* before sending to method
* #return A hashmap of all characters with their number of occurrences; if a
* letter is not in myChars it is not added to the HashMap
*/
public HashMap<Character, Integer> charCounter(char[] myChars) {
HashMap<Character, Integer> myCharCount = new HashMap<>();
if (myChars.length == 0) System.exit(1);
for (char c : myChars) {
if (myCharCount.containsKey(c)) {
//get the current number for the letter
int currentNum = myCharCount.get(c);
//Place the new number plus one to the HashMap
myCharCount.put(c, (currentNum + 1));
} else {
//Place the character in the HashMap with 1 occurrence
myCharCount.put(c, 1);
}
}
return myCharCount;
}
You could use some Stream magic, if you are using Java 8 for the grouping:
Map<String, Long> map = dummyString.chars() // Turn the String to an IntStream
.boxed() // Turn int to Integer to use Collectors.groupingBy
.collect(Collectors.groupingBy(
Character::toString, // Use the character as a key for the map
Collectors.counting())); // Count the occurrences
Now you could sort the result.
i have a program that takes tracks and how many times it was played and output it.. simple.. but i couldn't make the counting in a descending order. My second problem is that if there are multiple tracks with the same count, it should look at the track's name and print them in alphabetical order.. i reached the point where i can print everything as it should be without the order though, because I am using maps and whenever I use a list to sort it out, it gets sorted in ascending order.
Here is my code and output
import java.util.*;
import java.io.*;
import java.lang.*;
import lab.itunes.*;
public class Music {
public static void main(String[] args) throws Exception {
try {
Scanner input = new Scanner(System.in);
PrintStream output = new PrintStream(System.out);
Map<String,Integer> mapp = new HashMap<String,Integer>();
List<Integer> list1 = new ArrayList<Integer>();
output.print("Enter the name of the iTunes library XML file:");
String entry = input.nextLine();
Scanner fileInput = new Scanner(new File(entry));
Library music = new Library(entry); // this class was given to us.
Iterator<Track> itr = music.iterator(); // scan through it
while (itr.hasNext())
{
Track token = itr.next(); // get the tracks
mapp.put(token.getName(),token.getPlayCount()); // fill our map
list1.add(token.getPlayCount()); // fill our list too
}
for(Map.Entry<String,Integer> testo : mapp.entrySet()) {
String keys = testo.getKey();
Integer values = testo.getValue();
output.printf("%d\t%s%n",values,keys); // printing the keys and values in random order.
}
} catch (FileNotFoundException E) {
System.out.print("That file does not exist");
}
}
}
the output is this..
Enter the name of the iTunes library XML file:library.txt
87 Hotel California
54 Like a Rolling Stone
19 Billie Jean
75 Respect
26 Imagine
19 In the Ghetto
74 Macarena
27 Hey Jude
67 I Gotta Feeling
99 The Twist
can you please give me a hint for this? i worked for at least 4 hours to get this far.. thanks
Does the Library class have a sort() method? If not, you could add one and call sort() on the Library music just before you ask it for its iterator().
public class Library
{
// ... existing code ...
public void sort()
{
class TrackPlayCountComparator implements Comparator<Track>
{
#Override
public int compare(Track t1, Track t2) {
int compare = t2.getPlayCount() - t1.getPlayCount();
if (compare == 0) {
return t1.getName().compareTo(t2.getName());
}
return compare;
}
}
Collections.sort(this.tracks, new TrackPlayCountComparator());
}
}
Simplifies your code to this:
public static void main(String[] args) throws Exception
{
Scanner input = new Scanner(System.in);
System.out.print("Enter the name of the iTunes library XML file: ");
String entry = input.nextLine();
try {
input = new Scanner(new File(entry));
input.close();
Library music = new Library(entry); // this class was given to us.
music.sort(); // sort the tracks
PrintStream output = new PrintStream(System.out)
for (Iterator<Track> itr = music.iterator(); itr.hasNext(); ) {
Track track = itr.next();
output.printf("%d\t%s%n", track.getPlayCount(), track.getName());
}
} catch (FileNotFoundException E) {
System.out.print("That file does not exist");
}
}
I'm assuming your question is: how can I sort a map on the values, rather than the keys?
If so, here is some sample code to get you started:
map.entrySet().stream()
.sorted(Map.Entry.comparingByValue())
.map(entry -> entry.getKey() + "\t + entry.getValue())
.forEach(output::println);
If you need to sort in reverse order then just change the comparingByValue comparator:
.sorted(Map.Entry.comparingByValue((val1, val2) -> val2 - val2))
To sort by value then alphabetically:
.sorted((entry1, entry2) -> entry1.getValue() == entry2.getValue() ? entry1.getKey().compareTo(entry2.getKey())) : entry2.getValue() - entry1.getValue())
You could make that a bit neater by putting the comparator in a separate method.
private Comparator<Map.Entry<String, Integer>> songComparator() {
return (entry1, entry2) -> {
int difference = entry2.getValue() - entry1.getValue();
if (difference == 0) {
return entry1.getKey().compareTo(entry2.getKey()));
} else {
return difference;
}
}
}
you would then use songComparator to generate the comparator for sorted.
Use Collections.sort() to sort a collection by its natural order, or define a Comparator and pass it as the second argument.
First you must change your List to take the 'Track' type, and you no longer need a Map:
// the list will store every track
List<Track> tracks = new ArrayList<Track>();
String entry = input.nextLine();
Scanner fileInput = new Scanner(new File(entry));
Library music = new Library(entry); // this class was given to us.
Iterator<Track> itr = music.iterator(); // scan through it
while (itr.hasNext()) {
tracks.add(itr.next()); // add each track
}
// you can define classes anonymously:
Collections.sort(tracks, new Comparator<Track>()
{
#Override
public int compare(Track t1, Track t2) {
int diff = t2.getPlayCount() - t1.getPlayCount();
// if there is no difference in play count, return name comparison
return (diff == 0 ? t1.getName().compareTo(t2.getName()) : diff);
}
});
See Anonymous Classes for more information.