Map of Map - word pairs in java - stuck - java

I am using a MSDOS windows prompt to pipe in a file.. its a regular file with words.(not like abc,def,ghi..etc)
I am trying to write a program that counts how many times each word pair appears in a text file. A word pair consists of two consecutive words (i.e. a word and the word that directly follows it). In the first sentence of this paragraph, the words “counts” and “how” are a word pair.
What i want the program to do is, take this input :
abc def abc ghi abc def ghi jkl abc xyz abc abc abc ---
Should produce this output:
abc:
abc, 2
def, 2
ghi, 1
xyz, 1
def:
abc, 1
ghi, 1
ghi:
abc, 1
kl, 1
jkl:
abc, 1
xyz:
abc, 1
My input is not going to be like that though. My input will be more like:
"seattle amazoncom is expected to report"
so would i even need to test for "abc"?
MY BIGGEST issue is adding it to the map... so i think
I think i need to use a map of a map? I am not sure how to do this?
Map<String, Map<String, Integer>> uniqueWords = new HashMap<String, Map<String, Integer>>();
I think the map would produce this output for me: which is axactly what i want..
Key | Value number of times
--------------------------
abc | def, ghi, jkl 3
def | jkl, mno 2
if that map is correct, in my situation how would i add to it from the file?
I have tried:
if(words.contain("abc")) // would i even need to test for abc?????
{
uniqueWords.put("abc", words, ?) // not sure what to do about this?
}
this is what i have so far.
import java.util.Scanner;
import java.util.ArrayList;
import java.util.TreeSet;
import java.util.Iterator;
import java.util.HashSet;
public class Project1
{
public static void main(String[] args)
{
Scanner sc = new Scanner(System.in);
String word;
String grab;
int number;
// ArrayList<String> a = new ArrayList<String>();
// TreeSet<String> words = new TreeSet<String>();
Map<String, Map<String, Integer>> uniquWords = new HashMap<String, Map<String, Integer>>();
System.out.println("project 1\n");
while (sc.hasNext())
{
word = sc.next();
word = word.toLowerCase();
if (word.matches("abc")) // would i even need to test for abc?????
{
uniqueWords.put("abc", word); // syntax incorrect i still need an int!
}
if (word.equals("---"))
{
break;
}
}
System.out.println("size");
System.out.println(uniqueWords.size());
System.out.println("unique words");
System.out.println(uniqueWords.size());
System.out.println("\nbye...");
}
}
I hope someone can help me because i am banging my head and not learnign anything for weeks now.. Thank you...

I came up with this solution. I think your idea with the Map may be more elegant, but run this an lets see if we can refine:
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
public class Main {
private static List<String> inputWords = new ArrayList<String>();
private static Map<String, List<String>> result = new HashMap<String, List<String>>();
public static void main(String[] args) {
collectInput();
process();
generateOutput();
}
/*
* Modify this method to collect the input
* however you require it
*/
private static void collectInput(){
// test code
inputWords.add("abc");
inputWords.add("def");
inputWords.add("abc");
inputWords.add("ghi");
inputWords.add("abc");
inputWords.add("def");
inputWords.add("abc");
}
private static void process(){
// Iterate through every word in our input list
for(int i = 0; i < inputWords.size() - 1; i++){
// Create references to this word and next word:
String thisWord = inputWords.get(i);
String nextWord = inputWords.get(i+1);
// If this word is not in the result Map yet,
// then add it and create a new empy list for it.
if(!result.containsKey(thisWord)){
result.put(thisWord, new ArrayList<String>());
}
// Add nextWord to the list of adjacent words to thisWord:
result.get(thisWord).add(nextWord);
}
}
/*
* Rework this method to output results as you need them:
*/
private static void generateOutput(){
for(Entry e : result.entrySet()){
System.out.println("Symbol: " + e.getKey());
// Count the number of unique instances in the list:
Map<String, Integer>count = new HashMap<String, Integer>();
List<String>words = (List)e.getValue();
for(String s : words){
if(!count.containsKey(s)){
count.put(s, 1);
}
else{
count.put(s, count.get(s) + 1);
}
}
// Print the occurances of following symbols:
for(Entry f : count.entrySet()){
System.out.println("\t following symbol: " + f.getKey() + " : " + f.getValue());
}
}
System.out.println();
}
}

In your table, you have Key | Value | Number of times. Is the "nubmer of times" specific to each of second words? This may work.
My suggestion in your last question was to use a map of Lists. Each unique word would have an associated List (empty to begin with). At the end of processing you would count up all identical words in the list to get a total:
Key | List of following words
abc | def def ghi mno ghi
Now, you could count identical items in your list to find out that:
abc --> def = 2
abc --> ghi = 2
abc --> mno = 1
I think this approach or yours would work well. I'll put some code together and update this post is nobody else responds.

You have initialized uniqueWords as a Map of Maps, not a Map of Strings as you are trying to populate it. For your design to work, you need to put a Map<String, Integer> as the value for the "abc" key.
....
Map<String, Map<String, Integer>> uniquWords = new HashMap<String, Map<String, Integer>>();
System.out.println("project 1\n");
while (sc.hasNext())
{
word = sc.next();
word = word.toLowerCase();
if (word.matches("abc")) // would i even need to test for abc?????
// no, just use the word
{
uniqueWords.put("abc", word); // <-- here you are putting a String value, instead of a Map<String, Integer>
}
if (word.equals("---"))
{
break;
}
}
Instead, you could do something akin to the following brute-force approach:
Map<String, Integer> followingWordsAndCnts = uniqueWords.get(word);
if (followingWordsAndCnts == null) {
followingWordsAndCnts = new HashMap<String,Integer>();
uniqueWords.put(word, followingWordsAndCnts);
}
if (sc.hasNext()) {
word = sc.next().toLowerCase();
Integer cnt = followingWordsAndCnts.get(word);
followingWordsAndCnts.put(word, cnt == null? 1 : cnt + 1);
}
You could make this a recursive method to ensure that each word gets its turn as the following word and the word that is being followed.

for each key (e.g. "abc") you want to store another string (e.g. "def","abc") paired with an integer(1,2)
I would download google collections and use a Map<String, Multiset<String>>
Map<String, Multiset<String>> myMap = new HashMap<String, Multiset<String>>();
...
void addPair(String word1, String word2) {
Multiset<String> set = myMap.get(word1);
if(set==null) {
set = HashMultiMap.create();
myMap.put(word1,set);
}
set.add(word2);
}
int getOccurs(String word1, String word2) {
if(myMap.containsKey(word1))
return myMap.get(word1).count(word2);
return 0;
}
If you don't want to use a Multiset, you can create the logical equivalents(for your purposes, not general purpose):
Multiset<String> === Map<String,Integer>
Map<String, Multiset<String>> === Map<String, Map<String,Integer>>

To make your answer in alphabetically order... Simply make all HashMap into TreeMap. For example:
new HashMap>();'
into
new TreeMap>();
and dont forget to add import java.util.TreeMap;

Related

How to add digits to a created stopwords list in Java?

I have a method which creates a stopword list with the 10% of most frequent words from the lemmas key in my JSON file – which looks like this:
{..
,"lemmas":{
"doc41":"the dynamically expand when there too many collision i e have distinct hash code but fall into same slot modulo size expect average effect"
,"doc40":"retrieval operation include get generally do block so may overlap update operation include put remove retrieval reflect result any non null k new longadder increment"
,"doc42":"a set projection"..
}
}
private static List<String> StopWordsFile(ConcurrentHashMap<String, String> lemmas) {
// ConcurrentHashMap stores each word and its frequency
ConcurrentHashMap<String, Integer> counts = new ConcurrentHashMap<String, Integer>();
// Array List for all the individual words
ArrayList<String> corpus = new ArrayList<String>();
for (Entry<String, String> entry : lemmas.entrySet()) {
String line = entry.getValue().toLowerCase();
line = line.replaceAll("\\p{Punct}", " ");
line = line.replaceAll("\\d+"," ");
line = line.replaceAll("\\s+", " ");
line = line.trim();
String[] value = line.split(" ");
List<String> words = new ArrayList<String>(Arrays.asList(value));
corpus.addAll(words);
}
// count all the words in the corpus and store the words with each frequency in
// the counts
for (String word : corpus) {
if (counts.keySet().contains(word)) {
counts.put(word, counts.get(word) + 1);
} else {
counts.put(word, 1);
}
}
// Create a list to store all the words with their frequency and sort it by values.
List<Entry<String, Integer>> list = new ArrayList<>(counts.entrySet());
list.sort((e2, e1) -> e1.getValue().compareTo(e2.getValue()));
List<Entry<String, Integer>> stopwordslist = new ArrayList<>(list.subList(0, (int) (0.10 * list.size())));
// Create the stopwords list with the 10% most frequent words
List<String> stopwords = new ArrayList<>();
// for (Map.Entry<String, Integer> e : sublist) {
for (ConcurrentHashMap.Entry<String, Integer> e : stopwordslist) {
stopwords.add(e.getKey());
}
System.out.println(stopwords);
return stopwords;
}
It outputs these words:
[the, of, value, v, key, to, given, a, k, map, in, for, this, returns, if, is, super, null, ... that, none]
I want to add single digits to it such as '1,2,3,4,5,6,7,8,9' or/and another stopwords.txt file containing digits.
How can I do that?
Also, how can I output this stopwords list to a CSV file? Can someone point me in the right direction?
I'm new to Java.

How to find the max number of an unique string element in a alphanumeric Array list in java

I have list that has alphanumeric elements. I want to find the maximum number of each elements individually.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
public class Collect {
public static void main(String[] args) {
List<String> alphaNumericList = new ArrayList<String>();
alphaNumericList.add("Demo.23");
alphaNumericList.add("Demo.1000");
alphaNumericList.add("Demo.12");
alphaNumericList.add("Demo.12");
alphaNumericList.add("Test.01");
alphaNumericList.add("Test.02");
alphaNumericList.add("Test.100");
alphaNumericList.add("Test.99");
Collections.sort(alphaNumericList);
System.out.println("Output "+Arrays.asList(alphaNumericList));
}
I need filter only below values. For that I am sorting the list but it filters based on the string rather than int value. I want to achieve in an efficient way. Please suggest on this.
Demo.1000
Test.100
Output [[Demo.1000, Demo.12, Demo.12, Demo.23, Test.01, Test.02, Test.100, Test.99]]
You can either create a special AlphaNumericList type, wrapping the array list or whatever collection(s) you want to use internally, giving it a nice public interface to work with, or for the simplest case if you want to stick to the ArrayList<String>, just use a Comparator for sort(..):
package de.scrum_master.stackoverflow.q60482676;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import static java.lang.Integer.parseInt;
public class Collect {
public static void main(String[] args) {
List<String> alphaNumericList = Arrays.asList(
"Demo.23", "Demo.1000", "Demo.12", "Demo.12",
"Test.01", "Test.02", "Test.100", "Test.99"
);
Collections.sort(
alphaNumericList,
(o1, o2) ->
((Integer) parseInt(o1.split("[.]")[1])).compareTo(parseInt(o2.split("[.]")[1]))
);
System.out.println("Output " + alphaNumericList);
}
}
This will yield the following console log:
Output [Test.01, Test.02, Demo.12, Demo.12, Demo.23, Test.99, Test.100, Demo.1000]
Please let me know if you don't understand lambda syntax. You can also use an anonymous class instead like in pre-8 versions of Java.
Update 1: If you want to refactor the one-line lambda for better readability, maybe you prefer this:
Collections.sort(
alphaNumericList,
(text1, text2) -> {
Integer number1 = parseInt(text1.split("[.]")[1]);
int number2 = parseInt(text2.split("[.]")[1]);
return number1.compareTo(number2);
}
);
Update 2: If more than one dot "." character can occur in your strings, you need to get the numeric substring in a different way via regex match, still not complicated:
Collections.sort(
alphaNumericList,
(text1, text2) -> {
Integer number1 = parseInt(text1.replaceFirst(".*[.]", ""));
int number2 = parseInt(text2.replaceFirst(".*[.]", ""));
return number1.compareTo(number2);
}
);
Update 3: I just noticed that for some weird reason you put the sorted list into another list via Arrays.asList(alphaNumericList) when printing. I have replaced that by just alphaNumericList in the code above and also updated the console log. Before the output was like [[foo, bar, zot]], i.e. a nested list with one element.
Check below answer:
public static void main(String[] args) {
List<String> alphaNumericList = new ArrayList<String>();
alphaNumericList.add("Demo.23");
alphaNumericList.add("Demo.1000");
alphaNumericList.add("Demo.12");
alphaNumericList.add("Demo.12");
alphaNumericList.add("Test.01");
alphaNumericList.add("Test.02");
alphaNumericList.add("Test.100");
alphaNumericList.add("Test.99");
Map<String, List<Integer>> map = new HashMap<>();
for (String val : alphaNumericList) {
String key = val.split("\\.")[0];
Integer value = Integer.valueOf(val.split("\\.")[1]);
if (map.containsKey(key)) {
map.get(key).add(value);
} else {
List<Integer> intList = new ArrayList<>();
intList.add(value);
map.put(key, intList);
}
}
for (Map.Entry<String, List<Integer>> entry : map.entrySet()) {
List<Integer> valueList = entry.getValue();
Collections.sort(valueList, Collections.reverseOrder());
System.out.print(entry.getKey() + "." + valueList.get(0) + " ");
}
}
Using stream and toMap() collector.
Map<String, Long> result = alphaNumericList.stream().collect(
toMap(k -> k.split("\\.")[0], v -> Long.parseLong(v.split("\\.")[1]), maxBy(Long::compare)));
The result map will contain word part as a key and maximum number as a value of the map(in your example the map will contain {Demo=1000, Test=100})
a. Assuming there are string of type Demo. and Test. in your arraylist.
b. It should be trivial to filter out elements with String Demo. and then extract the max integer for same.
c. Same should be applicable for extracting out max number associated with Test.
Please check the following snippet of code to achieve the same.
Set<String> uniqueString = alphaNumericList.stream().map(c->c.replaceAll("\\.[0-9]*","")).collect(Collectors.toSet());
Map<String,Integer> map = new HashMap<>();
for(String s:uniqueString){
int max= alphaNumericList.stream().filter(c -> c.startsWith(s+".")).map(c -> c.replaceAll(s+"\\.","")).map(c-> Integer.parseInt(c)).max(Integer::compare).get();
map.put(s,max);
}

String sort in which each word in the string contains a number indicating its position in the result

I have to sort a string in which each word in the string contains a number which tells the sort position of that word in the resultant string.Numbers can be from 1 to 9.The words in the String contains only valid consecutive numbers.
Eg: "is2 this1 test4 3a"
What is the most efficient way to solve this after splitting the string using space as the splitter, how to compare and arrange it using minimum number of loops?
Try this:
private final static String testString = "is2 this1 test4 3a";
public static void main(String[] args){
String[] splittedString = testString.split(" ");
Map<Integer, String> map = new TreeMap<Integer, String>();
for(String position: splittedString) {
map.put(Integer.parseInt(position.replaceAll("[^\\d.]" , "")), position) ;
}
Test the logic:
for(Map.Entry<Integer, String> entry : map.entrySet())
System.out.println(entry.getKey() + " " + entry.getValue());
...
Key here is using a TreeMap (Java standard implementation of sorted Map), that hides order work.
Rest is quite obvious I suppose...hardest thing is the regex that "cleans" string taking pure number value...
Last step:
String[] result = map.values().toArray(new String[map.size()]);
System.out.println(Arrays.toString(result).replace(",",""));
Hope it helps!

How to use keySet() to retrieve a set of keys within a HashMap, loop over it and find its count for each key?

I am nearing the end of my assignment and one of the last thing that I have been instructed t do is to:
Use keySet() to retrieve the set of keys (the String part of the mapping). Loop over this set and print out the word and its count.
I have used the keySet() method to print out the number of keys within my HashMap, but what I have not done yet is find the length for each word within the HashMap and then print out the number of characters within each word. I am currently unsure as to how I should do this. I'm assuming I would use some time of for loop to iterate through the keySet() which I have already done and then use something like a .length() method to find out the length of each word and then print it out somehow?
Here is my relevant code so far:
Main class
package QuestionEditor;
import java.util.Set;
public class Main{
public static void main (String[] args) {
WordGroup secondWordGroup = new WordGroup ("When-you-play-play-hard-when-you-work-dont-play-at-all");
Set<String> set = secondWordGroup.getWordCountsMap().keySet();
System.out.println("set : " + set + "\n");
for(String key : set)
{
System.out.println(key);
}
}
}
WodGroup class
package QuestionEditor;
import java.util.HashMap;
public class WordGroup {
String word;
// Creates constructor which stores a string value in variable "word" and converts this into lower case using
// the lower case method.
public WordGroup(String aString) {
this.word = aString.toLowerCase();
}
public String[] getWordArray() {
String[] wordArray = word.split("-");
return wordArray;
}
public HashMap<String, Integer> getWordCountsMap() {
HashMap<String, Integer> myHashMap = new HashMap<String, Integer>();
for (String word : this.getWordArray()) {
if (myHashMap.keySet().contains(word)) {
myHashMap.put(word, myHashMap.get(word) + 1);
} else {
myHashMap.put(word, 1);
}
}
return myHashMap;
}
}
Any help on how to do this would be greatly appreciated, thanks.
UPDATE
So when my code compiles, I am getting the output:
Key: play has 3 counter
Key: all has 1 counter
Key: at has 1 counter
Key: work has 1 counter
Key: hard has 1 counter
Key: when has 2 counter
Key: you has 2 counter
Key: dont has 1 counter
But what I actually want it to do is to print out the amount of characters within each word. So for example, play would count 4 times, all would count 3 times, at would count 2 times etc. Any ideas on how to implement this?
The part that you are probably missing is: you can use the keys to then access your map values, like this:
Map<String, Integer> whateverMap = ... coming from somewhere
for (String key : whateverMap.keySet()) {
Integer countFromMap = whateverMap.get(key);
System.out.println("Key: " + key + " has " + countFromMap + " counter");
The above is meant as example to get you going, I didn't run it through a compiler.
My point here: there are various ways to iterate the elements you stored within a Map. You can use entrySet() to retrieve Entry objects; or you iterate the keys, and lookup values using each key.
You can use stream API from Java 8 to create a Map<String,Integer>
Map<String, Integer> stringIntegerMap = set.stream().collect(HashMap::new,
(hashMap, s) -> hashMap.put(s, s.length()), HashMap::putAll);
stringIntegerMap.forEach((key,value) ->System.out.println(key + " has length: "+key.length() + " and count: "+value));
The second parameter to collect function is an accumulator. You are accumulating a hasmap of your string from keyset and it's length

How to find all error messages and display them in descending order

Hi I am trying to sort input file from user for error messages in descending orders of occurrence.
input_file.txt
23545 debug code_to_debug
43535 error check your code
34243 error check values
32442 run program execute
24525 error check your code
I want to get output as
error check your code
error check values
My code currently:
import java.io.*;
import java.util.*;
public class Sort {
public static void main(String[] args) throws Exception {
BufferedReader reader = new BufferedReader(new FileReader("fileToRead"));
Map<String, String> map=new TreeMap<String, String>();
String line="";
while((line=reader.readLine())!=null){
map.put(getField(line),line);
}
reader.close();
FileWriter writer = new FileWriter("fileToWrite");
for(String val : map.values()){
writer.write(val);
writer.write('\n');
}
writer.close();
}
private static String getField(String line) {
return line.split(" ")[0];//extract value you want to sort on
}
}
Change your mapping from <String, String> to <Integer, String>. Then, use a custom Comparator to compare the Integers from least to greatest.
It appears that your error messages are ranked by an integer value from most severe to least severe. This should allow you to use that fact.
Rather than having a Map<String,String> where the key is the integer value you could have the key as the error message and then the value could hold a list of the integer values so when reading the file it would become something like and also implement a comparator in the map to order them:
Map<String, String> map = new TreeMap<String, List<String>>(new Comparator<String>()
{
#Override
public int compare(String s1, String s2)
{
//Implement a compare to get the order of string you want
}
}
);
String line = "";
while((line = reader.readLine()) != null)
{
String lineStr = line.split(" ")[1]; // get the message
List<String> vals = map.get(lineStr) // get the existing list
if( vals == null)
vals = new ArrayList<String>(); // create a new list if there isn't one
vals.add(getFeild(line)); // add the int value to the list
map.put(lineStr,vals); // add to map
}
You could then sort the list into numeric order if you wanted. Also this would then require a bit more work to print out the map - but this depends on the format
If all you want to do is reorder the input so all the error messages appear at the top, a very simple way to do it is like the following:
static String[] errorsToTop(String[] input) {
String[] output = new String[input.length];
int i = 0;
for(String line : input) {
if(line.contains("error"))
output[i++] = line;
}
for(String line : input) {
if(!line.contains("error"))
output[i++] = line;
}
return output;
}
That just copies the array first starting with all errors messages, then will all non-error messages.
It's also possible to make those two loops a nested loop though the logic is less obvious.
static String[] errorsToTop(String[] input) {
String[] output = new String[input.length];
int i = 0;
boolean not = false;
do {
for(String line : input) {
if(line.contains("error") ^ not)
output[i++] = line;
}
} while(not = !not);
return output;
}
It's unclear to me whether the numbers appear in your input text file or not. If they don't, you can use startsWith instead of contains:
if(line.startsWith("error"))
You could also use matches with a regex like:
if(line.matches("^\\d+ error[\\s\\S]*"))
which says "starts with any integer followed by a space followed by error followed by anything or nothing".
Since no answer has been marked I'll add 2 cents.
This code below works for exactly what you posted (and maybe nothing else), it assumes that errors have higher numbers than non errors, and that you are grabbing top N of lines based on a time slice or something.
import java.util.NavigableMap;
import java.util.TreeMap;
public class SortDesc {
public static void main(String[] args) {
NavigableMap<Integer, String> descendingMap = new TreeMap<Integer, String>().descendingMap();
descendingMap.put(23545, "debug code_to_debug");
descendingMap.put(43535, "error check your code");
descendingMap.put(34243, "error check values");
descendingMap.put(32442, "run program execute");
descendingMap.put(24525, "error check your code");
System.out.println(descendingMap);
}
}
results look like this
{43535=error check your code, 34243=error check values, 32442=run program execute, 24525=error check your code, 23545=debug code_to_debug}

Categories

Resources