Trying to print all words in a trie in java - java

I'm using a trie structure called a dictionary tree which I want to print all words from. When I insert a word when I reach the last letter in the word I store the completed word in Dictionary Tree.
private Map<Character, DictionaryTree> children = new LinkedHashMap<>();
private String completeWord;
void insertionHelper(String currentPortion, String fullWord){
if(currentPortion.length() == 1){
if(children.containsKey(currentPortion.charAt(0))){
// do nothing
}else{
this.children.put(currentPortion.charAt(0), new DictionaryTree());
}
this.completeWord = fullWord;
}else{
if(children.containsKey(currentPortion.charAt(0))){
children.get(currentPortion.charAt(0)).insertionHelper(currentPortion.substring(1), fullWord);
}else{
DictionaryTree a = new DictionaryTree();
a.insertionHelper(currentPortion.substring(1), fullWord);
children.put(currentPortion.charAt(0), a);
}
}
}
After this when Looking for all words I traverse every node and try to add the words to a static array List, however, for some reason many of the words are duplicated and others are missing.
String allWordHelper(){
String holder = " ";
for (Map.Entry<Character, DictionaryTree> child : children.entrySet()) {
if(completeWord != null){
//holder += completeWord + child.getValue().allWordHelper();
Word_Holder.allWords.add(completeWord);
}else{
holder += child.getValue().allWordHelper();
}
}
return holder;
}
I can't figure out why.

I have no idea what a DictionaryTree is and what your indata looks like but if you do
children.put(currentPortion.charAt(0), a);
doesn't that mean that whenever you get a words that starts with the same character as a previous word then the old word might be replaced by the new one?
It's quite impossible to fully understand your code with an unknown data type and all the recursive calls.

Related

Word Count Program using HashMaps

import java.io.*;
import java.util.*;
public class ListSetMap2
{
public static void main(String[] args)
{
Map<String, Integer> my_collection = new HashMap<String, Integer>();
Scanner keyboard = new Scanner(System.in);
System.out.println("Enter a file name");
String filenameString = keyboard.nextLine();
File filename = new File(filenameString);
int word_position = 1;
int word_num = 1;
try
{
Scanner data_store = new Scanner(filename);
System.out.println("Opening " + filenameString);
while(data_store.hasNext())
{
String word = data_store.next();
if(word.length() > 5)
{
if(my_collection.containsKey(word))
{
my_collection.get(my_collection.containsKey(word));
Integer p = (Integer) my_collection.get(word_num++);
my_collection.put(word, p);
}
else
{
Integer i = (Integer) my_collection.get(word_num);
my_collection.put(word, i);
}
}
}
}
catch (FileNotFoundException e)
{
System.out.println("Nope!");
}
}
}
I'm trying to write a program where it inputs/scans a file, logs the words in a HashMap collection, and count's the times that word occurs in the document, with only words over 5 characters being counted.
It's a bit of a mess in the middle, but I'm running into issues on how to count the number of times that word occurs, and keeping a individual count for each word. I'm sure there is a simple solution here and I'm just missing it. Please help!
Your logic of setting the frequency of word is wrong. Here is a simple approach that should work for you:
// if the word is already present in the hashmap
if (my_collection.containsKey(word)) {
// just increment the current frequency of the word
// this overrides the existing frequency
my_collection.put(word, my_collection.get(word) + 1);
} else {
// since the word is not there just put it with a frequency 1
my_collection.put(word, 1);
}
(Only giving hints, since this seems to be homework.) my_collection is (correctly) a HashMap that maps String keys to Integer values; in your situation, a key is supposed to be a word, and the corresponding value is supposed to be the number of times you have seen that word (frequency). Each time you call my_collection.get(x), the parameter x needs to be a String, namely the word whose frequency you want to know (unfortunately, HashMap doesn't enforce this). Each time you call my_collection.put(x, y), x needs to be a String, and y needs to be an Integer or int, namely the frequency for that word.
Given this, give some more thought to what you're using as parameters, and the sequence in which you need to make the calls and how you need to manipulate the values. For example, if you've already determined that my_collection doesn't contain the word, does it make sense to ask my_collection for the word's frequency? If it does contain the word, how do you need to change the frequency before putting the new value into my_collection?
(Also, please choose a more descriptive name for my_collection, e.g. frequencies.)
Try this way -
while(data_store.hasNext()) {
String word = data_store.next();
if(word.length() > 5){
if(my_collection.get(word)==null) my_collection.put(1);
else{
my_collection.put(my_collection.get(word)+1);
}
}
}

Most efficient data structure for storing an alphabetically ordered word list

My program will read in a paragraph of words (stored in a text file). It will then need to do the following:
Print out a list of all the words (alphabetical order). For each word, print the frequency count (how many times the word appears in the entire paragraph) and the line numbers in which the word appears on (does not need to be ordered). If a word appears on a line multiple times, the line number does not need to be stored twice (the frequency count of this word will still be updated)
Display a list of words ordered from most frequent to least frequent.
The user will input a specific word. If the word is found, print out its frequency count.
Limitations: I cannot use the Collections class and I cannot store data multiple times. (e.g. Reading in words from the paragraph and storing them into a Set and an ArrayList)
Coding this won't be hard, but I can't figure out what would be the most efficient implementation since the data size could be a few paragraphs from a Wikipedia article or something. Here's my idea for now:
Have a Word class. This Word class will contain methods to return the word's frequency count and the lines in which the word appears on (and other relevant data).
The paragraph will be stored in a text file. The program will read the data line by line. Split the line into an array and read in words one by one.
As words are being read in from the text file, put the words into some sort of structure. If the structure does not contain the word, create a new word object.
If the structure already contains the word, update the frequency counter for that word.
I will also have a int to record down the line number. These line numbers will be updated accordingly.
This is somewhat incomplete, but it is what I'm thinking for now. The whole 'Word' class may probably be completely unnecessary, too.
First, you could create a class that holds the data for the occurrences and the row numbers (along with the word). This class could implement the Comparable interface, providing easy comparisons based on the word frequencies:
public class WordOccurrence implements Comparable<WordOccurrence> {
private final String word;
private int totalCount = 0;
private Set<Integer> lineNumbers = new TreeSet<>();
public WordOccurrence(String word, int firstLineNumber) {
this.word = word;
addOccurrence(firstLineNumber);
}
public final void addOccurrence(int lineNumber) {
totalCount++;
lineNumbers.add(lineNumber);
}
#Override
public int compareTo(WordOccurrence o) {
return totalCount - o.totalCount;
}
#Override
public String toString() {
StringBuilder lineNumberInfo = new StringBuilder("[");
for (int line : lineNumbers) {
if (lineNumberInfo.length() > 1) {
lineNumberInfo.append(", ");
}
lineNumberInfo.append(line);
}
lineNumberInfo.append("]");
return word + ", occurences: " + totalCount + ", on rows "
+ lineNumberInfo.toString();
}
}
When reading the words from the file, it's useful to return the data in a Map<String, WordOccurrence>, mapping words into WordOccurrences. Using a TreeMap, you'll get alphabetical ordering "for free". Also, you may want to remove punctuation from the lines (e.g. using a regexp like \\p{P}) and ignore the case of the words:
public TreeMap<String, WordOccurrence> countOccurrences(String filePath)
throws IOException {
TreeMap<String, WordOccurrence> words = new TreeMap<>();
File file = new File(filePath);
BufferedReader reader = new BufferedReader(new InputStreamReader(
new FileInputStream(file)));
String line = null;
int lineNumber = 0;
while ((line = reader.readLine()) != null) {
// remove punctuation and normalize to lower-case
line = line.replaceAll("\\p{P}", "").toLowerCase();
lineNumber++;
String[] tokens = line.split("\\s+");
for (String token : tokens) {
if (words.containsKey(token)) {
words.get(token).addOccurrence(lineNumber);
} else {
words.put(token, new WordOccurrence(token, lineNumber));
}
}
}
return words;
}
Displaying the occurrences in alphabetical order using the above code is as simple as
for (Map.Entry<String, WordOccurrence> entry :
countOccurrences("path/to/file").entrySet()) {
System.out.println(entry.getValue());
}
If you cannot use Collections.sort() (and a Comparator<WordOccurrence>) for sorting by occurrences, you need to write the sorting yourself. Something like this should do it:
public static void displayInOrderOfOccurrence(
Map<String, WordOccurrence> words) {
List<WordOccurrence> orderedByOccurrence = new ArrayList<>();
// sort
for (Map.Entry<String, WordOccurrence> entry : words.entrySet()) {
WordOccurrence wo = entry.getValue();
// initialize the list on the first round
if (orderedByOccurrence.isEmpty()) {
orderedByOccurrence.add(wo);
} else {
for (int i = 0; i < orderedByOccurrence.size(); i++) {
if (wo.compareTo(orderedByOccurrence.get(i)) > 0) {
orderedByOccurrence.add(i, wo);
break;
} else if (i == orderedByOccurrence.size() - 1) {
orderedByOccurrence.add(wo);
break;
}
}
}
}
// display
for (WordOccurrence wo : orderedByOccurence) {
System.out.println(wo);
}
}
Running the above code using the following test data:
Potato; orange.
Banana; apple, apple; potato.
Potato.
will produce this output:
apple, occurrences: 2, on rows [2]
banana, occurrences: 1, on rows [2]
orange, occurrences: 1, on rows [1]
potato, occurrences: 3, on rows [1, 2, 3]
potato, occurrences: 3, on rows [1, 2, 3]
apple, occurrences: 2, on rows [2]
banana, occurrences: 1, on rows [2]
orange, occurrences: 1, on rows [1]
You can use a simple TreeMap<String, Integer> for frequency lookups.
Lookups should be O(1), given that the words are short(i.e what you would find a normal text). If you expect lots of unsuccessful lookups (lots of searches for words that don't exist), you could prefilter using a Bloom Filter.
I'd start with a straightforward implementation, and optimize further if needed (parse the stream directly, instead of splitting each line with a separator and reiterating).
you can use TreeMap it is very suitable for getting the data ordered. use your word as key and the frequency as value. for example let the following is you paragraph
Java is good language Java is object oriented
so I will do the following in order to store each word and its frequency
String s = "Java is good language Java is object oriented" ;
String strArr [] = s.split(" ") ;
TreeMap<String, Integer> tm = new TreeMap<String, Integer>();
for(String str : strArr){
if(tm.get(str) == null){
tm.put(str, 1) ;
}else{
int count = tm.get(str) ;
count+=1 ;
}
}
hopefully this will help you
you can have a structure like this one :
https://gist.github.com/jeorfevre/946ede55ad93cc811cf8
/**
*
* #author Jean-Emmanuel je#Rizze.com
*
*/
public class WordsIndex{
HashMap<String, Word> words = new HashMap<String, Word>();
public static void put(String word, int line, int paragraph){
word=word.toLowerCase();
if(words.containsKey(word)){
Word w=words.get(word);
w.count++;
}else{
//new word
Word w = new Word();
w.count=1;
w.line=line;
w.paragraph=paragraph;
w.word=word;
words.put(word, w);
}
}
}
public class Word{
String word;
int count;
int line;
int paragraph;
}
enjoy

Find the length of longest chain formed using given words in String

Okk As programmer we love get involved in logic building but that is not the case some time we become blank over some type of puzzle as below mentioned. Let me declare that this is not any kind of homework or job stuff it simply a logic and performance practice puzzle.Okk the puzzle of given an Strings` with comma separated words like
String S= peas,sugar,rice,soup
Now crux is to find out length of longest chain of the words like last character of word should be the first character of next word and so on to create a longest possible chain and finally to calculate the length of that chain.
Now I had tried to figure out some sort of solution like
split the string with comma
add them in list
sort that list
etc
but now how to develop further logic As I m little poor over logic development,Help is appreciated and if above half logic is not proper as it should be than what must the simple sort and perfect way to get the length of the longest chain of words.
Summary
input: String S= peas,sugar,rice,soup.
output: 4 length of words (peas->sugar->rice->soup) or (soup->peas->sugar->rice) etc
Once you have list (or array) you can iterate over the array checking your condition (equality of last letter of n-th words with the first letter of first word) and increase counter each time. Once the condition is false just escape the loop. Your counter will hold value you need.
okk friends here the logic and core part which I had made and my puzzle got solved
import java.util.Map;
import java.util.Stack;
public class CandidateCode
{
public static int chainLength=0;
public static void main(String[] args) {
String s= "peas,sugar,rice,soup";
int chainLengthfinal=wordChain(s);
System.out.println("final length:"+chainLengthfinal);
}
public static int wordChain(String input1)
{
List<String> stringList = new ArrayList<String>();
stringList= Arrays.asList(input1.split(","));
boolean ischain = new CandidateCode().hasChain(stringList);
if (ischain) {
return chainLength;
}
return 0;
}
Map<Character, List<String>> startsWith = new HashMap<Character, List<String>>();
Map<Character, List<String>> endsWith = new HashMap<Character, List<String>>();
private Character getFirstChar(String str) {
return str.charAt(0);
}
private Character getLastChar(String str) {
return str.charAt(str.length() - 1);
}
boolean hasChain(List<String> stringList) {
for (String str : stringList) {
Character start = getFirstChar(str);
Character end = getLastChar(str);
List<String> startsWithList;
List<String> endsWithList;
if (startsWith.containsKey(start)) {
startsWithList = startsWith.get(start);
} else {
startsWithList = new ArrayList<String>();
startsWith.put(start, startsWithList);
}
if (endsWith.containsKey(end)) {
endsWithList = endsWith.get(end);
} else {
endsWithList = new ArrayList<String>();
endsWith.put(end, endsWithList);
}
startsWithList.add(str);
endsWithList.add(str);
}
Stack<String> stringStack = new Stack<String>();
for (String str : stringList) {
if (hasChain(stringList.size(), str, stringStack)) {
System.out.println(stringStack);
System.out.println("size "+stringStack.size());
chainLength= stringStack.size();
return true;
}
}
return false;
}
private boolean hasChain(int size, String startString, Stack<String> stringStack) {
if (size == stringStack.size()) return true;
Character last = getLastChar(startString);
if (startsWith.containsKey(last)) {
List<String> stringList = startsWith.get(last);
for (int i = 0; i < stringList.size(); i++) {
String candidate = stringList.remove(i--);
stringStack.push(candidate);
if (hasChain(size, candidate, stringStack)) {
return true;
}
stringStack.pop();
stringList.add(++i, candidate);
}
}
return false;
}
}
output of the above program will be
[soup, peas, sugar, rice]
size 4.
final length:4.
initialize a " " string named last(String last=" ")
get the first string by splitting with comma
substring the last char of the string and store it to last
boolean brokenchain=false;
length=0;
while(more string to split with comma)&&(!brokenchain){
split string with comma
substring to get first char
if(first char!=last){
brokenchain=true;
}else{
length++;
get last char of this string with substring and store it to last
}
}
if you have for input a sequence of legth 5 and the it brokes and there is a sequence of length 6 following which you want to count and print as output, you have to store the count variable in a map, for example, as a key associated with the sequence as far. then you continue the loop(you have to make the brokenchain=false again) until the input string sequence ends. then you get the bigger key from your map and print it with his associated value(the biggest sequence)
I think you need to find the largest and smallest number.
split the string with comma
add them as list_item
compare list_item1 and list_item2, the largest value becomes list_item_X
compare list_item3 and list_item4, the largest value becomes list_item_Y
Now compare list_item1 and list_item_X, the largest value becomes
So the largest value is list_item_Z, here is implimentation through code.
$s = 'peas,sugar,rice,soup';
$list_items = explode(',', $s);
$lengths = array_map('strlen', $list_items);
echo "The shortest is " . min($lengths) .
". The longest is " . max($lengths);

Using a user inputted string of characters find the longest word that can be made

Basically I want to create a program which simulates the 'Countdown' game on Channel 4. In effect a user must input 9 letters and the program will search for the largest word in the dictionary that can be made from these letters.I think a tree structure would be better to go with rather than hash tables. I already have a file which contains the words in the dictionary and will be using file io.
This is my file io class:
public static void main(String[] args){
FileIO reader = new FileIO();
String[] contents = reader.load("dictionary.txt");
}
This is what I have so far in my Countdown class
public static void main(String[] args) throws IOException{
Scanner scan = new Scanner(System.in);
letters = scan.NextLine();
}
I get totally lost from here. I know this is only the start but I'm not looking for answers. I just want a small bit of help and maybe a pointer in the right direction. I'm only new to java and found this question in an interview book and thought I should give it a .
Thanks in advance
welcome to the world of Java :)
The first thing I see there that you have two main methods, you don't actually need that. Your program will have a single entry point in most cases then it does all its logic and handles user input and everything.
You're thinking of a tree structure which is good, though there might be a better idea to store this. Try this: http://en.wikipedia.org/wiki/Trie
What your program has to do is read all the words from the file line by line, and in this process build your data structure, the tree. When that's done you can ask the user for input and after the input is entered you can search the tree.
Since you asked specifically not to provide answers I won't put code here, but feel free to ask if you're unclear about something
There are only about 800,000 words in the English language, so an efficient solution would be to store those 800,000 words as 800,000 arrays of 26 1-byte integers that count how many times each letter is used in the word, and then for an input 9 characters you convert to similar 26 integer count format for the query, and then a word can be formed from the query letters if the query vector is greater than or equal to the word-vector component-wise. You could easily process on the order of 100 queries per second this way.
I would write a program that starts with all the two-letter words, then does the three-letter words, the four-letter words and so on.
When you do the two-letter words, you'll want some way of picking the first letter, then picking the second letter from what remains. You'll probably want to use recursion for this part. Lastly, you'll check it against the dictionary. Try to write it in a way that means you can re-use the same code for the three-letter words.
I believe, the power of Regular Expressions would come in handy in your case:
1) Create a regular expression string with a symbol class like: /^[abcdefghi]*$/ with your letters inside instead of "abcdefghi".
2) Use that regular expression as a filter to get a strings array from your text file.
3) Sort it by length. The longest word is what you need!
Check the Regular Expressions Reference for more information.
UPD: Here is a good Java Regex Tutorial.
A first approach could be using a tree with all the letters present in the wordlist.
If one node is the end of a word, then is marked as an end-of-word node.
In the picture above, the longest word is banana. But there are other words, like ball, ban, or banal.
So, a node must have:
A character
If it is the end of a word
A list of children. (max 26)
The insertion algorithm is very simple: In each step we "cut" the first character of the word until the word has no more characters.
public class TreeNode {
public char c;
private boolean isEndOfWord = false;
private TreeNode[] children = new TreeNode[26];
public TreeNode(char c) {
this.c = c;
}
public void put(String s) {
if (s.isEmpty())
{
this.isEndOfWord = true;
return;
}
char first = s.charAt(0);
int pos = position(first);
if (this.children[pos] == null)
this.children[pos] = new TreeNode(first);
this.children[pos].put(s.substring(1));
}
public String search(char[] letters) {
String word = "";
String w = "";
for (int i = 0; i < letters.length; i++)
{
TreeNode child = children[position(letters[i])];
if (child != null)
w = child.search(letters);
//this is not efficient. It should be optimized.
if (w.contains("%")
&& w.substring(0, w.lastIndexOf("%")).length() > word
.length())
word = w;
}
// if a node its end-of-word we add the special char '%'
return c + (this.isEndOfWord ? "%" : "") + word;
}
//if 'a' returns 0, if 'b' returns 1...etc
public static int position(char c) {
return ((byte) c) - 97;
}
}
Example:
public static void main(String[] args) {
//root
TreeNode t = new TreeNode('R');
//for skipping words with "'" in the wordlist
Pattern p = Pattern.compile(".*\\W+.*");
int nw = 0;
try (BufferedReader br = new BufferedReader(new FileReader(
"files/wordsEn.txt")))
{
for (String line; (line = br.readLine()) != null;)
{
if (p.matcher(line).find())
continue;
t.put(line);
nw++;
}
// line is not visible here.
br.close();
System.out.println("number of words : " + nw);
String res = null;
// substring (1) because of the root
res = t.search("vuetsrcanoli".toCharArray()).substring(1);
System.out.println(res.replace("%", ""));
}
catch (Exception e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Output:
number of words : 109563
counterrevolutionaries
Notes:
The wordlist is taken from here
the reading part is based on another SO question : How to read a large text file line by line using Java?

How do you check to compare a string value to each element in an array?

So I have a String Array (sConsonantArray) and have all of the consonants stored in it.
String[] sConsonantArray = new String[] {"q","w","r","t","p","s","d","f","g","h","j","k","l","z","x","c","v","b","n","m"};
I need to check if the second last value of a word (sWord) equals a value in the array and I don't know how to call each value in the array to compare the letters other than doing sConsonantArray[5] (checking them each one at a time). I am looking for an easier way to call them, thanks for your help. Also, it doesn't appear that the (&&) operator will work, other suggestions would be appreciated.
else if (sWord.substring(sWord.length()-2,sWord.length()-1).equals(sConsonantArray I DONT KNOW WHAT TO PUT HERE)) && (sWord.substring(sWord.length()-1,sWord.length()).equalsIgnoreCase("o"))
{
System.out.println("The plural of " + sWord + " is " + (sWord + "es"));
}
It seems to me that it would be simpler to have the consonants as a string and then use charAt:
private static final String CONSONANTS = "bcdfgh...z";
if (CONSONANTS.indexOf(word.charAt(word.length() - 2)) {
...
}
If you really want to use an array, you could change your array to be in order and then call Arrays.binarySearch. Another alternative would be to create a HashSet<String> of the consonants and use contains on that.
Try something like
else if (Arrays.asList(sConsonantArray).contains(
sWord.substring(sWord.length()-2,sWord.length()-1))
&& (sWord.substring(sWord.length()-1,sWord.length()).equalsIgnoreCase("o"))) {
// do something
}
or Write a small Util method
public static boolean isInConstants(String yourString){
String[] sConsonantArray = new String[] {"q","w...}
for (String item : sConsonantArray) {
if (yourString.equalsIgnoreCase(item)) {
return true;
}
}
return false;
}

Categories

Resources