Find all words in dictionary given a string of words

Find all words in dictionary given a string of words - java

I am attempting to write a program that will find all the words that can be constructed from it using a dictionary which has been loaded into an arrayList from a file. sowpodsList is the dictionary stored as an arrayList. I want to iterate through each word in the dictionary and then compare it to the string. Being that the string is just a random collection of words how do I go about achieving this ?
Input: asdm
Output: a, mad, sad .... (any word which matches in the dictionary.)
for (int i = 0; i < sowpodsList.size(); i++) {
for (int j = 0; j < sowpodsList.get(i).length(); j++) {
if (sowpodsList.get(i).charAt(j) == )
;
}
}

You can search if the count of each character of each word in the dictionary is equal to the input's character count.
ArrayList <String> matches = new ArrayList <String> ();
// for each word in dict
for(String word : sowpodsList) {
// match flag
Boolean nonMatch = true;
// for each character of dict word
for( char chW : word.toCharArray() ) {
String w = Character.toString(chW);
// if the count of chW in word is equal to its count in input,
// then, they are match
if ( word.length() - word.replace(w, "").length() !=
input.length() - input.replace(w, "").length() ) {
nonMatch = false;
break;
}
}
if (nonMatch) {
matches.add( word );
}
}
System.out.println(matches);
Sample output: (dict file I used is here: https://docs.oracle.com/javase/tutorial/collections/interfaces/examples/dictionary.txt)
Input: asdm
Matches: [ad, ads, am, as, dam, dams, ma, mad, mads, mas, sad]

If I were you I'd change the way you store your dictionary.
Given that the string input has random letters in it, what I'd do here is store all words of your dictionary in a SortedMap<String, char[]> (a TreeMap, to be precise) where the keys are the words in your dictionary and the values are characters in this word sorted.
Then I'd sort the characters in the input string as well and go for that (pseudo code, not tested):
public Set<String> getMatchingWords(final String input)
{
final char[] contents = input.toCharArray();
Arrays.sort(contents);
final int inputLength = contents.length;
final Set<String> matchedWords = new HashSet<>();
char[] candidate;
int len;
int matched;
for (final Map.Entry<String, char[]> entry: dictionary.entrySet()) {
candidate = entry.getValue();
// If the first character of the candidate is greater
// than the first character of the contents, no need
// to continue (recall: the dictionary is sorted)
if (candidate[0] > contents[0])
break;
// If the word has a greater length than the input,
// go for the next word
len = candidate.length;
if (len > inputLength)
continue;
// Compare character by character
for (matched = 0; matched < len; matched++)
if (candidate[matched] != contents[matched])
break;
// We only add a match if the number of matched characters
// is exactly that of the candidate
if (matched == len)
matchedWords.add(entry.getKey());
}
return matchedWords;
}
private static int commonChars(final char[] input, final char[] candidate)
{
final int len = Math.min(input.length, candidate.length);
int ret = 0;
for (int i = 0; i < len; i++) {
if (input[i] != candidate[i])
break;
ret++;
}
return ret;
}
With a trie: that would also be possible; whether it is practical or not however is another question, it depends on the size of the dictionary.
But the basic principle would be the same: you'd need a sorted character array of words in your dictionary and add to the trie little by little (use a builder).
A trie node would have three elements:
a map where the keys are the set of characters which can be matched next, and the values are the matching trie nodes;
a set of words which can match at that node exactly.
You can base your trie implementation off this one if you want.

Go for TRIE implementation.
TRIE provides the fastest way for searching over an Array of large collection of words.
https://en.wikipedia.org/wiki/Trie
What you need to do is to insert all words into the trie data structure.
Then just need to call search function in Trie to get the boolean match info.

There are two ways to do it. The best way depends on the relative size of the data structures.
If the dictionary is long and the list of letters is short, it may be best to sort the dictionary (if it is not already), then construct all possible words by permuting the letters (removing duplicates). Then do a binary search using string comparison for each combination of letters to see if it is a word in the dictionary. The tricky part is ensuring that duplicate letters are used only when appropriate.
If the list of letters is long and the dictionary is short, another way would be simply to count the number of letters in the input string: two a's, one s, one m, etc. Then for each dictionary word, if the number of each individual letter in the dictionary word does not exceed those in the input string, the word is valid.
Either way, add all words found to the output array.

Related

What is replacement of Sparse Array to be able to add same keys in this case?

I write the application for android. I have several words(~50000) and I have to type any one word which begins from specified letter and remove the word. I store all words in Sparse Array and read words from file in it.
sparseArray = new SparseArray<String>();
String str = "";
char c;
while ((str = stream.readLine()) != null) {
c = str.charAt(0);
sparseArray.put(c, str);
}
where key - first letter in word, value - a word.
When I receive a letter I select any word with same first letter
char receivedLetter;
...
String word = sparseArray.get(receivedLetter);
sparseArray.removeAt(sparseArray.indexOfValue(word));
Log.d("myLogs", "word: " + word);
But Sparse Array stores only 26 elements, because words with the same first letter(same key) are overwrited and remain only one last word. HashMap also don't decide the problem. What should I use to solve this problem?

There are several ways to do this. For example, without need to remove elements, you can use a sorted navigable collection such as a TreeSet.
TreeSet<String> words = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
words.add("hello");
words.add("beta");
words.add("beat");
words.add("couch");
words.add("alpha");
words.add("Bad");
Now you can do
NavigableSet<String> bWords = words.subSet("b", true, "c", false);
System.out.println(bWords); // prints [Bad, beat, beta]
And you're given the subset of words that are >= b && < c. You can then do
String removedWord = bWords.pollFirst(); // Bad
System.out.println(bWords); // prints [beat, beta]
// sub-sets affect their origin, they are "views on the original collection"
System.out.println(words); // prints [alpha, beat, beta, couch, hello]
And you've effectively removed a word with "b". A TreeSet has the advantage that you can navigate and search your data in many ways.
Based on a char the magic line of code to remove an element is
String removed = words.subSet(Character.toString(receivedLetter), true,
Character.toString((char) (receivedLetter + 1)), false)
.pollFirst();
The other alternative is a collection of collections. Like a SparseArray<List<String>>() for example
SparseArray<List<String>> sparseArray = new SparseArray<List<String>>();
String str;
while ((str = stream.readLine()) != null) {
char c = str.charAt(0);
// get or create list stored at letter c
List<String> list = sparseArray.get(c);
if (list == null) {
list = new ArrayList<String>();
sparseArray.put(c, list);
}
// add word to list
list.add(str);
}
To remove, you get the list, if it's not null remove an element from it.
char receivedLetter;
List<String> words = sparseArray.get(receivedLetter);
if (words != null && !words.isEmpty())
words.remove(words.size() - 1);

Converting String to char?

I need to take a String and then work out how many times each letter appears in the string. I thought of maybe converting each letter of the string into an individual char. Is there any way to do this?

you can use this
myString.toCharArray();
to get a char[] out of it. Or just use this :
myString.charAt(i);

Checkout
String#toCharArray() method in String API. This method returns a char[] array with all characters in the String. You can get each character from the array using the index. say, char[i] to return the character of the string in index 'i'. Please note that Array index starts from '0'

Use yourStringVariable.toCharArray()
char[] array = yourStringVariable.toCharArray();

Map<Character, Integer> counts = new HashMap<>();
for (int i=0;i<str.length();i++) {
Integer count = counts.get(str.charAt(i));
if (count == null) {
count = 1;
} else {
count = count+1;
}
counts.put(str.charAt(i), count);
}
The end result is a map containing all of the characters within the string and a count of how many of each one was found.

How to find greatest frequency of substring in string with no spaces

lets suppose string is like "aababcabcdabcdeabcdefabcdefg". so how to find the frequency of largest (all possible) substring.
ps: there is no spaces in between the string.

String stArray[] = s.split("\\s+");
int large =0;
String largeSt="";
for(int i=0;i<stArray.length;i++)
{
if(stArray[i].length() >large)
{
largeSt=stArray[i];
large = stArray[i].length();
}
}
system.out.println("very large :"+largeSt);
if you want to find frequency just count from array by matching

Inserting a char (a to z) in between each character in a word

I am working on a spell checker. I have implemented the hash table which takes in the word list, but now I have to write five techniques that are used to generate possible word suggestions. One of them is
Swapping adjacent characters
Insert a character in between each character
For example:
I have the word "bob"... I wanna be able to insert a char in between (a-z)b (a-z)o (a-z)b(a-z) to see if I can get a new word that could be a possible suggestion for the miss spelled word
This is what I did so far...but doesn't work
public static void main(String[] args) {
String word = "evelina";
char[] wordCharArr = word.toCharArray();
for(int i=0; i < wordCharArr.length ; i++) {
//char temp1 = wordCharArr[i];
for(char j = 'A'; j <= 'Z' ; j++) {
word.substring(j);
}
}
}

What I did was add some code which makes an ArrayList (essentially an unlimited array), and then fills it with all of the possibilities found from switching around one letter. It also prints out each one, but you can remove that.
The only changes I made were:
1.Adding the ArrayList
2.Fixing your outer loop; It is one character short (Bob requires 4 iterations, not 3)
3.Adding in additional sub-string segments to account for the rest of the word.
Elements can be retrieved by words.get(a); where 'a' is a int within the bounds of the Array-list. Don't forget the import statement, import java.util.*;
This would also be more efficient then Jeff's solution, because instead of having to check the entire dictionary, and then remove every single element from the dictionary like suggested below, it would simply have to check the dictionary with all of the ~100 possibilities. Because a dictionary is in alphabetical order, it can be searched very quickly, but removing each entry (nearly 100000 words) would be less efficient.
import java.util.*;
public class spellcheck {
public static void main(String[] args) {
String word = "evelina";
char[] wordCharArr = word.toCharArray();
ArrayList<String> words = new ArrayList<String>();
for(int i=0; i <= wordCharArr.length ; i++) {
for(char j = 'A'; j <= 'Z' ; j++) {
words.add(word.substring(0,i) + j + word.substring(i,wordCharArr.length));
System.out.println(word.substring(0,i) + j + word.substring(i,wordCharArr.length));
}
}
}
}

Your problem is in your for loop. Instead of only looping through every other letter of the word you're checking. You're looping through EVERY letter including the original. The outer for loop should be slightly different.

There is another option, but i'm not sure if it's easier or harder to implement (it sounds easier to me at least). Rather than looping over every character and inserting it into the word, you could build a simple query mechanism like so:
Input: b?ob
So your algorithm would be something like:
1) Start with your entire word list
2) Remove all words that don't start with b
3) You can "ignore" the ?
4) Remove all words that don't have a 'o' in the 3rd position
5) Remove all words that don't have a 'b' in the 4th position
6) Return the results
Then you go through each of the options
Input 2: bo?b
Input 3: bob?

Count no. of words using Regular expressions in java

How to count the number of times each word appear in a String in Java using Regular Expression?

I don't think a regex can solve your problem completely.
You want to
split a string into words, a regular expression can do this for a very simple definition of word, "parts of a string seperated by whitespace or punctuation", which is not a very good definition even if you just stick to English text
Count the number of occurances of each word derived from step 1. To do that you must store some kind of Mapping, and regexes neither store nor count.
A workable approach could be to
split the inputstring (by either regex or other means) into an array of word-strings
iterate over the array, and building a Map to keep count of each word
iterate over the map to output a list of words and the number of occurances.
If your input is limited to English you still have to consider how you want your algorithm to behave in case of things like they're <->they are etc and compound words. Add other languages to the mix for additional kinds of headaches (different ways of writing the same word, words split into parts, difference in writing depending on where in a sentence the word occurs, etc)

I would split your task into a) identify words and b) count number of each unique word in text.
a) could be solved with splitting the text with a regex.
b) could be solved by building a map with the result from a).
String text = "I like good mules. Mules are good :)";
String[] words = text.split("([\\W\\s]+)");
Map<String, Integer> counts = new HashMap<String, Integer>();
for (String word: words) {
if (counts.containsKey(word)) {
counts.put(word, counts.get(word) + 1);
} else {
counts.put(word, 1);
}
}
result: {Mules=1, are=1, good=2, mules=1, like=1, I=1}

Pattern p = Pattern.compile("\\babba\\b");
Matcher m = p.matcher("abba is abba with abbabba and abba doing abba");
int count = 0;
while(m.find()){
count++;
}
System.out.println(count); //4

Using Guava, this is a one-liner:
Multiset<String> countOfEachWord =
HashMultiset.create(Splitter.on(" ").omitEmptyStrings().split(myString));
then to get the count of "dog" for example you would say:
countOfEachWord.count("dog")

Must you use a regex? If not this might help:
public static int count(final String string, final String substring)
{
int count = 0;
int idx = 0;
while ((idx = string.indexOf(substring, idx)) != -1)
{
idx++;
count++;
}
return count;
}

int CountWords(String t){
return t.split("([[a-z][A-Z][0-9][\\Q-\\E]]+)",-1).length+(t.replaceAll("([[a-z][A-Z][0-9][\\W]]*)", "")).length()-1;
}
English Words(chemical names)+Chinese words

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Find all words in dictionary given a string of words - java

Related

What is replacement of Sparse Array to be able to add same keys in this case?

Converting String to char?

How to find greatest frequency of substring in string with no spaces

Inserting a char (a to z) in between each character in a word

Count no. of words using Regular expressions in java

Categories

Resources