Number to words Mapping , Awe inspiring memory whim?

Number to words Mapping , Awe inspiring memory whim? - java

I need help for completing this little project
Program will take a phone number as an input and convert it into a proper English word.
Explaination:
There is some letters related to digits from 0-9 saved in a text file in first ten lines, something like
1 akl
2 dgh
3 qnm
4 rtu
5 zx
6 cvf
7 eip
8 wjs
9 yb
0 o
On line# 11 total number of words is present i-e 50000
after that, from line number 12 all 50000 words are present; one word per line.
Now program will take number(s) as an input form user until user enters -1
and then generate a proper English matching word from this text file.Each letter represents a digit from the list.
for example user enters
6182703
output will be :
Fashion
for more than 1 matching words , system will list all the words hyphen '-' seperated.
How should I start this, what approach should I use ?
If someone gives Pseudo code or hints .. It would be really great.

I would take a dictionary of words and sort it in a file by your needs.
e.g:
apple = 17717
cherry = 627449
Then go through the file with a search algorithm.
EDIT: or you could store the data in a Relational DB (http://hsqldb.org/ is simple) to avoid a bigger memory footprint. If you like the solution you also could investigate some key/value stores etc.

A lot of the detail in your question relates to the input spec, which is all pretty trivial.
After parsing your input, you're going to have a list of "candidate" words (all the words), and a mapping of digits to the set of characters it can be represented with.
List<String> words;
Map<Character, Set<Character>> digitMapping;
The simplest way of generating the word for a number is probably this: sequentially filter the list of candidates, testing if they match the input digits, and removing them otherwise. Something like this might do the trick (consider this pseudocode - I haven't tried compiling it):
List<String> getMatches(String inputDigits) {
// Take a copy of the word list. You don't want to ruin the list for the next caller
List<String> candidates = new ArrayList<String>(words);
for (Iterator<String> it = candidates.iterator(); it.hasNext() && !candidats.isEmpty(); ) {
String candidate = it.getNext();
for (int i = 0; i < inputDigits.length; ++i) {
Character c = new Character(candidate.charAt(i));
Character d = new Character(inputDigits.charAt(i));
if (!digitMapping.get(d).contains(c)) {
it.remove();
}
}
}
return candidates;
}
It will return all the words that match, so in your example, "555" will likely return an empty list. "6182703" might only return a single word, "fashion", while "202" might return several words in a list ("dog", "hog", "god"). You'll need to decide how you want to handle the zero and multiple cases.
Edit: Details on populating digitMapping:
The digitMapping will be something like:
Map<Character, Set<Character>> digitMapping = new HashMap<Character, Set<Character>>();
Then you'll need to grab a char and a String from the input. For the input line "1 akl", your char will be '1', while your String will be "akl". You're mapping from the character to the set of characters in the string, so will need to construct an empty set, put it into the map, then populate the set. Something like (again, I haven't even tried compiling this, so take it with a grain of salt):
private void addDigitToMap(char digit, String chars) {
Set<Character> set = new HashSet<Character>();
digitMapping.put(set);
for (char c : chars.toCharArray()) {
set.add(new Character(c));
}
}
So now the map will have an entry that points to a set of the characters it can be represented by.

Related

Count the occurence of word in a list containing sentences

I am having some problem with Java programming which includes List. Basically, what I am trying to count the occurences of each word in a sentence from a list containing several sentences.
The code for the list containing sentences is as below:
List<List<String>> sort = new ArrayList<>();
for (String sentence : complete.split("[.?!]\\s*"))
{
sort.add(Arrays.asList(sentence.split("[ ,;:]+"))); //put each sentences in list
}
The output from the list is as follows:
[hurricane, gilbert, head, dominican, coast]
[hurricane, gilbert, sweep, dominican, republic, sunday, civil, defense, alert, heavily, populate, south, coast, prepare, high, wind]
[storm, approach, southeast, sustain, wind, mph, mph]
[there, alarm, civil, defense, director, a, television, alert, shortly]
The output desired should be as follows (only an example). It will output all the unique word in the list and calculate the occurences by sentences.
Word: hurricane
Sentence 1: 1 times
Sentence 2: 1 times
Sentence 3: 0 times
Sentence 4: 0 times
Word: gilbert
Sentence 1: 0 times
Sentence 2: 2 times
Sentence 3: 1 times
Sentence 4: 0 times
Word: head
Sentence 1: 3 times
Sentence 2: 2 times
Sentence 3: 0 times
Sentence 4: 0 times
and goes on....
With the example above, the word 'hurricane' occur 1 time in the first sentence, 1 time in second sentence, none in third sentence and none in forth sentence.
How do I achieve the output? I was thinking of a 2D matrices for building them. Any help will be appreciated. Thanks!

This is a working solution. I did not take care of the printing. The result is a Map -> Word, Array. Where Array contains the count of Word in each sentence indexed from 0. Runs in O(N) time. Play here: https://repl.it/Bg6D
List<List<String>> sort = new ArrayList<>();
Map<String, ArrayList<Integer>> res = new HashMap<>();
// split by sentence
for (String sentence : someText.split("[.?!]\\s*")) {
sort.add(Arrays.asList(sentence.split("[ ,;:]+"))); //put each sentences in list
}
// put all word in a hashmap with 0 count initialized
final int sentenceCount = sort.size();
sort.stream().forEach(sentence -> sentence.stream().forEach(s -> res.put(s, new ArrayList<Integer>(Collections.nCopies(sentenceCount, 0)))));
int index = 0;
// count the occurrences of each word for each sentence.
for (List<String> sentence: sort) {
for (String s : sentence) {
res.get(s).set(index, res.get(s).get(index) + 1);
}
index++;
}
EDIT:
In answer to your comment.
List<Integer> getSentence(int sentence, Map<String, ArrayList<Integer>> map) {
return map.entrySet().stream().map(e -> e.getValue().get(sentence)).collect(Collectors.toList());
}
Then you can call
List<Integer> sentence0List = getSentence(0, res);
However be aware that this approach is not optimal since it runs in O(K) time with K being the number of sentences. For small K it is totally fine but it does not scale. You have to clarify yourself what will you do with the result. If you need to call getSentence many times, this is not the correct approach. In that case you will need the data structured differently. Something like
Sentences = [
{'word1': N, 'word2': N},... // sentence 1
{'word1': N, 'word2': N},... // sentence 2
]
So you are able to easily access the word count per each sentence.
EDIT 2:
Call this method:
Map<String, Float> getFrequency(Map<String, ArrayList<Integer>> stringMap) {
Map<String, Float> res = new HashMap<>();
stringMap.entrySet().stream().forEach(e -> res.put(e.getKey()
, e.getValue().stream().mapToInt(Integer::intValue).sum() / (float)e.getValue().size()));
return res;
}
Will return something like this:
{standard=0.25, but=0.25, industry's=0.25, been=0.25, 1500s=0.25, software=0.25, release=0.25, type=0.5, when=0.25, dummy=0.5, Aldus=0.25, only=0.25, passages=0.25, text=0.5, has=0.5, 1960s=0.25, Ipsum=1.0, five=0.25, publishing=0.25, took=0.25, centuries=0.25, including=0.25, in=0.25, like=0.25, containing=0.25, printer=0.25, is=0.25, t

You could solve your problem by first creating an index for each word. You could use a Hashmap and put just put all the single words on it, which you find in your text (so you would have no need for checking double occurrences).
Then you can iterate the HashMap and check for every Word in every sentence. You can count occurrences by using the indexOf method of your list. As long as it returns a value greater than -1 you can count up the occurrence in the sentence. This method does only return the first occurrence so you
Some Pseudocode would be like:
Array sentences = text.split(sentence delimiter)
for each word in text
put word on hashmap
for each entry in hashmap
for each sentence
int count = 0
while subList(count, sentence.length) indexOf(entry) > -1
count for entry ++
Note that this is very greedy and not performance oriented at all. Oh yea, and also note, that there are some java nlp libraries out there which may have already solved your problem in a performance oriented and reusable way.

First you can segment your sentences and then tokenize them using a text segmentor such as NLTK or Stanford tokenizer. Splitting the string (containing sentences) around "[.?!]" is not a good idea. What happens to an "etc." or "e.g." that occurs in the middle of the sentence? Splitting a sentence around "[ ,;:]" is also not a good idea. You can have plenty of other symbols in a sentence such as quotation marks, dash and so on.
After segmentation and tokenization you can split your sentences around space and store them in a List<List<String>>:
List<List<String>> sentenceList = new ArraList();
Then for your index you can create a HashMap<String,List<Integer>>:
HashMap<String,List<Integer>> words = new HashMap();
Keys are all words in all sentences. Values you can update as follows:
for(int i = 0 ; i < sentenceList.size() ; i++){
for(String w : words){
if(sentence.contains(w)){
List tmp = words.get(w);
tmp.get(i)++;
words.put(w, tmp);
}
}
}
This solution has the time complexity of O(number_of_sentences*number_of_words) which is equivalent to O(n^2). An optimized solution is:
for(int i = 0 ; i < sentenceList.size() ; i++){
for(String w : sentenceList.get(i)){
List tmp = words.get(w);
tmp.get(i)++;
words.put(w, tmp);
}
}
This has the time complexity of O(number_of_sentences*average_length_of_sentences). Since average_length_of_sentences is usually small this is equivalent to O(n).

Comparing parts of Arrays against each other?

I'm really really really not sure what is the best way to approach this. I've gotten as far as I can, but I basically want to scan a user response with an array of words and search for matches so that my AI can tell what mood someone is in based off the words they used. However, I've yet to find a clear or helpful answer. My code is pretty cluttered too because of how many different methods I've tried to use. I either need a way to compare sections of arrays to each other or portions of strings. I've found things for finding a part of an array. Like finding eggs in green eggs and ham, but I've found nothing that finds a section of an array in a section of another array.
public class MoodCompare extends Mood1 {
public static void MoodCompare(String inputMood){
int inputMoodLength = inputMood.length();
int HappyLength = Arrays.toString(Happy).length();
boolean itWorks = false;
String[] inputMoodArray = inputMood.split(" ");
if(Arrays.toString(Happy).contains(Arrays.toString(inputMoodArray)) == true)
System.out.println("Success!");
InputMood is the data the user has input that should have keywords lurking in them to their mood. Happy is an array of the class Mood1 that is being extended. This is only a small piece of the class, much less the program, but it should be all I need to make a valid comparison to complete the class.
If anyone can help me with this, you will save me hours of work. So THANK YOU!!!

Manipulating strings will be nicer when you do not use the relative primitive arrays, where you have to walk through yourself etcetera. A Dutch proverb says: not seeing the wood through the trees.
In this case it seems you check words of the input against a set of words for some mood.
Lets use java collections:
Turning an input string into a list of words:
String input = "...";
List<String> sentence = Arrays.asList(input.split("\\W+"));
sentence.remove("");
\\W+ is a sequence of one or more non-word characters. Mind "word" mean A-Za-z0-9_.
Now a mood would be a set of unique words:
Set<String> moodWords = new HashSet<>();
Collections.addAll(moodWords, "happy", "wow", "hurray", "great");
Evaluation could be:
int matches = 0;
for (String word : sentence) {
if (moodWords.contains(word)) {
++matches;
}
}
int percent = sentence.isEmpty() ? 0 : matches * 100 / sentence.size();
System.out.printf("Happiness: %d %%%n", percent);
In java 8 even compacter.
int matches = sentence.stream().filter(moodWords::contains).count();
Explanation:
The foreach-word-in-sentence takes every word. For every word it checks whether it is contained in moodWords, the set of all mood words.
The percentage is taken over the number of words in the sentence being moody. The boundary condition of an empty sentence is handled by the if-then-else expression ... ? ... : ... - an empty sentence given the arbitrary percentage 0%.
The printf format used %d for the integer, %% for the percent sign % (self-escaped) and %n for the line break character(s).

If I'm understanding your question correctly, you mean something like this?
String words[] = {"green", "eggs", "and", "ham"};
String response = "eggs or ham";
Mood mood = new Mood();
for(String foo : words)
{
if(response.contains(foo))
{
//Check if happy etc...
if(response.equals("green")
mood.sad++;
...
}
}
System.out.println("Success");
...
//CheckMood() etc... other methods.

Try to use tokens.
Every time that the program needs to compare the contents of a row from one array to the other array, just tokenize the contents in parallel and compare them.
Visit the following Java Doc page for farther reference: http://docs.oracle.com/javase/7/docs/api/java/util/StringTokenizer.html
or even view the following web pages:
http://introcs.cs.princeton.edu/java/72regular/Tokenizer.java.html

string manipulation : insert words at certain indexes in string, simultaneously

I have searched a lot but couldn't find anything that will allow me to insert words at certain indexes simultaneously. For example :
I have a string :
rock climbing is fun, I love rock climbing.
I have a hashmap for certain words which indicate their index in the string :
e.g. :
rock -> 0,29
climbing -> 5,34
fun -> 17
Now my question is :
I want to put [start] tag at the start of all these words and [end] tag at the end of them, in the string. I can't do this one by one since in that case once I insert [start] at index 0, then all the other indexes will be modified and I'll have to recalculate them.
Is there a way in which I can insert all of the tags at once or something? Can somebody suggest some other solution to this problem?
I can't use regular expressions(replaceall method), since sometimes I'll have a sentence like :
rocks are hard.
and hashmap will be :
rock -> 0
I am looking for faster solutions here.
edit :
for the sentence :
rocks are hard but frocks are beautiful.
rocks -> 0
Here I don't want to replace frocks with the tags.

This is not exactly doing what you want, but consider this as an alternative solution what tries to achieve your goal via different means.
I will suppose two different things, firstly, suppose there exists a List<String> words, which contains the words you want to replace.
Then code will be:
public String insertTags(final String input) {
for (String word : words) {
input.replace(word, "[start]" + word + "[end]");
}
return input;
}
Second case, closer to your example but not using the indices, suppose there exists a Map<String, List<Integer>>, which contains the words and the indices to replaces them at in a list representation.
Then code would be:
public String insertTags(final String input) {
for (Map.Entry<String, List<Integer>> entry : words.entrySet()) {
String word = entry.getKey();
input.replace(word, "[start]" + word + "[end]");
}
return input;
}
The latter is definately more complex and does not even use the indices, so preferably you should rewrite it to the former.
Hope this helps you without having to worry about the indices.

Create String[] containing only certain characters

I am trying to create a String[] which contains only words that comprise of certain characters. For example I have a dictionary containing a number of words like so:
arm
army
art
as
at
attack
attempt
attention
attraction
authority
automatic
awake
baby
back
bad
bag
balance
I want to narrow the list down so that it only contains words with the characters a, b and g. Therefore the list should only contain the word 'bag' in this example.
Currently I am trying to do this using regexes but having never used them before I can't seem to get it to work.
Here is my code:
public class LetterJugglingMain {
public static void main(String[] args) {
String dictFile = "/Users/simonrhillary/Desktop/Dictionary(3).txt";
fileReader fr = new fileReader();
fr.openFile(dictFile);
String[] dictionary = fr.fileToArray();
String regx = "able";
String[] newDict = createListOfValidWords(dictionary, regx);
printArray(newDict);
}
public static String[] createListOfValidWords(String[] d, String regex){
List<String> narrowed = new ArrayList<String>();
for(int i = 0; i<d.length; i++){
if(d[i].matches(regex)){
narrowed.add(d[i]);
System.out.println("added " + d[i]);
}
}
String[] narrowArray = narrowed.toArray(new String[0]);
return narrowArray;
}
however the array returned is always empty unless the String regex is the exact word! Any ideas? I can post more code if needed...I think I must be trying to initialise the regex wrong.
The narrowed down list must contain ONLY the characters from the regex.

Frankly, I'm not an expert in regexes, but I don't think it's the best tool to do what you want. I would use a method like the following:
public boolean containsAll(String s, Set<Character> chars) {
Set<Character> copy = new HashSet<Character>();
for (int i = 0; i < s.length() && copy.size() < chars.size(); i++) {
char c = s.charAt(i);
if (chars.contains(c)) {
copy.add(c);
}
}
return copy.size() == chars.size();
}

The regex able will match only the string "able". However, if you want a regular expression to match either character of a, b, l or e, the regex you're looking for is [able] (in brackets). If you want words containing several such characters, add a + for repeating the pattern: [able]+.

The OP wants words that contain every character. Not just one of them.
And other characters are not a problem.
If this is the case, I think the simiplest way would be to loop through the entire string, character by character, and check to see if it contains all of the characters you want. Keep flags to check and see if every character has been found.
If this isn't the case.... :
Try using the regex:
^[able]+$
Here's what it does:
^ matches the beginning of the string and $ matches the end of the string. This makes sure that you're not getting a partial match.
[able] matches the characters you want the string to consist of, in this case a, b, l, and e. + Makes sure that there are 1 or more of these characters in the string.
Note: This regex will match a string that contains these 4 letters. For example, it will match:
able, albe, aeble, aaaabbblllleeee
and will not match
qable, treatable, and abled.

A sample regex that filters out words that contains at least one occurrence of all characters in a set. This will match any English word (case-insensitive) that contains at least one occurrence of all the characters a, b, g:
(?i)(?=.*a)(?=.*b)(?=.*g)[a-z]+
Example of strings that match would be bag, baggy, grab.
Example of strings that don't match would be big, argument, nothing.
The (?i) means turns on case-insensitive flag.
You need to append as many (?=.*<character>) as the number of characters in the set, for each of the characters.
I assume a word only contains English alphabet, so I specify [a-z]. Specify more if you need space, hyphen, etc.
I assume matches(String regex) method in String class, so I omitted the ^ and $.
The performance may be bad, since in the worst case (the characters are found at the end of the words), I think that the regex engine may go through the string for around n times where n is the number of characters in the set. It may not be an actual concern at all, since the words are very short, but if it turns out that this is a bottleneck, you may consider doing simple looping.

Select words with at least two different letters

I am using this code
Matcher m2 = Pattern.compile("\\b[ABE]+\\b").matcher(key);
to only get keys from a HashMap that contain the letters A, B or E
I am not though interested in words such as AAAAAA or EEEEE I need words with at least two different letters (in the best case, three).
Is there a way to modify the regex ? Can anyone offer insight on this?

Replace everything except your letters, make a Set of the result, test the Set for size.
public static void main (String args[])
{
String alphabet = "ABC";
String totest = "BBA";
if (args.length == 2)
{
alphabet = args[0];
totest = args[1];
}
String cleared = totest.replaceAll ("[^" + alphabet + "]", "");
char[] ca = cleared.toCharArray ();
Set <Character> unique = new HashSet <Character> ();
for (char c: ca)
unique.add (c);
System.out.println ("Result: " + (unique.size () > 1));
}
Example implementation

You could use a more complicated regex to do it e.g.
(.*A.*[BE].*|.*[BE].*A.*)|(.*B.*[AE].*|.*[AE].*B.*)|(.*E.*[BA].*|.*[BA].*E.*)
But it's probably going to be more easy to understand to do some kind of replacement, for instance make a loop that replaces one letter at a time with '', and check the size of the new string each time - if it changes the size of the string twice, then you've got two of your desired characters. EDIT: actually, if you know the set of desired characters at runtime before you do the check, NullUserException had it right in his comment - indexOf or contains will be more efficient and probably more readable than this.
Note that if your set of desired characters is unknown at compile time (or at least pre-string-checking at runtime), the second option is preferable - if you're looking for any characters, just replace all occurrences of the first character in a while(str.length > 0) loop - the number of times it goes through the loop is the number of different characters you've got.

Mark explicitly the repetition of desired letters,
It would look like this :
\b[ABE]{1,3}\b
It matches AAE, EEE, AEE but not AAAA, AAEE

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Number to words Mapping , Awe inspiring memory whim? - java

Related

Count the occurence of word in a list containing sentences

Comparing parts of Arrays against each other?

string manipulation : insert words at certain indexes in string, simultaneously

Create String[] containing only certain characters

Select words with at least two different letters

Categories

Resources