how to build a DAWG graphique in Java - java

I need to create a DAWG graphic for my scrabble IA.
After multi search I found two or three sites that explain how to create a DAWG:
https://progaide.com/question/12331755-algorithme-de-cr-ation-de-dawg-facile
How to create a DAWG?
https://codes-sources.commentcamarche.net/faq/10903-compression-d-un-dictionnaire-sous-forme-de-dawg
However, I don't understand it really well.
I need to create this graph with a dictionary in .txt (approximately 400.000 word like a french dictionary) to optimize a search of the different words that existe in the French Language.
Now I have a simple search in my .txt but it is really slow and i think that the real good method to implement this, is a DAWG.
My IA can put the first word with the letter of player1 but for the other turn of play i need to analyze a word with 8 letter and not 7, and i think the best option before advanced is optimize my research. For me, the best solution is a DAWG but if you have any other solution, I am open minded.
Thanks for reading me, I hope my english is understandable.
PS: if one need the .txt, I can give it to you. No problem (is a real full french dictionary for scrabble)

Related

Speech recognition / how to create TTS?

I have an idea to build program than can interact with the user voice in Arabic language, since one year ,I started with sphinx-4 but I need to make arabic acoustic model , grammar , dictionary. .
but I can't find the rood I want you to tell me in detailed description how to create those things?
the needed iIDE or program
please help me....
Ok, let me start at the very beginning, because I think you are not aware of the dimensions of your project, and you are mixing up things (ASR and TTS). First, I would like to explain what the different things are that you were talking about:
Acoustic Model: Every speech recognition system requires an acoustic model. Language, in particular words, are made up of phonemes. Phonemes describe how something sounds. To give you an example, the letter a is not always pronounced the same way, as you can see from the two words below:
to bark <=> to take
Now your ASR system needs to detect these phonemes. To do this, it performs a spectral analysis of many short frames of the audio signal and computes features, like MFCCs. What to do with these features? It puts them into a classifier (I could write a new chapter about the classifier here, but this will be too much information). Your classifier has to learn how to actually perform classification. What it does in simple words is it maps a set of features to a phoneme.
Dictionary: In your dictionary, you define every word that can be recognized by your ASR system. It tells the ASR the phoneme composition of a word. A short example for this is:
hello H EH L OW
world W ER L D
With this small dictionary, your system would be able to recognize the words hello and world.
Language Model (or Grammar): The language model holds information about the assembly of words for a given language. What does this mean? Think of the virtual keyboard of your smartphone. When you type in the words 'Will you marry', your keyboard might guess the next word to be 'me'. That is no magic. The model was learned from huge amounts of text files. Your LM does the same. It adds the knowledge about meaningful word compositions (what everybody calls a sentence) into the ASR system to further improve detection.
Now back to your problem: You need transcribed audio data for the following reasons:
You want to train your acoustic model if you have none.
You want to create a large enough dictionary.
You want to generate a language model from the text.
Long story short: You are wrong if you think you could solve all these tasks on your own. Only a reliable transcription is already a large amount of work. You should clearly overthink your idea.

converting audio file into text file using java

i am developing a desktop application using java. this application is for school kid to teach English, where user can upload some English audio can be in any format which need to be converted into text file. where they can read the text.
I've found some api but i am not sure about them.
http://cmusphinx.sourceforge.net/wiki/
I've seen many question on stackoverflow regarding this but none was helpful. if someone can help on this will be very greatful
thank you
There are many technologies and services available to perform speech recognition. For an intro to some of the choices see https://stackoverflow.com/a/6351055/90236.
I'm not sure that the results will be acceptable for teaching children English as a second language, but it is worth trying.
What you seek is currently breaking edge technology. Tools like cmusphinx can detect words from a dedicated, limited dictionary (so you can teach it to understand, say, 15 words and that's it - you can't teach it to understand English).
Basically, those tools try to find patterns in the sound waves that you feed them. They don't understand anything, they just use the same algorithm on anything and then try to find the closest match. This works well for small sets of words but as the number of words increases, the difference between then shrinks and the jobs gets ever harder (without even starting with words like whether and weather or C and see).
What you might consider is "repeat after me" software. Here, you need to record all words for the test as templates. Then you can record the words from the pupils and then compute the difference. If the difference is not too large, the word is correct. But again: This is simple repetition to improve pronunciation - not English.
There is desktop software which can understand a lot of English (for example the products from Nuance, Dragon Naturally Speaking being one of the most prominent). They do offer server solutions but that software isn't free or cheap if you're on a tight budget.

Optimize boggle algorithm

I'm implementing the following boggle algorithm:
I want to optimize it because it takes about 2 minutes and a half to find all words. Do you have any ideas on optimization techniques?
I've written some Boggle-solving algorithms by creating letter trees which can be traversed to assemble and verify words. You save loads of space by using a tree-based structure in which words share similar letters, meaning you won't have to keep individual copies of each word.
If you didn't write the program on the website you provided, keep in mind that we won't do your work for you. You have to show us that you've spent considerable time on the problem instead of giving us a program and asking us to optimize it for you. A good first step would be to study the algorithm on the page and fully understand how it works. Or even better, try writing your own Boggle program from scratch to learn which techniques work best for you.
You really want to have a look at this question (and answers) here: How to find list of possible words from a letter matrix [Boggle Solver]
There's solutions in Python, Perl, VB.NET and PHP. Most use Tries and optionally prefilter the dictionary using regexes.

Java scanner class

I am trying to create a small search engine that uses the Java Scanner class to read a file, matching user queries to the queried keyword in the file.
However i have a problem, i need to rank these keywords, if i search for "computer" and the file being searched contains 4 instances of "computer they will all be displayed within one line, because they are the same.
However if it returns " the computer shop" then this should be ranked lower, than just "computer because i did not search for "the computer shop"
I hope you understand, ho can i do this?
Thanks
As far as I understand you your problem is in search engine logic. In this case class Scanner is irrelevant. It is just a convenient utility to read data from stream.
Concerning to search engine please define better your input and required output. Generally you should search for better matching of your query with target text. What does it mean? It is very complicated. Probably longer char sequence, probably more matching words etc. People wrote hundreds of PhDs about this and created thousands of companies (did you hear about Google? :)).
So, unless it is homework try to use tools like Solr or Lucine. Otherwise think about strategies I mentioned above.
Good luck.
A better approach might be to create an inverted index. Instead of going from a file to the words in the file, you do the opposite.
A simple implementation in Java might just be done using Map<String,List<File>>, where the string is the word and the list of files represents the files that contain that.

When programming for Android, how can I get a list of almost every English word?

I'm programming something for the Android phone, and I need to have a list of as many English words as possible in the code. I question the practicality of going through a dictiionary in real life and punching in the words manually, but when I try to Google for another solution, I really don't see much coming up. How might I be able to get a dictionary in my code (just the words, not definitions)? Thanks!
A quick search led me to this. Download the XML-files, parse and add each word to a SQLite database and you're good to go.
Here is an American wordlist that you can parse into Android
ftp://ftp.ox.ac.uk/pub/wordlists/american/
Here is another nice resource:
http://wordlist.sourceforge.net/
It seems like you are looking for an extract of a text corpus, such as this: Word frequency lists and dictionary from the Corpus of Contemporary American English

Categories

Resources