I'm implementing the following boggle algorithm:
I want to optimize it because it takes about 2 minutes and a half to find all words. Do you have any ideas on optimization techniques?
I've written some Boggle-solving algorithms by creating letter trees which can be traversed to assemble and verify words. You save loads of space by using a tree-based structure in which words share similar letters, meaning you won't have to keep individual copies of each word.
If you didn't write the program on the website you provided, keep in mind that we won't do your work for you. You have to show us that you've spent considerable time on the problem instead of giving us a program and asking us to optimize it for you. A good first step would be to study the algorithm on the page and fully understand how it works. Or even better, try writing your own Boggle program from scratch to learn which techniques work best for you.
You really want to have a look at this question (and answers) here: How to find list of possible words from a letter matrix [Boggle Solver]
There's solutions in Python, Perl, VB.NET and PHP. Most use Tries and optionally prefilter the dictionary using regexes.
Related
Let's say you have a dictionary with 5 strings in it, and you also have multiple files. I want to iterate through those files and see how many times the strings in my dictionary appears in them. How can I do this so it is most efficient?
I would like this to scale as well..so more than 5 strings and more than a few documents. I'm pretty open about what language I'm using. Preferably Java or C#, but once again, I can work in another language.
Most efficient is always a trade off between time you want to put into it and the results you want (or need).
One easy approach that is efficient is to use a regular expression. This is probably pretty good with five strings and this will be fairly efficient. If that isn't good enough for you, well... You can certainly find a better approach.
This is a Pattern Matching Problem. The best algorithm to solve this kind of problem is Knuth-Morris-Pratt Algorithm. This is a fomous algorithm therefore you will find its description anywhere, but it found on Introduction to Algorithm book.
I was looking for lightweight library that'd allow me to feed it a bunch of words, and then ask it whether a given word would have any close matches.z
I'm not particularly concerned with the underlying algorithm (I reckon a simple hamming distance algorithm would probably suffice, were I to undertake the task myself).
I'm just in the development of a small language and I found it nifty to make suggestions to the user when an "Undefined class" error is detected (lots of times it's just a misspelled word). I don't want to lose much time on the issue though.
Thanks
Levenshtein distance is a common way of handling it. Just add all the words to a list and then brute-force iterate over it and return the smallest distance. Here's one library with a Levenschtein function: http://commons.apache.org/lang/api-2.4/org/apache/commons/lang/StringUtils.html
If you have a large number of words and you want it to run fast, then you'd have to use ngrams. Spilt each word into bigrams and then add (bigram, word) to a map. Use the map to look up the bigrams in the target word, and then iterate through the candidates. That's probably more work than you want to do, though.
not necessarily a library but i think this article may be really helpful. it mostly describes the general workings of how a spelling corrector works in python, but also has a link for a java implementation which you may use if that is what you are looking for specifically (note that I haven't specifically used the java one before)
i am developing a desktop application using java. this application is for school kid to teach English, where user can upload some English audio can be in any format which need to be converted into text file. where they can read the text.
I've found some api but i am not sure about them.
http://cmusphinx.sourceforge.net/wiki/
I've seen many question on stackoverflow regarding this but none was helpful. if someone can help on this will be very greatful
thank you
There are many technologies and services available to perform speech recognition. For an intro to some of the choices see https://stackoverflow.com/a/6351055/90236.
I'm not sure that the results will be acceptable for teaching children English as a second language, but it is worth trying.
What you seek is currently breaking edge technology. Tools like cmusphinx can detect words from a dedicated, limited dictionary (so you can teach it to understand, say, 15 words and that's it - you can't teach it to understand English).
Basically, those tools try to find patterns in the sound waves that you feed them. They don't understand anything, they just use the same algorithm on anything and then try to find the closest match. This works well for small sets of words but as the number of words increases, the difference between then shrinks and the jobs gets ever harder (without even starting with words like whether and weather or C and see).
What you might consider is "repeat after me" software. Here, you need to record all words for the test as templates. Then you can record the words from the pupils and then compute the difference. If the difference is not too large, the word is correct. But again: This is simple repetition to improve pronunciation - not English.
There is desktop software which can understand a lot of English (for example the products from Nuance, Dragon Naturally Speaking being one of the most prominent). They do offer server solutions but that software isn't free or cheap if you're on a tight budget.
I need some sort of solution in Java for the following requirements:
Search in a text for certain terms (each term can be 1-3 words). For example: {"hello world", "hello"}. The match need to be exact.
There are about 500 types of terms groups each contains about 30 terms.
Each text might contain up to 4000 words.
performance is an important issue.
Thanks,
Rod
I have done something similar for a bespoke spam filter.
A technique I found to be both simple and fast is:
Split the input file into words first.
Call intern() on each word, to simplify the comparisons in step 3.
Create a Term class, encapsulating an array of up to three strings. Its equals() method can do pointer comparison on the strings, rather than calling String.equals(). Create a Term instance for each group of 2 or 3 consecutive words in the input.
Use a Multimap (from Google Collections) to map each term to the set of files in which it appears.
Use regex expressions. See: http://java.sun.com/docs/books/tutorial/essential/regex/
There seems to be two parts to this. Figuring a decent algorithm, and implementing it in Java. (For the moment let's put aside the idea that surely "out there" someone has already implemented this, and you can probably find some ideas.)
Seems like we want to avoid repeat expensive work. but it's not clear where the costs would be. So I guess you'll need to be prepared to benchmark a few candidate appraoches. Also have in mind what is "good enough".
Start wih the simplest thing you can think of that works. Measure it. You might get the surprising result that it's good enough. Stop right there! For example, this is really dumb:
read text into String (4k, that's not too big)
for each term
use regexp to find matches in text
but it might well give a sub-second response time. Would your users really care if you took a 200ms response down to 100ms? How much would they pay for that?
Another approach. I wonder of this is faster?
prepare a collection of terms keyed by first word
tokenize the text
for each token
find terms that match
check for match (using look ahead for multi-word terms)
As for implementing in Java. Separate problem ask specific questions if you need to.
Started up solving UVa problems again as a way to pass time (going to the army in 6 weeks). I love writing Java, but end up using C / C++. It's not because IO is faster, no need to box data, more memory or use of unsigned, because its algorithm efficiency that counts.
In short i am slowly constructing how to/article/code base for different categories of efficient algorithms and dp is next.
Quoting Mark Twain: It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so.
I aid assistance in building priority list what are must have efficient algorithms.
This MIT lecture is a good introduction to dynamic programming if you're already familiar with algorithms.
The wikipedia article on Dynamic Programming has a section entitled "Algorithms that use dynamic programming" with many examples.
Here is another good list of practice problems in dynamic programming.
Since you referenced the UVa problem list, you should definitely take a look at Problem 103 - Stacking Boxes. The problem lends itself well to a solution using a Longest Increasing Subsequence algorithm.