checking words in a dictionary [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I need to determine if an unknown 5 or 6 letter string is a valid word, i.e. is in the dictionary. I could submit the string/word to an online dictionary, but I need to check this string/word, which will be different each time, for about 100 to 150 times. This seems to be a bit time consuming.
My next thought would be to try to get a dictionary program of my own. It would need to be in Java as my program is written in Java. Does the Java API already have a class for doing this? Can I get a descent one that someone has already coded, and all I have to do is submit the string/word to it?
My program is not being used for spell checking. I want to write a program for unscrambling the Jumbled Word Puzzles when I get stuck on a scrambled word. Thanks for your suggestions.

You could use one of the open source dictionaries and load it into a database: ftp://ftp.cerias.purdue.edu/pub/dict/ and ftp://ftp.ox.ac.uk/pub/wordlists/

For scrambled words, you might want to look at the Jumble algorithm, an implementation of which is seen here.

If you don't need spell checking this would be really easy. Just load all your words into a HashSet and then check to see if that set contains the word you want to test. There are tons of word lists available.
If you do need a spell checker, then check out aspell or other free APIs.

aspell and its associated word lists and dictionaries might be the answer.

I think aspell has a Java version.
edit: actually it looks like you might do better with this aspell spinoff called Jazzy.

Maybe you can check some wordlist:
http://wordlist.sourceforge.net/
This page has some word lists in text format, so you can process in Java yourself, most easily using a HashSet. You need to use more efficient data structures if efficiency is important.

Maybe you could try Peter Norvig's spelling checker. I think it's an elegant way to get 80-90% accuracy.

Related

JAVA get Plural/Singular for a given String/Word [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
trying to match a word with some hard coded values, let's say i have this word
'revenue' but 'revenues'
should also be a match.same way like this
'liability' > 'liabilities' .
what would be the approach we should take here, thanks in advance.
I have tried using my own algorithm but it is very difficult maintain word library and its respective plural or singular.
If you don't want to maintain full dictionary, then you might try to implement some general rules plus dictionary of exceptions from those rules.
But these are all quick and hacky solutions. Depending on how good must it be, different approaches would also be available like machine learning and maybe some language services available on clouds like AWS or Azure...
You might want to look at PorterStemmer of lucene. The idea is to compare the stems of both the words instead of comparing singulars and plurals. You can read more about it here.
Here is the maven dependency and below is an example:
PorterStemmer stemmer = new PorterStemmer();
stemmer.setCurrent("liability");
stemmer.stem();
System.out.println(stemmer.getCurrent());
stemmer.setCurrent("liabilities");
stemmer.stem();
System.out.println(stemmer.getCurrent());
The above returns same stems for both the words.

Algorithm for finding a sequence of characters [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm trying to make a program that :
accepts a string of characters(initial), i.e. a DNA sequence.
The number of characters that are accepted should be given as an input before inserting the string.
Accepts a number, then receives the same number of characters
Find all the possible 'mutations' that could occur from input 2, and check in input 1 whether there are occurrences of these mutations.
Mutations are a number of strings that could be made from input 2. For example, with AGGT, GAGT, GGAT, TGGA, AGGT, ATGG, and AGTG.
So a sample run of the program would be like
2
6 4
ATGGAT
AGGT
3
Usually, when you ask a question on this site, you should show us the effort you put in your work and exactly show or tell us where do you think what you did does not work.
I am not giving you an answer, but I'm giving you better: the opportunity to learn more on the subject. You can learn about what we call in java "regular expressions" here:http://docs.oracle.com/javase/tutorial/essential/regex/
If you make an update of your query showing us what you did, people might be more encline to help you. Stackoverflow has a great community of programmers who are willing to help you just as long as you show them respect by showing where you failed and not directly asking answers for your type of problems !
Learning about regex will definetly help you out to solve this situation. If you build a method using regex and matches and you have issues with it, post what you did and what was intended with this method !
Hope it helps you :)

Most efficient way to replace many (5000+) strings in a .txt file [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Using a general-purpose programming language like Java, what is the most efficient way to search through a ~20 page document to replace a set of 5000+ strings with some predetermined replacement string? The program should not replace any strings that have already been replaced. What data structure would be optimal to store the 5000+ strings and each of their replacements - two arrays, a dictionary, or something else?
Here are some of the options that I have considered so far:
Iterate through the entire .txt document once time per string using string.replace. The problem is that the algorithm must iterate through the entire .txt document an extra time for each string stored.
Iterate through the .txt once while replacing string as necessary while creating a new string by appending replacements. This seems more efficient, but each step would still require checking the entire set of 5000+ strings for any strings to replace.
Is there a more optimized means of solving this problem, or is one of the above attempts already optimal?
Also, would it be possible to run this algorithm more efficiently in a lower-level language like C?
You want to replace some string in 5000 strings and you want to make it optimal ... Now my question to you is: How will you know if you have to replace a string if you dont read the string? It's not possible, you have to read everything. And the shortest way to do that is to go line by line and replace immediatly. And somebody can correct me if i'm wrong, but reading a file is one of the most basic operations there is so using a library for that besides what is available by default in the programming language seems total overkill to me. Furthermore, every language has basic io and if it doesn't then don't use it.
To store strings, it all depends what you want to do with them. Different data structures have different purposes and some are better suited in some situations then others. If you just need to store them then a simple array is fine. However, if you need more advanced functions then you need to consider your options. But again it's all up to what you want to do with them later.
And there is the memory issue, you need to calculate how much memory your 5000+ strings will take, because you might run out of memory. Then you need to think if it's worth it to use all that memory. check this link
Finally your question about C, ofcourse it will be more efficient. Java runs in a virtual machine that adds considerable overhead. So basically your Java program runs in another Java program and if you know that there is a cost for every single operation then you understand that C will be more efficient then Java in terms of performance.
I would use the commons-lang library, which I think has exactly what you are looking for. Basically you create one array with all the strings you want to substitute and another array with the substitutions. See http://commons.apache.org/proper/commons-lang/javadocs/api-release/index.html for details on the StringUtils#replaceEach method.

Why soooo many ways to write a java binary tree? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Trying to come up with the simplest way to create binary tree, so i read alot of code online from different sources.
how is it theres so many different ways to create a working binary tree? This is what i dont understand about programming syntax/language... the logic i get.. but i believe there should be a set , uniform foundation to create the same things in the simplest way. Am I wrong?
create tree
create node
create node root
Ok,
I will try to answer your question.
Let's take a big box. You want to put inside some things. Few days after, you will have to find something and use it. Someday, maybe you will want to throw something away.
So, what is your approach, if you are sure, that you will never use any object of them? Probably you will not care where and how you put this object. It is fast and easy to do.
What, if someday, you will want to use some objects? You keep this way and you will be loosing a lot of time for searching or maybe you will try to keep some order inside and easily find an object?
Exactly the same story is with the Binary Trees.
Each of alghoritms has some properties f.e. a lot of algos keep your tree balanced. Before you choose an implementation for your code, think about how you will use it and take a proper one.

Linked Lists - Efficient usage? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
So I've been giving a project for school and was just hoping for reassurance towards the decision making I do.
My java skills are extremely basic and need to improve drastically soon! I just seem to be getting nowhere atm :P
Anyway, back on topic.
My first task is to create an interface directory that can do the following
Keep in mind this is assessed work therefore please don't provide answers. I have enough time to complete this since I've started early!
My initial approach for going about this task is using a Linked List. I don't know what you guys think about that? I may be completely wrong but based on the topics we have covered in School. LinkedList definitely seems suitable. I can add, get and remove.
Cheers for reading guys!
You are along the right lines. The java.util.Collections package will contain most of what you need.
I would actually use an ArrayList rather than a LinkedList as it is faster for random access and sorting.
However note that it says you should be able to find people efficiently and look them up by name.
That suggests using something like a TreeMap structure, mapping name to a class containing information on each person. Store the names as "Surname, Forename" and they will be sorted correctly.
That will only allow lookups based on the complete and correct name though. If you want to search for partial names the map is less useful.

Categories

Resources