My program loads two text files: one with a list of English words, another with jumbled words (more just random strings but most make words), then determines what words can be made of the jumbled ones and prints (at least supposed to) the word with its jumbled version next to it. My program effectively finds what words the jumbled words can make up. The problem is not all the words with jumbled equivalents print jumbled word next to them. Also I need the jumbled words on the right unscrambled on the right. Example here's some output (commas separate lines ie if two words in comma they were printed next to each other ;
addej,
ahicryrhe hierarchy,
alvan naval,
annaab banana,
baltoc,
braney nearby,
public class Lab4{
public static void main(String [] args) throws Exception{
if(args.length<2) {
Error(); }
BufferedReader jumbledW = new BufferedReader(new FileReader(args[1]));
BufferedReader words = new BufferedReader(new FileReader(args[0]));
List<String> jumbledWList = new ArrayList<>();
List<String> wList = new ArrayList<>();
long initialTime = System.currentTimeMillis();
while(jumbledW.ready()){
String jumble = jumbledW.readLine();
jumbledWList.add(jumble);
}
Collections.sort(jumbledWList);
while (words.ready()){
String word = words.readLine();
wList.add(word);
}
Collections.sort(wList);
for (String jumble : jumbledWList ) {
System.out.print(jumble + " ");
for (String word : wList) {
if(toConnical(jumble).equals(toConnical(word)))
System.out.print(word);
}
System.out.println();
}
long finalTime = System.currentTimeMillis();
long time = finalTime - initialTime;
System.out.println("The time taken for this program to run is " + time/1000.0 + " seconds" );
}
private static void Error(){
System.out.println("\nError:You have to pass the name of the input files on the command line" );
System.exit(0);
}
private static String toConnical(String word){
char [] arr = word.toCharArray();
Arrays.sort(arr);
String connical = new String(arr);
return connical;
}
}
While skipping through old unanswered posts I had come across this particular question which quite frankly is somewhat unclear as to what the actual problem may be. The way I read the post is this:
A file name is passed to this application through the command line which consists of several jumbled character words. Not certain however whether each line within the file consists of single jumbled words OR each line within the file consists of multiple jumbled words separated with one or more white-spaces (or perhaps even tabs). With this in mind we need to cover either scenario.
Yet another file name is also passed to this application through the command line which consists of several valid language type words. This file is to be considered the Word List to whereas any single jumbled word could pertain to one (or more) of those words within the list if it were unscrambled or not (the word within the Jumbled String list might not be jumbled).
As the Jumbled String list is processed each Word List word that is found to be capable of matching via sorted character to character comparison to the sorted Jumbled word characters is printed to console window.
Console output required is to be a comma delimited string consisting of each jumbled word followed by space delimited matching words from the Word List:
addej jaded, ahicryrhe hierarchy, alvan alvan naval, annaab banana,...etc
This is however appears to now be contradictory to your comment:
"the program is given a txt file of a great deal of english words youd
find in a dictionary and another txt file with jumbled words such as
cra which could make car or rat. The output i desire is the in
reference to the example of cra would be "car cra" (on one line)."
Whereas space delimited Word List words are to come first then the jumbled word processed each consuming a single console window line. Which format is desired? By the way rat can not be achieved from cra.
In reality your code works as expected however since you are using a BufferedReader object and a FileReader object your code will need to be in a try/catch block to handle any exceptions such as the FileNotFoundException and the IOException. This is a requiremnt and can not be excluded.
Below is your code slightly modified to accommodate your first desired output format:
try {
BufferedReader jumbledW = new BufferedReader(new FileReader(args[1]));
BufferedReader words = new BufferedReader(new FileReader(args[0]));
List<String> jumbledWList = new ArrayList<>();
List<String> wList = new ArrayList<>();
long initialTime = System.currentTimeMillis();
while (jumbledW.ready()) {
String jumble = jumbledW.readLine();
jumbledWList.add(jumble);
}
Collections.sort(jumbledWList);
while (words.ready()) {
String word = words.readLine();
wList.add(word);
}
Collections.sort(wList);
String resultString = "";
for (int i = 0; i < jumbledWList.size(); i++){
String jumble = jumbledWList.get(i);
resultString+= jumble + ":";
for (String word : wList) {
if (toConnical(jumble).equals(toConnical(word))) {
resultString+= " " + word;
}
}
if (i != (jumbledWList.size()-1)) { resultString+= ", "; }
}
System.out.println(resultString);
long finalTime = System.currentTimeMillis();
long time = finalTime - initialTime;
System.out.println("The time taken for this program to run is " + time / 1000.0 + " seconds");
}
catch (FileNotFoundException ex) { ex.printStackTrace(); }
catch (IOException ex) { ex.printStackTrace(); }
The jumbled word list file contained the following jumbled strings:
addej
ahicryrhe
alvan
annaab
baltoc
braney
cra
htis
the console output looked like this after running the above list of jumbled words across my 370,101 word Word List file. It took 0.740 seconds to process on my system:
addej: jaded, ahicryrhe: hierarchy, alvan: alvan naval, annaab: banana, baltoc: cobalt, braney: barney nearby, cra: arc car, htis: hist hits isth shit sith this tshi
All words shown above were in my Word List file.
Related
I have a scrambled String as follows: "artearardreardac".
I have a text file which contains English dictionary words close to 300,000 of them. I need to find the English words and be able to form a word as follows:
C A R D
A R E A
R E A R
D A R T
My intention was to initially loop through the scrambled String and make query to that text file each time n try to match 4 characters each time to see if its a valid word.
Problem with this is checking it against 300,000 words per loop.. Going to take ages. I looped through only the first letter 16 times and that itself take a significant time. The amount of possibilities coming from this method seems endless. Even if I dismiss the efficiency for now, I could end up finding English words which may not form a Word.
My guess is I have to resolve and find words while maintaining the letter formation correctly from the start somehow? At it for hours and gone from fun to frustration. Can I just get some guidance please. Looking for similar questions but found none.
Note: This is an example and I am trying to keep it open for a longer string or a square of different size. (The example is 4x4. The user can decide to go with a 5x5 square with a string of length 25).
My Code
public static void main(String[] args){
String result = wordSquareCreator(4, "artearardreardac");
System.out.println(result);
}
static String wordSquareCreator(int dimension, String letter){
String sortedWord = "";
String temp;
int front = 0;
int firstLetterFront = 0;
int back = dimension;
//Looping through first 4 letters and only changing the first letter 16 times to try a match.
for (int j = 0; j < letter.length(); j++) {
String a = letter.substring(firstLetterFront, j+1) + letter.substring(front+1, back);
temp = readFile(dimension, a);
if(temp != null){
sortedWord+= temp;
}
firstLetterFront++;
}
return sortedWord;
}
static String readFile(int dimension, String word){
//dict text file contains 300,00 English words
File file = new File("dict.txt");
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader(file));
String text;
while ((text = reader.readLine()) != null) {
if(text.length() == dimension) {
if(text.equals(word)){
//found a valid English word
return text;
}
}
}
}catch (Exception e){
e.printStackTrace();
}
finally {
try {
if(reader != null)
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return null;
}
You can greatly cut down your search space if you organize your dictionary properly. (Which can be done as you read it in, you don't need to modify the file on disk.)
Break it up into one list per word length, then sort each list.
Now, to reduce your search space--note that singletons can only occur on the diagonal from the top left to the bottom right. You have an odd number of C, T, R and A--those 4 letters make up this diagonal. (Note that you will not always be able to do this as they aren't guaranteed unique.) Your search space is now one set of 4 with 4 options (24 options) and one set of 6 (720 options except there are duplicates that cut this down.) 17k possible boards and under 1k words (edit: I originally said 5k but you can restrict the space to words starting with the correct letter and since it's a sorted list you don't need to consider the others at all) to try and you're already under 20 million possibilities to examine. You can cut this considerably by first filtering your word list to those that contain only the letters that are used.
At this point an exhaustive search isn't going to be prohibitive.
Since it seems that you want to create a word square out of those letters that you take in as a parameter to your function, you know that the absolute word length in your square is sqrt(amountOfLetters). In your examplecode that would be sqrt(16) = 4. You can also disqualify a lot of words directly from your dictionary:
discard a word if it does not start with a letter in your "alphabet" (i.e. "A", "C", "D", "E", "R", "T")
discard a word if it is not equal to your wordlength (i.e. 4)
discard a word if it has a letter not in your alphabet
The amount of words that you want to "write" in your square is wordlength * 2 (since the words can only start from the upper-row or from the left-column)
You could actually first start by going through your dictionary and copying only valid words into new file. Then compare your square into this new shorter dictionary.
With building up the square, I think there are 2 possibilities to choose between.
The first one is to randomly organize the square from the letters and make checks if the letters form up correct words
The second one is to randomly choose "correct" words from the dictionary, and write them into your square. After that you check if the words use a correct amount and setting of letters
I am working on a Comp Sci assignment. In the end, the program will determine whether a file is written in English or French. Right now, I'm struggling with the method that counts the frequency of words that appears in a .txt file.
I have a set of text files in both English and French in their respective folders labeled 1-20. The method asks for a directory (which in this case is "docs/train/eng/" or "docs/train/fre/") and for how many files that the program should go through (there are 20 files in each folder). Then it reads that file, splits all the words apart (I don't need to worry about capitalization or punctuation), and puts every word in a HashMap along with how many times they were in the file. (Key = word, Value = frequency).
This is the code I came up with for the method:
public static HashMap<String, Integer> countWords(String directory, int nFiles) {
// Declare the HashMap
HashMap<String, Integer> wordCount = new HashMap();
// this large 'for' loop will go through each file in the specified directory.
for (int k = 1; k < nFiles; k++) {
// Puts together the string that the FileReader will refer to.
String learn = directory + k + ".txt";
try {
FileReader reader = new FileReader(learn);
BufferedReader br = new BufferedReader(reader);
// The BufferedReader reads the lines
String line = br.readLine();
// Split the line into a String array to loop through
String[] words = line.split(" ");
int freq = 0;
// for loop goes through every word
for (int i = 0; i < words.length; i++) {
// Case if the HashMap already contains the key.
// If so, just increments the value
if (wordCount.containsKey(words[i])) {
wordCount.put(words[i], freq++);
}
// Otherwise, puts the word into the HashMap
else {
wordCount.put(words[i], freq++);
}
}
// Catching the file not found error
// and any other errors
}
catch (FileNotFoundException fnfe) {
System.err.println("File not found.");
}
catch (Exception e) {
System.err.print(e);
}
}
return wordCount;
}
The code compiles. Unfortunately, when I asked it to print the results of all the word counts for the 20 files, it printed this. It's complete gibberish (though the words are definitely there) and is not at all what I need the method to do.
If anyone could help me debug my code, I would greatly appreciate it. I've been at it for ages, conducting test after test and I'm ready to give up.
Let me combine all the good answers here.
1) Split up your methods to handle one thing each. One to read the files into strings[], one to process the strings[], and one to call the first two.
2) When you split think deeply about how you want to split. As #m0skit0 suggest you should likely split with \b for this problem.
3) As #jas suggested you should first check if your map already has the word. If it does increment the count, if not add the word to the map and set it's count to 1.
4) To print out the map in the way you likely expect, take a look at the below:
Map test = new HashMap();
for (Map.Entry entry : test.entrySet()){
System.out.println(entry.getKey() + " " + entry.getValue());
}
I would have expected something more like this. Does it make sense?
if (wordCount.containsKey(words[i])) {
int n = wordCount.get(words[i]);
wordCount.put(words[i], ++n);
}
// Otherwise, puts the word into the HashMap
else {
wordCount.put(words[i], 1);
}
If the word is already in the hashmap, we want to get the current count, add 1 to that and replace the word with the new count in the hashmap.
If the word is not yet in the hashmap, we simply put it in the map with a count of 1 to start with. The next time we see the same word we'll up the count to 2, etc.
If you split by space only, then other signs (parenthesis, punctuation marks, etc...) will be included in the words. For example: "This phrase, contains... funny stuff", if you split it by space you get: "This" "phrase," "contains..." "funny" and "stuff".
You can avoid this by splitting by word boundary (\b) instead.
line.split("\\b");
Btw your if and else parts are identical. You're always incrementing freq by one, which doesn't make much sense. If the word is already in the map, you want to get the current frequency, add 1 to it, and update the frequency in the map. If not, you put it in the map with a value of 1.
And pro tip: always print/log the full stacktrace for the exceptions.
In Java, I have a method that reads in a text file that has all the words in the dictionary, each on their own line.
It reads each line by using a for loop and adds each word to an ArrayList.
I want to get the length of the longest word (String) in the Array. In addition, I want to get the length of the longest word in the dictionary file. It would probably be easier to split this into several methods, but I don't know the syntax.
So far, the code is have is:
public class spellCheck {
static ArrayList <String> dictionary; //the dictonary file
/**
* load file
* #param fileName the file containing the dictionary
* #throws FileNotFoundException
*/
public static void loadDictionary(String fileName) throws FileNotFoundException {
Scanner in = new Scanner(new File(fileName));
while (in.hasNext())
{
for(int i = 0; i < fileName.length(); ++i)
{
String dictionaryword = in.nextLine();
dictionary.add(dictionaryword);
}
}
Assuming that each word is on it's own line, you should be reading the file more like...
try (Scanner in = new Scanner(new File(fileName))) {
while (in.hasNextLine()) {
String dictionaryword = in.nextLine();
dictionary.add(dictionaryword);
}
}
Remember, if you open a resource, you are responsible for closing. See The try-with-resources Statement for more details...
Calculating the metrics can be done after reading the file, but since your here, you could do something like...
int totalWordLength = 0;
String longest = "";
while (in.hasNextLine()) {
String dictionaryword = in.nextLine();
totalWordLength += dictionaryword.length();
dictionary.add(dictionaryword);
if (dictionaryword.length() > longest.length()) {
longest = dictionaryword;
}
}
int averageLength = Math.round(totalWordLength / (float)dictionary.size());
But you could just as easily loop through the dictionary and use the same idea
(nb- I've used local variables, so you will either want to make them class fields or return them wrapped in some kind of "metrics" class - your choice)
Set a two counters and a variable that holds the current longest word found before you start reading in with your while loop. To find the average have one counter be incremented by one each time the line is read and have the second counter add up the total number of characters in each word (obviously the total number of characters entered, divided by the total number of words read -- as denoted by the total number of lines -- is the average length of each word.
As for the longest word, set the longest word to be the empty string or some dummy value like a single character. Each time you read in a line compare the current word with the previously found longest word (using the .length() method on the String to find its length) and if its longer set a new longest word found
Also, if you have all this in a file, I'd use a buffered reader to read in your input data
May be this could help
String words = "Rookie never dissappoints, dont trust any Rookie";
// read your file to string if you get string while reading then you can use below code to do that.
String ss[] = words.split(" ");
List<String> list = Arrays.asList(ss);
Map<Integer,String> set = new Hashtable<Integer,String>();
int i =0;
for(String str : list)
{
set.put(str.length(), str);
System.out.println(list.get(i));
i++;
}
Set<Integer> keys = set.keySet();
System.out.println(keys);
System.out.println(set);
Object j[]= keys.toArray();
Arrays.sort(j);
Object max = j[j.length-1];
set.get(max);
System.out.println("Tha longest word is "+set.get(max));
System.out.println("Length is "+max);
Basically I want to create a program which simulates the 'Countdown' game on Channel 4. In effect a user must input 9 letters and the program will search for the largest word in the dictionary that can be made from these letters.I think a tree structure would be better to go with rather than hash tables. I already have a file which contains the words in the dictionary and will be using file io.
This is my file io class:
public static void main(String[] args){
FileIO reader = new FileIO();
String[] contents = reader.load("dictionary.txt");
}
This is what I have so far in my Countdown class
public static void main(String[] args) throws IOException{
Scanner scan = new Scanner(System.in);
letters = scan.NextLine();
}
I get totally lost from here. I know this is only the start but I'm not looking for answers. I just want a small bit of help and maybe a pointer in the right direction. I'm only new to java and found this question in an interview book and thought I should give it a .
Thanks in advance
welcome to the world of Java :)
The first thing I see there that you have two main methods, you don't actually need that. Your program will have a single entry point in most cases then it does all its logic and handles user input and everything.
You're thinking of a tree structure which is good, though there might be a better idea to store this. Try this: http://en.wikipedia.org/wiki/Trie
What your program has to do is read all the words from the file line by line, and in this process build your data structure, the tree. When that's done you can ask the user for input and after the input is entered you can search the tree.
Since you asked specifically not to provide answers I won't put code here, but feel free to ask if you're unclear about something
There are only about 800,000 words in the English language, so an efficient solution would be to store those 800,000 words as 800,000 arrays of 26 1-byte integers that count how many times each letter is used in the word, and then for an input 9 characters you convert to similar 26 integer count format for the query, and then a word can be formed from the query letters if the query vector is greater than or equal to the word-vector component-wise. You could easily process on the order of 100 queries per second this way.
I would write a program that starts with all the two-letter words, then does the three-letter words, the four-letter words and so on.
When you do the two-letter words, you'll want some way of picking the first letter, then picking the second letter from what remains. You'll probably want to use recursion for this part. Lastly, you'll check it against the dictionary. Try to write it in a way that means you can re-use the same code for the three-letter words.
I believe, the power of Regular Expressions would come in handy in your case:
1) Create a regular expression string with a symbol class like: /^[abcdefghi]*$/ with your letters inside instead of "abcdefghi".
2) Use that regular expression as a filter to get a strings array from your text file.
3) Sort it by length. The longest word is what you need!
Check the Regular Expressions Reference for more information.
UPD: Here is a good Java Regex Tutorial.
A first approach could be using a tree with all the letters present in the wordlist.
If one node is the end of a word, then is marked as an end-of-word node.
In the picture above, the longest word is banana. But there are other words, like ball, ban, or banal.
So, a node must have:
A character
If it is the end of a word
A list of children. (max 26)
The insertion algorithm is very simple: In each step we "cut" the first character of the word until the word has no more characters.
public class TreeNode {
public char c;
private boolean isEndOfWord = false;
private TreeNode[] children = new TreeNode[26];
public TreeNode(char c) {
this.c = c;
}
public void put(String s) {
if (s.isEmpty())
{
this.isEndOfWord = true;
return;
}
char first = s.charAt(0);
int pos = position(first);
if (this.children[pos] == null)
this.children[pos] = new TreeNode(first);
this.children[pos].put(s.substring(1));
}
public String search(char[] letters) {
String word = "";
String w = "";
for (int i = 0; i < letters.length; i++)
{
TreeNode child = children[position(letters[i])];
if (child != null)
w = child.search(letters);
//this is not efficient. It should be optimized.
if (w.contains("%")
&& w.substring(0, w.lastIndexOf("%")).length() > word
.length())
word = w;
}
// if a node its end-of-word we add the special char '%'
return c + (this.isEndOfWord ? "%" : "") + word;
}
//if 'a' returns 0, if 'b' returns 1...etc
public static int position(char c) {
return ((byte) c) - 97;
}
}
Example:
public static void main(String[] args) {
//root
TreeNode t = new TreeNode('R');
//for skipping words with "'" in the wordlist
Pattern p = Pattern.compile(".*\\W+.*");
int nw = 0;
try (BufferedReader br = new BufferedReader(new FileReader(
"files/wordsEn.txt")))
{
for (String line; (line = br.readLine()) != null;)
{
if (p.matcher(line).find())
continue;
t.put(line);
nw++;
}
// line is not visible here.
br.close();
System.out.println("number of words : " + nw);
String res = null;
// substring (1) because of the root
res = t.search("vuetsrcanoli".toCharArray()).substring(1);
System.out.println(res.replace("%", ""));
}
catch (Exception e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Output:
number of words : 109563
counterrevolutionaries
Notes:
The wordlist is taken from here
the reading part is based on another SO question : How to read a large text file line by line using Java?
I'm having some difficulty having this code generate a number of permutations(orderings) on a string separated by commas ...I can do just a regular string and have the permutations work on just letters but it is a bit more difficult when doing it with words separated by the commas...
To have the program recognize the commas I used the StringTokenizer method and I'm putting it into an arrayList but that is really as far as I have gotten ...the problem again is I'm having trouble permuting each word...to give an example I'll post it below this and then my code below that...thank you for your help everyone! ...and by permutations I mean orderings of the words separated by the comma's
For example, if the input coming in on the BufferedReader looked like:
red,yellow
one,two,three
the output on the PrintWriter should look like:
red,yellow
yellow,red
one,two,three
one,three,two
two,one,three
two,three,one
three,one,two
three,two,one
Note that the input had 3 lines total, including the blank line after "one,two,three" while the output had 11 lines total, including one blank line after "yellow,red" and two blank lines after "three,two,one". It is vital that you get the format exactly correct as the testing will be automated and will require this format. Also note that the order of the output lines for each problem does not matter. This means the first two lines of the output could also have been:
yellow,red
red,yellow
here is the code I have so far ...I have commented some stuff out so don't worry about those parts
import java.io.*;
import java.util.*;
public class Solution
{
public static void run(BufferedReader in, PrintWriter out)
throws IOException
{
String str = new String(in.readLine());
while(!str.equalsIgnoreCase(""))
{
PermutationGenerator generator = new PermutationGenerator(str);
ArrayList<String> permutations = generator.getPermutations();
for(String str: permutations)
{
out.println(in.readLine());
}
out.println();
out.println();
}
out.flush();
}
public class PermutationGenerator
{
private String word;
public PermutationGenerator(String aWord)
{
word = aWord;
}
public ArrayList<String> getPermutations()
{
ArrayList<String> permutations = new ArrayList<String>();
//if(word.length() == 0)
//{
//permutations.add(word);
//return permutations;
//}
StringTokenizer tokenizer = new StringTokenizer(word,",");
while (tokenizer.hasMoreTokens())
{
permutations.add(word);
tokenizer.nextToken();
}
/*
for(int i = 0; i < word.length(); i++)
{
//String shorterWord = word.substring(0,i) + word.substring(i + 1);
PermutationGenerator shorterPermutationGenerator = new PermutationGenerator(word);
ArrayList<String> shorterWordPermutations =
shorterPermutationGenerator.getPermutations();
for(String s: shorterWordPermutations)
{
permutations.add(word.readLine(i)+ s);
}
}*/
//return permutations;
}
}
}
You can use String.split() ( http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#split(java.lang.String) ) to get the individual words into as an array. You can separately generate all the permutations on integers {1..N} where N is the size of the word array. Then just walk the word array using the numeric permutations as indices.
Parse your input line (which is a comma-separated String ow words) into array of Strings (String[] words).
Use some permutation generator that works on a array, you can easily find such generator using google. U want a generator that can be initialized with Object[], and has a method like Object[] nextPermutation().
Put it together into your solution.
PS U can also use a Integer permutation generator and generate all permutations from 0 to (words.length - 1); each such permutation will give you an array of indexes of words[] to be printed out.