Searching through a Hashset for words within a file - java

I've got a Hashset with my dictionary of words in it.
What I'm trying to do is to individually scan words from the file checkMe to see whether they exist in my HashSet.
When a word doesn't exist, I need to trigger a number of actions (which I won't get into).
For now, I'd like some advice as to how I take words from my scanned file and check them against my HashSet.
Something like:
if (dicSet does not contain a word in checkMe) {
da da da
}
Also, I want to be able to loop through checkMe to make sure each word is checked through dicSet until I reach an error.
My code so far:
import java.util.*;
import java.io.*;
public class spelling{
public static void main(String args[]) throws FileNotFoundException {
//read the dictionary file
Scanner dicIN = new Scanner(new File("dictionary.txt"));
//read the spell check file
Scanner spellCheckFile = new Scanner(new File("checkMe.txt"));
//create Hashset
Set <String> dicSet = new HashSet<String>();
//Scan from spell check file
Scanner checkMe = new Scanner(spellCheckFile);
//Loop through dictionary and store them into set. set all chars to lower case just in case because java is case sensitive
while(dicIN.hasNext())
{
String dicWord = dicIN.next();
dicSet.add(dicWord.toLowerCase());
}
//make comparisons for words in spell check file with dictionary
if(dicSet){
}
// System.out.println(dicSet);
}
}

while(checkMe.hasNext())
{
String checkWord = checkMe.next();
if (!dicSet.contains(checkWord.toLowerCase())) {
// found a word that is not in the dictionary
}
}
That's the basic idea at least. For real use, you'd have to add a ton of error-checking and exceptional states handling (what if your input contains numbers? What about ., - etc?)

Related

Converting a string that contains multiple words to a vector of words

I have an InputStream file, I have to put all the words from that file into a vector of strings.
I tried multiple things to convert the InputStream file to where I can read all the words in it, but no matter what I always end up with a long string with all the words.
How can I separate all the words in the file to that I can put them in a vector of strings?
here is my code for the conversion from InputStream file to string:
public static InputStream vocabDoc = Librarian.class.getClassLoader().getResourceAsStream("Vocabulary.txt");
String str = new Scanner(vocabDoc,"UTF-8").useDelimiter("\\A").next();
System.out.println(str);
this is what the file "vocabDoc" contains (exactly):
file
vocabulary
test
is
one
this
for
if I try to put it in a vector it always come back as:
[file
vocabulary
test
is
one
this
for
]
and if I take out the "\n" it comes out as: [filevocabularytestisonethisfor], my goal is to have something like: [file, vocabulary, test, is, one, this, for] instead.
I'm not sure where to go from here and would really appreciate some help.
For the expected output, simply do it without using any explicit delimiter. Using Scanner#hasNext, you can test if the file more words to read.
Demo:
import java.io.InputStream;
import java.util.Scanner;
import java.util.Vector;
public class Main {
public static void main(String[] args) {
InputStream vocabDoc = Main.class.getClassLoader().getResourceAsStream("Vocabulary.txt");
Scanner scanner = new Scanner(vocabDoc);
Vector<String> vector = new Vector<>();
while (scanner.hasNext()) {
vector.add(scanner.next());
}
scanner.close();
System.out.println(vector);
}
}
Output:
[file, vocabulary, test, is, one, this, for]

Print Total Number of Different Words (case sensitive) from a file

**Edit after reviewing Tormod's answer and implementing his advice.
As the title states I'm attempting to print the total number of different words after receiving a file name from command line input. I receive the following message after attempting to compile the program:
Note: Project.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
Here is my code. Any help is greatly appreciated:
import java.lang.*;
import java.util.*;
import java.io.*;
public class Project {
public static void main(String[] args) throws IOException {
File file = new File(args[0]);
Scanner s = new Scanner(file);
HashSet lib = new HashSet<>();
try (Scanner sc = new Scanner(new FileInputStream(file))) {
int count = 0;
while(sc.hasNext()) {
sc.next();
count++;
}
System.out.println("The total number of word in the file is: " + count);
}
while (s.hasNext()) {
String data = s.nextLine();
String[] pieces = data.split("\\s+");
for (int count = 0; count < pieces.length; count++)
{
if(!lib.contains(pieces[count])) {
lib.add(pieces[count]);
}
}
}
System.out.print(lib.size());
}
}
I would implement it using a HashSet Add all the words, and read out the size. If you want to make it case insensitive just manipulate all the words to uppercase or something like that. this uses some memory but...
one problem you got with the algorithm is that you do only have one "words". it only holds the words at the same line. so you only count same words at the same line.
HashSet stores strings by their hash value, and thus stores one word only one time.
construction: HashSet lib = new HashSet<>();
inside the loop: if(!lib.contains(word)){lib.add(word);}
check the word count: lib.size()
for(String s : words) {
if(s.equals(word))
count++;
}
You are comparing the words to an empty String, since it's a word it's always gonna be false.
Like Tormod said, the best would be to store the words in a HashSet, as it won't keep duplicates. Then just read out its size.

Java - Parsing CSV into ArrayList (Need to recognize line breaks)

Preface: This is for an assignment in one of my classes.
I need to parse through a CSV file and add each string to an ArrayList so I can interact with each string individually with pre-coded functions.
My problem is that the final string in each line (which doesn't end with a comma) is combined with the first string in the next line and recognized as being at the same index in the ArrayList. I need to learn how to either add a line break or do something else that will stop my loop at the end of each line and read the next line separately. Perhaps there is a built-in method in the scanner class that I'm unaware of that does this for me? Help is appreciated!
Here is the information in the CSV file:
Fname,Lname,CompanyName,Balance,InterestRate,AccountInception
Sally,Sellers,Seashells Down By The Seashore,100.36,3,7/16/2002
Michael,Jordan,Jordan Inc.,1000000000,3,6/12/1998
Ignatio,Freely,Penultimate Designs,2300.76,2.4,3/13/1991
Here is my code so far
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Scanner;
public class InterestCalculator {
public static void main(String[] args) throws IOException {
Scanner scanner = new Scanner(new File("smalltestdata-sallysellers.csv"));
// Chomp off at each new line, then add to array or arraylist
scanner.useDelimiter("\n");
ArrayList<String> data = new ArrayList<String>();
while (scanner.hasNext()) {
// Grab data between commas to add to ArrayList
scanner.useDelimiter(",");
// Add grabbed data to ArrayList
data.add(scanner.next());
}
System.out.println(data.get(10));
scanner.close();
}
}
And here is the output
7/16/2002
Michael
It seems like you just need to do...
String s[] = scanner.nextLine().split(",");
Collections.addAll(data, s);

Manipulating a text file in Java?

I'm trying to make a program that reads in an external .txt file and manipulates it. The file has 5 different groups of data, 4 lines each (2 are int, 2 string). I need to read in the file using the Scanner class, Make an object to hold each group of data (write a class which stores the data group as a single object (lets call it ProgramData)). Then I need to create a ProgamData object and put that into an ArrayList, and repeat for each of the 5 groups.
I have a text file, and I read it in with the Scanner (I confirmed that I did this right through printing on the command line). I'm completely lost from there. Any help at all would be greatly appreciated.
Not like this will help, but here's my code so far:
import java.io.*;
import java.util.ArrayList;
import java.util.Scanner;
public class Project1
{
public static void main (String[] args) throws IOException
{
File dataFile = new File("C:\\Users/data.txt");
Scanner fileReader = new Scanner(dataFile);
int firstLine = fileReader.nextInt();
int secondLine = fileReader.nextInt();
String whiteSpace = fileReader.nextLine();
String thirdLine = fileReader.nextLine();
String fourthLine = fileReader.nextLine();
ArrayList<String> newArray = new ArrayList<String>();
}
}
Make sure when you're reading the input file, use the Scanner class's hasNext() method. It detects if there is still a line in the file so you don't reach the end of the file. Use it like so
// get file input
// this will make sure there are still lines left within the
// file and that you have not reached the end of the file
while(fileReader.hasNext()) {
int firstLine = fileReader.nextInt();
int secondLine = fileReader.nextInt();
String whiteSpace = fileReader.nextLine();
String thirdLine = fileReader.nextLine();
String fourthLine = fileReader.nextLine();
}
You need to take the provided above to do the operations you are looking for.
Here are the steps you can follow:
Create a class named as ProgramData
Make a constructor which will accept your group data. -->
What is constructor
Now in Project1 Class read the file properly. --> Scanner Tutorial and Reading a txt file using scanner java
Once you get all the first group data from file pass it to ProgramData class and create instance something like
ProgramData pd1 = new ProgramData (/* list of parameter */)
Add that ProgramData instace to Arraylist like below
// Define Arraylilst
ArrayList<ProgramData > list= new ArrayList<ProgramData >();
// Do some operation like reading or collecting the data and creating object
// shown in step 4
list.add(pd1); // add new object of group to list.
I hope this will help you to achieve your goal. If you have any question just ask. Good luck

Need some help on managing Strings

Don't worry about the hash table guys but just give me some idea how to manage Strings.
I need to do spell check a word entered by the user in dictionary using hashtables. I got a method named checkDictionary() from hash tables to check whether the given word is present in dictionary or not. It returns a Boolean value if the word is present or false if not.
What I want to do is, I just want to check the word in dictionary when it is misspelled, making some possible corrections.
possible corrections :
Change one letter: For example, if the misspelled word is “kest”, i want to try all possibilities of
changing one character at a time, and look the modified word up in the dictionary. The
possibilities will be “aest”, “best”,...,”zest”, “kast”,...,”kzst”, etc.
---How can I change a single character at a time and that too from a to z.
Exchange adjacent letters: For example, if the misspelled word is “ebst”, try “best”, esbt”
and “ebts”.
---How can I change the adjacent letters , need to swap or something?..
Remove one letter: For example, if the misspelled word is
“tbird”, try all possibilities of removing one letter at a time, and look the modified word up
in the dictionary, which are: “bird”, “tird”, “tbrd”, and “tbir”.
---How can I remove each letter every time?
Please do remember that the word entered may be of any length.
I need to return this suggestions to the user after checking the words in dictionary.
Is there any methods in Strings that I can use to implement these functions.
Please help in implementing the above methods Change, Exchange and Remove.
import java.util.*;
import java .io.*;
public class HashTableDemo
{
public static void main(String [] args)
{
// constructs a new empty hashtable with default initial capacity
HashTable hashtable = new HashTable();
Scanner keyboard = null;
Scanner input=null;
try
{
System.out.println("Enter a word to check in dictionary");
keyboard = new Scanner(System.in);
String word = (keyboard.nextLine().toUpperCase());
//Adding aal dictionary words from a text file to hash table.
input=new Scanner(new FileInputStream("TWL.txt"));
int i=1;
// adding value into hashtable
while(input.hasNextLine())
{
String hello = input.nextLine();
hashtable.put( hello, new Integer(i) );
i++;
}
);
if(hashtable.checkDictionary(word))
System.out.println("The word "+word+" is there in the dictionary.");
else
System.out.println("The word "+word+" is not there in the dictionary.");
}//try
//Here I need to implement the required methods if the word is not in dictionary and misspelled.
catch(FileNotFoundException e)
{
System.out.println("Cannot open file");
System.exit(0);
}//end catch
There is no simple solution for what you're trying to accomplish. A good mathematical concept you could use for spell-checking is called Edit Distance, you should definitely read a bit of theory before attempting to write some code.

Categories

Resources