Java - find matching string from .text file

Java - find matching string from .text file - java

I want to find matching string from .text file.
but using this code I can only get the matching string of the first line of the text file. it does not run to the other lines of the text file
Scanner scanner = new Scanner(file);
while (scanner.hasNextLine()) {
System.out.println("Inside next line");
String line = scanner.nextLine();
System.out.println(line);
String[] tokenarray = line.split(":");
if (tokenarray[0].equals(id)) {
System.out.println("match found");
System.out.println(tokenarray[0]);
customer = new Customer(tokenarray[0], tokenarray[1],
tokenarray[2], tokenarray[3], tokenarray[4],
tokenarray[5], tokenarray[6], tokenarray[7],
tokenarray[8]);
break;
}
}
This code works only when is input id as tokenarray[0] value of the first line of the document. I want to search whole text document. not only the first line.

It seems like when you will remove the break it will solve your problem.

String line = "hello:world:hello:up";
String[] tokenarray = line.split(":");
for (String s : tokenarray) {
System.out.print((s.contains("hello") ? "match" : "no match"));
System.out.print(", ");
}
output:
match, no match, match, no match,

Related

Remove stop words from file - going over it multiple times causes content duplication and does not remove the words

I am trying to go over a bunch of files, read each of them, and remove all stopwords from a specified list with such words. The result is a disaster - the content of the whole file copied over and over again.
What I tried:
- Saving the file as String and trying to look with regex
- Saving the file as String and going over line by line and comparing tokens to the stopwords that are stored in a LinkedHashSet, I can also store them in a file
- tried to twist the logic below in multiple ways, getting more and more ridiculous output.
- tried looking into text / line with the .contains() method, but no luck
My general logic is as follows:
for every word in the stopwords set:
while(file has more lines):
save current line into String
while (current line has more tokens):
assign current token into String
compare token with current stopword:
if(token equals stopword):
write in the output file "" + " "
else: write in the output file the token as is
Tried what's in this question and many other SO questions, but just can't achieve what I need.
Real code below:
private static void removeStopWords(File fileIn) throws IOException {
File stopWordsTXT = new File("stopwords.txt");
System.out.println("[Removing StopWords...] FILE: " + fileIn.getName() + "\n");
// create file reader and go over it to save the stopwords into the Set data structure
BufferedReader readerSW = new BufferedReader(new FileReader(stopWordsTXT));
Set<String> stopWords = new LinkedHashSet<String>();
for (String line; (line = readerSW.readLine()) != null; readerSW.readLine()) {
// trim() eliminates leading and trailing spaces
stopWords.add(line.trim());
}
File outp = new File(fileIn.getPath().substring(0, fileIn.getPath().lastIndexOf('.')) + "_NoStopWords.txt");
FileWriter fOut = new FileWriter(outp);
Scanner readerTxt = new Scanner(new FileInputStream(fileIn), "UTF-8");
while(readerTxt.hasNextLine()) {
String line = readerTxt.nextLine();
System.out.println(line);
Scanner lineReader = new Scanner(line);
for (String curSW : stopWords) {
while(lineReader.hasNext()) {
String token = lineReader.next();
if(token.equals(curSW)) {
System.out.println("---> Removing SW: " + curSW);
fOut.write("" + " ");
} else {
fOut.write(token + " ");
}
}
}
fOut.write("\n");
}
fOut.close();
}
What happens most often is that it looks for the first word from the stopWords set and that's it. The output contains all the other words even if I manage to remove the first one. And the first will be there in the next appended output in the end.
Part of my stopword list
about
above
after
again
against
all
am
and
any
are
as
at
With tokens I mean words, i.e. getting every word from the line and comparing it to the current stopword

After awhile of debugging I believe I have found the solution. This problem is very tricky as you have to use several different scanners and file readers etc. Here is what I did:
I changed how you added to your StopWords set, as it wasn't adding them correctly. I used a buffered reader to read each line, then a scanner to read each word, then added it to the set.
Then when you compared them I got rid of one of your loops as you can easily use the .contains() method to check if the word was a stopWord.
I left you to do the part of writing to the file to take out the stop words, as I'm sure you can figure that out now that everything else is working.
-My sample stop words txt file:
Stop words
Words
-My samples input file was the exact same, so it should catch all three words.
The code:
// create file reader and go over it to save the stopwords into the Set data structure
BufferedReader readerSW = new BufferedReader(new FileReader("stopWords.txt"));
Set<String> stopWords = new LinkedHashSet<String>();
String stopWordsLine = readerSW.readLine();
while (stopWordsLine != null) {
// trim() eliminates leading and trailing spaces
Scanner words = new Scanner(stopWordsLine);
String word = words.next();
while(word != null) {
stopWords.add(word.trim()); //Add the stop words to the set
if(words.hasNext()) {
word = words.next(); //If theres another line, read it
}
else {
break; //else break the inner while loop
}
}
stopWordsLine = readerSW.readLine();
}
BufferedReader outp = new BufferedReader(new FileReader("Words.txt"));
String line = outp.readLine();
while(line != null) {
Scanner lineReader = new Scanner(line);
String line2 = lineReader.next();
while(line2 != null) {
if(stopWords.contains(line2)) {
System.out.println("removing " + line2);
}
if(lineReader.hasNext()) { //If theres another line, read it
line2 = lineReader.next();
}
else {
break; //else break the first while loop
}
}
lineReader.close();
line = outp.readLine();
}
OutPut:
removing Stop
removing words
removing Words
Let me know if I can elaborate any more on my code or why I did something!

How to skip reading a line with scanner

I have read in a text file and am scanning said file. The question I have is how would I skip over lines that include a certain character (in my case lines that start with " // " and " " (whitespace).
Here is my code at the moment. Can someone point me in the right direction?
File dataFile = new File(filename);
Scanner scanner = new Scanner(dataFile);
while(scanner.hasNext())
{
String lineOfText = scanner.nextLine();
if (lineOfText.startsWith("//")) {
System.out.println(); // not sure what to put here
}
System.out.println(lineOfText);
}
scanner.close();

You will only want to execute the code within the while-loop if the line of text doesn't start with a / or whitespace. You can filter these out as seen below:
while(scanner.hasNext()) {
String lineOfText = scanner.nextLine();
if (lineOfText.startsWith("//") || lineOfText.startsWith(" ")) {
continue; //Exit this iteration if line starts with space or /
}
System.out.println(lineOfText);
}

As you are iterating over the lines of text in the file, use String's startsWith() method to check if the line starts with the sequences you are trying to avoid.
If it does, continue to the next line. Otherwise, print it.
while (scanner.hasNext()) {
String lineOfText = scanner.nextLine();
if (lineOfText.startsWith("//") || lineOfText.startsWith(" ") ) {
continue;
}
System.out.println(lineOfText);
}

Just use a continue like -
if (lineOfText.startsWith("//")) {
continue; //would skip the loop to next iteration from here
}
Detials - What is the "continue" keyword and how does it work in Java?

If you're just interested in printing out the lines of code that begin with a "//" then you should just use the continue keyword in java.
String lineOfText = scanner.nextLine();
if (lineOfText.startsWith("//")) {
continue;
}
See this post for more information regarding the "continue" keyword.

You can just insert "else" in your code like:
public static void main(String[] args) throws FileNotFoundException {
File dataFile = new File("testfile.txt");
Scanner scanner = new Scanner(dataFile);
while(scanner.hasNext())
{
String lineOfText = scanner.nextLine();
if (lineOfText.startsWith("//")) {
System.out.println();
}
else
System.out.println(lineOfText);
}
scanner.close();
}
}

Printing the string after a string in a string in java

I have a txt file containing words and their abbreviations that looks like this
one,1
two,2
you,u
probably,prob
...
I have read the txt file into a string splitting it and replacing the commas with spaces like so..
public String shortenWord( String inWord ) {
word = inWord;
String text = "";
try {
Scanner sc = new Scanner(new File("abbreviations.txt"));
while (sc.hasNextLine()) {
text = text + sc.next().replace(",", " ") + " ";
}
// System.out.println(text);
//System.out.println(word);
if (text.contains(word)) {
System.out.println(word);
}
else {
System.out.println("nope");
}
}
catch ( FileNotFoundException e ) {
System.out.println( e );
}
return text;
}
The user must input a word that they want abbreviated and it will return the abbreviated version of the word.
class testit{
public static void main(String[] args){
Shortener sh = new Shortener();
sh.shortenWord("you");
}
}
I have it returning the word they entered if it is found but i want it to return the word next to it in the file which would be the abbreviated version.
eg. printed string 'text' looks like ..
one 1 two 2 three 3 you u probably prob hello lo
I want them to be able to enter 'you' the program find 'you' and then prints 'u' which is the next string over separated by a space

Removing the comma achieves nothing, so don't do it.
I would first split:
String[] text = sc.next().split(",");
Then compare with the first part of the split:
if (text[0].equals(word))
and if true, return the second part of the split:
return text[1];

Part of the logic looks like below:
Map<String,String> myShortHand = new HashMap<String, String>();
Scanner sc = new Scanner(new File("abbreviations.txt"));
while (sc.hasNextLine()) {
String text[] = sc.next().split(",");
myShortHand.put(text[0],text[1]);
}
Now get the details from map like
myShortHand.get("one");

Split string with three words

What is the best way to split a string containing three words?
My code looks like this right now (see below for updated code):
BufferedReader infile = new BufferedReader(new FileReader("file.txt"));
String line;
int i = 0;
while ((line = infile.readLine()) != null) {
String first, second, last;
//Split line into first, second and last (word)
//Do something with words (no help needed)
i++;
}
Here is the full file.txt:
Allegrettho Albert 0111-27543
Brio Britta 0113-45771
Cresendo Crister 0111-27440
Dacapo Dan 0111-90519
Dolce Dolly 0116-31418
Espressivo Eskil 0116-19042
Fortissimo Folke 0118-37547
Galanto Gunnel 0112-61805
Glissando Gloria 0112-43918
Grazioso Grace 0112-43509
Hysterico Hilding 0119-71296
Interludio Inga 0116-22709
Jubilato Johan 0111-47678
Kverulando Kajsa 0119-34995
Legato Lasse 0116-26995
Majestoso Maja 0116-80308
Marcato Maria 0113-25788
Molto Maja 0117-91490
Nontroppo Maistro 0119-12663
Obligato Osvald 0112-75541
Parlando Palle 0112-84460
Piano Pia 0111-10729
Portato Putte 0112-61412
Presto Pelle 0113-54895
Ritardando Rita 0117-20295
Staccato Stina 0112-12107
Subito Sune 0111-37574
Tempo Kalle 0114-95968
Unisono Uno 0113-16714
Virtuoso Vilhelm 0114-10931
Xelerando Axel 0113-89124
New code as #Pshemo suggested:
public String load() {
try {
Scanner scanner = new Scanner(new File("reg.txt"));
while (scanner.hasNextLine()) {
String firstname = scanner.next();
String lastname = scanner.next();
String number = scanner.next();
list.add(new Entry(firstname, lastname, number));
}
msg = "The file reg.txt has been opened";
return msg;
} catch (NumberFormatException ne) {
msg = ("Can't find reg.txt");
return msg;
} catch (IOException ie) {
msg = ("Can't find reg.txt");
return msg;
}
}
I receive multiple errors, what's wrong?

Assuming that each line always contains exactly three words instead of split you can simply use Scanners method next three times for each line.
Scanner scanner = new Scanner(new File("file.txt"));
int i = 0;
while (scanner.hasNextLine()) {
String first = scanner.next();
String second = scanner.next();
String last = scanner.next();
//System.out.println(first+": "+second+": "+last);
i++;
}

line.split("\\s+"); // don't use " ". use "\\s+" for more than one whitespace

Assuming the line has 3+ words, use the split(delimiter) method:
String line = ...;
String[] parts = line.split("\\s+"); // Assuming words are separated by whitespaces, use another if required
then you can access to the first, second and last respectively:
String first = parts[0];
String second = parts[1];
String last = parts[parts.length() - 1];
Remember that indexes starts with 0.

String []parts=line.split("\\s+");
System.out.println(parts[0]);
System.out.println(parts[1]);
System.out.println(parts[parts.length-1]);

Read file from txt and split string using comma

Why is it it says that there is no split method found ? I want to split one lines to several parts. But there is error. Why is that so ?
try {
Scanner a = new Scanner (new FileInputStream ("product.txt"));
while (a.hasNext()){
System.out.println(a.nextLine()); //this works correctly, all the lines are displayed
String[] temp = a.split(",");
}
a.close();
}catch (FileNotFoundException e){
System.out.println("File not found");
}

split() is not defined for Scanner but for String.
Here's a quick fix:
String line = a.nextLine();
System.out.println(line); //this works correctly, all the lines are displayed
String[] temp = line.split(",");

split method works on the String and not on the Scanner. So store the contents of
a.nextLine()
in a string like this
String line = a.nextLine();
and then use split method on this stirng
String[] temp = line.split(",");

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - find matching string from .text file - java

It seems like when you will remove the break it will solve your problem.

String line = "hello:world:hello:up"; String[] tokenarray = line.split(":"); for (String s : tokenarray) { System.out.print((s.contains("hello") ? "match" : "no match")); System.out.print(", "); } output: match, no match, match, no match,

Related

Remove stop words from file - going over it multiple times causes content duplication and does not remove the words

How to skip reading a line with scanner

Printing the string after a string in a string in java

Split string with three words

Read file from txt and split string using comma

Categories

Resources