Recognizing empty lines in a text? - java

Im taking input from a separate file which currently has one paragraph. Im storing every word in the paragraph into a list and then iterating over each of them using this:
for (String word: words)
However, this iterator goes over each WORD. If I have two paragraphs in my input file which are separated by an empty line, how do I recognize that empty line under this for-loop iterator? My thinking is that iterating over words is obviously different from going over lines, so Im not sure.

An empty line follows the pattern:
\n\n
\r\n\r\n
\n -whitespace- \n
etc
A word following the pattern
-whitespace-nonwhitespace-whitespace-
Very different patterns. So looping over something using the definition of a word will never work.
You can use Java scanner to look at a file line by line.
public class LineScanner {
public List<String> eliminateEmptyLines(String input) {
scanner Scanner = new Scanner(input);
ArrayList<String> output = new ArrayList<>();
while (scanner.hasNextLine) {
String line = scanner.nextLine;
boolean isEmpty = line.matches("^\s*$");
if !(isEmpty) {
output.add(line);
}
}
return output;
}
}
Here's how the regex in String.matches works: How to check if a line is blank using regex
Here's the javadoc on Scanner: http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Scanner.html

Related

Split String between lines with regex

I want to split the file in the picture and i can't because of the lines...The output i get is all together.What is the right command in regex to split even if there is a new line ?
You are reading the file using a Scanner to get each line.
while(sc.hasNextLine())
You append the StringBuffer with each line (without a separators).
sb.append(sc.nextLine())
So at that point, a file with :
foo
bar
Would give a StringBuffer with
sb.toString(); //foobar
So there is not line separator left... Instead, do that with the Scanner.
while(sc.hasNextLine()){
String s = sc.nextLine();
System.out.println(s):
sb.append(s);
}
Last thing, please close every Scanner you open with sc.close() when you are done with them (that's a good practice to have).
EDIT:
If you want to keep that logic, simply append with a line separator after each lines:
while(sc.hasNextLine()){
sb.append(sc.nextLine()).append("\n");
}
That way, you will always have a line separator and the split will work
Quick example of what this would look like :
StringBuffer sb = new StringBuffer();
sb.append("foo").append("\n")
.append("bar").append("\n"); //fake "file reading"
String[] result = sb.toString().split("\n");
for(String s : result){
System.out.println(s);
}
foo
bar
This should do:
protasi[] = string.split("\\r?\\n");
split("\n") won't work because you need to provide a back-slash for \n which becomes \\n. I'd also added \\r which is what we should use for newline/carriage return.

Split a sentence ignoring characters in Java

I Want to write a program that reads one line of input text and breaks it up into words.
The (solution)
words should be output one per line. A word is defined to be a sequence of letters.
Any characters in the input that are not letters should be discarded.
For example, if the user inputs the line:
He said, "That’s not a good idea."
then the output of the program should be:
He
said
That
‘s
not
a
good
idea
Simply use a regex
Pattern pattern = Pattern.compile("[\\w'’]+");
Matcher matcher = pattern.matcher("He said, \"That’s not a good idea.\"");
while (matcher.find())
System.out.println(matcher.group());
Try this:
public class Main {
public static void main(String[] args) {
Scanner stdIn = new Scanner(System.in); // user input
String line = stdIn.nextLine(); // read line
String[] words = line.split("[^a-zA-Z]+"); // split by all non-alphabetic characters (a regex)
for (String word : words) { // iterate through the words
System.out.println(word); // print word with a newline
}
}
}
It won't include the apostrophe in the token 's, but I don't know why you included that. It's not a letter, after all, and I read your first bold sentence. I hope the comments help explain how it works. There will be a trailing empty line, but that should be easy for you to fix if you really need to.

Getting scanner to read text file

I am trying to use a scanner to read a text file pulled with JFileChooser. The wordCount is working correctly, so I know it is reading. However, I cannot get it to search for instances of the user inputted word.
public static void main(String[] args) throws FileNotFoundException {
String input = JOptionPane.showInputDialog("Enter a word");
JFileChooser fileChooser = new JFileChooser();
fileChooser.showOpenDialog(null);
File fileSelection = fileChooser.getSelectedFile();
int wordCount = 0;
int inputCount = 0;
Scanner s = new Scanner (fileSelection);
while (s.hasNext()) {
String word = s.next();
if (word.equals(input)) {
inputCount++;
}
wordCount++;
}
You'll have to look for
, ; . ! ? etc.
for each word. The next() method grabs an entire string until it hits an empty space.
It will consider "hi, how are you?" as the following "hi,", "how", "are", "you?".
You can use the method indexOf(String) to find these characters. You can also use replaceAll(String regex, String replacement) to replace characters. You can individuality remove each character or you can use a Regex, but those are usually more complex to understand.
//this will remove a certain character with a blank space
word = word.replaceAll(".","");
word = word.replaceAll(",","");
word = word.replaceAll("!","");
//etc.
Read more about this method:
http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#replaceAll%28java.lang.String,%20java.lang.String%29
Here's a Regex example:
//NOTE: This example will not work for you. It's just a simple example for seeing a Regex.
//Removes whitespace between a word character and . or ,
String pattern = "(\\w)(\\s+)([\\.,])";
word = word.replaceAll(pattern, "$1$3");
Source:
http://www.vogella.com/articles/JavaRegularExpressions/article.html
Here is a good Regex example that may help you:
Regex for special characters in java
Parse and remove special characters in java regex
Remove all non-"word characters" from a String in Java, leaving accented characters?
if the user inputed text is different in case then you should try using equalsIgnoreCase()
in addition to blackpanthers answer you should also use trim() to account for whitespaces.as
"abc" not equal to "abc "
You should take a look at matches().
equals will not help you, since next() doesn't return the file word by word,
but rather whitespace (not comma, semicolon, etc.) separated token by token (as others mentioned).
Here the java docString#matches(java.lang.String)
...and a little example.
input = ".*" + input + ".*";
...
boolean foundWord = word.matches(input)
. is the regex wildcard and stands for any sign. .* stands for 0 or more undefined signs. So you get a match, if input is somewhere in word.

Splitting strings with spaces

So I take in a line from a .txt file and turn it into a string. I would like to split the string up by |, but I also have spaces before and after it that is messing with the code, here is what I have so far:
File file = new File(fileLocation);
Scanner sc = new Scanner(file);
String line;
String[] words;
while(sc.hasNext()){
line = sc.next();
words = line.split("\\|");
this.german.add(words[0]);
this.english.add(words[1]);
}
An example line would be something like: in blue|in blau
I would also like to keep the spaces.
The .txt file would be:
in Rot|in red
in Blau|in blue
in Grun|in green
in Gelb|in Yellow
It would add all the items on the left of the | to the german list, and all of the ones on the right to the english list.
Ah, figured it out, the sc.next() is the next String, not the next line, I replaced it with sc.nextLine() and everything worked, thanks.
Call
line.replaceAll(" ", "");
beforehand; this will get rid of all the spaces. If you only want leading and trailing spaces from the split strings removed, use
words[i].trim()
instead.
Use the following pattern:
words = line.split("\\s+\\|\\s+");

How to scan for words in Java excluding punctuation

I'm trying to use the scanner class to parse all the words in a file. The file contains common text, but I only want to take the words excluding all the puntuation.
The solution I have until now is not complete but is already giving me some problem:
Scanner fileScan= new Scanner(file);
String word;
while(fileScan.hasNext("[^ ,!?.]+")){
word= fileScan.next();
this.addToIndex(word, filename);
}
Now if I use this on a sentence like "hi my name is mario!" it returns just "hi", "my", "name" and "is". It's not matching "mario!" (obviously) but it's not matching "mario", like I think it should.
Can you explain why is that and helping me find a better solution if you have one?
Thank you
This works:
import java.util.*;
class S {
public static void main(String[] args) {
Scanner fileScan= new Scanner("hi my name is mario!").useDelimiter("[ ,!?.]+");
String word;
while(fileScan.hasNext()){
word= fileScan.next();
System.out.println(word);
}
} // end of main()
}
javac -g S.java && java S
hi
my
name
is
mario
Since you want to get rid of the punctuation, you can simply replace all punctuation marks before adding to the index:
word = word.replaceAll("\\{Punct}", "");
In the case of hypens, or other isolated punctuation marks, you just check if word.isEmpty() before adding.
Of course, you'd have to get rid of your custom delimiter.

Categories

Resources