Java Scanner Split Strings by Sentences - java

I am trying to split a paragraph of text into separate sentences based on punctuation marks i.e. [.?!] However, the scanner splits the lines at the end of each new line as well, even though I've specified a particular pattern. How do I resolve this? Thanks!
this is a text file. yes the
deliminator works
no it does not. why not?
Scanner scanner = new Scanner(fileInputStream);
scanner.useDelimiter("[.?!]");
while (scanner.hasNext()) {
line = scanner.next();
System.out.println(line);
}

I don't believe the scanner splits it on line breaks, it is just your "line" variables have line breaks in them and that is why you get that output. For example, you can replace those line breaks with spaces:
(I am reading the same input text you supplied from a file, so it has some extra file reading code, but you'll get the picture.)
try {
File file = new File("assets/test.txt");
Scanner scanner = new Scanner(file);
scanner.useDelimiter("[.?!]");
while (scanner.hasNext()) {
String sentence = scanner.next();
sentence = sentence.replaceAll("\\r?\\n", " ");
// uncomment for nicer output
//line = line.trim();
System.out.println(sentence);
}
scanner.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
This is the result:
this is a text file
yes the deliminator works no it does not
why not
And if I uncomment the trim line, it's a bit nicer:
this is a text file
yes the deliminator works no it does not
why not

Related

Split comma separated values in java, int and String

I have the following in a text file to import into an ArrayList:
Australia,2
Ghana,4
China,3
Spain,1
My ArrayList is made up of objects from another class, Team which has the fields TeamName and ranking. I can get the following to import the String and int into the team name, but I can't separate the number which is supposed to be the teams ranking:
public void fileReader()
{
try
{
String filename = "teams.txt";
FileReader inputFile = new FileReader(filename);
Scanner parser = new Scanner(inputFile);
for (Team teams : teams)
{
teams.setTeamName(parser.next());
teams.setRanking(parser.next()); //this doesn't work
}
}
catch (IOException e)
{
System.out.println("Cannot find file");
}
}
I'm guessing I have to use a split somewhere along the line, or convert a String to an integer??
Check out opencsv. It's 2018 and you shouldn't have to parse a text file yourself :).
By default scanner will use white space as delimiter
Override this by calling useDelimiter method in your case parser.useDelimiter(',');
Then for converting ranking string to int you parser.nextInt()
You can code something like below to suite your purpose.
You have two tokens in your use case i.e. comma (,) and new line (\n). As a result, next() can't be used in a straight forward way.
I am going over each line, then tokenizing each line on comma and finally getting subsequent tokens.
try
{
String filename = "teams.txt";
FileReader inputFile = new FileReader(filename);
Scanner parser = new Scanner(inputFile);
for (Team teams : teams)
{
String[] splitLine = sc.nextLine().split(","); // comma delimited array
teams.setTeamName(splitLine[0]);
teams.setRanking(splitLine[1]);
}
}
Scanner.next() read the next token from input stream, and give String.
If you want to read the next integer, you should use nextInt() instead:
teams.setRanking(parser.nextInt());
Edit
You got InputMismatchException because by default, Scanner use java whitespace as delimeter.
WHITESPACE_PATTERN = Pattern.compile("\\p{javaWhitespace}+")
In your case, the delimeter are comma , and new line \n so you should config the delimeter for your scanner:
Scanner parser = new Scanner(inputFile);
s.useDelimiter(",|\\n")
Another work around is to read the whole line and parse your line:
String line = parse.nextLine();
String[] parts = line.split(",");
team.setTeamName(parts[0]);
team.setRanking(Integer.parse(parts[1]));
You can choose one of the two solutions above

change specific text in text file with scanner class (java)

I write this code that can search for the some specific text (such as word) in the text file with scanner class, but i want also to replace (old text to the new text) in the same old text locuation.
i find in the internet that i must used replaceAll method like ( replaceAll(old, new); )
but it does't work with the scanner class.
This is my code, it just search (if it existed ) write new text in new line without change the old one.
Do i need to change the method (to get the data) form scanner to FileReader ??
File file = new File("C:\\Users....file.txt");
Scanner input = new Scanner(System.in);
System.out.println("Enter the content you want to change:");
String Uinput = input.nextLine();
System.out.println("You want to change it to:");
String Uinput2 = input.nextLine();
Scanner scanner = new Scanner(file).useDelimiter(",");
BufferedWriter writer = new BufferedWriter(new FileWriter(file, true));
while (scanner.hasNextLine()) {
String lineFromFile = scanner.next();
if (lineFromFile.contains(Uinput)) {
lineFromFile = Uinput2;
writer.write(lineFromFile);
writer.close();
System.out.println("changed " + Uinput + " tO " + Uinput2);
break;
}
else if (!lineFromFile.contains(Uinput)){
System.out.println("Don't found " + Uinput);
break;
}
}
You cannot read from a file, then write to that same file. You need 2 different files.
while (read line from input file) {
if (NOT matches your search pattern)
write line to output file.
else { // matches
write start of line to your search pattern.
write your replace string
write from end of search pattern to end of line.
}
}
Unless your replace string is the same size as your search string, yes, you'll have to use 2 files. Consider the file:
Blah
Blah
Blah
Now replace the letter 'a' with "The quick Brown Fox". If you replace the first line, you've overwritten the rest of the file. Now you can't read the 2nd line, so YES, you'll have to use 2 files.
Here's another answer based on #Sedrick comment and your code.
I'm adding it to your pseudo code.
File file = new File("C:\\Users....file.txt");
Scanner input = new Scanner(System.in);
System.out.println("Enter the content you want to change:");
String Uinput = input.nextLine();
System.out.println("You want to change it to:");
String Uinput2 = input.nextLine();
Scanner scanner = new Scanner(file).useDelimiter(",");
java.util.List<String> tempStorage = new ArrayList<>();
while (scanner.hasNextLine()) {
String lineFromFile = scanner.next();
tempStorage.add(lineFromFile);
}
// close input file here
// Open your write file here (same file = overwrite).
// now loop through temp storage searching for input string.
for (String currentLine : tempStorage ) {
if (!lcurrentLine.contains(Uinput)){
String temp = currentLine.replace(Uinput, Uinput2);
write a line using temp variable
} else { // not replaced
write a line using currentLine;
}
// close write file here
By the way, you'll have to encase the reads writes with try catch to trap for IOExceptions. That's how I knew it was pseudo code. There are plenty of examples for reading/writing a file on this web site. It's easy to search for.

java.util.Scanner doesn't read empty line

I wrote a .txt file in which each line has a meaning - even an empty one. Scanner's methods next() and nextLine() do not recognize the empty line and jump right to the line with text. I'm wondering if there is a way for the scanner to consider all lines of text regardless the content.
I don't want to use BufferedReader because I'm working with very small tokens each time.
static final String fileName = "temp.txt";
try {
//System.out.println(Jsoup.connect(url).get());
Document document = Jsoup.connect(url).get();
FileWriter fileWriter = new FileWriter(fileName);
BufferedWriter bufferedWriter = new BufferedWriter(fileWriter);
Elements names = document.select("[id^=CZ]");
for (Element name : names) {
bufferedWriter.write(name.text());
bufferedWriter.write(System.lineSeparator() + System.lineSeparator());
System.out.println(name.text() + '\n');
}
bufferedWriter.close();
Scanner in = new Scanner(new File(fileName));
in.next();
String s = names.first().text();
String h = in.next();
...
At this point Strings s & h should be equal.
The document the scanner is reading starts with an empty line and goes like this:
asdkjasjkdajkdahkdjahdjadhkahdajkdajkdsasdkjasjkdajkdahkdjahdjadhkahdajkdajkdsasdkjasjkdajkdahkdjahdjadhkahdajkdajkdsasdkjasjkdajkdahkdjahdjadhkahdajkdajkdsasdkjasjkdajkdahkdjahdjadhkahdajkdajkdsasdkjasjkdajkdahkdjahdjadhkahdajkdajkdsasdkjasjkdajkdahkdjahdjadhkahdajkdajkdsasdkjasjkdajkdahkdjahdjadhkahdajkdajkdsasdkjasjkdajkdahkdjahdjadhkahdajkdajkdsasdkjasjkdajkdahkdjahdjadhkahdajkdajkdsasdkjasjkdajkdahkdjahdjadhkahdajkdajkdsasdkjasjkdajkdahkdjahdjadhkahdajkdajkds
Again, I have a dynamic file that might have first line empty and when I compare String s with String h they DO NOT equal. nextLine() and next() skip over the first line while it is still a valid element.
nextLine() is the method that you need. Unlinke next(), it does not skip ahead through newlines and white space.
Run this example (demo)
Scanner sc = new Scanner(System.in);
while (sc.hasNextLine()) {
String s = sc.nextLine();
System.out.println("'"+s+"'");
}
on input with empty lines to see that these lines are preserved:
'quick brown'
''
'fox jumps'
'over'
''
'the'
''
'lazy dog'
next() method reads tokens seperated by whitespaces or newline characters on other hand nextLine() reads lines seperated by newline charater.
You can try this:
Scanner scan = new Scanner(file);
while(scan.hasNextLine()){
System.out.println(scan.nextLine());
}

how to read the \n in a txt file for using in split

I have this code here
java.io.File file=new java.io.File("deneme2.txt");
try{
Scanner input=new Scanner(file);
while(input.hasNext()){
String inputFile= input.nextLine();
String[] sequences =inputFile.split(" ");
It reads the file but I have to edit each file since I can not read .txt when the input is like this
ATGAGATACG
AGTCTCTAG
but I can read when I make
ATGAGATACG AGTCTCTAG
I tried to make \n and something like that but I couldn't.
So can you guys help me.
AND I know for sure that it has a very simple solution :) a solution that I'm not aware of tho
edit:in first example the 2 sequences are divided with a shift enter but the second one is divided with a single space
It sounds like you want to make the code that reads the file independent of the file format. To some extent, that's not possible. Any program has to assume some kind of pattern to the input -- be it XML, delimited text etc. So that breaks it up into two approaches: Either make the file fit the code or make the code fit the file.
From your description, I'm guessing you want to be able to read a sequence of characters that is delimited by whitespace -- any whitespace (' ', '\n', '\t'), yes? If that's true, don't limit yourself to reading by line. Just read each token. This, of course, assumes each token is what you want.
I created a test file with the content
abcd efg h
ijklm op
qrs
That has newlines, spaces and tabs. Then I fed it to the following code:
public static void main(String[] args){
try{
Scanner scanner = new Scanner(new File("testFile.txt"));
List<String> list = new ArrayList<String>();
while(scanner.hasNext()){
String s = scanner.next();
list.add(s);
}
scanner.close();
System.out.println(list);
}catch(FileNotFoundException e){
e.printStackTrace();
}
}
Which gives the output
[abcd, efg, h, ijklm, op, qrs]
Is it possible you want to create an array of sequences? Like you want this file
ATGAGATACG <-- each of these being a sequence
AGTCTCTAG
to become an array like this
String[] sequences = {"ATGAGATACG", "AGTCTCTAG"};
If that's the case, you can just do something like this
List<String> sequences = new ArrayList<String>(); <-- create a list
java.io.File file=new java.io.File("deneme2.txt");
try{
Scanner input = new Scanner(file);
while(input.hasNextLine()){
sequences.add(input.nextLine().trim()); <-- add to the list each line
}
Edit
If its only two lines why not just do this, and forget the loop
String s1;
String s2;
try {
Scanner input = new Scanner(file);
s1 = input.nextLine().trim();
s2 = input.nextLine().trim();
} catch(.. ){
}
// do something with s1
// do something with s2

Scanner nextline() only printing new lines

I'm trying to use scanner to print lines from a text file, but it only prints first line before printing only new lines until while loop goes through file.
String line;
File input = new File("text.txt");
Scanner scan = new Scanner(input);
while (scan.hasNext()) //also does not work with hasNextLine(), but additional error
{
line = scan.nextLine();
System.out.println(line);
//other code can see what is in the string line, but output from System.out.println(line); is just a new line
}
How can I get System.out.println() to work with this code?
This is the Javadoc for nextLine()
Advances this scanner past the current line and returns the input that was skipped. This method returns the rest of the current line, excluding any line separator at the end. The position is set to the beginning of the next line.
You want next() instead:
Finds and returns the next complete token from this scanner. A complete token is preceded and followed by input that matches the delimiter pattern. This method may block while waiting for input to scan, even if a previous invocation of hasNext() returned true.
Your code becomes:
while (scan.hasNext())
{
line = scan.next();
System.out.println(line);
}
You may use .next() method:
String line;
File input = new File("text.txt");
Scanner scan = new Scanner(input);
while (scan.hasNext()) //also does not work with hasNextLine(), but additional error
{
line = scan.next();
System.out.println(line);
}

Categories

Resources