Split string with three words

Split string with three words - java

What is the best way to split a string containing three words?
My code looks like this right now (see below for updated code):
BufferedReader infile = new BufferedReader(new FileReader("file.txt"));
String line;
int i = 0;
while ((line = infile.readLine()) != null) {
String first, second, last;
//Split line into first, second and last (word)
//Do something with words (no help needed)
i++;
}
Here is the full file.txt:
Allegrettho Albert 0111-27543
Brio Britta 0113-45771
Cresendo Crister 0111-27440
Dacapo Dan 0111-90519
Dolce Dolly 0116-31418
Espressivo Eskil 0116-19042
Fortissimo Folke 0118-37547
Galanto Gunnel 0112-61805
Glissando Gloria 0112-43918
Grazioso Grace 0112-43509
Hysterico Hilding 0119-71296
Interludio Inga 0116-22709
Jubilato Johan 0111-47678
Kverulando Kajsa 0119-34995
Legato Lasse 0116-26995
Majestoso Maja 0116-80308
Marcato Maria 0113-25788
Molto Maja 0117-91490
Nontroppo Maistro 0119-12663
Obligato Osvald 0112-75541
Parlando Palle 0112-84460
Piano Pia 0111-10729
Portato Putte 0112-61412
Presto Pelle 0113-54895
Ritardando Rita 0117-20295
Staccato Stina 0112-12107
Subito Sune 0111-37574
Tempo Kalle 0114-95968
Unisono Uno 0113-16714
Virtuoso Vilhelm 0114-10931
Xelerando Axel 0113-89124
New code as #Pshemo suggested:
public String load() {
try {
Scanner scanner = new Scanner(new File("reg.txt"));
while (scanner.hasNextLine()) {
String firstname = scanner.next();
String lastname = scanner.next();
String number = scanner.next();
list.add(new Entry(firstname, lastname, number));
}
msg = "The file reg.txt has been opened";
return msg;
} catch (NumberFormatException ne) {
msg = ("Can't find reg.txt");
return msg;
} catch (IOException ie) {
msg = ("Can't find reg.txt");
return msg;
}
}
I receive multiple errors, what's wrong?

Assuming that each line always contains exactly three words instead of split you can simply use Scanners method next three times for each line.
Scanner scanner = new Scanner(new File("file.txt"));
int i = 0;
while (scanner.hasNextLine()) {
String first = scanner.next();
String second = scanner.next();
String last = scanner.next();
//System.out.println(first+": "+second+": "+last);
i++;
}

line.split("\\s+"); // don't use " ". use "\\s+" for more than one whitespace

Assuming the line has 3+ words, use the split(delimiter) method:
String line = ...;
String[] parts = line.split("\\s+"); // Assuming words are separated by whitespaces, use another if required
then you can access to the first, second and last respectively:
String first = parts[0];
String second = parts[1];
String last = parts[parts.length() - 1];
Remember that indexes starts with 0.

String []parts=line.split("\\s+");
System.out.println(parts[0]);
System.out.println(parts[1]);
System.out.println(parts[parts.length-1]);

Related

How can I take a String with a name and two values and separate it into a String containing the name and 2 doubles containing the values?

I've written a class for a program designed to help manage a volleyball team's roster. The roster is contained in a .dat file and the players are written as follows:
Rachael Adams 3.36 1.93
My issue arises when I try to separate this string into the proper data types (the name being a string, then the first and second values being doubles for the stats).
public Roster(String filename) {
players = new ArrayList<Player>();
try {
FileReader fr = new FileReader(filename);
BufferedReader inFile = new BufferedReader (fr);
String line = inFile.readLine();
Scanner scan = new Scanner(line);
while(line != null) {
String firstName = scan.next();
String lastName = scan.next();
double attackStat = scan.nextDouble();
double blockStat = scan.nextDouble();
String name = firstName + " " + lastName;
Player newPlayer = new Player(name, attackStat, blockStat);
players.add(newPlayer);
line = inFile.readLine();
}
scan.close();
inFile.close();
} catch (IOException e) {
System.out.println(e);
}
}
The program throws this exception when a Roster object is created
Exception in thread "main" java.util.InputMismatchException
at java.base/java.util.Scanner.throwFor(Scanner.java:939)
at java.base/java.util.Scanner.next(Scanner.java:1594)
at java.base/java.util.Scanner.nextDouble(Scanner.java:2564)
at Roster.<init>(Roster.java:30)
at Assignment08.openRosterFile(Assignment08.java:59)
at Assignment08.main(Assignment08.java:18)
I am newer to Java and still facing a learning curve, so if there is more information needed then please let me know.
If at all possible, I would greatly appreciate an explanation as to what I did wrong rather than just a solution. Thank you very much.

I always find it easier to split the line:
String[] columns = line.split(" (?=\\d)";
String name = columns[0];
double attackStat = Double.parseDouble(columns[1]);
double blockStat = Double.parseDouble(columns[2]);
This works by splitting on a space, but only when the next char is a digit via the look ahead (?=\d).
This automatically caters for any number of words in the name.

Why doesn't my program recognize the last names properly?

The scanner reads the wrong data, the text file format is:
111，Smith，Sam, 40，10.50
330，Jones，Jennifer，30，10.00
The program is:
public class P3 {
public static void main(String[] args) {
String file=args[0];
File fileName = new File(file);
try {
Scanner sc = new Scanner(fileName).useDelimiter(", ");
while (sc.hasNextLine()) {
if (sc.hasNextInt( ) ){ int id = sc.nextInt();}
String lastName = sc.next();
String firstName = sc.next();
if (sc.hasNextInt( ) ){ int hours = sc.nextInt(); }
if (sc.hasNextFloat()){ float payRate=sc.nextFloat(); }
System.out.println(firstName);
}
sc.close();
} catch(FileNotFoundException e) {
System.out.println("Can't open file "
+ fileName + " ");
}
}
}
The output is:
40，10.50
330，Jones，Jennifer，30，10.00
It is supposed to be:
Sam
Jennifer
How do I fix it?

The problem is that your data isn't just delimited by commas. It is also delimited by line-endings, and also by Unicode character U+FF0C (FULLWIDTH COMMA).
I took your code, replaced the line
Scanner sc = new Scanner(fileName).useDelimiter(", ");
with
Scanner sc = new Scanner(fileName, "UTF-8").useDelimiter(", |\r\n|\n|\uff0c");
and then ran it. It produced the output it was supposed to.
The text , |\r\n|\n|\uff0c is a regular expression that matches either:
a comma followed by a space,
a carriage-return (\r) followed by a newline (\n),
a newline on its own,
a Unicode full-width comma (\uff0c).
These are the characters we want to delimit the text by. I've specified both types of line-ending as I'm not sure which line-endings your file uses.
I've also set the scanner to use the UTF-8 encoding when reading from the file. I don't know whether that will make a difference for you, but on my system UTF-8 isn't the default encoding so I needed to specify it.

First, please swap fileName and file. Next, I suggest you use a try-with-resources. Your variables need to be at a common scope if you intend to use them. Finally, when using hasNextLine() I would then call nextLine and you can split on optional white space and comma. That could look something like
String fileName = // ...
File file = new File(fileName);
try (Scanner sc = new Scanner(file)) {
while (sc.hasNextLine()) {
String line = sc.nextLine();
String[] arr = line.split("\\s*,\\s*");
int id = Integer.parseInt(arr[0]);
String lastName = arr[1];
String firstName = arr[2];
int hours = Integer.parseInt(arr[3]);
float payRate = Float.parseFloat(arr[4]);
System.out.println(firstName);
}
} catch (FileNotFoundException e) {
System.out.println("Can't open file " + fileName + " ");
e.printStackTrace();
}

Java - take name from string

I'm developing a Java application that make some statistic stuff.
This application take all data from a .txt file which is supplied by the user.
The first line of that file contains the name of the sets of data that follows like this:
velx,vely,velz
//various data
I need to analyze that first line and retrieve the three name of variables, I correctly get the first two but I'm not able to get the last one.
There the code to get names:
public ArrayList<String> getTitle(){
// the ArrayList originally is not here but in the class intestation
// I copied it here to simplify code's understanding
ArrayList<String> title = new ArrayList<String>();
try {
InputStreamReader isr = new InputStreamReader(in);
BufferedReader br = new BufferedReader(isr);
StringBuilder sb = new StringBuilder();
int titleN = 0;
String line = br.readLine(); //read the first line of file
String temp;
System.out.println(ManageTable.class.getName() + " Line: " + line);
int c = line.length();
for(int i = 0; i <c; i++){
if((line.charAt(i) == ',') || **ANOTHER CONDITION** ){
temp = sb.toString();
System.out.println(ManageTable.class.getName() +" Temp is: " + temp);
title.add(temp);
System.out.println(ManageTable.class.getName() + " Title added");
sb.delete(0, sb.length());
}else{
sb.append(line.charAt(i));
}
}
} catch (IOException ex) {
Logger.getLogger(ManageTable.class.getName()).log(Level.SEVERE, null, ex);
}
return title;
}
I need to add a second condition to the if statement in order to find out when the line is ended and save the last name, even if its not followed by ','
I tried using:
if((line.charAt(i) == ',') || (i==c))
but from the name I get, always miss a character.
How can I check the end of the line and so get the full name?

If line contains just three names separated by comma, you can do
String[] names = line.split(",");

No need for all this looping. You can just split the line around the comma to get an array:
String[] names = line.split(",");

removing stop words after tokenizing string in java

I want to remove stop word after tokenizing string. I have external file .txt and read it then compare it to the tokenized string. if the tokenized word is equal with the stop word, then remove it.
here is the code for tokenizing
try{
while ((msg =readBufferData.readLine()) != null) {
int numberOfTokens;
System.out.println("Before: "+msg);
StringTokenizer tokens = new StringTokenizer(msg);
numberOfTokens = tokens.countTokens();
System.out.println("Tokens: "+numberOfTokens);
System.out.print("After : ");
while (tokens.hasMoreTokens()) {
msg = tokens.nextToken();
String msgLower = msg.toLowerCase();
String punctuationremove = punctuationRemover(msgLower);
// buffWriter.write(punctuationremove+" "); --> write into file .txt
System.out.print(punctuationremove+" ");
removingStopWord(punctuationremove, readStopWordsFile());
numberOfTotalTokens++;
}
// buffWriter.newLine(); make a new line after tokening new message
System.out.println("\n");
numberOfMessages++;
}
// write close buffWriter.close();
System.out.println("Total Tokens: "+numberOfTotalTokens);
System.out.println("Total Messages: "+numberOfMessages);
}
catch (Exception e){
System.out.println("Error Exception: "+e.getMessage());
}
Then I have a code for reading the stop word file
public static Set<String> readStopWordsFile() throws FileNotFoundException, IOException{
String fileStopWords = "\\stopWords.txt";
Set<String> stopWords = new LinkedHashSet<String>();
FileReader readFileStopWord = new FileReader(fileStopWords);
BufferedReader stopWordsFile = new BufferedReader(readFileStopWord);
String line;
while((line = stopWordsFile.readLine())!=null){
line = line.trim();
stopWords.add(line);
}
stopWordsFile.close();
return stopWords;
}
How can I compare the token with the set of stop word and delete the token that same with the stop word. Can you help me, thank you

You can simply read the stop words first and then check whether your token is a stopword.
Set<String> stopWords = readStopWordsFile();
// some file reading logic
while (tokens.hasMoreTokens()) {
msg = tokens.nextToken();
if(stopWords.contains(msg)){
continue; // skip over a stopword token
}
}

Read multiline text with values separated by whitespaces

I have a following test file :
Jon Smith 1980-01-01
Matt Walker 1990-05-12
What is the best way to parse through each line of this file, creating object with (name, surname, birthdate) ? Of course this is just a sample, the real file has many records.

import java.io.*;
class Record
{
String first;
String last;
String date;
public Record(String first, String last, String date){
this.first = first;
this.last = last;
this.date = date;
}
public static void main(String args[]){
try{
FileInputStream fstream = new FileInputStream("textfile.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
while ((strLine = br.readLine()) != null) {
String[] tokens = strLine.split(" ");
Record record = new Record(tokens[0],tokens[1],tokens[2]);//process record , etc
}
in.close();
} catch (Exception e){
System.err.println("Error: " + e.getMessage());
}
}
}

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class ScannerReadFile {
public static void main(String[] args) {
//
// Create an instance of File for data.txt file.
//
File file = new File("tsetfile.txt");
try {
//
// Create a new Scanner object which will read the data from the
// file passed in. To check if there are more line to read from it
// we check by calling the scanner.hasNextLine() method. We then
// read line one by one till all line is read.
//
Scanner scanner = new Scanner(file);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
System.out.println(line);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}
This:
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
Could also be changed to
while (scanner.hasNext()) {
String line = scanner.next();
Which will read whitespace.
You could do
Scanner scanner = new Scanner(file).useDelimiter(",");
To do a custom delimiter
At the time of the post, now you have three different ways to do this. Here you just need to parse the data you need. You could read the the line, then split or read one by one and everything 3 would a new line or a new person.

At first glance, I would suggest the StringTokenizer would be your friend here, but having some experience doing this for real, in business applications, what you probably cannot guarantee is that the Surname is a single name (i.e. someone with a double barrelled surname, not hyphenated would cause you problems.
If you can guarantee the integrity of the data then, you code would be
BufferedReader read = new BufferedReader(new FileReader("yourfile.txt"));
String line = null;
while( (line = read.readLine()) != null) {
StringTokenizer tokens = new StringTokenizer(line);
String firstname = tokens.nextToken();
...etc etc
}
If you cannot guarantee the integrity of your data, then you would need to find the first space, and choose all characters before that as the last name, find the last space and all characters after that as the DOB, and everything inbetween is the surname.

Use a FileReader for reading characters from a file, use a BufferedReader for buffering these characters so you can read them as lines. Then you have a choice.. Personally I'd use String.split() to split on the whitespace giving you a nice String Array, you could also tokenize this string.
Of course you'd have to think about what would happen if someone has a middle name and such.

Look at BufferedReader class. It has readLine method. Then you may want to split each line with space separators to construct get each individual field.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Split string with three words - java

line.split("\\s+"); // don't use " ". use "\\s+" for more than one whitespace

String []parts=line.split("\\s+"); System.out.println(parts[0]); System.out.println(parts[1]); System.out.println(parts[parts.length-1]);

Related

How can I take a String with a name and two values and separate it into a String containing the name and 2 doubles containing the values?

Why doesn't my program recognize the last names properly?

Java - take name from string

removing stop words after tokenizing string in java

Read multiline text with values separated by whitespaces

Categories

Resources