Array IndexOutOfBoundsException on Textfile Parse - java

I have a simple textfile:
John Jobs 225 Louis Lane Road
Amy Jones 445 Road Street
Corey Dol 556 The Road
Where I have people with First, last names, and address
I'm trying to parse them like this:
public void parseText() {
try {
File file = new File("test.txt");
String[] splitted;
Scanner sc = new Scanner(file);
while (sc.hasNextLine()) {
String s = sc.nextLine();
splitted = s.split("\\s+");
System.out.println(splitted[0]);
}
sc.close();
} catch (FileNotFoundException e) {
System.out.println("Error"); }
}
splitted[0] works fine, which prints out the firstnames of the people.
splitted[1] prints out the last names, but gives me a IndexOutOfBoundsException.
spitted[2] prints out the first integer values of each address, but again gives me an exception.
So Then I tried doing this:
String[] splitted = new String[4];
and once again tried accessing any index greater than 0, but still got that problem.
What am I doing wrong?

This is your file's content :
John Jobs 225 Louis Lane Road
Amy Jones 445 Road Street
Corey Dol 556 The Road
When each line is read and split , splitted will contain 6 elements for the first run and 5 for the next runs. so if you don't use indexes carefully you'll obviously get IndexOutOfBoundsException.
Better approach would be to use a foreach loop :
while (sc.hasNextLine()) {
String s = sc.nextLine();
splitted = s.split("\\s+");
//System.out.println(Arrays.toString(splitted));
for (String string : splitted) {
System.out.print(string+" ");
}
System.out.println();
.....rest of code

Related

Java - String splitting

I read a txt with data in the following format: Name Address Hobbies
Example(Bob Smith ABC Street Swimming)
and Assigned it into String z
Then I used z.split to separate each field using " " as the delimiter(space) but it separated Bob Smith into two different strings while it should be as one field, same with the address. Is there a method I can use to get it in the particular format I want?
P.S Apologies if I explained it vaguely, English isn't my first language.
String z;
try {
BufferedReader br = new BufferedReader(new FileReader("desc.txt"));
z = br.readLine();
} catch(IOException io) {
io.printStackTrace();
}
String[] temp = z.split(" ");
If the format of name and address parts is fixed to consist of two parts, you could just join them:
String z = ""; // z must be initialized
// use try-with-resources to ensure the reader is closed properly
try (BufferedReader br = new BufferedReader(new FileReader("desc.txt"))) {
z = br.readLine();
} catch(IOException io) {
io.printStackTrace();
}
String[] temp = z.split(" ");
String name = String.join(" ", temp[0], temp[1]);
String address = String.join(" ", temp[2], temp[3]);
String hobby = temp[4];
Another option could be to create a format string as a regular expression and use it to parse the input line using named groups (?<group_name>capturing text):
// use named groups to define parts of the line
Pattern format = Pattern.compile("(?<name>\\w+\\s\\w+)\\s(?<address>\\w+\\s\\w+)\\s(?<hobby>\\w+)");
Matcher match = format.matcher(z);
if (match.matches()) {
String name = match.group("name");
String address = match.group("address");
String hobby = match.group("hobby");
System.out.printf("Input line matched: name=%s address=%s hobby=%s%n", name, address, hobby);
} else {
System.out.println("Input line not matching: " + z);
}
I can think of three solutions.
In order from best to worst:
Different delimiter
Enforce the format to always have two names, two address parts and one hobby
Have a dictionary with names and hobbies, check each word to determine which type it is and then group them together as needed.
(The 3rd option is not meant as a serious alternative.)
As others have mentioned, using spaces as both field delimiter and inside fields is problematic. You could use a regex pattern to split the line (paste (\w+ \w+) (\w+ \w+) (.+) in Regex101 for an explanation):
Pattern pattern = Pattern.compile("(\\w+ \\w+) (\\w+ \\w+) (.+)");
Matcher matcher = pattern.matcher("Bob Smith ABC Street Bowling Fishing Rollerblading");
System.out.println("matcher.matches() = " + matcher.matches());
for (int i = 0; i <= matcher.groupCount(); i++) {
System.out.println("matcher.group(" + i + ") = " + matcher.group(i));
}
This would give the following output:
matcher.matches() = true
matcher.group(0) = Bob Smith ABC Street Bowling Fishing Rollerblading
matcher.group(1) = Bob Smith
matcher.group(2) = ABC Street
matcher.group(3) = Bowling Fishing Rollerblading
However this only works for this exact format. If you get a line with three name parts for example:
John B Smith ABC Street Swimming
This will get split into John B as the name, Smith ABC as the address and Street Swimming as hobbies.
So either make 100% sure your input will always match this format or use a different delimiter.
The split() method majorly works on the 2 things:
Delimiter and
The String Object
Sometimes on limit too.
Whatever limit you will provide, the split() method will do its work according to that.
It doesn't understand whether the left substring is a name or not, same as for the right substring.
Have a look at this code snippet:
String assets = "Gold:Stocks:Fixed Income:Commodity:Interest Rates";
String[] splits = assets.split(":");
System.out.println("splits.size: " + splits.length);
for(String asset: splits){
System.out.println(assets);
}
OutPut
splits.size: 5
Gold
Stocks
Fixed Income // with space
Commodity
Interest Rates // with space
The output came with spaces because I provided the ; as a delimiter.
This probably helped you to get your answer.
Find Detailed Information on Split():
Top 5 Use cases of Split()
Java Docs : Split()
It depends on the data you're dealing with. Will the name always consist of a first and last name? Then you can simply combine the first two elements from the resulting array into a new string.
Otherwise, you might have to find a different way to separate out the different pieces within the txt file. Possibly a comma? Some character that you know won't ever be used in your normal data.
Assuming that every line follows the format
Bob Smith ABC Street Swimming
ie, name surname.... this code can manually manipulate the data for you:
String[] temp = z.split(" ");
String[] temp2 = new String[temp.length - 1];
temp2[0] = temp[0] + " " + temp[1];
for (int i = 2; i < temp.length; i++) {
temp2[i] = temp2[i];
}
temp = temp2;

How to access each element after a split

I am trying to read from a text file and split it into three separate categories. ID, address, and weight. However, whenever I try to access the address and weight I have an error. Does anyone see the problem?
import java.io.*;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.*;
class Project1
{
public static void main(String[] args)throws Exception
{
List<String> list = new ArrayList<String>();
List<String> packages = new ArrayList<String>();
List<String> addresses = new ArrayList<String>();
List<String> weights = new ArrayList<String>();
//Provide the file path
File file = new File(args[0]);
//Reads the file
BufferedReader br = new BufferedReader(new FileReader(file));
String str;
while((str = br.readLine()) != null)
{
if(str.trim().length() > 0)
{
//System.out.println(str);
//Splits the string by commas and trims whitespace
String[] result = str.trim().split("\\s*,\\s*", 3);
packages.add(result[0]);
//ERROR: Doesn't know what result[1] or result[2] is.
//addresses.add(result[1]);
//weights.add(result[2]);
System.out.println(result[0]);
//System.out.println(result[1]);
//System.out.println(result[2]);
}
}
for(int i = 0; i < packages.size(); i++)
{
System.out.println(packages.get(i));
}
}
}
Here is the text file (The format is intentional):
,123-ABC-4567, 15 W. 15th St., 50.1
456-BgT-79876, 22 Broadway, 24
QAZ-456-QWER, 100 East 20th Street, 50
Q2Z-457-QWER, 200 East 20th Street, 49
678-FGH-9845 ,, 45 5th Ave,, 12.2,
678-FGH-9846,45 5th Ave,12.2
123-A BC-9999, 46 Foo Bar, 220.0
347-poy-3465, 101 B'way,24
,123-FBC-4567, 15 West 15th St., 50.1
678-FGH-8465 45 5th Ave 12.2
Seeing the pattern in your data, where some lines start with an unneeded comma, and some lines having multiple commas as delimiter and one line not even having any comma delimiter and instead space as delimiter, you will have to use a regex that handles all these behaviors. You can use this regex which does it all for your data and captures appropriately.
([\w- ]+?)[ ,]+([\w .']+)[ ,]+([\d.]+)
Here is the explanation for above regex,
([\w- ]+?) - Captures ID data which consists of word characters hyphen and space and places it in group1
[ ,]+ - This acts as a delimiter where it can be one or more space or comma
([\w .']+) - This captures address data which consists of word characters, space and . and places it in group2
[ ,]+ - Again the delimiter as described above
([\d.]+) - This captures the weight data which consists of numbers and . and places it in group3
Demo
Here is the modified Java code you can use. I've removed some of your variable declarations which you can have them back as needed. This code prints all the information after capturing the way you wanted using Matcher object.
Pattern p = Pattern.compile("([\\w- ]+?)[ ,]+([\\w .']+)[ ,]+([\\d.]+)");
// Reads the file
try (BufferedReader br = new BufferedReader(new FileReader("data1.txt"))) {
String str;
while ((str = br.readLine()) != null) {
Matcher m = p.matcher(str);
if (m.matches()) {
System.out.println(String.format("Id: %s, Address: %s, Weight: %s",
new Object[] { m.group(1), m.group(2), m.group(3) }));
}
}
}
Prints,
Id: 456-BgT-79876, Address: 22 Broadway, Weight: 24
Id: QAZ-456-QWER, Address: 100 East 20th Street, Weight: 50
Id: Q2Z-457-QWER, Address: 200 East 20th Street, Weight: 49
Id: 678-FGH-9845, Address: 45 5th Ave, Weight: 12.2
Id: 678-FGH-9846, Address: 45 5th Ave, Weight: 12.2
Id: 123-A BC-9999, Address: 46 Foo Bar, Weight: 220.0
Id: 347-poy-3465, Address: 101 B'way, Weight: 24
Id: 678-FGH-8465, Address: 45 5th Ave, Weight: 12.2
Let me know if this works for you and if you have any query further.
The last line only contains one token. So split will only return an array with one element.
A minimal reproducing example:
import java.io.*;
class Project1 {
public static void main(String[] args) throws Exception {
//Provide the file path
File file = new File(args[0]);
//Reads the file
BufferedReader br = new BufferedReader(new FileReader(file));
String str;
while ((str = br.readLine()) != null) {
if (str.trim().length() > 0) {
String[] result = str.trim().split("\\s*,\\s*", 3);
System.out.println(result[1]);
}
}
}
}
With this input file:
678-FGH-8465 45 5th Ave 12.2
The output looks like this:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at Project1.main(a.java:22)
Process finished with exit code 1
So you will have to decide, what your program should do in such cases. You might ignore those lines, print an error, or only add the first token in one of your lists.
you can add following code in your code
if (result.length > 0) {
packages.add(result[0]);
}
if (result.length > 1) {
addresses.add(result[1]);
}
if (result.length > 2) {
weights.add(result[2]);
}

java: splitting one array into two separate arrays based on even and odd positions of the array

I'm new to Java and I'm having difficulties I have an assignment that requires me to load a text file with the name of a state followed by its capital onto the program and read the state names into one array and the capital names into another array. The way I tackled this was that I loaded the text file into one array called total and made a count. I wanted to split those with an even position to be in a separate array called capital and those in an odd position to be in an array called states. But I'm not sure how exactly to put that into code. This is what I have so far
Sample of Text File:
Alabama
Montgomery
Alaska
Juneau
Arizona
Phoenix
Arkansas
Little Rock
California
Sacramento
Colorado
Denver
Connecticut
Hartford
Delaware
Dover
Florida
Tallahassee
Georgia
Atlanta
Hawaii
Honolulu
And my code so far
public class StateCapitals
{
/**
* #param args the command line arguments
* #throws java.io.FileNotFoundException
*/
public static void main(String[] args) throws FileNotFoundException
{
File inputfile;
File outputfile;
inputfile = new File("capitals.txt");
outputfile = new File ("InOrder.txt");
String stateandcity;
int count;
count = 1;
PrintWriter pw;
Scanner kb;
kb = new Scanner(inputfile);
String [] total;
total = new String[100];
String [] capitals;
capitals = new String[50];
String [] states;
states = new String [50];
while (kb.hasNextLine())
{
stateandcity = kb.nextLine();
System.out.println("Count: " +count + " " + stateandcity);
total[count-1] = stateandcity;
count ++;
}
if (count % 2 == 0)
states = new String [50]; //where i need help
}}
The algorithm will be like this:
Read everything into total like you have already thought of.
Use a for loop to loop from i=0 to i=100 (or however many items there are to be split), incrementing by 2 each time.
Assign total[i] to capital[i / 2].
Assign total[i + 1] to states[i / 2].
It is as simple as that! Try doing it yourself first. If you are having difficulties, just leave a comment!
I would separate them while reading them like this. (Save yourself a loop)
while (kb.hasNextLine())
{
state[count] = kb.nextLine();
capitals[count] = kb.nextLine();
System.out.println("Count: " +count + " " +
state[count] + "," +
capitals[count]);
count ++;
}

Java Scanner Class useDelimiter Method

I have to read from a text file containing all the NCAA Division 1 championship games since 1933,
the file is in this format: 1939:Villanova:42:Brown:30
1945:New York University:70:Ohio State:65 **The fact that some Universities have multiple white spaces is giving me lots of trouble beause we are only to read the school names and discard the year, points and colon. I do not know if I have to use a delimiter that discards what spaces, but buttom line is I am a very lost.
We are to discard the date, points, and ":". I am slightly fimilar with the useDelimiter method but, I have read that a .split("") might be useful. I am having a great deal of problems due to my lack of knowledge in patterns.
THIS IS WHAT I HAVE SO FAR:
class NCAATeamTester
{
public static void main(String[]args)throws IOException
{
NCAATeamList myList = new NCAATeamList(); //ArrayList containing teams
Scanner in = new Scanner(new File("ncaa2012.data"));
in.useDelimiter("[A-Za-z]+"); //String Delimeter excluding non alphabetic chars or ints
while(in.hasNextLine()){
String line = in.nextLine();
String name = in.next(line);
String losingTeam = in.next(line);
//Creating team object with winning team
NCAATeamStats win = new NCAATeamStats(name);
myList.addToList(win); //Adds to List
//Creating team object with losing team
NCAATeamStats lose = new NCAATeamStats(losingTeam);
myList.addToList(lose)
}
}
}
What about
String[] spl = line.split(':');
String name1 = spl[1];
String name2 = spl[3];
?
Or, if there are more records at the same line, use regular expressions :
String line = "1939:Villanova:42:Brown:30 1945:New York University:70:Ohio State:65";
Pattern p = Pattern.compile("(.*?:){4}[0-9]+");
Matcher m = p.matcher(line);
while (m.find())
{
String[] spl = m.group().split(':');
String name = spl[1];
String name2 = spl[3];
}

InputMismatchException Error in Java

public static void readStaffsFromFile() {
String inFileName = "startup.txt";
int numStaff, staffID;
String name, address;
Staff newStaff;
boolean fileExists;
Scanner inFile = null;
File databaseFile = new File(inFileName);
fileExists = databaseFile.exists();
if (fileExists) {
try {
inFile = new Scanner(databaseFile);
} catch (FileNotFoundException fnfe) {
JOptionPane.showMessageDialog(null, "The file startup.txt has just now been deleted.");
return; // cannot do anything more.
}
numStaff = inFile.nextInt();
inFile.nextLine();
for (int i = 0; i < numStaff; i++) {
staffID = inFile.nextInt();
name = inFile.nextLine();
address = inFile.nextLine();
// try{
newStaff = new Staff(staffID, name, address);
addStaff(newStaff);
// } catch (StaffException se)
// {
// System.out.println("Unable to add staff: " + name +
// " to the system.");
// }
}
}
JOptionPane.showMessageDialog(null, "System has been set up with default data from startup.txt.");
}
I have this method and when I try to call this method from main, it gives me this error.
Exception in thread "main" java.util.InputMismatchException
at java.util.Scanner.throwFor(Scanner.java:909)
at java.util.Scanner.next(Scanner.java:1530)
at java.util.Scanner.nextInt(Scanner.java:2160)
at java.util.Scanner.nextInt(Scanner.java:2119)
at SystemStartUp.readStaffsFromFile(SystemStartUp.java:195)
at SystemStartUp.loadFromFile(SystemStartUp.java:160)
at StartUp.main(StartUp.java:9)
The error line of error states that my error starts from the line of "staffID = inFile.nextInt();"
The input file looks like this.
13
11111111
Chris Ling
999 Dandenong Road
22222222
Des Casey
100 Silly Drive
33333333
Maria Indrawan
90 Programming Road
44444444
Campbell Wilson
2/5 Database Street
55555555
Janet Fraser
21 Web Drive
66666666
Judy Sheard
34 Hos Road
77777777
Ngoc Minh
24 Message Street
88888888
Martin Atchinson
45 Martine Street
99999999
Aleisha Matthews
1/6 Admin Road
10101010
Denyse Cove
100 Reception Street
12121212
Cornelia Liou
232 Reception Road
23232323
Trudi Robinson
111 Manager Street
34343434
Henry Linger
2/4 HDR Street
Probably the staffID doesn't always contains numbers. Please check the input file.
After staffID, you have to add inFile.nextLine(); to consume the new line character after of the line with number. Otherwise, you will get the error on the second loop.

Categories

Resources