I am trying to read from a text file and split it into three separate categories. ID, address, and weight. However, whenever I try to access the address and weight I have an error. Does anyone see the problem?
import java.io.*;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.*;
class Project1
{
public static void main(String[] args)throws Exception
{
List<String> list = new ArrayList<String>();
List<String> packages = new ArrayList<String>();
List<String> addresses = new ArrayList<String>();
List<String> weights = new ArrayList<String>();
//Provide the file path
File file = new File(args[0]);
//Reads the file
BufferedReader br = new BufferedReader(new FileReader(file));
String str;
while((str = br.readLine()) != null)
{
if(str.trim().length() > 0)
{
//System.out.println(str);
//Splits the string by commas and trims whitespace
String[] result = str.trim().split("\\s*,\\s*", 3);
packages.add(result[0]);
//ERROR: Doesn't know what result[1] or result[2] is.
//addresses.add(result[1]);
//weights.add(result[2]);
System.out.println(result[0]);
//System.out.println(result[1]);
//System.out.println(result[2]);
}
}
for(int i = 0; i < packages.size(); i++)
{
System.out.println(packages.get(i));
}
}
}
Here is the text file (The format is intentional):
,123-ABC-4567, 15 W. 15th St., 50.1
456-BgT-79876, 22 Broadway, 24
QAZ-456-QWER, 100 East 20th Street, 50
Q2Z-457-QWER, 200 East 20th Street, 49
678-FGH-9845 ,, 45 5th Ave,, 12.2,
678-FGH-9846,45 5th Ave,12.2
123-A BC-9999, 46 Foo Bar, 220.0
347-poy-3465, 101 B'way,24
,123-FBC-4567, 15 West 15th St., 50.1
678-FGH-8465 45 5th Ave 12.2
Seeing the pattern in your data, where some lines start with an unneeded comma, and some lines having multiple commas as delimiter and one line not even having any comma delimiter and instead space as delimiter, you will have to use a regex that handles all these behaviors. You can use this regex which does it all for your data and captures appropriately.
([\w- ]+?)[ ,]+([\w .']+)[ ,]+([\d.]+)
Here is the explanation for above regex,
([\w- ]+?) - Captures ID data which consists of word characters hyphen and space and places it in group1
[ ,]+ - This acts as a delimiter where it can be one or more space or comma
([\w .']+) - This captures address data which consists of word characters, space and . and places it in group2
[ ,]+ - Again the delimiter as described above
([\d.]+) - This captures the weight data which consists of numbers and . and places it in group3
Demo
Here is the modified Java code you can use. I've removed some of your variable declarations which you can have them back as needed. This code prints all the information after capturing the way you wanted using Matcher object.
Pattern p = Pattern.compile("([\\w- ]+?)[ ,]+([\\w .']+)[ ,]+([\\d.]+)");
// Reads the file
try (BufferedReader br = new BufferedReader(new FileReader("data1.txt"))) {
String str;
while ((str = br.readLine()) != null) {
Matcher m = p.matcher(str);
if (m.matches()) {
System.out.println(String.format("Id: %s, Address: %s, Weight: %s",
new Object[] { m.group(1), m.group(2), m.group(3) }));
}
}
}
Prints,
Id: 456-BgT-79876, Address: 22 Broadway, Weight: 24
Id: QAZ-456-QWER, Address: 100 East 20th Street, Weight: 50
Id: Q2Z-457-QWER, Address: 200 East 20th Street, Weight: 49
Id: 678-FGH-9845, Address: 45 5th Ave, Weight: 12.2
Id: 678-FGH-9846, Address: 45 5th Ave, Weight: 12.2
Id: 123-A BC-9999, Address: 46 Foo Bar, Weight: 220.0
Id: 347-poy-3465, Address: 101 B'way, Weight: 24
Id: 678-FGH-8465, Address: 45 5th Ave, Weight: 12.2
Let me know if this works for you and if you have any query further.
The last line only contains one token. So split will only return an array with one element.
A minimal reproducing example:
import java.io.*;
class Project1 {
public static void main(String[] args) throws Exception {
//Provide the file path
File file = new File(args[0]);
//Reads the file
BufferedReader br = new BufferedReader(new FileReader(file));
String str;
while ((str = br.readLine()) != null) {
if (str.trim().length() > 0) {
String[] result = str.trim().split("\\s*,\\s*", 3);
System.out.println(result[1]);
}
}
}
}
With this input file:
678-FGH-8465 45 5th Ave 12.2
The output looks like this:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at Project1.main(a.java:22)
Process finished with exit code 1
So you will have to decide, what your program should do in such cases. You might ignore those lines, print an error, or only add the first token in one of your lists.
you can add following code in your code
if (result.length > 0) {
packages.add(result[0]);
}
if (result.length > 1) {
addresses.add(result[1]);
}
if (result.length > 2) {
weights.add(result[2]);
}
Related
I have a file with records as below and I am trying to split the records in it based on white spaces and convert them into comma.
file:
a 3w 12 98 header P6124
e 4t 2 100 header I803
c 12L 11 437 M12
BufferedReader reader = new BufferedReader(new FileReader("/myfile.txt"));
String line = reader.readLine();
while (line != null) {
System.out.println(line);
line = reader.readLine();
String[] splitLine = line.split("\\s+")
If the data is separated by multiple white spaces, I usually go for regex replace -> split('\\s+') or split(" +").
But in the above case, I have a record c which doesn't have the data header. Hence the regex "\s+" or " +" will just skip that record and I will get an empty space as c,12L,11,437,M12 instead of c,12L,11,437,,M12
How do I properly split the lines based on any delimiter in this case so that I get data in the below format:
a,3w,12,98,header,P6124
e,4t,2,100,header,I803
c,12L,11,437,,M12
Could anyone let me know how I can achieve this ?
May be you can try using a more complicated approach, using a complex regex in order to match exatcly six fields for each line and handling explicitly the case of a missing value for the fifth one.
I rewrote your example adding some console log in order to clarify my suggestion:
public class RegexTest {
private static final String Input = "a 3w 12 98 header P6124\n" +
"e 4t 2 100 header I803\n" +
"c 12L 11 437 M12";
public static void main(String[] args) throws Exception {
BufferedReader reader = new BufferedReader(new StringReader(Input));
String line = null;
Pattern pattern = Pattern.compile("^([^ ]+) +([^ ]+) +([^ ]+) +([^ ]+) +([^ ]+)? +([^ ]+)$");
do {
line = reader.readLine();
System.out.println(line);
if(line != null) {
String[] splitLine = line.split("\\s+");
System.out.println(splitLine.length);
System.out.println("Line: " + line);
Matcher matcher = pattern.matcher(line);
System.out.println("matches: " + matcher.matches());
System.out.println("groups: " + matcher.groupCount());
for(int i = 1; i <= matcher.groupCount(); i++) {
System.out.printf(" Group %d has value '%s'\n", i, matcher.group(i));
}
}
} while (line != null);
}
}
The key is that the pattern used to match each line requires a sequence of six fields:
for each field, the value is described as [^ ]+
separators between fields are described as +
the value of the fifth (nullable) field is described as [^ ]+?
each value is captured as a group using parentheses: ( ... )
start (^) and end ($) of each line are marked explicitly
Then, each line is matched against the given pattern, obtaining six groups: you can access each group using matcher.group(index), where index is 1-based because group(0) returns the full match.
This is a more complex approach but I think it can help you to solve your problem.
Put a limit on the number of whitespace chars that may be used to split the input.
In the case of your example data, a maximum of 5 works:
String[] splitLine = line.split("\\s{1,5}");
See live demo (of this code working as desired).
Are you just trying to switch your delimiters from spaces to commas?
In that case:
cat myFile.txt | sed 's/ */ /g' | sed 's/ /,/g'
*edit: added a stage to strip out lists of more than two spaces, replacing them with just the two spaces needed to retain the double comma.
This question already has answers here:
What causes a java.lang.ArrayIndexOutOfBoundsException and how do I prevent it?
(26 answers)
Closed 3 years ago.
I am trying to read from a text file that has 20 lines and supposed to store them into an array and assign them a variable, firstname lastname and grade. Because I have to output them as last name, firstname and grade, I decided to use tokens but somehow I get this error: java.lang.ArrayIndexOutOfBoundsException: 1
public static void main(String[] args) throws IOException {
int numberOfLines = 20;
studentClass[] studentObject = new studentClass[numberOfLines];
readStudentData(studentObject);
}
public static void readStudentData(studentClass[] studentObject)throws {
//create FileReader and BufferedReader to read and store data
FileReader fr = new FileReader("/Volumes/PERS/Data.txt");
BufferedReader br = new BufferedReader (fr);
String line = null;
int i = 0;
//create array to store data for firstname, lastname, and score
while ((line = br.readLine()) != null){
String[] stuArray = line.split(" ");
String stuFName = stuArray[0];
String stuLName = stuArray[1];
int score = Integer.parseInt(stuArray[2]);
studentObject[i] = new studentClass (stuFName, stuLName, score);
i++;
}
br.close();
for(i = 0; i<studentObject.length; i++){
System.out.print(studentObject[i].getStudentFName());
}
}
The error that I get is specifically this line:
String stuLName = stuArray[1];
Here is the text file:
Duckey Donald 85
Goof Goofy 89
Brave Balto 93
Snow Smitn 93
Alice Wonderful 89
Samina Akthar 85
Simba Green 95
Donald Egger 90
Brown Deer 86
Johny Jackson 95
Greg Gupta 75
Samuel Happy 80
Danny Arora 80
Sleepy June 70
Amy Cheng 83
Shelly Malik 95
Chelsea Tomek 95
Angela Clodfelter 95
Allison Nields 95
Lance Norman 88
I think at the last line of your file you have white spaces. make sure last line hast no white space like space or tab.
First, next time you should include the import and output also in your code
for us to easy to fix it, and one more thing, the Class name should be
StudentClass, not studentClass, it have to me different with methods.
Second, I can't test your code without your studentClass ... So I only can guess it:
Consider 1: The text file have one more line (with white space) >> Impossible because String test = " "; test.split(" ")[0] == null;
Consider 2: Your text file has error, to test it, I suggest you to add
System.out.println(line + ".") after while ((line = br.readLine()) != null){
to test it, believe me, you will receive the last line because it's bloged;
I have a simple textfile:
John Jobs 225 Louis Lane Road
Amy Jones 445 Road Street
Corey Dol 556 The Road
Where I have people with First, last names, and address
I'm trying to parse them like this:
public void parseText() {
try {
File file = new File("test.txt");
String[] splitted;
Scanner sc = new Scanner(file);
while (sc.hasNextLine()) {
String s = sc.nextLine();
splitted = s.split("\\s+");
System.out.println(splitted[0]);
}
sc.close();
} catch (FileNotFoundException e) {
System.out.println("Error"); }
}
splitted[0] works fine, which prints out the firstnames of the people.
splitted[1] prints out the last names, but gives me a IndexOutOfBoundsException.
spitted[2] prints out the first integer values of each address, but again gives me an exception.
So Then I tried doing this:
String[] splitted = new String[4];
and once again tried accessing any index greater than 0, but still got that problem.
What am I doing wrong?
This is your file's content :
John Jobs 225 Louis Lane Road
Amy Jones 445 Road Street
Corey Dol 556 The Road
When each line is read and split , splitted will contain 6 elements for the first run and 5 for the next runs. so if you don't use indexes carefully you'll obviously get IndexOutOfBoundsException.
Better approach would be to use a foreach loop :
while (sc.hasNextLine()) {
String s = sc.nextLine();
splitted = s.split("\\s+");
//System.out.println(Arrays.toString(splitted));
for (String string : splitted) {
System.out.print(string+" ");
}
System.out.println();
.....rest of code
I want to Parse the lines of a file Using parsingMethod
test.csv
Frank George,Henry,Mary / New York,123456
,Beta Charli,"Delta,Delta Echo
", 25/11/1964, 15/12/1964,"40,000,000.00",0.0975,2,"King, Lincoln ",Alpha
This is the way i read line
public static void main(String[] args) throws Exception {
File file = new File("C:\\Users\\test.csv");
BufferedReader reader = new BufferedReader(new FileReader(file));
String line2;
while ((line2= reader.readLine()) !=null) {
String[] tab = parsingMethod(line2, ",");
for (String i : tab) {
System.out.println( i );
}
}
}
public static String[] parsingMethod(String line,String parser) {
List<String> liste = new LinkedList<String>();
String patternString ="(([^\"][^"+parser+ "]*)|\"([^\"]*)\")" +parser+"?";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher =pattern.matcher(line);
while (matcher.find()) {
if(matcher.group(2) != null){
liste.add(matcher.group(2).replace("\n","").trim());
}else if(matcher.group(3) != null){
liste.add(matcher.group(3).replace("\n","").trim());
}
}
String[] result = new String[liste.size()];
return liste.toArray(result);
}
}
Output :
Frank George
Henry
Mary / New York
123456
Beta Charli
Delta
Delta Echo
"
25/11/1964
15/12/1964
40,000,000.00
0.0975
2
King
Lincoln
"
Alpha
Delta
Delta Echo
I want to remove this " ,
Can any one help me to improve my Pattern.
Expected output
Frank George
Henry
Mary / New York
123456
Beta Charli
Delta
Delta Echo
25/11/1964
15/12/1964
40,000,000.00
0.0975
2
King
Lincoln
Alpha
Delta
Delta Echo
Output for line 3
25/11/1964
15/12/1964
40
000
000.00
0.0975
2
King
Lincoln
Your code didn't compile properly but that was caused by some of the " not being escaped.
But this should do the trick:
String patternString = "(?:^.,|)([^\"]*?|\".*?\")(?:,|$)";
Pattern pattern = Pattern.compile(patternString, Pattern.MULTILINE);
(?:^.,|) is a non capturing group that matches a single character at the start of the line
([^\"]*?|\".*?\") is a capturing group that either matches everything but " OR anything in between " "
(?:,|$) is a non capturing group that matches a end of the line or a comma.
Note: ^ and $ only work as stated when the pattern is compiled with the Pattern.MULTILINE flag
I can't reproduce your result but I'm thinking maybe you want to leave the quotes out of the second captured group, like this:
"(([^\"][^"+parser+ "]*)|\"([^\"]*))\"" +parser+"?"
Edit: Sorry, this won't work. Maybe you want to let any number of ^\" in the first group as well, like this: (([^,\"]*)|\"([^\"]*)\"),?
As i can see the lines are related so try this:
public static void main(String[] args) throws Exception {
File file = new File("C:\\Users\\test.csv");
BufferedReader reader = new BufferedReader(new FileReader(file));
StringBuilder line = new StringBuilder();
String lineRead;
while ((lineRead = reader.readLine()) != null) {
line.append(lineRead);
}
String[] tab = parsingMethod(line.toString());
for (String i : tab) {
System.out.println(i);
}
}
public static String[] parsingMethod(String line) {
List<String> liste = new LinkedList<String>();
String patternString = "(([^\"][^,]*)|\"([^\"]*)\"),?";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
if (matcher.group(2) != null) {
liste.add(matcher.group(2).replace("\n", "").trim());
} else if (matcher.group(3) != null) {
liste.add(matcher.group(3).replace("\n", "").trim());
}
}
String[] result = new String[liste.size()];
return liste.toArray(result);
}
Ouput:
Frank George
Henry
Mary / New York
123456
Beta Charli
Delta,Delta Echo
25/11/1964
15/12/1964
40,000,000.00
0.0975
2
King, Lincoln
Alpha
as Delta, Delta Echo is in a quotation this should appear in the same line ! like as King, Lincoln
I have a flat file like:
A 10
S 20
W A 20 10
S A 45 10
S W S 20 20 20 30
W A S 22 50 20 55
I want to make sure it is well formed, (separated by blank space " ")
allowing only a regular expression like:
anyword* then " " then (word*|numbers*)*
where * is any number of words
but there is also one issue,
if there is only one word or char there is only one number
if there are 2 words or chars separated by " " then there must be 2 numbers separated by " "
if there are 3 words or chars separated by " " then there must be 4 numbers separated by " "
I was doing something like this, but do not know where to incorporate validation of line
try {
input = new BufferedReader(new FileReader(new File(filename)));
String line = null;
while ((line = input.readLine()) != null) {
String[] words = line.split(" ");
if (words.length == 2) {
}
}
}
This regex should do it:
^[a-z]+ (?:\d+|[a-z]+(?: \d+ \d+| [a-z]+(?: \d+){4}))$
I tried to make it as short as possible, but it may be possible to condense it a bit more. This should be used with case sensitivity enabled or you should change all of the [a-z] to [a-zA-Z].
Here is a Rubular.