Disclaimer:
The parsing-problem described in here is very simple. This question does not simply ask for a way to achieve the parsing. - That's almost straightforward - Instead, it asks for an elegant way. That elegant way would probably be one which does not first read line-wise and then parse each line on its own, as this is obviously not necessary. However, is this elegant way possible with ready to use standard classes?
Question:
I have to parse text of the following form in java (there is more than these 3 records; records can have way more lines than these examples):
5
Dominik 3
Markus 3 2
Reiner 1 2
Samantha 4
Thomas 3
4
Babette 1 4
Diana 3 4
Magan 2
Thomas 2 4
The first number n is the number of lines in the record directly following. Each record consists of a name and then 0 to n integers.
I thought that using java.util.Scanner is a natural choice, but it leads to the nastiness that when using hasNextInt() and hasNext() to determine if a line is started, I can't distinguish if a read number is the header of the next record or it's the last number behind the last name of the previous record. Example from above:
...
Thomas 3
4
...
Here, I don't know how to tell if the 3 and the 4 is a header or belongs to the current line of Thomas.
Sure I can first read line by line, put them into another Scanner, and then read them again, but this effectively parses the whole data twice, which looks ugly to me. Is there a better way?
I would need something like a flag which tells me if a line break was encountered during the last delimiter skipping operation.
Read the file using FileReader and BufferedReader and then start checking :
outer loop -->while readLine is not null
if line matches //d+ --> read value of number and put it into count
from 0 to count do what you want to do // inner loop
Instead of reading into a separate scanner, you can read to end of line, and use String.split, like this:
while (scanner.hasNextInt()) {
int count = scanner.nextInt();
for (int i = 0 ; i != count ; i++) {
if (!scanner.hasNext()) throw new IllegalStateException("expected a name");
String name = scanner.next();
List<Integer> numbers = new ArrayList<Integer>();
for (String numStr : scanner.readLine().split(" ")) {
numbers.add(Integer.parseInt(numStr));
}
... // Do something with name and numbers
}
}
This approach avoids the need to detect the difference between the last int on a line vs. the first integer on next line by calling readLine() after reading a name, i.e. in the middle of reading a line.
File file = new File("records.txt");
BufferedReader reader = new BufferedReader(new FileReader(file));
String line = null;
/* Read file one line at a time */
while((line = reader.readLine()) != null){
int noOfRecords = Integer.parseInt(line);
/* read the next n lines in a loop */
while(noOfRecords != 0){
line = reader.readLine();
String[] tokens = line.split(" ");
noOfRecords--;
// do what you need to do with names and numbers
}
}
Here we're reading one line at a time, so the first time we read a line it will be an int (call it as n), from there read the next n lines in some inner loop. Once it's done with this inner loop it will come outside and the next time you read a line it's definitely another int or EOF. That way you don't have to deal with integer parsing exceptions and we'll read all the lines only once :)
Related
Okay so I'm having a slight problem with scanner advancing an extra line. I have a file that has many lines containing integers each separated by one space. Somewhere in the file there is a line with no integers and just the word "done".
When done is found we exit the loop and print out the largest prime integer that is less than each given integer in each line(if integer is already prime do nothing to it). We do this all the way up until the line with "done".
My problem: lets say the file contains 6 lines and on the 6th line is the word done. My output would skip lines 1, 3 and 5. It would only return the correct values for line 2 and 4.
Here's a snippet of code where I read the values in:
Scanner in = new Scanner(
new InputStreamReader(socket.getInputStream()));
PrintStream out = new PrintStream(socket.getOutputStream());
while(in.nextLine() != "done"){
String[] arr = in.nextLine().split(" ");
Now I sense the problem is that the nextLine call in my loop advances the line and then the nextline.split call also advances the line. Thus, all odd number lines will be lost. Would there be another way to check for "done" without advancing a line or is there a possible command I could call to somehow reset the scanner back to the start of the loop?
The problem is you have 2 calls to nextLine() try something like this
String line = in.nextLine();
while (!"done".equals(line)) {
String[] arr = line.split(" ");
// Process the line
if (!in.hasNextLine()) {
// Error reached end of file without finding done
}
line = in.nextLine();
}
Also note I fixed the check for "done" you should be using equals().
I think you are looking for this
while(in.hasNextLine()){
String str = in.nextLine();
if(str.trim().equals("done"){
break;
}else{
String[] arr = str.split("\\s+");
//then do whatever you want to do
}
}
I have the following issue: I am trying to parse a .csv file in java, and store specifically 3 columns of it in a 2 Dimensional array. The Code for the method looks like this:
public static void parseFile(String filename) throws IOException{
FileReader readFile = new FileReader(filename);
BufferedReader buffer = new BufferedReader(readFile);
String line;
String[][] result = new String[10000][3];
String[] b = new String[6];
for(int i = 0; i<10000; i++){
while((line = buffer.readLine()) != null){
b = line.split(";",6);
System.out.println("ID: "+b[0]+" Title: "+b[3]+ "Description: "+b[4]); // Here is where the outofbounds exception occurs...
result[i][0] = b[0];
result[i][1] = b[3];
result[i][2] = b[4];
}
}
buffer.close();
}
I feel like I have to specify this: the .csv file is HUGE. It has 32 columns, and (almost) 10.000 entries (!).
When Parsing, I keep getting the following:
XXXXX CHUNKS OF SUCCESFULLY EXTRACTED CODE
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException:3
at ParseCSV.parseFile(ParseCSV.java:24)
at ParseCSV.main(ParseCSV.java:41)
However, I realized that SOME of the stuff in the file has a strange format e.g. some of the texts inside it for instance have newlines in them, but there is no newline character involved in any way. However, if I delete those blank lines manually, the output generated (before the error message is prompted) adds the stuff to the array up until the next blank line ...
Does anyone have an idea how to fix this? Any help would be greately appreciated...
Your first problem is that you probably have at least one blank line in your csv file. You need to replace:
b = line.split(";", 6);
with
b = line.split(";");
if(b.length() < 5){
System.err.println("Warning, line has only " + b.length() +
"entries, so skipping it:\n" + line);
continue;
}
If your input can legitimately have new lines or embedded semi-colons within your entries, that is a more complex parsing problem, and you are probably better off using a third-party parsing library, as there are several very good ones.
If your input is not supposed to have new lines in it, the problem probably is \r. Windows uses \r\n to represent a new line, while most other systems just use \n. If multiple people/programs edited your text file, it is entirely possible to end up with stray \r by themselves, which are not easily handled by most parsers.
A way to easily check if that's your problem is before you split your line, do
line = line.replace("\r","").
If this is a process you are repeating many times, you might need to consider using a Scanner (or library) instead to get more efficient text processing. Otherwise, you can make do with this.
When you have new lines in your CSV file, after this line
while((line = buffer.readLine()) != null){
variable line will have not a CSV line but just some text without ;
For example, if you have file
column1;column2;column
3 value
after first iteration variable line will have
column1;column2;column
after second iteration it will have
3 value
when you call "3 value".split(";",6) it will return array with one element. and later when you call b[3] it will throw exception.
CSV format has many small things, to implement which you will spend a lot of time. This is a good article about all possible csv examples
http://en.wikipedia.org/wiki/Comma-separated_values#Basic_rules_and_examples
I would recommend to you some ready CSV parsers like this
https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVParser.html
String's split(pattern, limit) method returns an array sized to the number of tokens found up to the the number specified by the limit parameter. Limit is the maximum, not the minimum number of array elements returned.
"1,2,3" split with (",", 6) with return an array of 3 elements: "1", "2" and "3".
"1,2,3,4,5,6,7" will return 6 elements: "1", "2", "3", "4", "5" and ""6,7" The last element is goofy because the split method stopped splitting after 5 and returned the rest of the source string as the sixth element.
An empty line is represented as an empty string (""). Splitting "" will return an array of 1 element, the empty string.
In your case, the string array created here
String[] b = new String[6];
and assigned to b is replaced by the the array returned by
b = line.split(";",6);
and meets it's ultimate fate at the hands of the garbage collector unseen and unloved.
Worse, in the case of the empty lines, it's replaced by a one element array, so
System.out.println("ID: "+b[0]+" Title: "+b[3]+ "Description: "+b[4]);
blows up when trying to access b[3].
Suggested solution is to either
while((line = buffer.readLine()) != null){
if (line.length() != 0)
{
b = line.split(";",6);
System.out.println("ID: "+b[0]+" Title: "+b[3]+ "Description: "+b[4]); // Here is where the outofbounds exception occurs...
...
}
or (better because the previous could trip over a malformed line)
while((line = buffer.readLine()) != null){
b = line.split(";",6);
if (b.length() == 6)
{
System.out.println("ID: "+b[0]+" Title: "+b[3]+ "Description: "+b[4]); // Here is where the outofbounds exception occurs...
...
}
You might also want to think about the for loop around the while. I don't think it's doing you any good.
while((line = buffer.readLine()) != null)
is going to read every line in the file, so
for(int i = 0; i<10000; i++){
while((line = buffer.readLine()) != null){
is going to read every line in the file the first time. Then it going to have 9999 attempts to read the file, find nothing new, and exit the while loop.
You are not protected from reading more than 10000 elements because the while loop because the while loop will read a 10001th element and overrun your array if there are more than 10000 lines in the file. Look into replacing the big array with an arraylist or vector as they will size to fit your file.
Please check b.length>0 before accessing b[].
How can I start reading from the third line of text file in Java?
I want to store 12 in 'nodes' variable, 14 in'edges' variable.
12334 in different variable and so on.
My input text file consisting of integers goes like this:
12
14
12334 12214 25
32151 32151 85
21514 51454 20
.
.
.
.
.
try
{
for(i=0;i<2;i++)
array[i] = inputFile.nextInt();
nodes=array[0];
edges=array[1];
break;
for(i=2;i<5;i++)
{
array1[i] = inputFile.nextInt();
System.out.println(array1[i]);
}
}
Using Scanner:
Scanner sc = new Scanner(myFile);
int lineIndex = 0;
while(sc.hasNextLine()) {
String line = sc.nextLine();
if(++lineIndex > 3) {
// do something
}
}
Note: Having break is going to terminate the outer loop
Suggestiones how to solve this
1 . Either use BufferReader or Scanner class.
2 . Have a counter variable set to zero
3. keep reading line and check if it is equal to 3 yet
4. continue reading line, but when the counter is equal 3, save each line in either variable or Array
Difference between BufferReader and Scanner
1. BufferedReader has significantly larger buffer memory than Scanner. Use BufferedReader if you want to get long strings from a stream, and use Scanner if you want to parse specific type of token from a stream.
2. Scanner can use tokenize using custom delimiter and parse the stream into primitive types of data, while BufferedReader can only read and store String.
3. BufferedReader is synchronous while Scanner is not. Use BufferedReader if you're working with multiple threads.
My question is quite simple, I want to read in a text file and store the first line from the file into an integer, and every other line of the file into a multi-dimensional array. The way of which I was thinking of doing this would be of creating an if-statement and another integer and when that integer is at 0 store the line into the integer variable. Although this seems stupid and there must be a more simple way.
For example, if the contents of the text file were:
4
1 2 3 4
4 3 2 1
2 4 1 3
3 1 4 2
The first line "4", would be stored in an integer, and every other line would go into the multi-dimensional array.
public void processFile(String fileName){
int temp = 0;
int firstLine;
int[][] array;
try{
BufferedReader input = new BufferedReader(new FileReader(fileName));
String inputLine = null;
while((inputLine = input.readLine()) != null){
if(temp == 0){
firstLine = Integer.parseInt(inputLine);
}else{
// Rest goes into array;
}
temp++;
}
}catch (IOException e){
System.out.print("Error: " + e);
}
}
I'm intentionally not answering this to do it for you. Try something with:
String.split
A line that says something like array[temp-1] = new int[firstLine];
An inner for loop with another Integer.parseInt line
That should be enough to get you the rest of the way
Instead, you could store the first line of the file as an integer, and then enter a for loop where you loop over the rest of the lines of the file, storing them in arrays. This doesn't require an if, because you know that the first case happens first, and the other cases (array) happen after.
I'm going to assume that you know how to use file IO.
I'm not extremely experienced, but this is how I would think about it:
while (inputFile.hasNext())
{
//Read the number
String number = inputFile.nextLine();
if(!number.equals(" ")){
//Do what you need to do with the character here (Ex: Store into an array)
}else{
//Continue on reading the document
}
}
Good Luck.
I'm pretty new to programming and I'm getting a error which I'm sure is a easy fix for more experienced people.
Here is what I have:
import java.io.*;
import java.util.Scanner;
public class ReadNamesFile
{
public static void main(String[] args) throws IOException {
// make the names.csv comma-separated-values file available for reading
FileReader f = new FileReader("names.csv");
BufferedReader r = new BufferedReader(f);
//
String lastName="unknown", firstName="unknown", office="unknown";
// get first line
String line = r.readLine();
// process lines until end-of-file occurs
while ( line != null )
{
// get the last name on the line
//
// position of first comma
int positionOfComma = line.indexOf(",");
// extract the last name as a substring
lastName = line.substring(0,positionOfComma);
// truncate the line removing the name and comma
line = line.substring(positionOfComma+1);
// extract the first name as a substring
firstName = line.substring(0,positionOfComma);
// truncate the line removing the name and comma
line = line.substring(positionOfComma+1);
// extract the office number as a substring
office = line.substring(0,positionOfComma);
// truncate the line removing the name and comma
line = line.substring(positionOfComma+2);
//
//
//
// display the information about each person
System.out.print("\nlast name = "+lastName);
System.out.print("\t first name = "+firstName);
System.out.print("\t office = "+office);
System.out.println();
//
// get the next line
line = r.readLine();
}
}
}
Basically, it finds the last name, first name and office number in a .csv file and prints them out.
When I compile I don't get any errors but when I run it I get:
java.lang.StringIndexOutOfBoundsException: String index out of range: 7
at java.lang.String.substring(String.java:1955)
at ReadNamesFile.main(ReadNamesFile.java:34)
Before trying to do the office number part, the first two (last and first name) printed out fine but the office number doesn't seem to work.
Any ideas?
Edit: Thanks for all the posts guys, I still can't really figure it out though. Can someone post something really dumbed down? I've been trying to fix this for an hour now and I can't get it.
Let's work by example, what issues you have with your code.
Eg: line: Overflow,stack
{ length: 14 }
Taking your program statements line by line -
int positionOfComma = line.indexOf(","); // returns 9
lastName = line.substring(0,positionOfComma); // should be actually postionOfComma-1
Now lastName has Overflow. positionOfComma has 9.
line = line.substring(positionOfComma+1);
Now line has stack.
firstName = line.substring(0,positionOfComma);
Asking substring from 0 to 9. But stack is only of length 5. This will cause String index out of range exeception. Hope you understood where you are doing wrong.
From JavaDoc:
(StringIndexOutOfBoundsException) - Thrown by String methods to
indicate that an index is either negative or greater than the size of
the string.
In your case, one of your calls to .substring is being given a value that is >= the length of the string. If line #34 is a comment, then it's the line above #34.
You need to:
a) Make sure you handle the case if you DON'T find a comma (i.e. if you cannot find and extract a lastName and/or firstName string)
b) Make sure the value of "positionOfComma + N" never exceeds the length of the string.
A couple of "if" blocks and/or "continue" statements will do the trick nicely ;-)
You correctly find positionOfComma, but then that logic applies to the original value of line. When you remove the last name and comma, positionOfComma is no longer correct as it applies to the old value of line.
int positionOfComma = line.indexOf(",");
this line of code might not find a comma and then positionOfComma will be -1. Next you substring something with (0,-1) - eeek no wonder it gives StringIndexOutOfBoundsException. Use something like:
int positionOfComma = 0;
if(line.indexOf(",")!=-1)
{
positionOfComma = line.indexOf(",");
}
You do have to do lots of checking of things sometimes especially when the data is whacked :(
http://download.oracle.com/javase/1.4.2/docs/api/java/lang/String.html#indexOf(java.lang.String)
PS I'm sure someone clever can make my coding look shabby but you get the point I hope :)