Java split() method strips empty strings at the end? [duplicate] - java

This question already has answers here:
Java String split removed empty values
(5 answers)
Closed 6 years ago.
Check out the below program.
try {
for (String data : Files.readAllLines(Paths.get("D:/sample.txt"))){
String[] de = data.split(";");
System.out.println("Length = " + de.length);
}
} catch (IOException e) {
e.printStackTrace();
}
Sample.txt:
1;2;3;4
A;B;;
a;b;c;
Output:
Length = 4
Length = 2
Length = 3
Why second and third output is giving 2 and 3 instead of 4. In sample.txt file, condition for 2nd and 3rd line is should give newline(\n or enter) immediately after giving delimiter for the third field. Can anyone help me how to get length as 4 for 2nd and 3rd line without changing the condition of the sample.txt file and how to print the values of de[2] (throws ArrayIndexOutOfBoundsException)?

You can specify to apply the pattern as often as possible with:
String[] de = data.split(";", -1);
See the Javadoc for the split method taking two arguments for details.

have a look at the docs, here the important quote:
[...] the array can have any length, and trailing empty strings will be discarded.
If you don't like that, have a look at Fabian's comment. When calling String.split(String), it calls String.split(String, 0) and that discards trailing empty strings (as the docs say it), when calling String.split(String, n) with n < 0 it won't discard anything.

Related

question on transferring codes, what method should I use?

Question explaination: as some of the comments suggested, I will try my best to make this question clearer. The inputs are from a file and the code is just one example. Supposedly the code should work for any inputs in the format. I understand that I need to use Scanner to read the file. The question would be what code do I use to get to the output.
Input Specification:
The first line of input contains the number N, which is the number of lines that follow. The next
N lines will contain at least one and at most 80 characters, none of which are spaces.
Output Specification:
Output will be N lines. Line i of the output will be the encoding of the line i + 1 of the input.
The encoding of a line will be a sequence of pairs, separated by a space, where each pair is an
integer (representing the number of times the character appears consecutively) followed by a space,
followed by the character.
Sample Input
4
+++===!!!!
777777......TTTTTTTTTTTT
(AABBC)
3.1415555
Output for Sample Input
3 + 3 = 4 !
6 7 6 . 12 T
1 ( 2 A 2 B 1 C 1 )
1 3 1 . 1 1 1 4 1 1 4 5
I have only posted two questions so far, and I don't quite understand the standard of a "good" question and a "bad" question? Can someone explain why this is a bad question? Appreciate it!
Complete working code here try it.
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
public class CharTask {
public static void main(String[] args) {
List<String> lines = null;
try {
File file = new File("inp.txt");
FileInputStream ins =new FileInputStream(file);
Scanner scanner = new Scanner(ins);
lines = new ArrayList<String>();
while(scanner.hasNext()) {
lines.add(scanner.nextLine());
}
List<String> output = processInput(lines);
for (int i=1;i<output.size(); i++) {
System.out.println(output.get(i));
}
scanner.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
private static List<String> processInput(List<String> lines){
List<String> output = new ArrayList<String>();
for (String line: lines) {
output.add(getProcessLine(line));
}
return output;
}
private static String getProcessLine(String line) {
if(line.length() == 0) {
return null;
}
String output = "";
char prev = line.charAt(0);
int count = 1;
for(int i=1;i<line.length();i++) {
char c = line.charAt(i);
if (c == prev) {
count = count +1;
}
else {
output = output + " "+count + " "+prev;
prev = c;
count = 1;
}
}
output = output + " "+count+" "+prev;
return output;
}
}
Input
(inp.txt)
4
+++===!!!!
777777......TTTTTTTTTTTT
(AABBC)
3.1415555
Output
3 + 3 = 4 !
6 7 6 . 12 T
1 ( 2 A 2 B 1 C 1 )
1 3 1 . 1 1 1 4 1 1 4 5
There are two different problems you need to address, and I think it is going to help you to address them separately. The first is to read in the input. It's not clear to me whether you are going to prompt for it and whether it is coming from the console or a file or what exactly. For that you will want to initialize a scanner, use nextInt to get the number of lines, call nextLine() to clear the rest of that line and then run a for loop from 0 up to the number of lines, reading the next line (using nextLine()) into a String variable. To make sure that is working well, I would suggest printing out the unaltered string and see if what is coming out is what is going in.
The other task is to convert a given input String into the desired output String. You can work on that independently, then pull things back together later. You will want a method that takes in a string and returns a string. You can test it by passing the sample Strings and seeing if it gives you back the desired output strings. Set the result="". Looping over the characters in the String using charAt, it will want variables for the currentCharacter and currentCount, and when the character changes or the end of the string is encountered, concatenate the number and character onto the string and reset the character count and current character as needed. Outside the loop, return the result.
Once the two tasks are solved, pull them together by printing out what the method returns for the input string as opposed to the input string itself.
I think that gives you direction on the method to use. It's not a full-blown solution, but that's not what you requested or needed.

What's an elegant way to parse this text in java?

Disclaimer:
The parsing-problem described in here is very simple. This question does not simply ask for a way to achieve the parsing. - That's almost straightforward - Instead, it asks for an elegant way. That elegant way would probably be one which does not first read line-wise and then parse each line on its own, as this is obviously not necessary. However, is this elegant way possible with ready to use standard classes?
Question:
I have to parse text of the following form in java (there is more than these 3 records; records can have way more lines than these examples):
5
Dominik 3
Markus 3 2
Reiner 1 2
Samantha 4
Thomas 3
4
Babette 1 4
Diana 3 4
Magan 2
Thomas 2 4
The first number n is the number of lines in the record directly following. Each record consists of a name and then 0 to n integers.
I thought that using java.util.Scanner is a natural choice, but it leads to the nastiness that when using hasNextInt() and hasNext() to determine if a line is started, I can't distinguish if a read number is the header of the next record or it's the last number behind the last name of the previous record. Example from above:
...
Thomas 3
4
...
Here, I don't know how to tell if the 3 and the 4 is a header or belongs to the current line of Thomas.
Sure I can first read line by line, put them into another Scanner, and then read them again, but this effectively parses the whole data twice, which looks ugly to me. Is there a better way?
I would need something like a flag which tells me if a line break was encountered during the last delimiter skipping operation.
Read the file using FileReader and BufferedReader and then start checking :
outer loop -->while readLine is not null
if line matches //d+ --> read value of number and put it into count
from 0 to count do what you want to do // inner loop
Instead of reading into a separate scanner, you can read to end of line, and use String.split, like this:
while (scanner.hasNextInt()) {
int count = scanner.nextInt();
for (int i = 0 ; i != count ; i++) {
if (!scanner.hasNext()) throw new IllegalStateException("expected a name");
String name = scanner.next();
List<Integer> numbers = new ArrayList<Integer>();
for (String numStr : scanner.readLine().split(" ")) {
numbers.add(Integer.parseInt(numStr));
}
... // Do something with name and numbers
}
}
This approach avoids the need to detect the difference between the last int on a line vs. the first integer on next line by calling readLine() after reading a name, i.e. in the middle of reading a line.
File file = new File("records.txt");
BufferedReader reader = new BufferedReader(new FileReader(file));
String line = null;
/* Read file one line at a time */
while((line = reader.readLine()) != null){
int noOfRecords = Integer.parseInt(line);
/* read the next n lines in a loop */
while(noOfRecords != 0){
line = reader.readLine();
String[] tokens = line.split(" ");
noOfRecords--;
// do what you need to do with names and numbers
}
}
Here we're reading one line at a time, so the first time we read a line it will be an int (call it as n), from there read the next n lines in some inner loop. Once it's done with this inner loop it will come outside and the next time you read a line it's definitely another int or EOF. That way you don't have to deal with integer parsing exceptions and we'll read all the lines only once :)

Looking for best way to do this program [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have two .txt files, first one looks like this :
XXXXXXX
XX0X0XX
XX000XX
XXXX0XX
XXXXXXX
and second like this :
.1..
.111
...1
....
First file needs to be seen as a hole made out of zeros and second as a figure make out of ones. I need to write an algorithm that reads both files and checks if "figure" out of second txt file fits into "hole" out of first one. What do you think is the most efficient way to do that ?
I think the best way is to read both files into arrays and then make comparision between arrays, but this is just my first thoughts.
Also final file should look like this :
XXXXXXX
XX1X0XX
XX111XX
XXXX1XX
XXXXXXX
One way could be to:
Load the first file in one array
Iterate over the second file and compare what you have in the array with what you have read in the file.
You can read both files line by line. Pass nth line from both the files to the following method:
public static boolean isFit(String a, String b) {
return a.replace('X', '.').replace('0', '1').equals(b);
}
If it return false then it is a mis-match otherwise at the end you can say that it is a match.
Here's a small method I threw together that determines whether or not a particular line of the figure matches a particular line in the hole.
public static int figureLineFits(char[] figure, char[] hole){
// Since figure max length per line is 4 and hole is 5
// we have to try to match it on either one end or the other.
char[] hole1 = Arrays.copyOfRange(hole, 0, hole.length-1);
char[] hole2 = Arrays.copyOfRange(hole, 1, hole.length);
// We get rid of the extra holes in the hole array.
for (int i = 0; i < 4; i++){
if(figure[i] == '.'){
if(hole1[i] == '0') hole1[i] = 'X';
if(hole2[i] == '0') hole2[i] = 'X';
}
}
// Convert the arrays to Strings because I'm
// lazy to lookup equivalent array methods.
String compFigure = figure.toString();
String compHole1 = hole1.toString();
String compHole2 = hole2.toString();
// Replace the 0s with 1s and Xs with .s in the hole strings.
compHole1.replace('0', '1');
compHole1.replace('X', '.');
compHole2.replace('0', '1');
compHole2.replace('X', '.');
// Set up some comparison booleans.
boolean leftComparison = compFigure.equals(compHole1);
boolean rightComparison = compFigure.equals(compHole2);
// Do some boolean logic to determine how the figure matches the hole.
// Will return 3 if figure can be matched on both sides.
// Will return 1 if figure can be matched on left side.
// Will return 2 if figure can be matched on right side.
// Will return 0 if figure doesn't match on either side.
if(leftComparison && rightComparison) return 3;
if(leftComparison) return 1;
if(rightComparison) return 2;
return 0;
}
Then you read the first line of the figure and try to match it with the lines of the hole.
If you can match it (the figureLineFits function doesn't return 0) then you can try to match the second line of the figure to the next line of the hole.
If that comparison doesn't return 0 then you have to check if the match is adequate, e.g. if the first line returned 1 and the next one returned 2 then the figure doesn't match. If the first line returned 3 and the second line returned either 1 or 2 then the match is adequate since the "3" means that it matches on both sides.
If you see that the match is not adequate you have to then go back to the first line of the figure and continue to match it on the line after you matched the first figure line not the consecutive figure lines since the first figure line might also match the second hole line although the second figure line didn't match the second hole line.
Hopefully this will get your head going in the right direction.

Verifying unexpected empty lines in a file

Aside: I am using the penn.txt file for the problem. The link here is to my Dropbox but it is also available in other places such as here. However, I've not checked whether they are exactly the same.
Problem statement: I would like to do some word processing on each line of the penn.txt file which contains some words and syntactic categories. The details are not relevant.
Actual "problem" faced: I suspect that the file has some consecutive blank lines (which should ideally not be present), which I think the code verifies but I have not verified it by eye, because the number of lines is somewhat large (~1,300,000). So I would like my Java code and conclusions checked for correctness.
I've used slightly modified version of the code for converting file to String and counting number of lines in a string. I'm not sure about efficiency of splitting but it works well enough for this case.
File file = new File("final_project/penn.txt"); //location
System.out.println(file.exists());
//converting file to String
byte[] encoded = null;
try {
encoded = Files.readAllBytes(Paths.get("final_project/penn.txt"));
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
String mystr = new String(encoded, StandardCharsets.UTF_8);
//splitting and checking "consecutiveness" of \n
for(int j=1; ; j++){
String split = new String();
for(int i=0; i<j; i++){
split = split + "\n";
}
if(mystr.split(split).length==1) break;
System.out.print("("+mystr.split(split).length + "," + j + ") ");
}
//counting using Scanner
int count=0;
try {
Scanner reader = new Scanner(new FileInputStream(file));
while(reader.hasNext()){
count++;
String entry = reader.next();
//some word processing here
}
reader.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
System.out.println(count);
The number of lines in Gedit--if I understand correctly--matched the number of \n characters found at 1,283,169. I have verified (separately) that the number of \r and \r\n (combined) characters is 0 using the same splitting idea. The total splitting output is shown below:
(1283169,1) (176,2) (18,3) (13,4) (11,5) (9,6) (8,7) (7,8) (6,9) (6,10) (5,11) (5,12) (4,13) (4,14) (4,15) (4,16) (3,17) (3,18) (3,19) (3,20) (3,21) (3,22) (3,23) (3,24) (3,25) (2,26) (2,27) (2,28) (2,29) (2,30) (2,31) (2,32) (2,33) (2,34) (2,35) (2,36) (2,37) (2,38) (2,39) (2,40) (2,41) (2,42) (2,43) (2,44) (2,45) (2,46) (2,47) (2,48) (2,49) (2,50)
Please answer whether the following statements are correct or not:
From this, what I understand is that there is one instance of 50 consecutive \n characters and because of that there are exactly two instances of 25 consecutive \n characters and so on.
The last count (using Scanner) reading gives 1,282,969 which is an exact difference of 200. In my opinion, what this means is that there are exactly 200 (or 199?) empty lines floating about somewhere in the file.
Is there any way to separately verify this "discrepancy" of 200? (something like a set-theoretic counting of intersections maybe)
A partial answer to question (the last part) is as follows:
(Assuming the two statements in the question are true)
If instead of printing number of split parts, if you print no. of occurrences of \n j times, you'll get (simply doing a -1):
(1283168,1) (175,2) (17,3) (12,4) (10,5) (8,6) (7,7) (6,8) (5,9) (5,10) (4,11) (4,12) (3,13) (3,14) (3,15) (3,16) (2,17) (2,18) (2,19) (2,20) (2,21) (2,22) (2,23) (2,24) (2,25) (1,26) (1,27) (1,28) (1,29) (1,30) (1,31) (1,32) (1,33) (1,34) (1,35) (1,36) (1,37) (1,38) (1,39) (1,40) (1,41) (1,42) (1,43) (1,44) (1,45) (1,46) (1,47) (1,48) (1,49) (1,50)
Note that for j>3, product of both numbers is <=50, which is your maximum. What this means is that there is a place with 50 consecutive \n characters and all the hits you are getting from 4 to 49 are actually part of the same.
However for 3, the maximum multiple of 3 less than 50 is 48 which gives 16 while you have 17 occurrences here. So there is an extra \n\n\n somewhere with non-\n character on both its 'sides'.
Now for 2 (\n\n), we can subtract 25 (coming from the 50 \ns) and 1 (coming from the separate \n\n\n) to obtain 175-26 = 149.
Counting for the discrepancy, we should sum (2-1)*149 + (3-1)*1 + (50-1)*1, the -1 coming because first \n in each of these is accounted for in the Scanner counting. This sum is 200.

Novice programmer needs advice: "String index out of range" - Java

I'm pretty new to programming and I'm getting a error which I'm sure is a easy fix for more experienced people.
Here is what I have:
import java.io.*;
import java.util.Scanner;
public class ReadNamesFile
{
public static void main(String[] args) throws IOException {
// make the names.csv comma-separated-values file available for reading
FileReader f = new FileReader("names.csv");
BufferedReader r = new BufferedReader(f);
//
String lastName="unknown", firstName="unknown", office="unknown";
// get first line
String line = r.readLine();
// process lines until end-of-file occurs
while ( line != null )
{
// get the last name on the line
//
// position of first comma
int positionOfComma = line.indexOf(",");
// extract the last name as a substring
lastName = line.substring(0,positionOfComma);
// truncate the line removing the name and comma
line = line.substring(positionOfComma+1);
// extract the first name as a substring
firstName = line.substring(0,positionOfComma);
// truncate the line removing the name and comma
line = line.substring(positionOfComma+1);
// extract the office number as a substring
office = line.substring(0,positionOfComma);
// truncate the line removing the name and comma
line = line.substring(positionOfComma+2);
//
//
//
// display the information about each person
System.out.print("\nlast name = "+lastName);
System.out.print("\t first name = "+firstName);
System.out.print("\t office = "+office);
System.out.println();
//
// get the next line
line = r.readLine();
}
}
}
Basically, it finds the last name, first name and office number in a .csv file and prints them out.
When I compile I don't get any errors but when I run it I get:
java.lang.StringIndexOutOfBoundsException: String index out of range: 7
at java.lang.String.substring(String.java:1955)
at ReadNamesFile.main(ReadNamesFile.java:34)
Before trying to do the office number part, the first two (last and first name) printed out fine but the office number doesn't seem to work.
Any ideas?
Edit: Thanks for all the posts guys, I still can't really figure it out though. Can someone post something really dumbed down? I've been trying to fix this for an hour now and I can't get it.
Let's work by example, what issues you have with your code.
Eg: line: Overflow,stack
{ length: 14 }
Taking your program statements line by line -
int positionOfComma = line.indexOf(","); // returns 9
lastName = line.substring(0,positionOfComma); // should be actually postionOfComma-1
Now lastName has Overflow. positionOfComma has 9.
line = line.substring(positionOfComma+1);
Now line has stack.
firstName = line.substring(0,positionOfComma);
Asking substring from 0 to 9. But stack is only of length 5. This will cause String index out of range exeception. Hope you understood where you are doing wrong.
From JavaDoc:
(StringIndexOutOfBoundsException) - Thrown by String methods to
indicate that an index is either negative or greater than the size of
the string.
In your case, one of your calls to .substring is being given a value that is >= the length of the string. If line #34 is a comment, then it's the line above #34.
You need to:
a) Make sure you handle the case if you DON'T find a comma (i.e. if you cannot find and extract a lastName and/or firstName string)
b) Make sure the value of "positionOfComma + N" never exceeds the length of the string.
A couple of "if" blocks and/or "continue" statements will do the trick nicely ;-)
You correctly find positionOfComma, but then that logic applies to the original value of line. When you remove the last name and comma, positionOfComma is no longer correct as it applies to the old value of line.
int positionOfComma = line.indexOf(",");
this line of code might not find a comma and then positionOfComma will be -1. Next you substring something with (0,-1) - eeek no wonder it gives StringIndexOutOfBoundsException. Use something like:
int positionOfComma = 0;
if(line.indexOf(",")!=-1)
{
positionOfComma = line.indexOf(",");
}
You do have to do lots of checking of things sometimes especially when the data is whacked :(
http://download.oracle.com/javase/1.4.2/docs/api/java/lang/String.html#indexOf(java.lang.String)
PS I'm sure someone clever can make my coding look shabby but you get the point I hope :)

Categories

Resources