I Am Not Getting the Result I Expect Using readLine() in Java - java

I am using the code snippet below, however it's not working quite as I understand it should.
public static void main(String[] args) {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String line;
try {
line = br.readLine();
while(line != null) {
System.out.println(line);
line = br.readLine();
}
} catch (IOException e) {
e.printStackTrace();
}
}
From reading the Javadoc about readLine() it says:
Reads a line of text. A line is considered to be terminated by any one of a line feed (\n), a carriage return (\r), or a carriage return followed immediately by a linefeed.
Returns:
A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached
Throws:
IOException - If an I/O error occurs
From my understanding of this, readLine should return null the first time no input is entered other than a line termination, like \r. However, this code just ends up looping infinitely. After debugging, I have found that instead of null being returned when just a termination character is entered, it actually returns an empty string (""). This doesn't make sense to me. What am I not understanding correctly?

From my understanding of this, readLine should return null the first time no input is entered other than a line termination, like '\r'.
That is not correct. readLine will return null if the end of the stream is reached. That is, for example, if you are reading a file, and the file ends, or if you're reading from a socket and the socket closses.
But if you're simply reading the console input, hitting the return key on your keyboard does not constitute an end of stream. It's simply a character that is returned (\n or \r\n depending on your OS).
So, if you want to break on both the empty string and the end of line, you should do:
while (line != null && !line.equals(""))
Also, your current program should work as expected if you pipe some file directly into it, like so:
java -cp . Echo < test.txt

No input is not the same as the end of the stream. You can usually simulate the end of the stream in a console by pressing Ctrl+D (AFAIK some systems use Ctrl+Z instead). But I guess this is not what you want so better test for empty strings additionally to null strings.

There's a nice apache commons lang library which has a good api for common :) actions. You could use statically import StringUtils and use its method isNotEmpty(String ) to get:
while(isNotEmpty(line)) {
System.out.println(line);
line = br.readLine();
}
It might be useful someday:) There are also other useful classes in this lib.

Related

LineIterator.hasNext() throws exception if last line of the file is not empty

I have a piece of code that looks like this (in java) which uses org.apache.commons.io.IOUtils.lineIterator;
LineIterator iterator = lineIterator(someFile, defaultCharset());
....
while(iterator.hasNext){
process(iterator);
}
private String process(LineIterator iterator, SomeComplexClass someClass){
while(iterator.hasNext()){
//do something using iterator.nextLine()
}
}
This works perfectly if my input file has an empty last row.
Input file:
Line1
Line2
(nothing here. just empty line)
But if I have my input file without an empty line, then lineIterator.hasNext() throws exception saying "java.lang.IllegalStateException: Stream already closed"
Input file:
Line1
Line2 //there is no empty line after this
Is this a known behaviour of LineIterator?
Please help me try to understand why is it happening like this.
Update: I don't see close() or LineIterator.closeQuietly(iterator) in the code. Can that cause the issue?
Open your file reading in text editor which can show line numbers like notepad++
then if the last line is empty.Delete that last line.
Blank space does not mean there is no line
If you look at LineIterator hasNext() LineIterator,
it seems to have a bug. To work, it should set cachedLine to null when it reaches the end of the file, but instead only sets a "finished" flag. Unfortunately, at the start of the method, cachedLine is checked before the finished flag, so you get true even if the end of file is reached.

Java BufferedWriter Creating Null Characters

I've been using Java's BufferedWriter to write to a file to parse out some input. When I open the file after, however, there seems to be added null characters. I tried specifying the encoding as "US-ASCII" and "UTF8" but I get the same result. Here's my code snippet:
Scanner fileScanner = new Scanner(original);
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file), "US-ASCII"));
while(fileScanner.hasNextLine())
{
String next = fileScanner.nextLine();
next = next.replaceAll(".*\\x0C", ""); //remove up to ^L
out.write(next);
out.newLine();
}
out.flush();
out.close();
Maybe the issue isn't even with the BufferedWriter?
I've narrowed it down to this code block because if I comment it out, there are no null-characters in the output file. If I do a regex replace in VIM the file is null-character free (:%s/.*^L//g).
Let me know if you need more information.
Thanks!
EDIT:
hexdump of a normal line looks like:
0000000 5349 2a41 3030 202a
But when this code is run the hexdump looks like:
0000000 5330 2a49 4130 202a
I'm not sure why things are getting mixed up.
EDIT:
Also, even if the file doesn't match the regex and runs through that block of code, it comes out with null characters.
EDIT:
Here's a hexdump of the first few lines of a diff:
http://pastie.org/pastes/8964701/text
command was: diff -y testfile.hexdump expectedoutput.hexdump
The rest of the lines are different like the last two.
EDIT: Looking at the hexdump diff you gave, the only difference is that one has LF line endings (0A) and the other has CRLF line endings (0D 0A). All the other data in your diff is shifted ahead to accomodate the extra byte.
The CRLF is the default line ending on the OS you're using. If you want a specific line ending in your output, write the string "\n" or "\r\n".
Previously I noted that the Scanner doesn't specify a charset. It should specify the appropriate one that the input is known to be encoded in. However, this isn't the source of the unexpected output.
Scanner.nextLine() is eating the existing line endings.
The javadoc for nextLine states:
This method returns the rest of the current line, excluding any line separator at the end.
The javadoc for BufferedWriter.newLine explains:
Writes a line separator. The line separator string is defined by the system property line.separator, and is not necessarily a single newline ('\n') character.
In your case your system's default newline seperator is "\n". The EDI file you are parsing uses "\r\n".
Using the system defined newLine seperator isn't the appropriate thing to do in this case. The newline separator to use is dictated by the file format and should be put in a format specific static constant somewhere.
Change "out.newLine();" to "out.write("\r\n");"
I think what is going on is the following
All lines that contain ^L (ff) get modified to remove everything before the ^L but in addition you have the side effect in 1 that all \r (cr) also get removed. However, if cr appears before ^L nextLine() is treating that as a line too. Note how, in the output file below, the number of cr + nl is 6 in the input file and the number of cr + nl is also 6 but they're all nl, so the line with c gets preserved because it's being treated on a different line than ^L. Probably not what you want. See below.
Some observations
The source file is being generated on a system that uses \r\n to define a new line, and your program is being run on a system that does not. Because of this all occurrences of 0xd are going to be removed. This will make the two files different sizes even if there are no ^L.
But you probably overlooked #1 because vim will operate in DOS mode (recognize \r\n as a newline separator) or non-DOS mode (only \n) depending on what it reads when it opens the file and hides the fact from the user if it can. In fact to test I had to brute force in \r using ^v^m because I was editing on Linux using vim more here.
Your means to test is probably using od -x (for hex right)? But that outputs ints which is not what you want. Consider the following input file and output file. After your program runs. As viewed in vi
Input file
a
b^M
c^M^M ^L
d^L
Output file
a
b
c
Well maybe that's right, lets see what od has to say
od -x of input File
0a61 0d62 630a 0d0d 0c20 640a 0a0c
od -x of output File
0a61 0a62 0a63 0a0a 000a
Huh, what where did that null come from? But wait from the man page of od
-t type Specify the output format. type is a string containing one or more of the following kinds of type specifiers:
q a Named characters (ASCII). Control characters are displayed using the following names:
-h, -x Output hexadecimal shorts. Equivalent to -t x2.
-a Output named characters. Equivalent to -t a.
Oh, ok so instead use the -a option
od -a of input
a nl b cr nl c cr cr sp ff nl d ff nl
od -a of output
a nl b nl c nl nl nl nl
Forcing java to ignore \r
And finally, all that being said, you really have to overcome the implicit understanding of java that \r delimits a line, even contrary to the documentation. Even when explicitly setting the scanner to use a \r ignoring pattern, it still operates contrary to the documentation and you must override that again by setting the delimiter (see below). I've found the following will probably do what you want by insisting on Unix line semantics. I also added in some logic to not output a blank line.
public static void repl(File original,File file) throws IOException
{
Scanner fileScanner = new Scanner(original);
Pattern pattern1 = Pattern.compile("(?d).*");
fileScanner.useDelimiter("(?d)\\n");
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file), "UTF8"));
while(fileScanner.hasNext(pattern1))
{
String next = fileScanner.next(pattern1);
next = next.replaceAll("(?d)(.*\\x0C)|(\\x0D)","");
if(next.length() != 0)
{
out.write(next);
out.newLine();
}
}
out.flush();
out.close();
}
With this change, the output above changes to.
od -a of input
a nl b cr nl c cr cr sp ff nl d ff nl
od -a of output
a nl b nl
Stuart Caie provided the answer. if you are looking for an code to avoid these characters.
Basic issue is , Org file using different line separator and the new file using different line separator character.
One easy way, find the Org file Separator character and use the same in new file.
try(BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file)));
Scanner fileScanner = new Scanner(original);) {
String lineSep = null;
boolean lineSepFound = false;
while(fileScanner.hasNextLine())
{
if (!lineSepFound){
MatchResult matchResult = fileScanner.match();
if (matchResult != null){
lineSep = matchResult.group(1);
if (lineSep != null){
lineSepFound = true;
}
}
}else{
out.write(lineSep);
}
String next = fileScanner.nextLine();
next = next.replaceAll(".*\\x0C", ""); //remove up to ^L
out.write(next);
}
} catch ( IOException e) {
e.printStackTrace();
}
Note ** MatchResult matchResult = fileScanner.match(); would provide the matchResult for the last Match performed. And in our case we have used hasNextLine() - Scanner used linePattern to find the next line .. Scanner.hasNextLine Source code finding the line Separator ,
but unfortunately no way to get the line separator back. So i have used thier code to get the lineSep only once. and used that lineSep for creating new file.
Also per your code , you would be having extra line separator at the end of file. Corrected here.
Let me know if that works.

\n Not working when reading from File to List and then to output. Java

I have a minor problem, the \n's in my file isn't working in my output I tried two methods:
PLEASE NOTE:
*The text in the file here is a much simplified example. That is why I do not just use the output.append("\n\n"); in the second method. Also the \ns in the file are not always at the END of the line i.e. a line n the file could be Stipulation 1.1\nUnder this Stipulation...etc. *
The \n's in the file need to work. Also both JOptionPane.showMessageDialog(null,rules); and System.out.println(rules); give the same formatted output
Text in File:
A\n
B\n
C\n
D\n
Method 1:
private static void setGameRules(File f) throws FileNotFoundException, IOException
{
rules = Files.readAllLines(f.toPath(), Charset.defaultCharset());
JOptionPane.showMessageDialog(null,rules);
}
Output 1:
A\nB\nC\nD\n
Method 2:
private static void setGameRules(File f) throws FileNotFoundException, IOException
{
rules = Files.readAllLines(f.toPath(), Charset.defaultCharset());
StringBuilder output = new StringBuilder();
for (String s : rules)
{
output.append(s);
output.append("\n\n");//these \n work but the ones in my file do not
}
System.out.println(output);
}
Output 2:
A\n
B\n
C\n
D\n
The character sequence \n is simply a human readable representation of an unprintable character.
When reading it from a file, you get two characters a '\' and an 'n', not the line break character.
As such, you'll need to replace the placeholders in your file with a 'real' line break character.
Using the method I mentioned earlier: s = s.replaceAll( "\\\\n", System.lineSeparator() ); is one way, I'm sure there are others.
Perhaps in readAllLines you can add add the above line of code to do the replacement before, or as, you stick the line in the rules array.
Edit:
The reason this doesn't work the way you expect is because you're reading it from a file. If it was hardcoded into your class, the compiler would see the '\n' sequence and say "Oh boy! A line separator! I'll just replace that with (char)0x0A".
What do you mean with "it is not working"? In what way are they not working? Do you expect to see a line break? I am not sure if you actually have the characters '\n' at the end of each line, or the LineFeed Character (0x0A). The reason your '\n' would work in the Javas source is, that this is a way to escape the linefeed character. Tell us a little about your input file, how is it generated?
Second thing I notice is, that you print the text to the console in the second Method. I am not certain, that the JOptionPane will even display line breaks this way. I think it uses a JLabel, see Java: Linebreaks in JLabels? for that. The console does interpret \n as a linebreak.
The final Answer looks like this:
private static void setGameRules(File f) throws FileNotFoundException, IOException {
rules = Files.readAllLines(f.toPath(), Charset.defaultCharset());
for(int i =0;i!=rules.size();i++){
rules.set(i, rules.get(i).replaceAll( "\\\\n","\n"));
}
}
As #Ray said the \n in the file was just being read as chars \ and n not as the line seperator \n
I just added a for-loop to run through the list and replace them using:
rules.set(i, rules.get(i).replaceAll( "\\\\n","\n")

Scanner's nextLine(), Only fetching partial

So, using something like:
for (int i = 0; i < files.length; i++) {
if (!files[i].isDirectory() && files[i].canRead()) {
try {
Scanner scan = new Scanner(files[i]);
System.out.println("Generating Categories for " + files[i].toPath());
while (scan.hasNextLine()) {
count++;
String line = scan.nextLine();
System.out.println(" ->" + line);
line = line.split("\t", 2)[1];
System.out.println("!- " + line);
JsonParser parser = new JsonParser();
JsonObject object = parser.parse(line).getAsJsonObject();
Set<Entry<String, JsonElement>> entrySet = object.entrySet();
exploreSet(entrySet);
}
scan.close();
// System.out.println(keyset);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}
as one goes over a Hadoop output file, one of the JSON objects in the middle is breaking... because scan.nextLine() is not fetching the whole line before it brings it to split. ie, the output is:
->0 {"Flags":"0","transactions":{"totalTransactionAmount":"0","totalQuantitySold":"0"},"listingStatus":"NULL","conditionRollupId":"0","photoDisplayType":"0","title":"NULL","quantityAvailable":"0","viewItemCount":"0","visitCount":"0","itemCountryId":"0","itemAspects":{ ... "sellerSiteId":"0","siteId":"0","pictureUrl":"http://somewhere.com/45/x/AlphaNumeric/$(KGrHqR,!rgF!6n5wJSTBQO-G4k(Ww~~
!- {"Flags":"0","transactions":{"totalTransactionAmount":"0","totalQuantitySold":"0"},"listingStatus":"NULL","conditionRollupId":"0","photoDisplayType":"0","title":"NULL","quantityAvailable":"0","viewItemCount":"0","visitCount":"0","itemCountryId":"0","itemAspects":{ ... "sellerSiteId":"0","siteId":"0","pictureUrl":"http://somewhere.com/45/x/AlphaNumeric/$(KGrHqR,!rgF!6n5wJSTBQO-G4k(Ww~~
Most of the above data has been sanitized (not the URL (for the most part) however... )
and the URL continues as:
$(KGrHqZHJCgFBsO4dC3MBQdC2)Y4Tg~~60_1.JPG?set_id=8800005007
in the file....
So its slightly miffing.
This also is entry #112, and I have had other files parse without errors... but this one is screwing with my mind, mostly because I dont see how scan.nextLine() isnt working...
By debug output, the JSON error is caused by the string not being split properly.
And almost forgot, it also works JUST FINE if I attempt to put the offending line in its own file and parse just that.
EDIT:
Also blows up if I remove the offending line in about the same place.
Attempted with JVM 1.6 and 1.7
Workaround Solution:
BufferedReader scan = new BufferedReader(new FileReader(files[i]));
instead of scanner....
Based on your code, the best explanation I can come up with is that the line really does end after the "~~" according to the criteria used by Scanner.nextLine().
The criteria for an end-of-line are:
Something that matches this regex: "\r\n|[\n\r\u2028\u2029\u0085]" or
The end of the input stream
You say that the file continues after the "~~", so lets put EOF aside, and look at the regex. That will match any of the following:
The usual line separators:
<CR>
<NL>
<CR><NL>
... and three unusual forms of line separator that Scanner also recognizes.
0x0085 is the <NEL> or "next line" control code in the "ISO C1 Control" group
0x2028 is the Unicode "line separator" character
0x2029 is the Unicode "paragraph separator" character
My theory is that you've got one of the "unusual" forms in your input file, and this is not showing up in .... whatever tool it is that you are using to examine the files.
I suggest that you examine the input file using a tool that can show you the actual bytes of the file; e.g. the od utility on a Linux / Unix system. Also, check that this isn't caused by some kind of character encoding mismatch ... or trying to read or write binary data as text.
If these don't help, then the next step should be to run your application using your IDE's Java debugger, and single-step it through the Scanner.hasNextLine() and nextLine() calls to find out what the code is actually doing.
And almost forgot, it also works JUST FINE if I attempt to put the offending line in its own file and parse just that.
That's interesting. But if the tool you are using to extract the line is the same one that is not showing the (hypothesized) unusual line separator, then this evidence is not reliable. The process of extraction may be altering the "stuff" that is causing the problems.

How to skip rest of line using BufferedReader

In my program users frequently search a .txt file for certain information. To know if the right bit of data has been found I first check each line to see if it starts with a special character signalling the start of a group of data, something like this:
//one character has so far been read
if(character == '#'){
//continue to examine data
}else{
//skip the rest of the line
}
The problem I'm having is how to actually "skip the rest of the line", if the line did not start with my special character of choice.
As per complaints about insufficient information: I am indeed using a while loop to read each line
You can just do the action inside the if:
BufferedReader csvFile = new BufferedReader(
new InputStreamReader(inputStream));
while ((csvLine = csvFile.readLine()) != null) {
if (csvLine.charAt(0) == '#') {
// do # data action here
}
}
use the scanner class and the method nextLine().it will help you a lot
in case if it seems a bit difficult. then read the file line by line and then use RegEx pattern to check your required pattern for that line of file.

Categories

Resources