How to read CR LF line with BufferedReader?

How to read CR LF line with BufferedReader? - java

I am trying to create a simple java server application, but I am struggling to read the user's input with BufferedReader correctly.
The problem is that the line delimeter is CR LF ("\r\n") only - so I can't use readLine() standard method. Therefore, I tried to implement my own method:
private String readCRLFLine(BufferedReader in) {
StringBuilder result = new StringBuilder();
char cr = 'a'; // initialize cr
char lf;
try {
while (((lf = (char) in.read()) != '\n') && (cr != '\r')) {
cr = lf;
result.append(lf);
}
result.deleteCharAt(result.length() - 1); // delete \r from the result
} catch (IOException ex) {
// handle the exception here
}
return result.toString();
}
Now, when I try to get and print the result:
while ((userMessage = readCRLFLine(inputStream)) != null) {
System.out.println(userMessage);
break;
}
...it only prints the sequence of characters to the first break line character ("\r" or "\n") and the rest of the input is read by the second call of method
readCRLFLine(inputStream) - which should already read other input.
I want to be able to handle inputs like:
"abc\rabc\n\r\abc\r\n"
In this case userMessage should be:
"abc\rabc\n\r\abc"
I can't use Scanner and it's "useDelimeter("\r\n")" method, because I need to set up TIMEOUT with "clientSocket.setSoTimeout(TIMEOUT)" and from what I know, this is not possible with Scanners.
What is wrong with my "readCRLFLine" method? Is there any other solution to this problem?
Any help would be appreciated.

Your method is incorrect because the while condition is not what you actually need.
It says "As long as the currently read character is different than \n and the previous character is different than \r, continue reading".
But that means that when you read a \n, that condition is false, because lf != '\n' is false. So it will stop on the first \n. It will also stop on any character that follows a \r, because then cr != '\r' is going to be false making the combined condition false.
You want to stop the loop when the current character is \n and the previous character is \r. That means your while condition is supposed to be! ( lf == '\n' && cr == '\r').
Now pay attention to the DeMorgan law: "not (A and B)" is equivalent to "not A OR not B". You mistakenly decided to use AND instead of OR. Change your condition to:
while (((lf = (char) in.read()) != '\n') || (cr != '\r')) {
...
}
By the way, you should probably check that the value you read from in is not negative before you convert it to char and compare it. A negative value indicates that the client closed the connection, and you'll get into an endless loop if that happens.

Related

How to ignore some of Java's Files.lines end-of-line delimiters

Java's Files.lines method reads all lines from a file as a Stream, breaking the file into lines at the following delimiters:
\u000D followed by \u000A, CARRIAGE RETURN followed by LINE FEED
\u000A, LINE FEED
\u000D, CARRIAGE RETURN
I have files that contain the odd occurrence of \u000D, CARRIAGE RETURN which I do not want to treat as a new line, to be consistent with the way that grep (Windows) doesn't treat just a single \u000D as a newline marker. I want to process the lines in the file as a stream, but is there a way I can get a stream that doesn't use a single \u000D as a newline marker, using just CR/LF or LF? I have to use Java 8.
My problem is that I am getting grep to return the line number with its matches, but because of the difference in EOL delimiters, Files.lines.skip(numLines) doesn't then align with the same line if I try to skip to the line number returned by grep.

Lets assume that you are doing byte-wise input ...
A scalable / efficient solution avoids holding the entire file in memory, and / or creating a string object for each line of input that you skip. This is one way to do it.
File f = ...
InputStream is = new BufferedInputStream(new FileInputStream(f));
int lineCounter = 1;
int wantedLine = 42;
int b = 0;
while (lineCounter < wantedLine && b != -1) {
do {
b = is.read();
if (b == '\n') {
lineCount++;
}
} while (b != -1 && b != '\n');
}
if (lineCounter == wantedLine) {
// do stuff
}
Notes:
I know this is a bit clunky. And it would be possible to do away with the nested loop ... but this code is intended to be "illustrative" of an approach.
You could possibly get better performance by using ByteBuffer, but it makes the code more complicated. (If you are unfamiliar with the Buffer APIs.)
You could do something similar with a BufferedReader.
For production quality code, you should use try with resources to manage the InputStream resource.

Try this.
Stream.of(Files.readString(path).split("\r?\n"))
.filter(...

Enter a newline character in a text box to be passed into `Matcher.replaceAll()` [duplicate]

This question already has answers here:
How to unescape a Java string literal in Java?
(11 answers)
Closed 2 years ago.
I am working on adding search / replace functionality to an android application.
I would like the user to be able to search and replace using regular expressions. The search functionality works correctly, however a newline character in the replacement string \n is interpreted as a literal 'n'.
Here is what my code is doing:
Pattern.compile(search.getText().toString()).matcher(text).replaceAll(replace.getText().toString())
Given
text is a CharSequence with contents A \n\n\nB (note the trailing space after 'A')
search being a textbox with the contents \s+\n
replace being a text box with contents \n.
I expect the result of the code above to be text = A\n\n\nB (trailing spaces removed).
Instead the result is text = An\n\nB. i.e. the \n in the replace text box is interpreted as a literal 'n'.
I would like to know what I can do in order to read the contents of replace, such that the \n is interpreted as a newline.
Note that I can achieve the desired result in the example by capturing the newline like with search = \s+(\n) and replace = $1. This is not, however, the issue.
For the purposes of this discussion I am only considering Unix line endings.
Edit:
using replace contents = \\n results in a literal '\n' being inserted.
i.e.
A
B
is transformed to
A\n
B

The approach suggested by Wiktor Stribiżew found in in stackoverflow.com/a/4298836 works for me.
Essentially, we need to parse the string and replace each escaped character with the correct escaped sequence.
Here is the code I used:
private String unescape(final String input) {
final StringBuilder builder = new StringBuilder();
boolean isEscaped = false;
for (int i = 0; i < input.length(); i++) {
char current = input.charAt(i);
if (isEscaped) {
if (current == 't') {
builder.append('\t');
} else if (current == 'b') {
builder.append('\b');
} else if (current == 'r') {
builder.append('\r');
} else if (current == 'n') {
builder.append('\n');
} else if (current == 'f') {
builder.append('\f');
} else if (current == '\\' || current == '\'' || current == '"') {
builder.append(current);
} else {
throw new IllegalArgumentException("Illegal escape sequence.");
}
isEscaped = false;
} else if (current == '\\') {
isEscaped = true;
} else {
builder.append(current);
}
}
return builder.toString();
}
It isn't as complete or as correct as the solution in the answer linked above, but it appears to work correctly for my purposes.

The issue here seems to be that '' is itself a special character. You have to double it for regex to see it as a necessary character. Doing a double replacement on all '' should proof useful. like this
yourString = yourString.replace("\", "\\");

Trying to parse words into a HashMap from an input stream (character by character), however, spaces keep coming in?

I'm trying to parse words from an input file into a hashmap where every word maps to the amount of times it occurs in a file. I must do this via a character stream (i.e: I have to traverse the file character by character). Now this is working fine, however, my parser seems to sometimes skip whitespaces and include two words as a single string (e.g: themiddle, helloworld, etc...) can anyone point out what I'm doing wrong? Also, is there anyway to include words that are of the form (Letter/Digit).(letter/Digit). (so it is a letter then a dot then a letter for as many times (abbreviations in other words such as I.B.M)).
Here is a snippet of my code
int i;
while ((i=f.read()) != -1) {
if (Character.isLetterOrDigit(i)) {
st += (char)i;
}
else {
st = st.toLowerCase();
if (tokens.containsKey(st)) {
int temp = tokens.get(st);
tokens.put(st, temp+=1);
}
else {
tokens.put(st, 1);
st = "";
}
}
}
tokens.remove("");
return tokens;
}`
any help would be appreciated, the input is a filereader object btw.

This is what sets your string back to empty:
st = "";
So it should be in the outer else, not inner, otherwise you're only setting it as empty when you find a new (not repeated) word. This will make the next word concatenate with the previous.
As for the second part, you could do something like
if ( Character.isLetterOrDigit(i) || (st != "" && (char)i == '.') ) {
st += (char)i;
}
Edit:
And then, to remove the last period, just check if the last character of st is a period when it gets to the else.
Another edit:
If you want exactly one letter before each period, you can check the string st backwards in the if. Or just process it after the split, in the else. Or even use Regular Expressions.
How you implement this is up to you.

Java Reader that reads line with line terminator [duplicate]

I'd like to iterate through a text file one line at a time, operate on the contents, and stream the result to a separate file. Textbook case for BufferedReader.readLine().
But: I need to glue my lines together with newlines, and what if the original file didn't have the "right" newlines for my platform (DOS files on Linux or vice versa)? I guess I could read ahead a bit in the stream and see what kind of line endings I find, even though that's really hacky.
But: suppose my input file doesn't have a trailing newline. I'd like to keep things how they were. Now I need to peek ahead to the next line ending before reading every line. At this point why am I using a class that gives me readLine() at all?
This seems like it should be a solved problem. Is there a library (or even better, core Java7 class!) that will just let me call a method similar to readLine() that returns one line of text from a stream, with the EOL character(s) intact?

Here's an implementation that reads char by char until it finds a line terminator. The reader passed in must support mark(), so if yours doesn't, wrap it in a BufferedReader.
public static String readLineWithTerm(Reader reader) throws IOException {
if (! reader.markSupported()) {
throw new IllegalArgumentException("reader must support mark()");
}
int code;
StringBuilder line = new StringBuilder();
while ((code = reader.read()) != -1) {
char ch = (char) code;
line.append(ch);
if (ch == '\n') {
break;
} else if (ch == '\r') {
reader.mark(1);
ch = (char) reader.read();
if (ch == '\n') {
line.append(ch);
} else {
reader.reset();
}
break;
}
}
return (line.length() == 0 ? null : line.toString());
}

Update:
But: I need to glue my lines together with newlines, and what if the original file didn't have the "right" newlines for my platform (DOS files on Linux or vice versa)? I guess I could read ahead a bit in the stream and see what kind of line endings I find, even though that's really hacky.
You can create a BufferedReader with a specified charset. So if the file is wacky, you'll have to supply the file's charset. Files.newBufferedReader(Path p, Charset cs)
Is there a library (or even better, core Java7 class!) that will just
let me call a method similar to readLine() that returns one line of
text from a stream, with the EOL character(s) intact?
If you're going to read a file, you have to know what charset it is. If you know what charset it is, then you don't need the EOL character to be "intact" since you can just add it on yourself.
From BufferedReader.readLine:
Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.
Returns:
A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached
So BufferedReader.readLine does not return any line-termination characters. If you want to preserve these characters, you can use the read method instead.
int size = 1000; // size of file
BufferedReader br = new BufferedReader(new FileReader("file.txt"));
char[] buf = new char[size];
br.read(buf, 0, size);
That is just a simple example, but if the file has line termination then it will show up in the buffer.

You should be using the StreamTokenizer to get more detailed control over input pasring.
http://docs.oracle.com/javase/7/docs/api/java/io/StreamTokenizer.html

Do any Java stream-input libraries preserve line ending characters?

I'd like to iterate through a text file one line at a time, operate on the contents, and stream the result to a separate file. Textbook case for BufferedReader.readLine().
But: I need to glue my lines together with newlines, and what if the original file didn't have the "right" newlines for my platform (DOS files on Linux or vice versa)? I guess I could read ahead a bit in the stream and see what kind of line endings I find, even though that's really hacky.
But: suppose my input file doesn't have a trailing newline. I'd like to keep things how they were. Now I need to peek ahead to the next line ending before reading every line. At this point why am I using a class that gives me readLine() at all?
This seems like it should be a solved problem. Is there a library (or even better, core Java7 class!) that will just let me call a method similar to readLine() that returns one line of text from a stream, with the EOL character(s) intact?

Here's an implementation that reads char by char until it finds a line terminator. The reader passed in must support mark(), so if yours doesn't, wrap it in a BufferedReader.
public static String readLineWithTerm(Reader reader) throws IOException {
if (! reader.markSupported()) {
throw new IllegalArgumentException("reader must support mark()");
}
int code;
StringBuilder line = new StringBuilder();
while ((code = reader.read()) != -1) {
char ch = (char) code;
line.append(ch);
if (ch == '\n') {
break;
} else if (ch == '\r') {
reader.mark(1);
ch = (char) reader.read();
if (ch == '\n') {
line.append(ch);
} else {
reader.reset();
}
break;
}
}
return (line.length() == 0 ? null : line.toString());
}

Update:
But: I need to glue my lines together with newlines, and what if the original file didn't have the "right" newlines for my platform (DOS files on Linux or vice versa)? I guess I could read ahead a bit in the stream and see what kind of line endings I find, even though that's really hacky.
You can create a BufferedReader with a specified charset. So if the file is wacky, you'll have to supply the file's charset. Files.newBufferedReader(Path p, Charset cs)
Is there a library (or even better, core Java7 class!) that will just
let me call a method similar to readLine() that returns one line of
text from a stream, with the EOL character(s) intact?
If you're going to read a file, you have to know what charset it is. If you know what charset it is, then you don't need the EOL character to be "intact" since you can just add it on yourself.
From BufferedReader.readLine:
Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.
Returns:
A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached
So BufferedReader.readLine does not return any line-termination characters. If you want to preserve these characters, you can use the read method instead.
int size = 1000; // size of file
BufferedReader br = new BufferedReader(new FileReader("file.txt"));
char[] buf = new char[size];
br.read(buf, 0, size);
That is just a simple example, but if the file has line termination then it will show up in the buffer.

You should be using the StreamTokenizer to get more detailed control over input pasring.
http://docs.oracle.com/javase/7/docs/api/java/io/StreamTokenizer.html

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to read CR LF line with BufferedReader? - java

Related

How to ignore some of Java's Files.lines end-of-line delimiters

Enter a newline character in a text box to be passed into `Matcher.replaceAll()` [duplicate]

Trying to parse words into a HashMap from an input stream (character by character), however, spaces keep coming in?

Java Reader that reads line with line terminator [duplicate]

Do any Java stream-input libraries preserve line ending characters?

Categories

Resources