Do any Java stream-input libraries preserve line ending characters? - java

I'd like to iterate through a text file one line at a time, operate on the contents, and stream the result to a separate file. Textbook case for BufferedReader.readLine().
But: I need to glue my lines together with newlines, and what if the original file didn't have the "right" newlines for my platform (DOS files on Linux or vice versa)? I guess I could read ahead a bit in the stream and see what kind of line endings I find, even though that's really hacky.
But: suppose my input file doesn't have a trailing newline. I'd like to keep things how they were. Now I need to peek ahead to the next line ending before reading every line. At this point why am I using a class that gives me readLine() at all?
This seems like it should be a solved problem. Is there a library (or even better, core Java7 class!) that will just let me call a method similar to readLine() that returns one line of text from a stream, with the EOL character(s) intact?

Here's an implementation that reads char by char until it finds a line terminator. The reader passed in must support mark(), so if yours doesn't, wrap it in a BufferedReader.
public static String readLineWithTerm(Reader reader) throws IOException {
if (! reader.markSupported()) {
throw new IllegalArgumentException("reader must support mark()");
}
int code;
StringBuilder line = new StringBuilder();
while ((code = reader.read()) != -1) {
char ch = (char) code;
line.append(ch);
if (ch == '\n') {
break;
} else if (ch == '\r') {
reader.mark(1);
ch = (char) reader.read();
if (ch == '\n') {
line.append(ch);
} else {
reader.reset();
}
break;
}
}
return (line.length() == 0 ? null : line.toString());
}

Update:
But: I need to glue my lines together with newlines, and what if the original file didn't have the "right" newlines for my platform (DOS files on Linux or vice versa)? I guess I could read ahead a bit in the stream and see what kind of line endings I find, even though that's really hacky.
You can create a BufferedReader with a specified charset. So if the file is wacky, you'll have to supply the file's charset. Files.newBufferedReader(Path p, Charset cs)
Is there a library (or even better, core Java7 class!) that will just
let me call a method similar to readLine() that returns one line of
text from a stream, with the EOL character(s) intact?
If you're going to read a file, you have to know what charset it is. If you know what charset it is, then you don't need the EOL character to be "intact" since you can just add it on yourself.
From BufferedReader.readLine:
Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.
Returns:
A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached
So BufferedReader.readLine does not return any line-termination characters. If you want to preserve these characters, you can use the read method instead.
int size = 1000; // size of file
BufferedReader br = new BufferedReader(new FileReader("file.txt"));
char[] buf = new char[size];
br.read(buf, 0, size);
That is just a simple example, but if the file has line termination then it will show up in the buffer.

You should be using the StreamTokenizer to get more detailed control over input pasring.
http://docs.oracle.com/javase/7/docs/api/java/io/StreamTokenizer.html

Related

How to ignore some of Java's Files.lines end-of-line delimiters

Java's Files.lines method reads all lines from a file as a Stream, breaking the file into lines at the following delimiters:
\u000D followed by \u000A, CARRIAGE RETURN followed by LINE FEED
\u000A, LINE FEED
\u000D, CARRIAGE RETURN
I have files that contain the odd occurrence of \u000D, CARRIAGE RETURN which I do not want to treat as a new line, to be consistent with the way that grep (Windows) doesn't treat just a single \u000D as a newline marker. I want to process the lines in the file as a stream, but is there a way I can get a stream that doesn't use a single \u000D as a newline marker, using just CR/LF or LF? I have to use Java 8.
My problem is that I am getting grep to return the line number with its matches, but because of the difference in EOL delimiters, Files.lines.skip(numLines) doesn't then align with the same line if I try to skip to the line number returned by grep.
Lets assume that you are doing byte-wise input ...
A scalable / efficient solution avoids holding the entire file in memory, and / or creating a string object for each line of input that you skip. This is one way to do it.
File f = ...
InputStream is = new BufferedInputStream(new FileInputStream(f));
int lineCounter = 1;
int wantedLine = 42;
int b = 0;
while (lineCounter < wantedLine && b != -1) {
do {
b = is.read();
if (b == '\n') {
lineCount++;
}
} while (b != -1 && b != '\n');
}
if (lineCounter == wantedLine) {
// do stuff
}
Notes:
I know this is a bit clunky. And it would be possible to do away with the nested loop ... but this code is intended to be "illustrative" of an approach.
You could possibly get better performance by using ByteBuffer, but it makes the code more complicated. (If you are unfamiliar with the Buffer APIs.)
You could do something similar with a BufferedReader.
For production quality code, you should use try with resources to manage the InputStream resource.
Try this.
Stream.of(Files.readString(path).split("\r?\n"))
.filter(...

Java Reader that reads line with line terminator [duplicate]

I'd like to iterate through a text file one line at a time, operate on the contents, and stream the result to a separate file. Textbook case for BufferedReader.readLine().
But: I need to glue my lines together with newlines, and what if the original file didn't have the "right" newlines for my platform (DOS files on Linux or vice versa)? I guess I could read ahead a bit in the stream and see what kind of line endings I find, even though that's really hacky.
But: suppose my input file doesn't have a trailing newline. I'd like to keep things how they were. Now I need to peek ahead to the next line ending before reading every line. At this point why am I using a class that gives me readLine() at all?
This seems like it should be a solved problem. Is there a library (or even better, core Java7 class!) that will just let me call a method similar to readLine() that returns one line of text from a stream, with the EOL character(s) intact?
Here's an implementation that reads char by char until it finds a line terminator. The reader passed in must support mark(), so if yours doesn't, wrap it in a BufferedReader.
public static String readLineWithTerm(Reader reader) throws IOException {
if (! reader.markSupported()) {
throw new IllegalArgumentException("reader must support mark()");
}
int code;
StringBuilder line = new StringBuilder();
while ((code = reader.read()) != -1) {
char ch = (char) code;
line.append(ch);
if (ch == '\n') {
break;
} else if (ch == '\r') {
reader.mark(1);
ch = (char) reader.read();
if (ch == '\n') {
line.append(ch);
} else {
reader.reset();
}
break;
}
}
return (line.length() == 0 ? null : line.toString());
}
Update:
But: I need to glue my lines together with newlines, and what if the original file didn't have the "right" newlines for my platform (DOS files on Linux or vice versa)? I guess I could read ahead a bit in the stream and see what kind of line endings I find, even though that's really hacky.
You can create a BufferedReader with a specified charset. So if the file is wacky, you'll have to supply the file's charset. Files.newBufferedReader(Path p, Charset cs)
Is there a library (or even better, core Java7 class!) that will just
let me call a method similar to readLine() that returns one line of
text from a stream, with the EOL character(s) intact?
If you're going to read a file, you have to know what charset it is. If you know what charset it is, then you don't need the EOL character to be "intact" since you can just add it on yourself.
From BufferedReader.readLine:
Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.
Returns:
A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached
So BufferedReader.readLine does not return any line-termination characters. If you want to preserve these characters, you can use the read method instead.
int size = 1000; // size of file
BufferedReader br = new BufferedReader(new FileReader("file.txt"));
char[] buf = new char[size];
br.read(buf, 0, size);
That is just a simple example, but if the file has line termination then it will show up in the buffer.
You should be using the StreamTokenizer to get more detailed control over input pasring.
http://docs.oracle.com/javase/7/docs/api/java/io/StreamTokenizer.html

How to read CR LF line with BufferedReader?

I am trying to create a simple java server application, but I am struggling to read the user's input with BufferedReader correctly.
The problem is that the line delimeter is CR LF ("\r\n") only - so I can't use readLine() standard method. Therefore, I tried to implement my own method:
private String readCRLFLine(BufferedReader in) {
StringBuilder result = new StringBuilder();
char cr = 'a'; // initialize cr
char lf;
try {
while (((lf = (char) in.read()) != '\n') && (cr != '\r')) {
cr = lf;
result.append(lf);
}
result.deleteCharAt(result.length() - 1); // delete \r from the result
} catch (IOException ex) {
// handle the exception here
}
return result.toString();
}
Now, when I try to get and print the result:
while ((userMessage = readCRLFLine(inputStream)) != null) {
System.out.println(userMessage);
break;
}
...it only prints the sequence of characters to the first break line character ("\r" or "\n") and the rest of the input is read by the second call of method
readCRLFLine(inputStream) - which should already read other input.
I want to be able to handle inputs like:
"abc\rabc\n\r\abc\r\n"
In this case userMessage should be:
"abc\rabc\n\r\abc"
I can't use Scanner and it's "useDelimeter("\r\n")" method, because I need to set up TIMEOUT with "clientSocket.setSoTimeout(TIMEOUT)" and from what I know, this is not possible with Scanners.
What is wrong with my "readCRLFLine" method? Is there any other solution to this problem?
Any help would be appreciated.
Your method is incorrect because the while condition is not what you actually need.
It says "As long as the currently read character is different than \n and the previous character is different than \r, continue reading".
But that means that when you read a \n, that condition is false, because lf != '\n' is false. So it will stop on the first \n. It will also stop on any character that follows a \r, because then cr != '\r' is going to be false making the combined condition false.
You want to stop the loop when the current character is \n and the previous character is \r. That means your while condition is supposed to be! ( lf == '\n' && cr == '\r').
Now pay attention to the DeMorgan law: "not (A and B)" is equivalent to "not A OR not B". You mistakenly decided to use AND instead of OR. Change your condition to:
while (((lf = (char) in.read()) != '\n') || (cr != '\r')) {
...
}
By the way, you should probably check that the value you read from in is not negative before you convert it to char and compare it. A negative value indicates that the client closed the connection, and you'll get into an endless loop if that happens.

reading line in bufferedReader

From the javadoc
public String readLine()
throws IOException
Read a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.
I have following kind of text :
Now the earth was formless and empty. Darkness was on the surface
of the deep. God's Spirit was hovering over the surface
of the waters.
I am reading lines as:
while(buffer.readline() != null){
}
But, the problem is it is considering a line for string upto before newline.But i would like to consider line when string ends with .. How would i do it?
You can use a Scanner and set your own delimiter using useDelimiter(Pattern).
Note that the input delimiter is a regex, so you will need to provide the regex \. (you need to break the special meaning of the character . in regex)
You can read a character at a time, and copy the data to a StringBuilder
Reader reader = ...;
StringBuilder sb = new StringBuilder();
int ch;
while((ch = reader.read()) >= 0) {
if(ch == '.') break;
sb.append((char) ch);
}
Use a java.util.Scanner instead of a buffered reader, and set the delimiter to "\\." with Scanner.useDelimiter().
(but be aware that the delimiter is consumed, so you'll have to add it again!)
or read the raw string and split it on each .
You could split the whole text by every .:
String text = "Your test.";
String[] lines = text.split("\\.");
After you split the text you get an array of lines. You could also use a regex if you want more control, e.g. to split the text also by : or ;. Just google it.
PS.: Perhaps you have to remove the new line characters first with something like:
text = text.replaceAll("\n", "");

How to remove line breaks from a file in Java?

How can I replace all line breaks from a string in Java in such a way that will work on Windows and Linux (ie no OS specific problems of carriage return/line feed/new line etc.)?
I've tried (note readFileAsString is a function that reads a text file into a String):
String text = readFileAsString("textfile.txt");
text.replace("\n", "");
but this doesn't seem to work.
How can this be done?
You need to set text to the results of text.replace():
String text = readFileAsString("textfile.txt");
text = text.replace("\n", "").replace("\r", "");
This is necessary because Strings are immutable -- calling replace doesn't change the original String, it returns a new one that's been changed. If you don't assign the result to text, then that new String is lost and garbage collected.
As for getting the newline String for any environment -- that is available by calling System.getProperty("line.separator").
As noted in other answers, your code is not working primarily because String.replace(...) does not change the target String. (It can't - Java strings are immutable!) What replace actually does is to create and return a new String object with the characters changed as required. But your code then throws away that String ...
Here are some possible solutions. Which one is most correct depends on what exactly you are trying to do.
// #1
text = text.replace("\n", "");
Simply removes all the newline characters. This does not cope with Windows or Mac line terminations.
// #2
text = text.replace(System.getProperty("line.separator"), "");
Removes all line terminators for the current platform. This does not cope with the case where you are trying to process (for example) a UNIX file on Windows, or vice versa.
// #3
text = text.replaceAll("\\r|\\n", "");
Removes all Windows, UNIX or Mac line terminators. However, if the input file is text, this will concatenate words; e.g.
Goodbye cruel
world.
becomes
Goodbye cruelworld.
So you might actually want to do this:
// #4
text = text.replaceAll("\\r\\n|\\r|\\n", " ");
which replaces each line terminator with a space1. Since Java 8 you can also do this:
// #5
text = text.replaceAll("\\R", " ");
And if you want to replace multiple line terminator with one space:
// #6
text = text.replaceAll("\\R+", " ");
1 - Note there is a subtle difference between #3 and #4. The sequence \r\n represents a single (Windows) line terminator, so we need to be careful not to replace it with two spaces.
This function normalizes down all whitespace, including line breaks, to single spaces. Not exactly what the original question asked for, but likely to do exactly what is needed in many cases:
import org.apache.commons.lang3.StringUtils;
final String cleansedString = StringUtils.normalizeSpace(rawString);
If you want to remove only line terminators that are valid on the current OS, you could do this:
text = text.replaceAll(System.getProperty("line.separator"), "");
If you want to make sure you remove any line separators, you can do it like this:
text = text.replaceAll("\\r|\\n", "");
Or, slightly more verbose, but less regexy:
text = text.replaceAll("\\r", "").replaceAll("\\n", "");
str = str.replaceAll("\\r\\n|\\r|\\n", " ");
Worked perfectly for me after searching a lot, having failed with every other line.
This would be efficient I guess
String s;
s = "try this\n try me.";
s.replaceAll("[\\r\\n]+", "")
Linebreaks are not the same under windows/linux/mac. You should use System.getProperties with the attribute line.separator.
String text = readFileAsString("textfile.txt").replaceAll("\n", "");
Even though the definition of trim() in oracle website is
"Returns a copy of the string, with leading and trailing whitespace omitted."
the documentation omits to say that new line characters (leading and trailing) will also be removed.
In short
String text = readFileAsString("textfile.txt").trim(); will also work for you.
(Checked with Java 6)
In Kotlin, and also since Java 11, String has lines() method, which returns list of lines in the multi-line string.
You can get all the lines and then merge them into a single string.
With Kotlin it will be as simple as
str.lines().joinToString("")
String text = readFileAsString("textfile.txt").replace("\n","");
.replace returns a new string, strings in Java are Immutable.
You may want to read your file with a BufferedReader. This class can break input into individual lines, which you can assemble at will. The way BufferedReader operates recognizes line ending conventions of the Linux, Windows and MacOS worlds automatically, regardless of the current platform.
Hence:
BufferedReader br = new BufferedReader(
new InputStreamReader("textfile.txt"));
StringBuilder sb = new StringBuilder();
for (;;) {
String line = br.readLine();
if (line == null)
break;
sb.append(line);
sb.append(' '); // SEE BELOW
}
String text = sb.toString();
Note that readLine() does not include the line terminator in the returned string. The code above appends a space to avoid gluing together the last word of a line and the first word of the next line.
I find it odd that (Apache) StringUtils wasn't covered here yet.
you can remove all newlines (or any other occurences of a substring for that matter) from a string using the .replace method
StringUtils.replace(myString, "\n", "");
This line will replace all newlines with the empty string.
because newline is technically a character you can optionally use the .replaceChars method that will replace characters
StringUtils.replaceChars(myString, '\n', '');
FYI if you can want to replace simultaneous muti-linebreaks with single line break then you can use
myString.trim().replaceAll("[\n]{2,}", "\n")
Or replace with a single space
myString.trim().replaceAll("[\n]{2,}", " ")
You can use apache commons IOUtils to iterate through the line and append each line to StringBuilder. And don't forget to close the InputStream
StringBuilder sb = new StringBuilder();
FileInputStream fin=new FileInputStream("textfile.txt");
LineIterator lt=IOUtils.lineIterator(fin, "utf-8");
while(lt.hasNext())
{
sb.append(lt.nextLine());
}
String text = sb.toString();
IOUtils.closeQuitely(fin);
You can use generic methods to replace any char with any char.
public static void removeWithAnyChar(String str, char replceChar,
char replaceWith) {
char chrs[] = str.toCharArray();
int i = 0;
while (i < chrs.length) {
if (chrs[i] == replceChar) {
chrs[i] = replaceWith;
}
i++;
}
}
org.apache.commons.lang.StringUtils#chopNewline
Try doing this:
textValue= textValue.replaceAll("\n", "");
textValue= textValue.replaceAll("\t", "");
textValue= textValue.replaceAll("\\n", "");
textValue= textValue.replaceAll("\\t", "");
textValue= textValue.replaceAll("\r", "");
textValue= textValue.replaceAll("\\r", "");
textValue= textValue.replaceAll("\r\n", "");
textValue= textValue.replaceAll("\\r\\n", "");

Categories

Resources