From the javadoc
public String readLine()
throws IOException
Read a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.
I have following kind of text :
Now the earth was formless and empty. Darkness was on the surface
of the deep. God's Spirit was hovering over the surface
of the waters.
I am reading lines as:
while(buffer.readline() != null){
}
But, the problem is it is considering a line for string upto before newline.But i would like to consider line when string ends with .. How would i do it?
You can use a Scanner and set your own delimiter using useDelimiter(Pattern).
Note that the input delimiter is a regex, so you will need to provide the regex \. (you need to break the special meaning of the character . in regex)
You can read a character at a time, and copy the data to a StringBuilder
Reader reader = ...;
StringBuilder sb = new StringBuilder();
int ch;
while((ch = reader.read()) >= 0) {
if(ch == '.') break;
sb.append((char) ch);
}
Use a java.util.Scanner instead of a buffered reader, and set the delimiter to "\\." with Scanner.useDelimiter().
(but be aware that the delimiter is consumed, so you'll have to add it again!)
or read the raw string and split it on each .
You could split the whole text by every .:
String text = "Your test.";
String[] lines = text.split("\\.");
After you split the text you get an array of lines. You could also use a regex if you want more control, e.g. to split the text also by : or ;. Just google it.
PS.: Perhaps you have to remove the new line characters first with something like:
text = text.replaceAll("\n", "");
Related
I'd like to iterate through a text file one line at a time, operate on the contents, and stream the result to a separate file. Textbook case for BufferedReader.readLine().
But: I need to glue my lines together with newlines, and what if the original file didn't have the "right" newlines for my platform (DOS files on Linux or vice versa)? I guess I could read ahead a bit in the stream and see what kind of line endings I find, even though that's really hacky.
But: suppose my input file doesn't have a trailing newline. I'd like to keep things how they were. Now I need to peek ahead to the next line ending before reading every line. At this point why am I using a class that gives me readLine() at all?
This seems like it should be a solved problem. Is there a library (or even better, core Java7 class!) that will just let me call a method similar to readLine() that returns one line of text from a stream, with the EOL character(s) intact?
Here's an implementation that reads char by char until it finds a line terminator. The reader passed in must support mark(), so if yours doesn't, wrap it in a BufferedReader.
public static String readLineWithTerm(Reader reader) throws IOException {
if (! reader.markSupported()) {
throw new IllegalArgumentException("reader must support mark()");
}
int code;
StringBuilder line = new StringBuilder();
while ((code = reader.read()) != -1) {
char ch = (char) code;
line.append(ch);
if (ch == '\n') {
break;
} else if (ch == '\r') {
reader.mark(1);
ch = (char) reader.read();
if (ch == '\n') {
line.append(ch);
} else {
reader.reset();
}
break;
}
}
return (line.length() == 0 ? null : line.toString());
}
Update:
But: I need to glue my lines together with newlines, and what if the original file didn't have the "right" newlines for my platform (DOS files on Linux or vice versa)? I guess I could read ahead a bit in the stream and see what kind of line endings I find, even though that's really hacky.
You can create a BufferedReader with a specified charset. So if the file is wacky, you'll have to supply the file's charset. Files.newBufferedReader(Path p, Charset cs)
Is there a library (or even better, core Java7 class!) that will just
let me call a method similar to readLine() that returns one line of
text from a stream, with the EOL character(s) intact?
If you're going to read a file, you have to know what charset it is. If you know what charset it is, then you don't need the EOL character to be "intact" since you can just add it on yourself.
From BufferedReader.readLine:
Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.
Returns:
A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached
So BufferedReader.readLine does not return any line-termination characters. If you want to preserve these characters, you can use the read method instead.
int size = 1000; // size of file
BufferedReader br = new BufferedReader(new FileReader("file.txt"));
char[] buf = new char[size];
br.read(buf, 0, size);
That is just a simple example, but if the file has line termination then it will show up in the buffer.
You should be using the StreamTokenizer to get more detailed control over input pasring.
http://docs.oracle.com/javase/7/docs/api/java/io/StreamTokenizer.html
I want to split the file in the picture and i can't because of the lines...The output i get is all together.What is the right command in regex to split even if there is a new line ?
You are reading the file using a Scanner to get each line.
while(sc.hasNextLine())
You append the StringBuffer with each line (without a separators).
sb.append(sc.nextLine())
So at that point, a file with :
foo
bar
Would give a StringBuffer with
sb.toString(); //foobar
So there is not line separator left... Instead, do that with the Scanner.
while(sc.hasNextLine()){
String s = sc.nextLine();
System.out.println(s):
sb.append(s);
}
Last thing, please close every Scanner you open with sc.close() when you are done with them (that's a good practice to have).
EDIT:
If you want to keep that logic, simply append with a line separator after each lines:
while(sc.hasNextLine()){
sb.append(sc.nextLine()).append("\n");
}
That way, you will always have a line separator and the split will work
Quick example of what this would look like :
StringBuffer sb = new StringBuffer();
sb.append("foo").append("\n")
.append("bar").append("\n"); //fake "file reading"
String[] result = sb.toString().split("\n");
for(String s : result){
System.out.println(s);
}
foo
bar
This should do:
protasi[] = string.split("\\r?\\n");
split("\n") won't work because you need to provide a back-slash for \n which becomes \\n. I'd also added \\r which is what we should use for newline/carriage return.
I'm trying to split text in a JTextArea using a regex to split the String by \n However, this does not work and I also tried by \r\n|\r|n and many other combination of regexes.
Code:
public void insertUpdate(DocumentEvent e) {
String split[], docStr = null;
Document textAreaDoc = (Document)e.getDocument();
try {
docStr = textAreaDoc.getText(textAreaDoc.getStartPosition().getOffset(), textAreaDoc.getEndPosition().getOffset());
} catch (BadLocationException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
split = docStr.split("\\n");
}
This should cover you:
String lines[] = string.split("\\r?\\n");
There's only really two newlines (UNIX and Windows) that you need to worry about.
String#split(String regex) method is using regex (regular expressions). Since Java 8 regex supports \R which represents (from documentation of Pattern class):
Linebreak matcher
\R Any Unicode linebreak sequence, is equivalent to
\u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]
So we can use it to match:
\u000D\000A -> \r\n pair
\u000A -> line feed (\n)
\u000B -> line tabulation (DO NOT confuse with character tabulation \t which is \u0009)
\u000C -> form feed (\f)
\u000D -> carriage return (\r)
\u0085 -> next line (NEL)
\u2028 -> line separator
\u2029 -> paragraph separator
As you see \r\n is placed at start of regex which ensures that regex will try to match this pair first, and only if that match fails it will try to match single character line separators.
So if you want to split on line separator use split("\\R").
If you don't want to remove from resulting array trailing empty strings "" use split(regex, limit) with negative limit parameter like split("\\R", -1).
If you want to treat one or more continues empty lines as single delimiter use split("\\R+").
If you don’t want empty lines:
String.split("[\\r\\n]+")
String.split(System.lineSeparator());
This should be system independent
A new method lines has been introduced to String class in java-11, which returns Stream<String>
Returns a stream of substrings extracted from this string partitioned
by line terminators.
Line terminators recognized are line feed "\n" (U+000A), carriage
return "\r" (U+000D) and a carriage return followed immediately by a
line feed "\r\n" (U+000D U+000A).
Here are a few examples:
jshell> "lorem \n ipusm \n sit".lines().forEach(System.out::println)
lorem
ipusm
sit
jshell> "lorem \n ipusm \r sit".lines().forEach(System.out::println)
lorem
ipusm
sit
jshell> "lorem \n ipusm \r\n sit".lines().forEach(System.out::println)
lorem
ipusm
sit
String#lines()
In JDK11 the String class has a lines() method:
Returning a stream of lines extracted from this string, separated by
line terminators.
Further, the documentation goes on to say:
A line terminator is one of the following: a line feed character "\n"
(U+000A), a carriage return character "\r" (U+000D), or a carriage
return followed immediately by a line feed "\r\n" (U+000D U+000A). A
line is either a sequence of zero or more characters followed by a
line terminator, or it is a sequence of one or more characters
followed by the end of the string. A line does not include the line
terminator.
With this one can simply do:
Stream<String> stream = str.lines();
then if you want an array:
String[] array = str.lines().toArray(String[]::new);
Given this method returns a Stream it upon up a lot of options for you as it enables one to write concise and declarative expression of possibly-parallel operations.
You don't have to double escape characters in character groups.
For all non empty lines use:
String.split("[\r\n]+")
All answers given here actually do not respect Javas definition of new lines as given in e.g. BufferedReader#readline. Java is accepting \n, \r and \r\n as new line. Some of the answers match multiple empty lines or malformed files. E..g. <sometext>\n\r\n<someothertext> when using [\r\n]+would result in two lines.
String lines[] = string.split("(\r\n|\r|\n)", -1);
In contrast, the answer above has the following properties:
it complies with Javas definition of a new line such as e.g. the BufferedReader is using it
it does not match multiple new lines
it does not remove trailing empty lines
If, for some reason, you don't want to use String.split (for example, because of regular expressions) and you want to use functional programming on Java 8 or newer:
List<String> lines = new BufferedReader(new StringReader(string))
.lines()
.collect(Collectors.toList());
Maybe this would work:
Remove the double backslashes from the parameter of the split method:
split = docStr.split("\n");
For preserving empty lines from getting squashed use:
String lines[] = String.split("\\r?\\n", -1);
The above answers did not help me on Android, thanks to the Pshemo response that worked for me on Android. I will leave some of Pshemo's answer here :
split("\\\\n")
The above code doesnt actually do anything visible - it just calcualtes then dumps the calculation. Is it the code you used, or just an example for this question?
try doing textAreaDoc.insertString(int, String, AttributeSet) at the end?
There is new boy in the town, so you need not to deal with all above complexities.
From JDK 11 onward, just need to write as single line of code, it will split lines and returns you Stream of String.
public class MyClass {
public static void main(String args[]) {
Stream<String> lines="foo \n bar \n baz".lines();
//Do whatever you want to do with lines
}}
Some references.
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html#lines()
https://www.azul.com/90-new-features-and-apis-in-jdk-11/
I hope this will be helpful to someone. Happy coding.
Sadly, Java lacks a both simple and efficient method for splitting a string by a fixed string. Both String::split and the stream API are complex and relatively slow. Also, they can produce different results.
String::split examines its input, then compiles to java.util.regex.Pattern every time (except if the input contains only a single char that's safe).
However, Pattern is very fast, once it was compiled. So the best solution is to precompile the pattern:
private static final Pattern LINE_SEP_PATTERN = Pattern.compile("\\R");
Then use it like this:
String[] lines = LINE_SEP_PATTERN.split(input);
From Java 8, \R matches to any line break specified by Unicode. Prior to Java 8 you could use something like this:
Pattern.compile(Pattern.quote(System.lineSeparator()))
String lines[] =String.split( System.lineSeparator())
After failed attempts on the basis of all given solutions. I replace \n with some special word and then split. For me following did the trick:
article = "Alice phoned\n bob.";
article = article.replace("\\n", " NEWLINE ");
String sen [] = article.split(" NEWLINE ");
I couldn't replicate the example given in the question. But, I guess this logic can be applied.
As an alternative to the previous answers, guava's Splitter API can be used if other operations are to be applied to the resulting lines, like trimming lines or filtering empty lines :
import com.google.common.base.Splitter;
Iterable<String> split = Splitter.onPattern("\r?\n").trimResults().omitEmptyStrings().split(docStr);
Note that the result is an Iterable and not an array.
There are three different conventions (it could be said that those are de facto standards) to set and display a line break:
carriage return + line feed
line feed
carriage return
In some text editors, it is possible to exchange one for the other:
The simplest thing is to normalize to line feedand then split.
final String[] lines = contents.replace("\r\n", "\n")
.replace("\r", "\n")
.split("\n", -1);
try this hope it was helpful for you
String split[], docStr = null;
Document textAreaDoc = (Document)e.getDocument();
try {
docStr = textAreaDoc.getText(textAreaDoc.getStartPosition().getOffset(), textAreaDoc.getEndPosition().getOffset());
} catch (BadLocationException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
split = docStr.split("\n");
package in.javadomain;
public class JavaSplit {
public static void main(String[] args) {
String input = "chennai\nvellore\ncoimbatore\nbangalore\narcot";
System.out.println("Before split:\n");
System.out.println(input);
String[] inputSplitNewLine = input.split("\\n");
System.out.println("\n After split:\n");
for(int i=0; i<inputSplitNewLine.length; i++){
System.out.println(inputSplitNewLine[i]);
}
}
}
I'd like to iterate through a text file one line at a time, operate on the contents, and stream the result to a separate file. Textbook case for BufferedReader.readLine().
But: I need to glue my lines together with newlines, and what if the original file didn't have the "right" newlines for my platform (DOS files on Linux or vice versa)? I guess I could read ahead a bit in the stream and see what kind of line endings I find, even though that's really hacky.
But: suppose my input file doesn't have a trailing newline. I'd like to keep things how they were. Now I need to peek ahead to the next line ending before reading every line. At this point why am I using a class that gives me readLine() at all?
This seems like it should be a solved problem. Is there a library (or even better, core Java7 class!) that will just let me call a method similar to readLine() that returns one line of text from a stream, with the EOL character(s) intact?
Here's an implementation that reads char by char until it finds a line terminator. The reader passed in must support mark(), so if yours doesn't, wrap it in a BufferedReader.
public static String readLineWithTerm(Reader reader) throws IOException {
if (! reader.markSupported()) {
throw new IllegalArgumentException("reader must support mark()");
}
int code;
StringBuilder line = new StringBuilder();
while ((code = reader.read()) != -1) {
char ch = (char) code;
line.append(ch);
if (ch == '\n') {
break;
} else if (ch == '\r') {
reader.mark(1);
ch = (char) reader.read();
if (ch == '\n') {
line.append(ch);
} else {
reader.reset();
}
break;
}
}
return (line.length() == 0 ? null : line.toString());
}
Update:
But: I need to glue my lines together with newlines, and what if the original file didn't have the "right" newlines for my platform (DOS files on Linux or vice versa)? I guess I could read ahead a bit in the stream and see what kind of line endings I find, even though that's really hacky.
You can create a BufferedReader with a specified charset. So if the file is wacky, you'll have to supply the file's charset. Files.newBufferedReader(Path p, Charset cs)
Is there a library (or even better, core Java7 class!) that will just
let me call a method similar to readLine() that returns one line of
text from a stream, with the EOL character(s) intact?
If you're going to read a file, you have to know what charset it is. If you know what charset it is, then you don't need the EOL character to be "intact" since you can just add it on yourself.
From BufferedReader.readLine:
Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.
Returns:
A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached
So BufferedReader.readLine does not return any line-termination characters. If you want to preserve these characters, you can use the read method instead.
int size = 1000; // size of file
BufferedReader br = new BufferedReader(new FileReader("file.txt"));
char[] buf = new char[size];
br.read(buf, 0, size);
That is just a simple example, but if the file has line termination then it will show up in the buffer.
You should be using the StreamTokenizer to get more detailed control over input pasring.
http://docs.oracle.com/javase/7/docs/api/java/io/StreamTokenizer.html
How can I replace all line breaks from a string in Java in such a way that will work on Windows and Linux (ie no OS specific problems of carriage return/line feed/new line etc.)?
I've tried (note readFileAsString is a function that reads a text file into a String):
String text = readFileAsString("textfile.txt");
text.replace("\n", "");
but this doesn't seem to work.
How can this be done?
You need to set text to the results of text.replace():
String text = readFileAsString("textfile.txt");
text = text.replace("\n", "").replace("\r", "");
This is necessary because Strings are immutable -- calling replace doesn't change the original String, it returns a new one that's been changed. If you don't assign the result to text, then that new String is lost and garbage collected.
As for getting the newline String for any environment -- that is available by calling System.getProperty("line.separator").
As noted in other answers, your code is not working primarily because String.replace(...) does not change the target String. (It can't - Java strings are immutable!) What replace actually does is to create and return a new String object with the characters changed as required. But your code then throws away that String ...
Here are some possible solutions. Which one is most correct depends on what exactly you are trying to do.
// #1
text = text.replace("\n", "");
Simply removes all the newline characters. This does not cope with Windows or Mac line terminations.
// #2
text = text.replace(System.getProperty("line.separator"), "");
Removes all line terminators for the current platform. This does not cope with the case where you are trying to process (for example) a UNIX file on Windows, or vice versa.
// #3
text = text.replaceAll("\\r|\\n", "");
Removes all Windows, UNIX or Mac line terminators. However, if the input file is text, this will concatenate words; e.g.
Goodbye cruel
world.
becomes
Goodbye cruelworld.
So you might actually want to do this:
// #4
text = text.replaceAll("\\r\\n|\\r|\\n", " ");
which replaces each line terminator with a space1. Since Java 8 you can also do this:
// #5
text = text.replaceAll("\\R", " ");
And if you want to replace multiple line terminator with one space:
// #6
text = text.replaceAll("\\R+", " ");
1 - Note there is a subtle difference between #3 and #4. The sequence \r\n represents a single (Windows) line terminator, so we need to be careful not to replace it with two spaces.
This function normalizes down all whitespace, including line breaks, to single spaces. Not exactly what the original question asked for, but likely to do exactly what is needed in many cases:
import org.apache.commons.lang3.StringUtils;
final String cleansedString = StringUtils.normalizeSpace(rawString);
If you want to remove only line terminators that are valid on the current OS, you could do this:
text = text.replaceAll(System.getProperty("line.separator"), "");
If you want to make sure you remove any line separators, you can do it like this:
text = text.replaceAll("\\r|\\n", "");
Or, slightly more verbose, but less regexy:
text = text.replaceAll("\\r", "").replaceAll("\\n", "");
str = str.replaceAll("\\r\\n|\\r|\\n", " ");
Worked perfectly for me after searching a lot, having failed with every other line.
This would be efficient I guess
String s;
s = "try this\n try me.";
s.replaceAll("[\\r\\n]+", "")
Linebreaks are not the same under windows/linux/mac. You should use System.getProperties with the attribute line.separator.
String text = readFileAsString("textfile.txt").replaceAll("\n", "");
Even though the definition of trim() in oracle website is
"Returns a copy of the string, with leading and trailing whitespace omitted."
the documentation omits to say that new line characters (leading and trailing) will also be removed.
In short
String text = readFileAsString("textfile.txt").trim(); will also work for you.
(Checked with Java 6)
In Kotlin, and also since Java 11, String has lines() method, which returns list of lines in the multi-line string.
You can get all the lines and then merge them into a single string.
With Kotlin it will be as simple as
str.lines().joinToString("")
String text = readFileAsString("textfile.txt").replace("\n","");
.replace returns a new string, strings in Java are Immutable.
You may want to read your file with a BufferedReader. This class can break input into individual lines, which you can assemble at will. The way BufferedReader operates recognizes line ending conventions of the Linux, Windows and MacOS worlds automatically, regardless of the current platform.
Hence:
BufferedReader br = new BufferedReader(
new InputStreamReader("textfile.txt"));
StringBuilder sb = new StringBuilder();
for (;;) {
String line = br.readLine();
if (line == null)
break;
sb.append(line);
sb.append(' '); // SEE BELOW
}
String text = sb.toString();
Note that readLine() does not include the line terminator in the returned string. The code above appends a space to avoid gluing together the last word of a line and the first word of the next line.
I find it odd that (Apache) StringUtils wasn't covered here yet.
you can remove all newlines (or any other occurences of a substring for that matter) from a string using the .replace method
StringUtils.replace(myString, "\n", "");
This line will replace all newlines with the empty string.
because newline is technically a character you can optionally use the .replaceChars method that will replace characters
StringUtils.replaceChars(myString, '\n', '');
FYI if you can want to replace simultaneous muti-linebreaks with single line break then you can use
myString.trim().replaceAll("[\n]{2,}", "\n")
Or replace with a single space
myString.trim().replaceAll("[\n]{2,}", " ")
You can use apache commons IOUtils to iterate through the line and append each line to StringBuilder. And don't forget to close the InputStream
StringBuilder sb = new StringBuilder();
FileInputStream fin=new FileInputStream("textfile.txt");
LineIterator lt=IOUtils.lineIterator(fin, "utf-8");
while(lt.hasNext())
{
sb.append(lt.nextLine());
}
String text = sb.toString();
IOUtils.closeQuitely(fin);
You can use generic methods to replace any char with any char.
public static void removeWithAnyChar(String str, char replceChar,
char replaceWith) {
char chrs[] = str.toCharArray();
int i = 0;
while (i < chrs.length) {
if (chrs[i] == replceChar) {
chrs[i] = replaceWith;
}
i++;
}
}
org.apache.commons.lang.StringUtils#chopNewline
Try doing this:
textValue= textValue.replaceAll("\n", "");
textValue= textValue.replaceAll("\t", "");
textValue= textValue.replaceAll("\\n", "");
textValue= textValue.replaceAll("\\t", "");
textValue= textValue.replaceAll("\r", "");
textValue= textValue.replaceAll("\\r", "");
textValue= textValue.replaceAll("\r\n", "");
textValue= textValue.replaceAll("\\r\\n", "");