How to remove line breaks from a file in Java? - java

How can I replace all line breaks from a string in Java in such a way that will work on Windows and Linux (ie no OS specific problems of carriage return/line feed/new line etc.)?
I've tried (note readFileAsString is a function that reads a text file into a String):
String text = readFileAsString("textfile.txt");
text.replace("\n", "");
but this doesn't seem to work.
How can this be done?

You need to set text to the results of text.replace():
String text = readFileAsString("textfile.txt");
text = text.replace("\n", "").replace("\r", "");
This is necessary because Strings are immutable -- calling replace doesn't change the original String, it returns a new one that's been changed. If you don't assign the result to text, then that new String is lost and garbage collected.
As for getting the newline String for any environment -- that is available by calling System.getProperty("line.separator").

As noted in other answers, your code is not working primarily because String.replace(...) does not change the target String. (It can't - Java strings are immutable!) What replace actually does is to create and return a new String object with the characters changed as required. But your code then throws away that String ...
Here are some possible solutions. Which one is most correct depends on what exactly you are trying to do.
// #1
text = text.replace("\n", "");
Simply removes all the newline characters. This does not cope with Windows or Mac line terminations.
// #2
text = text.replace(System.getProperty("line.separator"), "");
Removes all line terminators for the current platform. This does not cope with the case where you are trying to process (for example) a UNIX file on Windows, or vice versa.
// #3
text = text.replaceAll("\\r|\\n", "");
Removes all Windows, UNIX or Mac line terminators. However, if the input file is text, this will concatenate words; e.g.
Goodbye cruel
world.
becomes
Goodbye cruelworld.
So you might actually want to do this:
// #4
text = text.replaceAll("\\r\\n|\\r|\\n", " ");
which replaces each line terminator with a space1. Since Java 8 you can also do this:
// #5
text = text.replaceAll("\\R", " ");
And if you want to replace multiple line terminator with one space:
// #6
text = text.replaceAll("\\R+", " ");
1 - Note there is a subtle difference between #3 and #4. The sequence \r\n represents a single (Windows) line terminator, so we need to be careful not to replace it with two spaces.

This function normalizes down all whitespace, including line breaks, to single spaces. Not exactly what the original question asked for, but likely to do exactly what is needed in many cases:
import org.apache.commons.lang3.StringUtils;
final String cleansedString = StringUtils.normalizeSpace(rawString);

If you want to remove only line terminators that are valid on the current OS, you could do this:
text = text.replaceAll(System.getProperty("line.separator"), "");
If you want to make sure you remove any line separators, you can do it like this:
text = text.replaceAll("\\r|\\n", "");
Or, slightly more verbose, but less regexy:
text = text.replaceAll("\\r", "").replaceAll("\\n", "");

str = str.replaceAll("\\r\\n|\\r|\\n", " ");
Worked perfectly for me after searching a lot, having failed with every other line.

This would be efficient I guess
String s;
s = "try this\n try me.";
s.replaceAll("[\\r\\n]+", "")

Linebreaks are not the same under windows/linux/mac. You should use System.getProperties with the attribute line.separator.

String text = readFileAsString("textfile.txt").replaceAll("\n", "");
Even though the definition of trim() in oracle website is
"Returns a copy of the string, with leading and trailing whitespace omitted."
the documentation omits to say that new line characters (leading and trailing) will also be removed.
In short
String text = readFileAsString("textfile.txt").trim(); will also work for you.
(Checked with Java 6)

In Kotlin, and also since Java 11, String has lines() method, which returns list of lines in the multi-line string.
You can get all the lines and then merge them into a single string.
With Kotlin it will be as simple as
str.lines().joinToString("")

String text = readFileAsString("textfile.txt").replace("\n","");
.replace returns a new string, strings in Java are Immutable.

You may want to read your file with a BufferedReader. This class can break input into individual lines, which you can assemble at will. The way BufferedReader operates recognizes line ending conventions of the Linux, Windows and MacOS worlds automatically, regardless of the current platform.
Hence:
BufferedReader br = new BufferedReader(
new InputStreamReader("textfile.txt"));
StringBuilder sb = new StringBuilder();
for (;;) {
String line = br.readLine();
if (line == null)
break;
sb.append(line);
sb.append(' '); // SEE BELOW
}
String text = sb.toString();
Note that readLine() does not include the line terminator in the returned string. The code above appends a space to avoid gluing together the last word of a line and the first word of the next line.

I find it odd that (Apache) StringUtils wasn't covered here yet.
you can remove all newlines (or any other occurences of a substring for that matter) from a string using the .replace method
StringUtils.replace(myString, "\n", "");
This line will replace all newlines with the empty string.
because newline is technically a character you can optionally use the .replaceChars method that will replace characters
StringUtils.replaceChars(myString, '\n', '');

FYI if you can want to replace simultaneous muti-linebreaks with single line break then you can use
myString.trim().replaceAll("[\n]{2,}", "\n")
Or replace with a single space
myString.trim().replaceAll("[\n]{2,}", " ")

You can use apache commons IOUtils to iterate through the line and append each line to StringBuilder. And don't forget to close the InputStream
StringBuilder sb = new StringBuilder();
FileInputStream fin=new FileInputStream("textfile.txt");
LineIterator lt=IOUtils.lineIterator(fin, "utf-8");
while(lt.hasNext())
{
sb.append(lt.nextLine());
}
String text = sb.toString();
IOUtils.closeQuitely(fin);

You can use generic methods to replace any char with any char.
public static void removeWithAnyChar(String str, char replceChar,
char replaceWith) {
char chrs[] = str.toCharArray();
int i = 0;
while (i < chrs.length) {
if (chrs[i] == replceChar) {
chrs[i] = replaceWith;
}
i++;
}
}

org.apache.commons.lang.StringUtils#chopNewline

Try doing this:
textValue= textValue.replaceAll("\n", "");
textValue= textValue.replaceAll("\t", "");
textValue= textValue.replaceAll("\\n", "");
textValue= textValue.replaceAll("\\t", "");
textValue= textValue.replaceAll("\r", "");
textValue= textValue.replaceAll("\\r", "");
textValue= textValue.replaceAll("\r\n", "");
textValue= textValue.replaceAll("\\r\\n", "");

Related

Java: how to split text by just any possible newline/linebreak characters [duplicate]

I'm trying to split text in a JTextArea using a regex to split the String by \n However, this does not work and I also tried by \r\n|\r|n and many other combination of regexes.
Code:
public void insertUpdate(DocumentEvent e) {
String split[], docStr = null;
Document textAreaDoc = (Document)e.getDocument();
try {
docStr = textAreaDoc.getText(textAreaDoc.getStartPosition().getOffset(), textAreaDoc.getEndPosition().getOffset());
} catch (BadLocationException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
split = docStr.split("\\n");
}
This should cover you:
String lines[] = string.split("\\r?\\n");
There's only really two newlines (UNIX and Windows) that you need to worry about.
String#split​(String regex) method is using regex (regular expressions). Since Java 8 regex supports \R which represents (from documentation of Pattern class):
Linebreak matcher
\R         Any Unicode linebreak sequence, is equivalent to
\u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]
So we can use it to match:
\u000D\000A -> \r\n pair
\u000A -> line feed (\n)
\u000B -> line tabulation (DO NOT confuse with character tabulation \t which is \u0009)
\u000C -> form feed (\f)
\u000D -> carriage return (\r)
\u0085 -> next line (NEL)
\u2028 -> line separator
\u2029 -> paragraph separator
As you see \r\n is placed at start of regex which ensures that regex will try to match this pair first, and only if that match fails it will try to match single character line separators.
So if you want to split on line separator use split("\\R").
If you don't want to remove from resulting array trailing empty strings "" use split(regex, limit) with negative limit parameter like split("\\R", -1).
If you want to treat one or more continues empty lines as single delimiter use split("\\R+").
If you don’t want empty lines:
String.split("[\\r\\n]+")
String.split(System.lineSeparator());
This should be system independent
A new method lines has been introduced to String class in java-11, which returns Stream<String>
Returns a stream of substrings extracted from this string partitioned
by line terminators.
Line terminators recognized are line feed "\n" (U+000A), carriage
return "\r" (U+000D) and a carriage return followed immediately by a
line feed "\r\n" (U+000D U+000A).
Here are a few examples:
jshell> "lorem \n ipusm \n sit".lines().forEach(System.out::println)
lorem
ipusm
sit
jshell> "lorem \n ipusm \r sit".lines().forEach(System.out::println)
lorem
ipusm
sit
jshell> "lorem \n ipusm \r\n sit".lines().forEach(System.out::println)
lorem
ipusm
sit
String#lines()
In JDK11 the String class has a lines() method:
Returning a stream of lines extracted from this string, separated by
line terminators.
Further, the documentation goes on to say:
A line terminator is one of the following: a line feed character "\n"
(U+000A), a carriage return character "\r" (U+000D), or a carriage
return followed immediately by a line feed "\r\n" (U+000D U+000A). A
line is either a sequence of zero or more characters followed by a
line terminator, or it is a sequence of one or more characters
followed by the end of the string. A line does not include the line
terminator.
With this one can simply do:
Stream<String> stream = str.lines();
then if you want an array:
String[] array = str.lines().toArray(String[]::new);
Given this method returns a Stream it upon up a lot of options for you as it enables one to write concise and declarative expression of possibly-parallel operations.
You don't have to double escape characters in character groups.
For all non empty lines use:
String.split("[\r\n]+")
All answers given here actually do not respect Javas definition of new lines as given in e.g. BufferedReader#readline. Java is accepting \n, \r and \r\n as new line. Some of the answers match multiple empty lines or malformed files. E..g. <sometext>\n\r\n<someothertext> when using [\r\n]+would result in two lines.
String lines[] = string.split("(\r\n|\r|\n)", -1);
In contrast, the answer above has the following properties:
it complies with Javas definition of a new line such as e.g. the BufferedReader is using it
it does not match multiple new lines
it does not remove trailing empty lines
If, for some reason, you don't want to use String.split (for example, because of regular expressions) and you want to use functional programming on Java 8 or newer:
List<String> lines = new BufferedReader(new StringReader(string))
.lines()
.collect(Collectors.toList());
Maybe this would work:
Remove the double backslashes from the parameter of the split method:
split = docStr.split("\n");
For preserving empty lines from getting squashed use:
String lines[] = String.split("\\r?\\n", -1);
The above answers did not help me on Android, thanks to the Pshemo response that worked for me on Android. I will leave some of Pshemo's answer here :
split("\\\\n")
The above code doesnt actually do anything visible - it just calcualtes then dumps the calculation. Is it the code you used, or just an example for this question?
try doing textAreaDoc.insertString(int, String, AttributeSet) at the end?
There is new boy in the town, so you need not to deal with all above complexities.
From JDK 11 onward, just need to write as single line of code, it will split lines and returns you Stream of String.
public class MyClass {
public static void main(String args[]) {
Stream<String> lines="foo \n bar \n baz".lines();
//Do whatever you want to do with lines
}}
Some references.
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html#lines()
https://www.azul.com/90-new-features-and-apis-in-jdk-11/
I hope this will be helpful to someone. Happy coding.
Sadly, Java lacks a both simple and efficient method for splitting a string by a fixed string. Both String::split and the stream API are complex and relatively slow. Also, they can produce different results.
String::split examines its input, then compiles to java.util.regex.Pattern every time (except if the input contains only a single char that's safe).
However, Pattern is very fast, once it was compiled. So the best solution is to precompile the pattern:
private static final Pattern LINE_SEP_PATTERN = Pattern.compile("\\R");
Then use it like this:
String[] lines = LINE_SEP_PATTERN.split(input);
From Java 8, \R matches to any line break specified by Unicode. Prior to Java 8 you could use something like this:
Pattern.compile(Pattern.quote(System.lineSeparator()))
String lines[] =String.split( System.lineSeparator())
After failed attempts on the basis of all given solutions. I replace \n with some special word and then split. For me following did the trick:
article = "Alice phoned\n bob.";
article = article.replace("\\n", " NEWLINE ");
String sen [] = article.split(" NEWLINE ");
I couldn't replicate the example given in the question. But, I guess this logic can be applied.
As an alternative to the previous answers, guava's Splitter API can be used if other operations are to be applied to the resulting lines, like trimming lines or filtering empty lines :
import com.google.common.base.Splitter;
Iterable<String> split = Splitter.onPattern("\r?\n").trimResults().omitEmptyStrings().split(docStr);
Note that the result is an Iterable and not an array.
There are three different conventions (it could be said that those are de facto standards) to set and display a line break:
carriage return + line feed
line feed
carriage return
In some text editors, it is possible to exchange one for the other:
The simplest thing is to normalize to line feedand then split.
final String[] lines = contents.replace("\r\n", "\n")
.replace("\r", "\n")
.split("\n", -1);
try this hope it was helpful for you
String split[], docStr = null;
Document textAreaDoc = (Document)e.getDocument();
try {
docStr = textAreaDoc.getText(textAreaDoc.getStartPosition().getOffset(), textAreaDoc.getEndPosition().getOffset());
} catch (BadLocationException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
split = docStr.split("\n");
package in.javadomain;
public class JavaSplit {
public static void main(String[] args) {
String input = "chennai\nvellore\ncoimbatore\nbangalore\narcot";
System.out.println("Before split:\n");
System.out.println(input);
String[] inputSplitNewLine = input.split("\\n");
System.out.println("\n After split:\n");
for(int i=0; i<inputSplitNewLine.length; i++){
System.out.println(inputSplitNewLine[i]);
}
}
}

How to extract words from a string in Java

I've imported a file and turned it into a String called readFile. The file contains two lines:
qwertyuiop00%
qwertyuiop
I have already extracted the "00" from the string using:
String number = readFile.substring(11, 13);
I now want to extract the "ert" and the "uio" in "qwertyuiop"
When I try to use the same method as the first, like so:
String e = readFile.substring(16, 19);
String u = readFile.substring(20, 23);
and try to use:
System.out.println(e + "and" + u);
It says string index out of range.
How do I go about this?
Is it because the next two words I want to extract from the string are on the second line?
If so, how do I extract only the second line?
I want to keep it basic, thanks.
UPDATE:
it turns out only the first line of the file is being read, does anyone know how to make it so it reads both lines?
If you count the total number of characters for each string, they are more than the indexes your entering.
qwertyuiop00% is 13 characters. Call .length() method on the string to verify the length is the one you expect.
I would debug with adding the following before:
System.out.println(readFile);
System.out.println(readFile.length());
Note:
qwertyuiop00% qwertyuiop is 24 characters since space counts as a character. Unless ofcourse you don't have the space in which it's 23 characters and your indexes are 0 to 22
Note2:
I asked for the parser code since I suspect your using the usual code which is something like:
while ((line = reader.readLine()) != null)
You need to concatenate those lines into one String (though it's not the best approach).
see: How do I create a Java string from the contents of a file?
First split your string into lines, you could do this using
String[] lines = readFile.split("[\r\n]+");
You may want to read the content directly into a List<String> using Files.#readAllLines instead.
second, do not use hard coded indexes, use String#indexOf to find them out. If a substring does not occur in your original string, then the method retunrs -1, always check for that value and call substring only when the return value is not -1 (0 or greater).
if(lines.length > 1) {
int startIndex = lines[1].indexOf("ert");
if(startIndex != -1) {
// do what you want
}
}
Btw, there is no point in extracting already known substring from a string
System.out.println(e + "and" + u);
is equivalent to
System.out.println("ertanduio");
Knowing the start and end position of a fixed substring makes only sence if you want to do something with rest of original string, for example removing the substrings.
You may give this a try:-
Scanner sc=new Scanner(new FileReader(new File(The file path for readFile.txt)));
String st="";
while(sc.hasNext()){
st=sc.next();
}
System.out.println(st.substring(2,5)+" "+"and"+" "+st.substring(6,9));
Check out if it works.

Using String.split on input file

So I have an input file of the form:
AdjGraphHeader
11
20
30
.
.
.
and I need to create a string array that holds each line separately. So I read the file using:
String s = FileUtils.readFileToString(f);
Words w = new Words(s, (long)s.length());
and then the words constructor did the following:
public Words(String str, long n_) {
strings = str.split("\\n");
string = str;
n = n_;
m = strings.length;
}
The issue seems to be that there is an extra line at the end of adjGraphHeader, but I have no idea how to get rid of it. Any help would be greatly appreciated.
I suspect that your lines are separated not by \n but by \r\n so if you remove \n you still have \r after each word and while printing it you will see empty line. One of solutions could be splitting using this regular expression
"\r?\n|\r" - ? makes element described before it optional, | represent "or" operator
which will handle line separators if forms \r \r\n \n.
You can also use something IMHO simpler like
List<String> lines = Files.readAllLines(Paths.get("locationOfFile"));
or if you already have File f
List<String> lines = Files.readAllLines(f.toPath());
and store each line in list (you can later convert this list to array if you really need it)

Replace new line/return with space using regex

Pretty basic question for someone who knows.
Instead of getting from
"This is my text.
And here is a new line"
To:
"This is my text. And here is a new line"
I get:
"This is my text.And here is a new line.
Any idea why?
L.replaceAll("[\\\t|\\\n|\\\r]","\\\s");
I think I found the culprit.
On the next line I do the following:
L.replaceAll( "[^a-zA-Z0-9|^!|^?|^.|^\\s]", "");
And this seems to be causing my issue.
Any idea why?
I am obviously trying to do the following: remove all non-chars, and remove all new lines.
\s is a shortcut for whitespace characters in regex. It has no meaning in a string. ==> You can't use it in your replacement string. There you need to put exactly the character(s) that you want to insert. If this is a space just use " " as replacement.
The other thing is: Why do you use 3 backslashes as escape sequence? Two are enough in Java. And you don't need a | (alternation operator) in a character class.
L.replaceAll("[\\t\\n\\r]+"," ");
Remark
L is not changed. If you want to have a result you need to do
String result = L.replaceAll("[\\t\\n\\r]+"," ");
Test code:
String in = "This is my text.\n\nAnd here is a new line";
System.out.println(in);
String out = in.replaceAll("[\\t\\n\\r]+"," ");
System.out.println(out);
The new line separator is different for different OS-es - '\r\n' for Windows and '\n' for Linux.
To be safe, you can use regex pattern \R - the linebreak matcher introduced with Java 8:
String inlinedText = text.replaceAll("\\R", " ");
Try
L.replaceAll("(\\t|\\r?\\n)+", " ");
Depending on the system a linefeed is either \r\n or just \n.
I found this.
String newString = string.replaceAll("\n", " ");
Although, as you have a double line, you will get a double space. I guess you could then do another replace all to replace double spaces with a single one.
If that doesn't work try doing:
string.replaceAll(System.getProperty("line.separator"), " ");
If I create lines in "string" by using "\n" I had to use "\n" in the regex. If I used System.getProperty() I had to use that.
Your regex is good altough I would replace it with the empty string
String resultString = subjectString.replaceAll("[\t\n\r]", "");
You expect a space between "text." and "And" right?
I get that space when I try the regex by copying your sample
"This is my text. "
So all is well here. Maybe if you just replace it with the empty string it will work. I don't know why you replace it with \s. And the alternation | is not necessary in a character class.
You May use first split and rejoin it using white space.
it will work sure.
String[] Larray = L.split("[\\n]+");
L = "";
for(int i = 0; i<Larray.lengh; i++){
L = L+" "+Larray[i];
}
This should take care of space, tab and newline:
data = data.replaceAll("[ \t\n\r]*", " ");

reading line in bufferedReader

From the javadoc
public String readLine()
throws IOException
Read a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.
I have following kind of text :
Now the earth was formless and empty. Darkness was on the surface
of the deep. God's Spirit was hovering over the surface
of the waters.
I am reading lines as:
while(buffer.readline() != null){
}
But, the problem is it is considering a line for string upto before newline.But i would like to consider line when string ends with .. How would i do it?
You can use a Scanner and set your own delimiter using useDelimiter(Pattern).
Note that the input delimiter is a regex, so you will need to provide the regex \. (you need to break the special meaning of the character . in regex)
You can read a character at a time, and copy the data to a StringBuilder
Reader reader = ...;
StringBuilder sb = new StringBuilder();
int ch;
while((ch = reader.read()) >= 0) {
if(ch == '.') break;
sb.append((char) ch);
}
Use a java.util.Scanner instead of a buffered reader, and set the delimiter to "\\." with Scanner.useDelimiter().
(but be aware that the delimiter is consumed, so you'll have to add it again!)
or read the raw string and split it on each .
You could split the whole text by every .:
String text = "Your test.";
String[] lines = text.split("\\.");
After you split the text you get an array of lines. You could also use a regex if you want more control, e.g. to split the text also by : or ;. Just google it.
PS.: Perhaps you have to remove the new line characters first with something like:
text = text.replaceAll("\n", "");

Categories

Resources