I'm trying to split text in a JTextArea using a regex to split the String by \n However, this does not work and I also tried by \r\n|\r|n and many other combination of regexes.
Code:
public void insertUpdate(DocumentEvent e) {
String split[], docStr = null;
Document textAreaDoc = (Document)e.getDocument();
try {
docStr = textAreaDoc.getText(textAreaDoc.getStartPosition().getOffset(), textAreaDoc.getEndPosition().getOffset());
} catch (BadLocationException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
split = docStr.split("\\n");
}
This should cover you:
String lines[] = string.split("\\r?\\n");
There's only really two newlines (UNIX and Windows) that you need to worry about.
String#split(String regex) method is using regex (regular expressions). Since Java 8 regex supports \R which represents (from documentation of Pattern class):
Linebreak matcher
\R Any Unicode linebreak sequence, is equivalent to
\u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]
So we can use it to match:
\u000D\000A -> \r\n pair
\u000A -> line feed (\n)
\u000B -> line tabulation (DO NOT confuse with character tabulation \t which is \u0009)
\u000C -> form feed (\f)
\u000D -> carriage return (\r)
\u0085 -> next line (NEL)
\u2028 -> line separator
\u2029 -> paragraph separator
As you see \r\n is placed at start of regex which ensures that regex will try to match this pair first, and only if that match fails it will try to match single character line separators.
So if you want to split on line separator use split("\\R").
If you don't want to remove from resulting array trailing empty strings "" use split(regex, limit) with negative limit parameter like split("\\R", -1).
If you want to treat one or more continues empty lines as single delimiter use split("\\R+").
If you don’t want empty lines:
String.split("[\\r\\n]+")
String.split(System.lineSeparator());
This should be system independent
A new method lines has been introduced to String class in java-11, which returns Stream<String>
Returns a stream of substrings extracted from this string partitioned
by line terminators.
Line terminators recognized are line feed "\n" (U+000A), carriage
return "\r" (U+000D) and a carriage return followed immediately by a
line feed "\r\n" (U+000D U+000A).
Here are a few examples:
jshell> "lorem \n ipusm \n sit".lines().forEach(System.out::println)
lorem
ipusm
sit
jshell> "lorem \n ipusm \r sit".lines().forEach(System.out::println)
lorem
ipusm
sit
jshell> "lorem \n ipusm \r\n sit".lines().forEach(System.out::println)
lorem
ipusm
sit
String#lines()
In JDK11 the String class has a lines() method:
Returning a stream of lines extracted from this string, separated by
line terminators.
Further, the documentation goes on to say:
A line terminator is one of the following: a line feed character "\n"
(U+000A), a carriage return character "\r" (U+000D), or a carriage
return followed immediately by a line feed "\r\n" (U+000D U+000A). A
line is either a sequence of zero or more characters followed by a
line terminator, or it is a sequence of one or more characters
followed by the end of the string. A line does not include the line
terminator.
With this one can simply do:
Stream<String> stream = str.lines();
then if you want an array:
String[] array = str.lines().toArray(String[]::new);
Given this method returns a Stream it upon up a lot of options for you as it enables one to write concise and declarative expression of possibly-parallel operations.
You don't have to double escape characters in character groups.
For all non empty lines use:
String.split("[\r\n]+")
All answers given here actually do not respect Javas definition of new lines as given in e.g. BufferedReader#readline. Java is accepting \n, \r and \r\n as new line. Some of the answers match multiple empty lines or malformed files. E..g. <sometext>\n\r\n<someothertext> when using [\r\n]+would result in two lines.
String lines[] = string.split("(\r\n|\r|\n)", -1);
In contrast, the answer above has the following properties:
it complies with Javas definition of a new line such as e.g. the BufferedReader is using it
it does not match multiple new lines
it does not remove trailing empty lines
If, for some reason, you don't want to use String.split (for example, because of regular expressions) and you want to use functional programming on Java 8 or newer:
List<String> lines = new BufferedReader(new StringReader(string))
.lines()
.collect(Collectors.toList());
Maybe this would work:
Remove the double backslashes from the parameter of the split method:
split = docStr.split("\n");
For preserving empty lines from getting squashed use:
String lines[] = String.split("\\r?\\n", -1);
The above answers did not help me on Android, thanks to the Pshemo response that worked for me on Android. I will leave some of Pshemo's answer here :
split("\\\\n")
The above code doesnt actually do anything visible - it just calcualtes then dumps the calculation. Is it the code you used, or just an example for this question?
try doing textAreaDoc.insertString(int, String, AttributeSet) at the end?
There is new boy in the town, so you need not to deal with all above complexities.
From JDK 11 onward, just need to write as single line of code, it will split lines and returns you Stream of String.
public class MyClass {
public static void main(String args[]) {
Stream<String> lines="foo \n bar \n baz".lines();
//Do whatever you want to do with lines
}}
Some references.
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html#lines()
https://www.azul.com/90-new-features-and-apis-in-jdk-11/
I hope this will be helpful to someone. Happy coding.
Sadly, Java lacks a both simple and efficient method for splitting a string by a fixed string. Both String::split and the stream API are complex and relatively slow. Also, they can produce different results.
String::split examines its input, then compiles to java.util.regex.Pattern every time (except if the input contains only a single char that's safe).
However, Pattern is very fast, once it was compiled. So the best solution is to precompile the pattern:
private static final Pattern LINE_SEP_PATTERN = Pattern.compile("\\R");
Then use it like this:
String[] lines = LINE_SEP_PATTERN.split(input);
From Java 8, \R matches to any line break specified by Unicode. Prior to Java 8 you could use something like this:
Pattern.compile(Pattern.quote(System.lineSeparator()))
String lines[] =String.split( System.lineSeparator())
After failed attempts on the basis of all given solutions. I replace \n with some special word and then split. For me following did the trick:
article = "Alice phoned\n bob.";
article = article.replace("\\n", " NEWLINE ");
String sen [] = article.split(" NEWLINE ");
I couldn't replicate the example given in the question. But, I guess this logic can be applied.
As an alternative to the previous answers, guava's Splitter API can be used if other operations are to be applied to the resulting lines, like trimming lines or filtering empty lines :
import com.google.common.base.Splitter;
Iterable<String> split = Splitter.onPattern("\r?\n").trimResults().omitEmptyStrings().split(docStr);
Note that the result is an Iterable and not an array.
There are three different conventions (it could be said that those are de facto standards) to set and display a line break:
carriage return + line feed
line feed
carriage return
In some text editors, it is possible to exchange one for the other:
The simplest thing is to normalize to line feedand then split.
final String[] lines = contents.replace("\r\n", "\n")
.replace("\r", "\n")
.split("\n", -1);
try this hope it was helpful for you
String split[], docStr = null;
Document textAreaDoc = (Document)e.getDocument();
try {
docStr = textAreaDoc.getText(textAreaDoc.getStartPosition().getOffset(), textAreaDoc.getEndPosition().getOffset());
} catch (BadLocationException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
split = docStr.split("\n");
package in.javadomain;
public class JavaSplit {
public static void main(String[] args) {
String input = "chennai\nvellore\ncoimbatore\nbangalore\narcot";
System.out.println("Before split:\n");
System.out.println(input);
String[] inputSplitNewLine = input.split("\\n");
System.out.println("\n After split:\n");
for(int i=0; i<inputSplitNewLine.length; i++){
System.out.println(inputSplitNewLine[i]);
}
}
}
I've imported a file and turned it into a String called readFile. The file contains two lines:
qwertyuiop00%
qwertyuiop
I have already extracted the "00" from the string using:
String number = readFile.substring(11, 13);
I now want to extract the "ert" and the "uio" in "qwertyuiop"
When I try to use the same method as the first, like so:
String e = readFile.substring(16, 19);
String u = readFile.substring(20, 23);
and try to use:
System.out.println(e + "and" + u);
It says string index out of range.
How do I go about this?
Is it because the next two words I want to extract from the string are on the second line?
If so, how do I extract only the second line?
I want to keep it basic, thanks.
UPDATE:
it turns out only the first line of the file is being read, does anyone know how to make it so it reads both lines?
If you count the total number of characters for each string, they are more than the indexes your entering.
qwertyuiop00% is 13 characters. Call .length() method on the string to verify the length is the one you expect.
I would debug with adding the following before:
System.out.println(readFile);
System.out.println(readFile.length());
Note:
qwertyuiop00% qwertyuiop is 24 characters since space counts as a character. Unless ofcourse you don't have the space in which it's 23 characters and your indexes are 0 to 22
Note2:
I asked for the parser code since I suspect your using the usual code which is something like:
while ((line = reader.readLine()) != null)
You need to concatenate those lines into one String (though it's not the best approach).
see: How do I create a Java string from the contents of a file?
First split your string into lines, you could do this using
String[] lines = readFile.split("[\r\n]+");
You may want to read the content directly into a List<String> using Files.#readAllLines instead.
second, do not use hard coded indexes, use String#indexOf to find them out. If a substring does not occur in your original string, then the method retunrs -1, always check for that value and call substring only when the return value is not -1 (0 or greater).
if(lines.length > 1) {
int startIndex = lines[1].indexOf("ert");
if(startIndex != -1) {
// do what you want
}
}
Btw, there is no point in extracting already known substring from a string
System.out.println(e + "and" + u);
is equivalent to
System.out.println("ertanduio");
Knowing the start and end position of a fixed substring makes only sence if you want to do something with rest of original string, for example removing the substrings.
You may give this a try:-
Scanner sc=new Scanner(new FileReader(new File(The file path for readFile.txt)));
String st="";
while(sc.hasNext()){
st=sc.next();
}
System.out.println(st.substring(2,5)+" "+"and"+" "+st.substring(6,9));
Check out if it works.
So I have an input file of the form:
AdjGraphHeader
11
20
30
.
.
.
and I need to create a string array that holds each line separately. So I read the file using:
String s = FileUtils.readFileToString(f);
Words w = new Words(s, (long)s.length());
and then the words constructor did the following:
public Words(String str, long n_) {
strings = str.split("\\n");
string = str;
n = n_;
m = strings.length;
}
The issue seems to be that there is an extra line at the end of adjGraphHeader, but I have no idea how to get rid of it. Any help would be greatly appreciated.
I suspect that your lines are separated not by \n but by \r\n so if you remove \n you still have \r after each word and while printing it you will see empty line. One of solutions could be splitting using this regular expression
"\r?\n|\r" - ? makes element described before it optional, | represent "or" operator
which will handle line separators if forms \r \r\n \n.
You can also use something IMHO simpler like
List<String> lines = Files.readAllLines(Paths.get("locationOfFile"));
or if you already have File f
List<String> lines = Files.readAllLines(f.toPath());
and store each line in list (you can later convert this list to array if you really need it)
Pretty basic question for someone who knows.
Instead of getting from
"This is my text.
And here is a new line"
To:
"This is my text. And here is a new line"
I get:
"This is my text.And here is a new line.
Any idea why?
L.replaceAll("[\\\t|\\\n|\\\r]","\\\s");
I think I found the culprit.
On the next line I do the following:
L.replaceAll( "[^a-zA-Z0-9|^!|^?|^.|^\\s]", "");
And this seems to be causing my issue.
Any idea why?
I am obviously trying to do the following: remove all non-chars, and remove all new lines.
\s is a shortcut for whitespace characters in regex. It has no meaning in a string. ==> You can't use it in your replacement string. There you need to put exactly the character(s) that you want to insert. If this is a space just use " " as replacement.
The other thing is: Why do you use 3 backslashes as escape sequence? Two are enough in Java. And you don't need a | (alternation operator) in a character class.
L.replaceAll("[\\t\\n\\r]+"," ");
Remark
L is not changed. If you want to have a result you need to do
String result = L.replaceAll("[\\t\\n\\r]+"," ");
Test code:
String in = "This is my text.\n\nAnd here is a new line";
System.out.println(in);
String out = in.replaceAll("[\\t\\n\\r]+"," ");
System.out.println(out);
The new line separator is different for different OS-es - '\r\n' for Windows and '\n' for Linux.
To be safe, you can use regex pattern \R - the linebreak matcher introduced with Java 8:
String inlinedText = text.replaceAll("\\R", " ");
Try
L.replaceAll("(\\t|\\r?\\n)+", " ");
Depending on the system a linefeed is either \r\n or just \n.
I found this.
String newString = string.replaceAll("\n", " ");
Although, as you have a double line, you will get a double space. I guess you could then do another replace all to replace double spaces with a single one.
If that doesn't work try doing:
string.replaceAll(System.getProperty("line.separator"), " ");
If I create lines in "string" by using "\n" I had to use "\n" in the regex. If I used System.getProperty() I had to use that.
Your regex is good altough I would replace it with the empty string
String resultString = subjectString.replaceAll("[\t\n\r]", "");
You expect a space between "text." and "And" right?
I get that space when I try the regex by copying your sample
"This is my text. "
So all is well here. Maybe if you just replace it with the empty string it will work. I don't know why you replace it with \s. And the alternation | is not necessary in a character class.
You May use first split and rejoin it using white space.
it will work sure.
String[] Larray = L.split("[\\n]+");
L = "";
for(int i = 0; i<Larray.lengh; i++){
L = L+" "+Larray[i];
}
This should take care of space, tab and newline:
data = data.replaceAll("[ \t\n\r]*", " ");
From the javadoc
public String readLine()
throws IOException
Read a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.
I have following kind of text :
Now the earth was formless and empty. Darkness was on the surface
of the deep. God's Spirit was hovering over the surface
of the waters.
I am reading lines as:
while(buffer.readline() != null){
}
But, the problem is it is considering a line for string upto before newline.But i would like to consider line when string ends with .. How would i do it?
You can use a Scanner and set your own delimiter using useDelimiter(Pattern).
Note that the input delimiter is a regex, so you will need to provide the regex \. (you need to break the special meaning of the character . in regex)
You can read a character at a time, and copy the data to a StringBuilder
Reader reader = ...;
StringBuilder sb = new StringBuilder();
int ch;
while((ch = reader.read()) >= 0) {
if(ch == '.') break;
sb.append((char) ch);
}
Use a java.util.Scanner instead of a buffered reader, and set the delimiter to "\\." with Scanner.useDelimiter().
(but be aware that the delimiter is consumed, so you'll have to add it again!)
or read the raw string and split it on each .
You could split the whole text by every .:
String text = "Your test.";
String[] lines = text.split("\\.");
After you split the text you get an array of lines. You could also use a regex if you want more control, e.g. to split the text also by : or ;. Just google it.
PS.: Perhaps you have to remove the new line characters first with something like:
text = text.replaceAll("\n", "");