Related
I'm trying to split text in a JTextArea using a regex to split the String by \n However, this does not work and I also tried by \r\n|\r|n and many other combination of regexes.
Code:
public void insertUpdate(DocumentEvent e) {
String split[], docStr = null;
Document textAreaDoc = (Document)e.getDocument();
try {
docStr = textAreaDoc.getText(textAreaDoc.getStartPosition().getOffset(), textAreaDoc.getEndPosition().getOffset());
} catch (BadLocationException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
split = docStr.split("\\n");
}
This should cover you:
String lines[] = string.split("\\r?\\n");
There's only really two newlines (UNIX and Windows) that you need to worry about.
String#split(String regex) method is using regex (regular expressions). Since Java 8 regex supports \R which represents (from documentation of Pattern class):
Linebreak matcher
\R Any Unicode linebreak sequence, is equivalent to
\u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]
So we can use it to match:
\u000D\000A -> \r\n pair
\u000A -> line feed (\n)
\u000B -> line tabulation (DO NOT confuse with character tabulation \t which is \u0009)
\u000C -> form feed (\f)
\u000D -> carriage return (\r)
\u0085 -> next line (NEL)
\u2028 -> line separator
\u2029 -> paragraph separator
As you see \r\n is placed at start of regex which ensures that regex will try to match this pair first, and only if that match fails it will try to match single character line separators.
So if you want to split on line separator use split("\\R").
If you don't want to remove from resulting array trailing empty strings "" use split(regex, limit) with negative limit parameter like split("\\R", -1).
If you want to treat one or more continues empty lines as single delimiter use split("\\R+").
If you don’t want empty lines:
String.split("[\\r\\n]+")
String.split(System.lineSeparator());
This should be system independent
A new method lines has been introduced to String class in java-11, which returns Stream<String>
Returns a stream of substrings extracted from this string partitioned
by line terminators.
Line terminators recognized are line feed "\n" (U+000A), carriage
return "\r" (U+000D) and a carriage return followed immediately by a
line feed "\r\n" (U+000D U+000A).
Here are a few examples:
jshell> "lorem \n ipusm \n sit".lines().forEach(System.out::println)
lorem
ipusm
sit
jshell> "lorem \n ipusm \r sit".lines().forEach(System.out::println)
lorem
ipusm
sit
jshell> "lorem \n ipusm \r\n sit".lines().forEach(System.out::println)
lorem
ipusm
sit
String#lines()
In JDK11 the String class has a lines() method:
Returning a stream of lines extracted from this string, separated by
line terminators.
Further, the documentation goes on to say:
A line terminator is one of the following: a line feed character "\n"
(U+000A), a carriage return character "\r" (U+000D), or a carriage
return followed immediately by a line feed "\r\n" (U+000D U+000A). A
line is either a sequence of zero or more characters followed by a
line terminator, or it is a sequence of one or more characters
followed by the end of the string. A line does not include the line
terminator.
With this one can simply do:
Stream<String> stream = str.lines();
then if you want an array:
String[] array = str.lines().toArray(String[]::new);
Given this method returns a Stream it upon up a lot of options for you as it enables one to write concise and declarative expression of possibly-parallel operations.
You don't have to double escape characters in character groups.
For all non empty lines use:
String.split("[\r\n]+")
All answers given here actually do not respect Javas definition of new lines as given in e.g. BufferedReader#readline. Java is accepting \n, \r and \r\n as new line. Some of the answers match multiple empty lines or malformed files. E..g. <sometext>\n\r\n<someothertext> when using [\r\n]+would result in two lines.
String lines[] = string.split("(\r\n|\r|\n)", -1);
In contrast, the answer above has the following properties:
it complies with Javas definition of a new line such as e.g. the BufferedReader is using it
it does not match multiple new lines
it does not remove trailing empty lines
If, for some reason, you don't want to use String.split (for example, because of regular expressions) and you want to use functional programming on Java 8 or newer:
List<String> lines = new BufferedReader(new StringReader(string))
.lines()
.collect(Collectors.toList());
Maybe this would work:
Remove the double backslashes from the parameter of the split method:
split = docStr.split("\n");
For preserving empty lines from getting squashed use:
String lines[] = String.split("\\r?\\n", -1);
The above answers did not help me on Android, thanks to the Pshemo response that worked for me on Android. I will leave some of Pshemo's answer here :
split("\\\\n")
The above code doesnt actually do anything visible - it just calcualtes then dumps the calculation. Is it the code you used, or just an example for this question?
try doing textAreaDoc.insertString(int, String, AttributeSet) at the end?
There is new boy in the town, so you need not to deal with all above complexities.
From JDK 11 onward, just need to write as single line of code, it will split lines and returns you Stream of String.
public class MyClass {
public static void main(String args[]) {
Stream<String> lines="foo \n bar \n baz".lines();
//Do whatever you want to do with lines
}}
Some references.
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html#lines()
https://www.azul.com/90-new-features-and-apis-in-jdk-11/
I hope this will be helpful to someone. Happy coding.
Sadly, Java lacks a both simple and efficient method for splitting a string by a fixed string. Both String::split and the stream API are complex and relatively slow. Also, they can produce different results.
String::split examines its input, then compiles to java.util.regex.Pattern every time (except if the input contains only a single char that's safe).
However, Pattern is very fast, once it was compiled. So the best solution is to precompile the pattern:
private static final Pattern LINE_SEP_PATTERN = Pattern.compile("\\R");
Then use it like this:
String[] lines = LINE_SEP_PATTERN.split(input);
From Java 8, \R matches to any line break specified by Unicode. Prior to Java 8 you could use something like this:
Pattern.compile(Pattern.quote(System.lineSeparator()))
String lines[] =String.split( System.lineSeparator())
After failed attempts on the basis of all given solutions. I replace \n with some special word and then split. For me following did the trick:
article = "Alice phoned\n bob.";
article = article.replace("\\n", " NEWLINE ");
String sen [] = article.split(" NEWLINE ");
I couldn't replicate the example given in the question. But, I guess this logic can be applied.
As an alternative to the previous answers, guava's Splitter API can be used if other operations are to be applied to the resulting lines, like trimming lines or filtering empty lines :
import com.google.common.base.Splitter;
Iterable<String> split = Splitter.onPattern("\r?\n").trimResults().omitEmptyStrings().split(docStr);
Note that the result is an Iterable and not an array.
There are three different conventions (it could be said that those are de facto standards) to set and display a line break:
carriage return + line feed
line feed
carriage return
In some text editors, it is possible to exchange one for the other:
The simplest thing is to normalize to line feedand then split.
final String[] lines = contents.replace("\r\n", "\n")
.replace("\r", "\n")
.split("\n", -1);
try this hope it was helpful for you
String split[], docStr = null;
Document textAreaDoc = (Document)e.getDocument();
try {
docStr = textAreaDoc.getText(textAreaDoc.getStartPosition().getOffset(), textAreaDoc.getEndPosition().getOffset());
} catch (BadLocationException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
split = docStr.split("\n");
package in.javadomain;
public class JavaSplit {
public static void main(String[] args) {
String input = "chennai\nvellore\ncoimbatore\nbangalore\narcot";
System.out.println("Before split:\n");
System.out.println(input);
String[] inputSplitNewLine = input.split("\\n");
System.out.println("\n After split:\n");
for(int i=0; i<inputSplitNewLine.length; i++){
System.out.println(inputSplitNewLine[i]);
}
}
}
I have a string that is read in pairs, separated by comma. However, I do not always want to split at the comma because there is not always 1 comma in the input. For example, the string,
(http://www.wolframalpha.com/input/?i=103%2F30+%3D+4a-3b,+71%2F60+%3D+a+%2B+b
,http://www.wolframalpha.com/input/?i=x%5E2%2B5x%2B6,file:///tmp/foo/bar/p,d,f.pdf)
Is read in all one line. For this case, I only want to split at the ,h, and no where else in the string. Essentially, after the split, the strings should be:
http://www.wolframalpha.com/input/?i=103%2F30+%3D+4a-3b,+71%2F60+%3D+a+%2B+b
http://www.wolframalpha.com/input/?i=x%5E2%2B5x%2B6
file:///tmp/foo/bar/p,d,f.pdf
Maintaining the order of the comma in the first string. (I will get rid of parenthesis). I have looked at this stack overflow question, and while helpful, does not correctly split this string. This is in Java. Any help is appreciated.
You can use regex to do the split. Please see below code snippet.
String str = "(http://www.wolframalpha.com/input/?i=103%2F30+%3D+4a-3b,+71%2F60+%3D+a+%2B+b,http://www.wolframalpha.com/input/?i=x%5E2%2B5x%2B6)";
String[] strArr = str.split("(,(?=http))");
You will have Array of all the value which would be possible according to your requirement.
Split on 'http' then re-add it.
Psuedo-code
String input = "http://www.wolframalpha.com/input/?i=103%2F30+%3D+4a-3b,+71%2F60+%3D+a+%2B+b
,http://www.wolframalpha.com/input/?i=x%5E2%2B5x%2B6"
List<String> split = input.split('http');
List<String> finalList = new ArrayList<String>();
for(String fixup in split)
{
finalList.put( "http" + fixup );
}
Final should contain the two URLs.
{
ArrayList<String> node_array = new ArrayList<String>();
String allValues[] = node.split("[(,)]");
for(String value : allValues){
node_array.add(value);
}
node is a string, for example: (3,4,5,6,3)
for some reason when I verify the content of the arraylist the split seems to leave a trail of space as elements, specifically where ( and ) is supposed to be. What am I doing wrong?
You're asking split() to split at parentheses and commas. In your string, there is a blank substring right before the first separator, the opening parenthesis. split() is keeping that blank substring and returning it at the zeroth element of the resulting array.
There are plenty of examples in the documentation that illustrate how the function works.
To work around this, you can either ignore the empty strings, or flip the regex on its head and match the numbers instead of splitting at the punctuation characters.
You have defined a separator to be the one of the characters that's the first character in your String, so an empty string "" will show up in your ArrayList, because that what occurs before the first separator. However, for your application you can easily fix it like this:
ArrayList<String> node_array = new ArrayList<String>();
String allValues[] = node.split("[(,)]");
for(String value : allValues){
if(!value.equals("")) node_array.add(value);
}
return node_array;
node.replace("(","").replace(")","").split(",");
or
node.substring(1,node.length()-1).split(",");
I am trying to break apart a very simple collection of strings that come in the forms of
0|0
10|15
30|55
etc etc. Essentially numbers that are seperated by pipes.
When I use java's string split function with .split("|"). I get somewhat unpredictable results. white space in the first slot, sometimes the number itself isn't where I thought it should be.
Can anybody please help and give me advice on how I can use a reg exp to keep ONLY the integers?
I was asked to give the code trying to do the actual split. So allow me to do that in hopes to clarify further my problem :)
String temp = "0|0";
String splitString = temp.split("|");
results
\n
0
|
0
I am trying to get
0
0
only. Forever grateful for any help ahead of time :)
I still suggest to use split(), it skips null tokens by default. you want to get rid of non numeric characters in the string and only keep pipes and numbers, then you can easily use split() to get what you want. or you can pass multiple delimiters to split (in form of regex) and this should work:
String[] splited = yourString.split("[\\|\\s]+");
and the regex:
import java.util.regex.*;
Pattern pattern = Pattern.compile("\\d+(?=([\\|\\s\\r\\n]))");
Matcher matcher = pattern.matcher(yourString);
while (matcher.find()) {
System.out.println(matcher.group());
}
The pipe symbol is special in a regexp (it marks alternatives), you need to escape it. Depending on the java version you are using this could well explain your unpredictable results.
class t {
public static void main(String[]_)
{
String temp = "0|0";
String[] splitString = temp.split("\\|");
for (int i=0; i<splitString.length; i++)
System.out.println("splitString["+i+"] is " + splitString[i]);
}
}
outputs
splitString[0] is 0
splitString[1] is 0
Note that one backslash is the regexp escape character, but because a backslash is also the escape character in java source you need two of them to push the backslash into the regexp.
You can do replace white space for pipes and split it.
String test = "0|0 10|15 30|55";
test = test.replace(" ", "|");
String[] result = test.split("|");
Hope this helps for you..
You can use StringTokenizer.
String test = "0|0";
StringTokenizer st = new StringTokenizer(test);
int firstNumber = Integer.parseInt(st.nextToken()); //will parse out the first number
int secondNumber = Integer.parseInt(st.nextToken()); //will parse out the second number
Of course you can always nest this inside of a while loop if you have multiple strings.
Also, you need to import java.util.* for this to work.
The pipe ('|') is a special character in regular expressions. It needs to be "escaped" with a '\' character if you want to use it as a regular character, unfortunately '\' is a special character in Java so you need to do a kind of double escape maneuver e.g.
String temp = "0|0";
String[] splitStrings = temp.split("\\|");
The Guava library has a nice class Splitter which is a much more convenient alternative to String.split(). The advantages are that you can choose to split the string on specific characters (like '|'), or on specific strings, or with regexps, and you can choose what to do with the resulting parts (trim them, throw ayway empty parts etc.).
For example you can call
Iterable<String> parts = Spliter.on('|').trimResults().omitEmptyStrings().split("0|0")
This should work for you:
([0-9]+)
Considering a scenario where in we have read a line from csv or xls file in the form of string and need to separate the columns in array of string depending on delimiters.
Below is the code snippet to achieve this problem..
{ ...
....
String line = new BufferedReader(new FileReader("your file"));
String[] splittedString = StringSplitToArray(stringLine,"\"");
...
....
}
public static String[] StringSplitToArray(String stringToSplit, String delimiter)
{
StringBuffer token = new StringBuffer();
Vector tokens = new Vector();
char[] chars = stringToSplit.toCharArray();
for (int i=0; i 0) {
tokens.addElement(token.toString());
token.setLength(0);
i++;
}
} else {
token.append(chars[i]);
}
}
if (token.length() > 0) {
tokens.addElement(token.toString());
}
// convert the vector into an array
String[] preparedArray = new String[tokens.size()];
for (int i=0; i < preparedArray.length; i++) {
preparedArray[i] = (String)tokens.elementAt(i);
}
return preparedArray;
}
Above code snippet contains method call to StringSplitToArray where in the method converts the stringline into string array splitting the line depending on the delimiter specified or passed to the method. Delimiter can be comma separator(,) or double code(").
For more on this, follow this link : http://scrapillars.blogspot.in
How can I replace all line breaks from a string in Java in such a way that will work on Windows and Linux (ie no OS specific problems of carriage return/line feed/new line etc.)?
I've tried (note readFileAsString is a function that reads a text file into a String):
String text = readFileAsString("textfile.txt");
text.replace("\n", "");
but this doesn't seem to work.
How can this be done?
You need to set text to the results of text.replace():
String text = readFileAsString("textfile.txt");
text = text.replace("\n", "").replace("\r", "");
This is necessary because Strings are immutable -- calling replace doesn't change the original String, it returns a new one that's been changed. If you don't assign the result to text, then that new String is lost and garbage collected.
As for getting the newline String for any environment -- that is available by calling System.getProperty("line.separator").
As noted in other answers, your code is not working primarily because String.replace(...) does not change the target String. (It can't - Java strings are immutable!) What replace actually does is to create and return a new String object with the characters changed as required. But your code then throws away that String ...
Here are some possible solutions. Which one is most correct depends on what exactly you are trying to do.
// #1
text = text.replace("\n", "");
Simply removes all the newline characters. This does not cope with Windows or Mac line terminations.
// #2
text = text.replace(System.getProperty("line.separator"), "");
Removes all line terminators for the current platform. This does not cope with the case where you are trying to process (for example) a UNIX file on Windows, or vice versa.
// #3
text = text.replaceAll("\\r|\\n", "");
Removes all Windows, UNIX or Mac line terminators. However, if the input file is text, this will concatenate words; e.g.
Goodbye cruel
world.
becomes
Goodbye cruelworld.
So you might actually want to do this:
// #4
text = text.replaceAll("\\r\\n|\\r|\\n", " ");
which replaces each line terminator with a space1. Since Java 8 you can also do this:
// #5
text = text.replaceAll("\\R", " ");
And if you want to replace multiple line terminator with one space:
// #6
text = text.replaceAll("\\R+", " ");
1 - Note there is a subtle difference between #3 and #4. The sequence \r\n represents a single (Windows) line terminator, so we need to be careful not to replace it with two spaces.
This function normalizes down all whitespace, including line breaks, to single spaces. Not exactly what the original question asked for, but likely to do exactly what is needed in many cases:
import org.apache.commons.lang3.StringUtils;
final String cleansedString = StringUtils.normalizeSpace(rawString);
If you want to remove only line terminators that are valid on the current OS, you could do this:
text = text.replaceAll(System.getProperty("line.separator"), "");
If you want to make sure you remove any line separators, you can do it like this:
text = text.replaceAll("\\r|\\n", "");
Or, slightly more verbose, but less regexy:
text = text.replaceAll("\\r", "").replaceAll("\\n", "");
str = str.replaceAll("\\r\\n|\\r|\\n", " ");
Worked perfectly for me after searching a lot, having failed with every other line.
This would be efficient I guess
String s;
s = "try this\n try me.";
s.replaceAll("[\\r\\n]+", "")
Linebreaks are not the same under windows/linux/mac. You should use System.getProperties with the attribute line.separator.
String text = readFileAsString("textfile.txt").replaceAll("\n", "");
Even though the definition of trim() in oracle website is
"Returns a copy of the string, with leading and trailing whitespace omitted."
the documentation omits to say that new line characters (leading and trailing) will also be removed.
In short
String text = readFileAsString("textfile.txt").trim(); will also work for you.
(Checked with Java 6)
In Kotlin, and also since Java 11, String has lines() method, which returns list of lines in the multi-line string.
You can get all the lines and then merge them into a single string.
With Kotlin it will be as simple as
str.lines().joinToString("")
String text = readFileAsString("textfile.txt").replace("\n","");
.replace returns a new string, strings in Java are Immutable.
You may want to read your file with a BufferedReader. This class can break input into individual lines, which you can assemble at will. The way BufferedReader operates recognizes line ending conventions of the Linux, Windows and MacOS worlds automatically, regardless of the current platform.
Hence:
BufferedReader br = new BufferedReader(
new InputStreamReader("textfile.txt"));
StringBuilder sb = new StringBuilder();
for (;;) {
String line = br.readLine();
if (line == null)
break;
sb.append(line);
sb.append(' '); // SEE BELOW
}
String text = sb.toString();
Note that readLine() does not include the line terminator in the returned string. The code above appends a space to avoid gluing together the last word of a line and the first word of the next line.
I find it odd that (Apache) StringUtils wasn't covered here yet.
you can remove all newlines (or any other occurences of a substring for that matter) from a string using the .replace method
StringUtils.replace(myString, "\n", "");
This line will replace all newlines with the empty string.
because newline is technically a character you can optionally use the .replaceChars method that will replace characters
StringUtils.replaceChars(myString, '\n', '');
FYI if you can want to replace simultaneous muti-linebreaks with single line break then you can use
myString.trim().replaceAll("[\n]{2,}", "\n")
Or replace with a single space
myString.trim().replaceAll("[\n]{2,}", " ")
You can use apache commons IOUtils to iterate through the line and append each line to StringBuilder. And don't forget to close the InputStream
StringBuilder sb = new StringBuilder();
FileInputStream fin=new FileInputStream("textfile.txt");
LineIterator lt=IOUtils.lineIterator(fin, "utf-8");
while(lt.hasNext())
{
sb.append(lt.nextLine());
}
String text = sb.toString();
IOUtils.closeQuitely(fin);
You can use generic methods to replace any char with any char.
public static void removeWithAnyChar(String str, char replceChar,
char replaceWith) {
char chrs[] = str.toCharArray();
int i = 0;
while (i < chrs.length) {
if (chrs[i] == replceChar) {
chrs[i] = replaceWith;
}
i++;
}
}
org.apache.commons.lang.StringUtils#chopNewline
Try doing this:
textValue= textValue.replaceAll("\n", "");
textValue= textValue.replaceAll("\t", "");
textValue= textValue.replaceAll("\\n", "");
textValue= textValue.replaceAll("\\t", "");
textValue= textValue.replaceAll("\r", "");
textValue= textValue.replaceAll("\\r", "");
textValue= textValue.replaceAll("\r\n", "");
textValue= textValue.replaceAll("\\r\\n", "");