Split string containing newline characters Java - java

Say I have a following string str:
GTM =0.2
Test =100
[DLM]
ABCDEF =5
(yes, it contains newline characters) That I am trying to split with [DLM] delimiter substring like this:
String[] strArr = str.split("[DLM]");
Why is it that when I do:
System.out.print(strArr[0]);
I get this output: GT
and when I do
System.out.print(strArr[1]);
I get =0.2
Does this make any sense at all?

str.split("[DLM]"); should be str.split("\\[DLM\\]");
Why?
[ and ] are special characters and String#split accepts regex.
A solution that I like more is using Pattern#quote:
str.split(Pattern.quote("[DLM]"));
quote returns a String representation of the given regex.

Yes, you're giving a regex which says "split with either D, or L, or M".
You should escape those boys like this: str.split("\[DLM\]");
It's being split at the first M.

Escape the brackets
("\\[DLM\\]")
When you use brackets inside the " ", it reads it as, each character inside of the brackets is a delimiter. So in your case, M was a delimiter

use
String[] strArr = str.split("\\[DLM]\\");
Instead of
String[] strArr = str.split("[DLM]");
Other wise it will split with either D, or L, or M.

Related

Regex to add square brackets into beginning/end sentence and also each word separate by comma?

What kind of regex do I need to use to add square brackets into beginning/end sentence and also each word separate by comma?
I have a sentence like this:
qqqqqqq\
asadsds\
dsdadad\
sadadad\
asdsada\
dsadadd";
I expecting to get result like this:
[qqqqqq, asadsds, dsdadad, sadadad, asdsada, dsadadd]
I try many things such as:
String regex1 = "(^[a-zA-z_0-9])(\\s)([a-zA-z_0-9]$)";
It seems that you need to split the input string into some "words" thus excluding all non-letter or non-word characters as delimiters using String.split("\\W+") or String.split("[^a-zA-Z]+").
Then re-join the cleaned words using commas and adding brackets which can be provided by Arrays.toString method.
This can be implemented simply:
String input = "qqqqqqq\\\nasadsds\\\ndsdadad\\\nsadadad\\\nasdsada\\\ndsadadd\";";
System.out.println(input);
System.out.println("------------");
String result = Arrays.toString(input.split("\\W+"));
System.out.println(result);
Output:
qqqqqqq\
asadsds\
dsdadad\
sadadad\
asdsada\
dsadadd";
------------
[qqqqqqq, asadsds, dsdadad, sadadad, asdsada, dsadadd]

Extracting numbers into a string array

I have a string which is of the form
String str = "124333 is the otp of candidate number 9912111242.
Please refer txn id 12323335465645 while referring blah blah.";
I need 124333, 9912111242 and 12323335465645 in a string array. I have tried this with
while (Character.isDigit(sms.charAt(i)))
I feel that running the above said method on every character is inefficient. Is there a way I can get a string array of all the numbers?
Use a regex (see Pattern and matcher):
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(<your string here>);
while (m.find()) {
//m.group() contains the digits you want
}
you can easily build ArrayList that contains each matched group you find.
Or, as other suggested, you can split on non-digits characters (\D):
"blabla 123 blabla 345".split("\\D+")
Note that \ has to be escaped in Java, hence the need of \\.
You can use String.split():
String[] nbs = str.split("[^0-9]+");
This will split the String on any group of non-numbers digits.
And this works perfectly for your input.
String str = "124333 is the otp of candidate number 9912111242. Please refer txn id 12323335465645 while referring blah blah.";
System.out.println(Arrays.toString(str.split("\\D+")));
Output:
[124333, 9912111242, 12323335465645]
\\D+ Matches one or more non-digit characters. Splitting the input according to one or more non-digit characters will give you the desired output.
Java 8 style:
long[] numbers = Pattern.compile("\\D+")
.splitAsStream(str)
.mapToLong(Long::parseLong)
.toArray();
Ah if you only need a String array, then you can just use String.split as the other answers suggests.
Alternatively, you can try this:
String str = "124333 is the otp of candidate number 9912111242. Please refer txn id 12323335465645 while referring blah blah.";
str = str.replaceAll("\\D+", ",");
System.out.println(Arrays.asList(str.split(",")));
\\D+ matches one or more non digits
Output
[124333, 9912111242, 12323335465645]
First thing comes into my mind is filter and split, then i realized that it can be done via
String[] result =str.split("\\D+");
\D matches any non-digit character, + says that one or more of these are needed, and leading \ escapes the other \ since \D would be parsed as 'escape character D' which is invalid

How can I split a string except when the delimiter is protected by quotes or brackets?

I asked How to split a string with conditions. Now I know how to ignore the delimiter if it is between two characters.
How can I check multiple groups of two characters instead of one?
I found Regex for splitting a string using space when not surrounded by single or double quotes, but I don't understand where to change '' to []. Also, it works with two groups only.
Is there a regex that will split using , but ignore the delimiter if it is between "" or [] or {}?
For instance:
// Input
"text1":"text2","text3":"text,4","text,5":["text6","text,7"],"text8":"text9","text10":{"text11":"text,12","text13":"text14","text,15":["text,16","text17"],"text,18":"text19"}
// Output
"text1":"text2"
"text3":"text,4"
"text,5":["text6","text,7"]
"text8":"text9"
"text10":{"text11":"text,12","text13":"text14","text,15":["text,16","text17"],"text,18":"text19"}
You can use:
text = "\"text1\":\"text2\",\"text3\":\"text,4\",\"text,5\":[\"text6\",\"text,7\"],\"text8\":\"text9\",\"text10\":{\"text11\":\"text,12\",\"text13\":\"text14\",\"text,15\":[\"text,16\",\"text17\"],\"text,18\":\"text19\"}";
String[] toks = text.split("(?=(?:(?:[^\"]*\"){2})*[^\"]*$)(?![^{]*})(?![^\\[]*\\]),+");
for (String tok: toks)
System.out.printf("%s%n", tok);
- RegEx Demo
OUTPUT:
"text1":"text2"
"text3":"text,4"
"text,5":["text6","text,7"]
"text8":"text9"
"text10":{"text11":"text,12","text13":"text14","text,15":["text,16","text17"],"text,18":"text19"}

Java string split giving unexpected result

I have this string
String x = "2013-04-17T08:00:00.001,41.14806,-9.58972,-13.0,0.0,0.0,-20.0,4|2013-04-17T08:00:00.001,41.14806,-9.58972,-22.0,0.0,0.0,-20.0,4|2013-04-17T08:00:00.001,41.14806,-9.58972,-31.0,0.0,0.0,-20.0,4|2013-04-17T08:00:00.001,41.14806,-9.58972,-40.0,0.0,0.0,-20.0,4|2013-04-17T08:00:00.001,41.14806,-9.58972,-49.0,0.0,0.0,-20.0,4|2013-04-17T08:00:00.001,41.14806,-9.58972,-58.0,0.0,0.0,-20.0,4|2013-04-17T08:00:00.001,41.14806,-9.58972,-64.0,0.0,0.0,-20.0,4";
if i'm doing the split like this String vec2 [] = x.split(","); the output it will be this
2013-04-17T08:00:00.001
41.14806
-9.58972
-13.0
0.0
0.0
-20.0
and so on.
If I'm doing the split like this String vec2[] = x.split("|"); the output is this:
2
0
1
3
-
0
4
-
1
7
T
0
8
:
0
0
:
and so on.
And I would expect something similar to this:
2013-04-17T08:00:00.001,41.14806,-9.58972,-13.0,0.0,0.0,-20.0,4
2013-04-17T08:00:00.001,41.14806,-9.58972,-22.0,0.0,0.0,-20.0,4
and so on
Any idea what's wrong?
You need to escape the |:
String vec2[] = x.split("\\|");
That's because the argument to split() is a regex not a string.
In regexes, some characters have special meanings.
The vertical bar | represens alternation. So if you want to split according to |, you need to write \\| which like telling: "Don't take | as a special character, take it as the symbol |".
The argument to split is a regular expression and the "|" character has special meaning. Try escaping it \\|.
String.split(String) splits on a regular expression, not on a character. As you can see in the summary of Java regular expression constructs, the | functions as an or construct.
If you want to split on the | character, you might need to escape it using \|. Note that to escape it in a Java String, you'll need to escape the backslash as well: \\|.
The problem is that the split(String regex) takes a regular expression as argument. The pipe (|) is a special character in regex and must thus be escaped:
String x = "2013-04-17T08:00:00.001,41.14806,-9.58972,-13.0,0.0,0.0,-20.0,4|2013-04-17T08:00:00.001,41.14806,-9.58972,-22.0,0.0,0.0,-20.0,4|2013-04-17T08:00:00.001,41.14806,-9.58972,-31.0,0.0,0.0,-20.0,4|2013-04-17T08:00:00.001,41.14806,-9.58972,-40.0,0.0,0.0,-20.0,4|2013-04-17T08:00:00.001,41.14806,-9.58972,-49.0,0.0,0.0,-20.0,4|2013-04-17T08:00:00.001,41.14806,-9.58972,-58.0,0.0,0.0,-20.0,4|2013-04-17T08:00:00.001,41.14806,-9.58972,-64.0,0.0,0.0,-20.0,4";
String[] arr = x.split("\\|");
for(String str : arr)
{
System.out.println(str);
}
Yields:
2013-04-17T08:00:00.001,41.14806,-9.58972,-13.0,0.0,0.0,-20.0,4
2013-04-17T08:00:00.001,41.14806,-9.58972,-22.0,0.0,0.0,-20.0,4
2013-04-17T08:00:00.001,41.14806,-9.58972,-31.0,0.0,0.0,-20.0,4
2013-04-17T08:00:00.001,41.14806,-9.58972,-40.0,0.0,0.0,-20.0,4
2013-04-17T08:00:00.001,41.14806,-9.58972,-49.0,0.0,0.0,-20.0,4
2013-04-17T08:00:00.001,41.14806,-9.58972,-58.0,0.0,0.0,-20.0,4
2013-04-17T08:00:00.001,41.14806,-9.58972,-64.0,0.0,0.0,-20.0,4
Try this
String vec2[] = x.split("\\|");
You need to escape the | character, since it is the regex or pattern.
String x = "2013-04-17T08:00:00.001,41.14806,-9.58972,-13.0,0.0,0.0,-20.0,4|2013-04-17T08:00:00.001,41.14806,-9.58972,-22.0,0.0,0.0,-20.0,4|2013-04-17T08:00:00.001,41.14806,-9.58972,-31.0,0.0,0.0,-20.0,4|2013-04-17T08:00:00.001,41.14806,-9.58972,-40.0,0.0,0.0,-20.0,4|2013-04-17T08:00:00.001,41.14806,-9.58972,-49.0,0.0,0.0,-20.0,4|2013-04-17T08:00:00.001,41.14806,-9.58972,-58.0,0.0,0.0,-20.0,4|2013-04-17T08:00:00.001,41.14806,-9.58972,-64.0,0.0,0.0,-20.0,4";
String[] arr = x.split("\\|");
for(String s: arr){
System.out.println(s);
}
did you try escaping the character as such
x.split("\\|");

split strings with uppercase

I have some strings that I want to split them word by word. They are in different formats like:
THIS-IS-MY-STRING
ThisIsMyString
This_Is_My_String
This is my string
I use:
String[] x = str1.split("(?=[A-Z])|[_]|[-]|[ ]");
But there are some problems:
some elements in x array will be empty
for the first string I want “THIS” but the result of split is “T”, “H”, “I”, “S”
How should I change split to reach my purpose? Could you please help me?
You need to include look-behind as well, here you go:
String[] x = str1.split("([-_ ]|(?<=[^-_ A-Z])(?=[A-Z]))");
[-_ ] means - or _ or space.
(?<=[^-_ A-Z]) means the previous character isn't a -, _, space, or A-Z.
(?=[A-Z]) means the next character is A-Z.
Reference.
EDIT:
Unfortunately there is no way (I know of) that you can use split to split _CITY_ABC while avoiding _CITY or an empty string.
You can however only process the first and last string if not empty, but this is not ideal.
For this I suggest Matcher:
String str1 = "_CityCITY_";
Pattern p = Pattern.compile("[A-Z][a-z]+(?=[A-Z]|$)|[A-Za-z]+(?=[-_ ]|$)");
Matcher m = p.matcher(str1);
while (m.find())
System.out.println(m.group());
Try Regex.Split(). The first param is the string to split and the second string would be your regular expression. Hope this helps.

Categories

Resources