Splitting a string on space except for single space - java

I was splitting a string on white spaces using the following
myString.split("\\s+");
How do i provide exception for single space. i.e split on space except for single space

Like this:
myString.split("\\s{2,}");
or like this,
myString.split(" \\s+"); // notice the blank at the beginning.
It depends on what you really want, which is not clear by reading the question.
You can check the quantifier syntax in the Pattern class.

You can use a pattern like
myString.split("\\s\\s+");
This only matches if a whitespace character is followed by further whitespace charactes.
Please note that a whitespace character is more than a simple blank.

"Your String".split("\\s{2,}");
will do the job.
For example:
String str = "I am a String";
String []strArr = str.split("\\s{2,}");
This will return an array with length 3.
The following would be the output.
strArr[0] = "I am"
strArr[1] = "a"
strArr[2] = "String"
I hope this answers your question.

If you literally want to exclude a single space, as opposed to other types of whitespace, then you'll need the following:
s.split("\\s{2,}|[\\s&&[^ ]]")
This constructs a character class by subtracting the space from the \s built-in character class.

Related

String.split() not working as intended

I'm trying to split a string, however, I'm not getting the expected output.
String one = "hello 0xA0xAgoodbye";
String two[] = one.split(" |0xA");
System.out.println(Arrays.toString(two));
Expected output: [hello, goodbye]
What I got: [hello, , , goodbye]
Why is this happening and how can I fix it?
Thanks in advance! ^-^
If you'd like to treat consecutive delimiters as one, you could modify your regex as follows:
"( |0xA)+"
This means "a space or the string "0xA", repeated one or more times".
(\\s|0xA)+ This will match one or more number of space or 0xA in the text and split them
This result is caused by multiple consecutive matches in the string. You may wrap the pattern with a grouping construct and apply a + quantifier to it to match multiple matches:
String one = "hello 0xA0xAgoodbye";
String two[] = one.split("(?:\\s|0xA)+");
System.out.println(Arrays.toString(two));
A (?:\s|0xA)+ regex matches 1 or more whitespace symbols or 0XA literal character sequences.
See the Java online demo.
However, you will still get an empty value as the first item in the resulting array if the 0xA or whitespaces appear at the start of the string. Then, you will have to remove them first:
String two[] = one.replaceFirst("^(?:\\s|0xA)+", "").split("(?:\\s+|0xA)+");
See another Java demo.

Remove unwanted characters from string by regex in Java

I have a string here:
javax.swing.JLabel[,380,30,150x25,alignmentX=0.0,alignmentY=0.0]: Hello
I want to remove everything before the ":", including the ":" itself. This would leave only "Hello". I read about regex, but no combination I tried worked. Can someone tell me how to do it. Thanks in advance!
You need to use replaceAll method or replaceFirst.
string.replaceFirst(".*:\\s*", "");
or
string.replaceAll(".*:\\s*", "");
This would give you only Hello. If you remove \\s* pattern,then it would give you <space>Hello string.
.* Matches any character zero or more times, greedily.
: Upto the colon.
\\s* Matches zero or more space characters.
You could also just split the string by : and take the second string. Like this
String sample = "javax.swing.JLabel[,380,30,150x25,alignmentX=0.0,alignmentY=0.0]: Hello";
System.out.println(sample.split(":", -1)[1]);
This will output
<space>Hello
If you want to get rid of that leading space just trim it off like
System.out.println(sample.split(":", -1)[1].trim());

Split a String on an Integer followed by a space

I have a rather large String that i need to split so I can put it into an array. As it is, there will be a semicolon followed by an Integer, followed by a space and this is where I need to split it.
Say for instance, I have a String:
first aaa;0 second bbb;1 third ccc;2
I need to split it so that it becomes:
first aaa;0
second bbb;1
third ccc;2
I assume I can use something like:
Pattern pattern = Pattern.compile(^([0-9]*\s");
myArray = pattern.split(string_to_split);
I just don't understand RegEx that well yet.
Thanks to anyone taking a look
Also, the pattern where it should be split will always be a semicolon, followed by only one digit and then the space.
Just split your input string according to the below regex.
(?<=;\\d)\\s
Code:
String s = "first aaa;0 second bbb;1 third ccc;2";
String[] tok = s.split("(?<=;\\d)\\s");
System.out.println(Arrays.toString(tok));
Output:
[first aaa;0, second bbb;1, third ccc;2]
Explanation:
(?<=;\d) Positive lookbehind is used here. It sets the matching marker just after to the ;<number>. That is, it asserts what precedes the space character is must be a semicolon and a number.
(?<=;\d)\s Now it matches the following space character.
Splitting your input string according to that matched space will give you the desired output.

What's wrong with my split() and its regex?

Part of my application I encountered this problem. The String line variable contains 12.2 Andrew and I'm trying to split them separately but it doesn't work and comes with a NumberFormatException error. Could you guys help me on that please?
String line = "12.2 Andrew";
String[] data = line.split("(?<=\\d)(?=[a-zA-Z])");
System.out.println(Double.valueOf.(data[0]));
Did you look at your data variable? It didn't split anything at all, since the condition never matches. You are looking for a place in the input immediately after a number and before a letter, and since there is a space in between this doesn't exist.
Try adding a space in the middle, that should fix it:
String[] data = line.split("(?<=\\d) (?=[a-zA-Z])");
Your split is not working, and not splitting the String.
Therefore Double.parseDouble is parsing the whole input.
Try the following:
String line = "12.2 Andrew";
String[] data = line.split("(?<=\\d)(?=[a-zA-Z])");
System.out.println(Arrays.toString(data));
// System.out.println(Double.valueOf(data[0]));
// fixed
data = line.split("(?<=\\d).(?=[a-zA-Z])");
System.out.println(Arrays.toString(data));
System.out.println(Double.valueOf(data[0]));
Output
[12.2 Andrew]
[12.2, Andrew]
12.2
If you print content of data[0] you will notice that it still contains 12.2 Andrew so you actually didn't split anything. That is because your regex says:
split on place which has digit before and letter after it
which for data like
123foo345bar 123 baz
effectively can only split in places marked with |
123|foo345|bar 123 baz
^it will not split `123 baz` like
`123| baz` because after digit is space (not letter)
`123 |baz` before letter is space (not digit)
so regex can't match it
What you need is to "split on space which has digit before and letter after it" so use
String[] data = line.split("(?<=\\d)\\s+(?=[a-zA-Z])");
// ^^^^ - this represent one ore more whitespaces

String splitting with different character

i think it is a weird question. So here is my splitting:
String s = "asd#asd";
String[] raw1 = s.split("#"); // this has size of two raw[0] = raw[1] = "asd"
However,
String s = "asd$asd";
String[] raw2 = s.split("$"); // this has size of ONE
raw2 is not splitted. Does anyone know why?
Because split() takes a regexp, and $ indicates the end-of-line. If you need to split on a character that is actually a regexp metacharacter, then you'll need to escape it.
See Pattern for the regexp metacharacters.
You may find that StringTokenizer is more appropriate for your needs. This will take a list of characters that you should split on, and it won't interpret them as regular expression metacharacters. However it's a little more verbose and unweildy to use. As Nandkumar notes below, the latest docs states that it is discouraged in new code.
Because split() takes a regex and $ matches the end of a line.
You have to escape it :
s.split("\\$");
See Pattern documentation for more information on regexes.
You have to escape it:
String s = "asd$asd";
String[] raw2 = s.split("\\$"); // this has size of TWO
You need to escape special character, make it
s.split("\\$");

Categories

Resources