Split a String on an Integer followed by a space - java

I have a rather large String that i need to split so I can put it into an array. As it is, there will be a semicolon followed by an Integer, followed by a space and this is where I need to split it.
Say for instance, I have a String:
first aaa;0 second bbb;1 third ccc;2
I need to split it so that it becomes:
first aaa;0
second bbb;1
third ccc;2
I assume I can use something like:
Pattern pattern = Pattern.compile(^([0-9]*\s");
myArray = pattern.split(string_to_split);
I just don't understand RegEx that well yet.
Thanks to anyone taking a look
Also, the pattern where it should be split will always be a semicolon, followed by only one digit and then the space.

Just split your input string according to the below regex.
(?<=;\\d)\\s
Code:
String s = "first aaa;0 second bbb;1 third ccc;2";
String[] tok = s.split("(?<=;\\d)\\s");
System.out.println(Arrays.toString(tok));
Output:
[first aaa;0, second bbb;1, third ccc;2]
Explanation:
(?<=;\d) Positive lookbehind is used here. It sets the matching marker just after to the ;<number>. That is, it asserts what precedes the space character is must be a semicolon and a number.
(?<=;\d)\s Now it matches the following space character.
Splitting your input string according to that matched space will give you the desired output.

Related

Java regex negative lookahead to replace non-triple characters

I'm trying to take a number, convert it into a string and replace all characters that are not a triple.
Eg. if I pass in 1222331 my replace method should return 222. I can find that this pattern exists but I need to get the value and save it into a string for additional logic. I don't want to do a for loop to iterate through this string.
I have the following code:
String first = Integer.toString(num1);
String x = first.replaceAll("^((?!([0-9])\\3{2})).*$","");
But it's replacing the triple digits also. I only need it to replace the rest of the characters. Is my approach wrong?
You can use
first = first.replaceAll("((\\d)\\2{2})|\\d", "$1");
See regex demo
The regex - ((\d)\2{2})|\d - matches either a digit that repeats thrice (and captures it into Group 1), or just matches any other digit. $1 just restores the captured text in the resulting string while removing all others.

regex to strip leading zeros treated as string

I have numbers like this that need leading zero's removed.
Here is what I need:
00000004334300343 -> 4334300343
0003030435243 -> 3030435243
I can't figure this out as I'm new to regular expressions. This does not work:
(^0)
You're almost there. You just need quantifier:
str = str.replaceAll("^0+", "");
It replaces 1 or more occurrences of 0 (that is what + quantifier is for. Similarly, we have * quantifier, which means 0 or more), at the beginning of the string (that's given by caret - ^), with empty string.
Accepted solution will fail if you need to get "0" from "00". This is the right one:
str = str.replaceAll("^0+(?!$)", "");
^0+(?!$) means match one or more zeros if it is not followed by end of string.
Thank you to the commenter - I have updated the formula to match the description from the author.
If you know input strings are all containing digits then you can do:
String s = "00000004334300343";
System.out.println(Long.valueOf(s));
// 4334300343
Code Demo
By converting to Long it will automatically strip off all leading zeroes.
Another solution (might be more intuitive to read)
str = str.replaceFirst("^0+", "");
^ - match the beginning of a line
0+ - match the zero digit character one or more times
A exhausting list of pattern you can find here Pattern.
\b0+\B will do the work. See demo \b anchors your match to a word boundary, it matches a sequence of one or more zeros 0+, and finishes not in a word boundary (to not eliminate the last 0 in case you have only 00...000)
The correct regex to strip leading zeros is
str = str.replaceAll("^0+", "");
This regex will match 0 character in quantity of one and more at the string beginning.
There is not reason to worry about replaceAll method, as regex has ^ (begin input) special character that assure the replacement will be invoked only once.
Ultimately you can use Java build-in feature to do the same:
String str = "00000004334300343";
long number = Long.parseLong(str);
// outputs 4334300343
The leading zeros will be stripped for you automatically.
I know this is an old question, but I think the best way to do this is actually
str = str.replaceAll("(^0+)?(\d+)", "$2")
The reason I suggest this is because it splits the string into two groups. The second group is at least one digit. The first group matches 1 or more zeros at the start of the line. However, the first group is optional, meaning that if there are no leading zeros, you just get all of the digits. And, if str is only a zero, you get exactly one zero (because the second group must match at least one digit).
So if it's any number of 0s, you get back exactly one zero. If it starts with any number of 0s followed by any other digit, you get no leading zeros. If it starts with any other digit, you get back exactly what you had in the first place.
Here is the simple and proper solution.
str = str.replaceAll(/^0+/g, "");
Global Flag g is required when using replaceAll with regex

What's wrong with my split() and its regex?

Part of my application I encountered this problem. The String line variable contains 12.2 Andrew and I'm trying to split them separately but it doesn't work and comes with a NumberFormatException error. Could you guys help me on that please?
String line = "12.2 Andrew";
String[] data = line.split("(?<=\\d)(?=[a-zA-Z])");
System.out.println(Double.valueOf.(data[0]));
Did you look at your data variable? It didn't split anything at all, since the condition never matches. You are looking for a place in the input immediately after a number and before a letter, and since there is a space in between this doesn't exist.
Try adding a space in the middle, that should fix it:
String[] data = line.split("(?<=\\d) (?=[a-zA-Z])");
Your split is not working, and not splitting the String.
Therefore Double.parseDouble is parsing the whole input.
Try the following:
String line = "12.2 Andrew";
String[] data = line.split("(?<=\\d)(?=[a-zA-Z])");
System.out.println(Arrays.toString(data));
// System.out.println(Double.valueOf(data[0]));
// fixed
data = line.split("(?<=\\d).(?=[a-zA-Z])");
System.out.println(Arrays.toString(data));
System.out.println(Double.valueOf(data[0]));
Output
[12.2 Andrew]
[12.2, Andrew]
12.2
If you print content of data[0] you will notice that it still contains 12.2 Andrew so you actually didn't split anything. That is because your regex says:
split on place which has digit before and letter after it
which for data like
123foo345bar 123 baz
effectively can only split in places marked with |
123|foo345|bar 123 baz
^it will not split `123 baz` like
`123| baz` because after digit is space (not letter)
`123 |baz` before letter is space (not digit)
so regex can't match it
What you need is to "split on space which has digit before and letter after it" so use
String[] data = line.split("(?<=\\d)\\s+(?=[a-zA-Z])");
// ^^^^ - this represent one ore more whitespaces

Splitting a string on space except for single space

I was splitting a string on white spaces using the following
myString.split("\\s+");
How do i provide exception for single space. i.e split on space except for single space
Like this:
myString.split("\\s{2,}");
or like this,
myString.split(" \\s+"); // notice the blank at the beginning.
It depends on what you really want, which is not clear by reading the question.
You can check the quantifier syntax in the Pattern class.
You can use a pattern like
myString.split("\\s\\s+");
This only matches if a whitespace character is followed by further whitespace charactes.
Please note that a whitespace character is more than a simple blank.
"Your String".split("\\s{2,}");
will do the job.
For example:
String str = "I am a String";
String []strArr = str.split("\\s{2,}");
This will return an array with length 3.
The following would be the output.
strArr[0] = "I am"
strArr[1] = "a"
strArr[2] = "String"
I hope this answers your question.
If you literally want to exclude a single space, as opposed to other types of whitespace, then you'll need the following:
s.split("\\s{2,}|[\\s&&[^ ]]")
This constructs a character class by subtracting the space from the \s built-in character class.

java split string with regex

I want to split string by setting all non-alphabet as separator.
String[] word_list = line.split("[^a-zA-Z]");
But with the following input
11:11 Hello World
word_list contains many empty string before "hello" and "world"
Please kindly tell me why. Thank You.
Because your regular expression matches each individual non-alpha character. It would be like separating
",,,,,,Hello,World"
on commas.
You will want an expression that matches an entire sequence of non-alpha characters at once such as:
line.split("[^a-zA-Z][^a-zA-Z]*")
I still think you will get one leading empty string with your example since it would be like separating ",Hello,World" if comma were your separator.
Here's your string, where each ^ character shows a match for [^a-zA-Z]:
11:11 Hello World
^^^^^^ ^
The split method finds each of these matches, and basically returns all substrings between the ^ characters. Since there's six matches before any useful data, you end up with 5 empty substrings before you get the string "Hello".
To prevent this, you can manually filter the result to ignore any empty strings.
Will the following do?
String[] word_list = line.replaceAll("[^a-zA-Z ]","").replaceAll(" +", " ").trim().split("[^a-zA-Z]");
What I am doing here is removing all non-alphabet characters before doing the split and then replacing multiple spaces by a single space.

Categories

Resources