split strings with uppercase - java

I have some strings that I want to split them word by word. They are in different formats like:
THIS-IS-MY-STRING
ThisIsMyString
This_Is_My_String
This is my string
I use:
String[] x = str1.split("(?=[A-Z])|[_]|[-]|[ ]");
But there are some problems:
some elements in x array will be empty
for the first string I want “THIS” but the result of split is “T”, “H”, “I”, “S”
How should I change split to reach my purpose? Could you please help me?

You need to include look-behind as well, here you go:
String[] x = str1.split("([-_ ]|(?<=[^-_ A-Z])(?=[A-Z]))");
[-_ ] means - or _ or space.
(?<=[^-_ A-Z]) means the previous character isn't a -, _, space, or A-Z.
(?=[A-Z]) means the next character is A-Z.
Reference.
EDIT:
Unfortunately there is no way (I know of) that you can use split to split _CITY_ABC while avoiding _CITY or an empty string.
You can however only process the first and last string if not empty, but this is not ideal.
For this I suggest Matcher:
String str1 = "_CityCITY_";
Pattern p = Pattern.compile("[A-Z][a-z]+(?=[A-Z]|$)|[A-Za-z]+(?=[-_ ]|$)");
Matcher m = p.matcher(str1);
while (m.find())
System.out.println(m.group());

Try Regex.Split(). The first param is the string to split and the second string would be your regular expression. Hope this helps.

Related

split on integer values but not floating point values

I have a java program where I need to split on integer values but not floating point values
ie. "1/\\2" should produce: [1,/\\,2]
but "1.0/\\2.0" should produce: [1.0,/\\,2.0]
does anybody have any ideas?
or could anybody point me in the direction of how to split on the specific strings "\\/" and "/\\" ?
UPDATE: sorry! one more case! for the string "100 /\ 3.4e+45" I need to split it into:
[100,/\,3.4,e,+,45]
my current regex is (kind of really ugly):
line.split("\\s+|(?<=[-+])|(?=[-+])|(?:(?<=[0-9])(?![0-9.]|$))|(?:(?<![0-9.]|^)(?=[0-9]))|(?<=[-+()])|(?=[-+()])|(?<=e)|(?=e)");
and for the string: "100 /\ 3.4e+45" is giving me:
[100,/\,3.4,+,45]
This regex should do it:
(?:(?<=[0-9])(?![0-9.]|$))|(?:(?<![0-9.]|^)(?=[0-9]))
It's two checks, basically matching:
A digit not followed by a digit, a decimal point, or the end of text.
A digit not preceded by a digit, a decimal point, or the start of text.
It will match the empty space after/before the digit, so you can use this regex in split().
See regex101 for demo.
Follow-up
could anybody point me in the direction of how to split on the specific strings "\/" and "/\""
If you want to split before a specific pattern, use a positive lookahead: (?=xxx). If you want to split after a specific pattern, use a positive lookbehind: (?<=xxx). To do either, separate by |:
(?<=xxx)|(?=xxx)
where xxx is the text \/ or /\, i.e. the regex \\/|/\\, and doubling for Java string literal:
"(?<=\\\\/|/\\\\)|(?=\\\\/|/\\\\)"
See regex101 for demo.
You could try something like this:
String regex = "\\d+(.\\d+)?", str = "1//2";
Matcher m = Pattern.compile(regex).matcher(str);
ArrayList<String> list = new ArrayList<String>();
int index = 0;
for(index = 0 ; m.find() ; index = m.end()) {
if(index != m.start()) list.add(str.substring(index, m.start()));
list.add(str.substring(m.start(), m.end()));
}
list.add(str.substring(index));
The idea is to find number using regex and Matcher, and also add the strings in between.

Regex with single letter is not identifying

I have a Regex
[\w\W][^\s]+
My intension is to identify a word
may contain special character only
may contain word only
may contain special character and word
can have single letter.
Should not identify space or tab
Above 3 conditions are working, but my 4th condition is not working. Can anyone please help?
[^\s]+
This should do it for you.
\s match any white space character [\r\n\t\f ]
You can use a negated class that allows any 1 or more characters other than a tab or space:
[^\t\p{Zs}]+
See IDEONE Demo:
String str = "Your string here";
String rx = "[^\t\\p{Zs}]+";
Pattern ptrn = Pattern.compile(rx);
Matcher m = ptrn.matcher(str);
while (m.find()) {
System.out.println(m.group(0));
}
You only use [^\s\t\n]+ it identify all single char to many char without (\s=space, \t=tab, \n=enter key)

Extracting numbers into a string array

I have a string which is of the form
String str = "124333 is the otp of candidate number 9912111242.
Please refer txn id 12323335465645 while referring blah blah.";
I need 124333, 9912111242 and 12323335465645 in a string array. I have tried this with
while (Character.isDigit(sms.charAt(i)))
I feel that running the above said method on every character is inefficient. Is there a way I can get a string array of all the numbers?
Use a regex (see Pattern and matcher):
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(<your string here>);
while (m.find()) {
//m.group() contains the digits you want
}
you can easily build ArrayList that contains each matched group you find.
Or, as other suggested, you can split on non-digits characters (\D):
"blabla 123 blabla 345".split("\\D+")
Note that \ has to be escaped in Java, hence the need of \\.
You can use String.split():
String[] nbs = str.split("[^0-9]+");
This will split the String on any group of non-numbers digits.
And this works perfectly for your input.
String str = "124333 is the otp of candidate number 9912111242. Please refer txn id 12323335465645 while referring blah blah.";
System.out.println(Arrays.toString(str.split("\\D+")));
Output:
[124333, 9912111242, 12323335465645]
\\D+ Matches one or more non-digit characters. Splitting the input according to one or more non-digit characters will give you the desired output.
Java 8 style:
long[] numbers = Pattern.compile("\\D+")
.splitAsStream(str)
.mapToLong(Long::parseLong)
.toArray();
Ah if you only need a String array, then you can just use String.split as the other answers suggests.
Alternatively, you can try this:
String str = "124333 is the otp of candidate number 9912111242. Please refer txn id 12323335465645 while referring blah blah.";
str = str.replaceAll("\\D+", ",");
System.out.println(Arrays.asList(str.split(",")));
\\D+ matches one or more non digits
Output
[124333, 9912111242, 12323335465645]
First thing comes into my mind is filter and split, then i realized that it can be done via
String[] result =str.split("\\D+");
\D matches any non-digit character, + says that one or more of these are needed, and leading \ escapes the other \ since \D would be parsed as 'escape character D' which is invalid

How to remove empty results after splitting with regex in Java?

I want to find all numbers from a given string (all numbers are mixed with letters but are separated by space).I try to split the input String but when check the result array I find that there are a lot of empty Strings, so how to change my split regex to remove this empty spaces?
Pattern reg = Pattern.compile("\\D0*");
String[] numbers = reg.split("asd0085 sa223 9349x");
for(String s:numbers){
System.out.println(s);
}
And the result:
85
223
9349
I know that I can iterate over the array and to remove empty results. But how to do it only with regex?
If you are using java 8, you can do it in 1 statement like this:
String[] array = Arrays.asList(s1.split("[,]")).stream().filter(str -> !str.isEmpty()).collect(Collectors.toList()).toArray(new String[0]);
Don't use split. Use find method which will return all matching substrings. You can do it like
Pattern reg = Pattern.compile("\\d+");
Matcher m = reg.matcher("asd0085 sa223 9349x");
while (m.find())
System.out.println(m.group());
which will print
0085
223
9349
Based on your regex it seems that your goal is also to remove leading zeroes like in case of 0085. If that is true, you can use regex like 0*(\\d+) and take part matched by group 1 (the one in parenthesis) and let leading zeroes be matched outside of that group.
Pattern reg = Pattern.compile("0*(\\d+)");
Matcher m = reg.matcher("asd0085 sa223 9349x");
while (m.find())
System.out.println(m.group(1));
Output:
85
223
9349
But if you really want to use split then change "\\D0*" to \\D+0* so you could split on one-or-more non-digits \\D+, not just one non-digit \\D, but with this solution you may need to ignore first empty element in result array (depending if string will start with element which should be split on, or not).
You can try with Pattern and Matcher as well.
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher("asd0085 sa223 9349x");
while (m.find()) {
System.out.println(m.group());
}
The method i think to solve this problem is,
String urStr = "asd0085 sa223 9349x";
urStr = urStr.replaceAll("[a-zA-Z]", "");
String[] urStrAry = urStr.split("\\s");
Replace all alphabets from the string.
Then split it by whitespace (\\s).
Pattern reg = Pattern.compile("\\D+");
// ...
results in:
0085
223
9349
You may try this:
reg.split("asd0085 sa223 9349x").replace("^/", "")
Using String.split(), you get an empty string as array element, when you have back to back delimiter in your string, on which you're splitting.
For e.g, if you split xyyz on y, the 2nd element will be an empty string. To avoid that, you can just add a quantifier to delimiter - y+, so that split happens on 1 or more iteration.
In your case it happens because you've used \\D0* which will match each non-digit character, and split on that. Thus you've back to back delimiter. You can of course use surrounding quantifier here:
Pattern reg = Pattern.compile("(\\D0*)+");
But what you really need is: \\D+0* there.
However, if what you only want is the numeric sequence from your string, I would use Matcher#find() method instead, with \\d+ as regex.

Java (Regex?) split string between number/letter combination

I've been looking through pages and pages of Google results but haven't come across anything that could help me.
What I'm trying to do is split a string like Bananas22Apples496Pears3, and break it down into some kind of readable format. Since String.split() cannot do this, I was wondering if anyone could point me to a regex snippet that could accomplish this.
Expanding a bit: the above string would be split into (String[] for simplicity's sake):
{"Bananas:22", "Apples:496", "Pears:3"}
Try this
String s = "Bananas22Apples496Pears3";
String[] res = s.replaceAll("(?<=\\p{L})(?=\\d)", ":").split("(?<=\\d)(?=\\p{L})");
for (String t : res) {
System.out.println(t);
}
The first step would be to replace the empty string with a ":", when on the left is a letter with the lookbehind assertion (?<=\\p{L}) and on the right is a digit, with the lookahead assertion (?=\\d).
Then split the result, when on the left is a digit and on the right is a letter.
\\p{L} is a Unicode property that matches every letter in every language.
You need to Replace and then split the string.You can't do it with the split alone
1> Replace All the string with the following regex
(\\w+?)(\\d+)
and replace it with
$1:$2
2> Now Split it with this regex
(?<=\\d)(?=[a-zA-Z])
This should do what you want:
import java.util.regex.*;
String d = "Bananas22Apples496Pears3"
Pattern p = Pattern.compile("[A-Za-z]+|[0-9]+");
Matcher m = p.matcher(d);
while (m.find()) {
System.out.println(m.group());
}
// Bananas
// 22
// Apples
// 496
// Pears
// 3
String myText = "Bananas22Apples496Pears3";
System.out.println(myText.replaceAll("([A-Za-z]+)([0-9]+)", "$1:$2,"));
Replace \d+ by :$0 and then split at (?=[a-zA-Z]+:\d+).

Categories

Resources