How to remove empty results after splitting with regex in Java? - java

I want to find all numbers from a given string (all numbers are mixed with letters but are separated by space).I try to split the input String but when check the result array I find that there are a lot of empty Strings, so how to change my split regex to remove this empty spaces?
Pattern reg = Pattern.compile("\\D0*");
String[] numbers = reg.split("asd0085 sa223 9349x");
for(String s:numbers){
System.out.println(s);
}
And the result:
85
223
9349
I know that I can iterate over the array and to remove empty results. But how to do it only with regex?

If you are using java 8, you can do it in 1 statement like this:
String[] array = Arrays.asList(s1.split("[,]")).stream().filter(str -> !str.isEmpty()).collect(Collectors.toList()).toArray(new String[0]);

Don't use split. Use find method which will return all matching substrings. You can do it like
Pattern reg = Pattern.compile("\\d+");
Matcher m = reg.matcher("asd0085 sa223 9349x");
while (m.find())
System.out.println(m.group());
which will print
0085
223
9349
Based on your regex it seems that your goal is also to remove leading zeroes like in case of 0085. If that is true, you can use regex like 0*(\\d+) and take part matched by group 1 (the one in parenthesis) and let leading zeroes be matched outside of that group.
Pattern reg = Pattern.compile("0*(\\d+)");
Matcher m = reg.matcher("asd0085 sa223 9349x");
while (m.find())
System.out.println(m.group(1));
Output:
85
223
9349
But if you really want to use split then change "\\D0*" to \\D+0* so you could split on one-or-more non-digits \\D+, not just one non-digit \\D, but with this solution you may need to ignore first empty element in result array (depending if string will start with element which should be split on, or not).

You can try with Pattern and Matcher as well.
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher("asd0085 sa223 9349x");
while (m.find()) {
System.out.println(m.group());
}

The method i think to solve this problem is,
String urStr = "asd0085 sa223 9349x";
urStr = urStr.replaceAll("[a-zA-Z]", "");
String[] urStrAry = urStr.split("\\s");
Replace all alphabets from the string.
Then split it by whitespace (\\s).

Pattern reg = Pattern.compile("\\D+");
// ...
results in:
0085
223
9349

You may try this:
reg.split("asd0085 sa223 9349x").replace("^/", "")

Using String.split(), you get an empty string as array element, when you have back to back delimiter in your string, on which you're splitting.
For e.g, if you split xyyz on y, the 2nd element will be an empty string. To avoid that, you can just add a quantifier to delimiter - y+, so that split happens on 1 or more iteration.
In your case it happens because you've used \\D0* which will match each non-digit character, and split on that. Thus you've back to back delimiter. You can of course use surrounding quantifier here:
Pattern reg = Pattern.compile("(\\D0*)+");
But what you really need is: \\D+0* there.
However, if what you only want is the numeric sequence from your string, I would use Matcher#find() method instead, with \\d+ as regex.

Related

Matching three or more identical characters - Java program [duplicate]

I have this small piece of code
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]"))
{
System.out.println(s);
}
}
Supposed to print
dkoe
but it prints nothing!!
Welcome to Java's misnamed .matches() method... It tries and matches ALL the input. Unfortunately, other languages have followed suit :(
If you want to see if the regex matches an input text, use a Pattern, a Matcher and the .find() method of the matcher:
Pattern p = Pattern.compile("[a-z]");
Matcher m = p.matcher(inputstring);
if (m.find())
// match
If what you want is indeed to see if an input only has lowercase letters, you can use .matches(), but you need to match one or more characters: append a + to your character class, as in [a-z]+. Or use ^[a-z]+$ and .find().
[a-z] matches a single char between a and z. So, if your string was just "d", for example, then it would have matched and been printed out.
You need to change your regex to [a-z]+ to match one or more chars.
String.matches returns whether the whole string matches the regex, not just any substring.
java's implementation of regexes try to match the whole string
that's different from perl regexes, which try to find a matching part
if you want to find a string with nothing but lower case characters, use the pattern [a-z]+
if you want to find a string containing at least one lower case character, use the pattern .*[a-z].*
Used
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]+"))
{
System.out.println(s);
}
}
I have faced the same problem once:
Pattern ptr = Pattern.compile("^[a-zA-Z][\\']?[a-zA-Z\\s]+$");
The above failed!
Pattern ptr = Pattern.compile("(^[a-zA-Z][\\']?[a-zA-Z\\s]+$)");
The above worked with pattern within ( and ).
Your regular expression [a-z] doesn't match dkoe since it only matches Strings of lenght 1. Use something like [a-z]+.
you must put at least a capture () in the pattern to match, and correct pattern like this:
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("(^[a-z]+$)"))
{
System.out.println(s);
}
}
You can make your pattern case insensitive by doing:
Pattern p = Pattern.compile("[a-z]+", Pattern.CASE_INSENSITIVE);

How to replace second occurence of pattern in Java? [duplicate]

I have this small piece of code
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]"))
{
System.out.println(s);
}
}
Supposed to print
dkoe
but it prints nothing!!
Welcome to Java's misnamed .matches() method... It tries and matches ALL the input. Unfortunately, other languages have followed suit :(
If you want to see if the regex matches an input text, use a Pattern, a Matcher and the .find() method of the matcher:
Pattern p = Pattern.compile("[a-z]");
Matcher m = p.matcher(inputstring);
if (m.find())
// match
If what you want is indeed to see if an input only has lowercase letters, you can use .matches(), but you need to match one or more characters: append a + to your character class, as in [a-z]+. Or use ^[a-z]+$ and .find().
[a-z] matches a single char between a and z. So, if your string was just "d", for example, then it would have matched and been printed out.
You need to change your regex to [a-z]+ to match one or more chars.
String.matches returns whether the whole string matches the regex, not just any substring.
java's implementation of regexes try to match the whole string
that's different from perl regexes, which try to find a matching part
if you want to find a string with nothing but lower case characters, use the pattern [a-z]+
if you want to find a string containing at least one lower case character, use the pattern .*[a-z].*
Used
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]+"))
{
System.out.println(s);
}
}
I have faced the same problem once:
Pattern ptr = Pattern.compile("^[a-zA-Z][\\']?[a-zA-Z\\s]+$");
The above failed!
Pattern ptr = Pattern.compile("(^[a-zA-Z][\\']?[a-zA-Z\\s]+$)");
The above worked with pattern within ( and ).
Your regular expression [a-z] doesn't match dkoe since it only matches Strings of lenght 1. Use something like [a-z]+.
you must put at least a capture () in the pattern to match, and correct pattern like this:
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("(^[a-z]+$)"))
{
System.out.println(s);
}
}
You can make your pattern case insensitive by doing:
Pattern p = Pattern.compile("[a-z]+", Pattern.CASE_INSENSITIVE);

Extracting numbers into a string array

I have a string which is of the form
String str = "124333 is the otp of candidate number 9912111242.
Please refer txn id 12323335465645 while referring blah blah.";
I need 124333, 9912111242 and 12323335465645 in a string array. I have tried this with
while (Character.isDigit(sms.charAt(i)))
I feel that running the above said method on every character is inefficient. Is there a way I can get a string array of all the numbers?
Use a regex (see Pattern and matcher):
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(<your string here>);
while (m.find()) {
//m.group() contains the digits you want
}
you can easily build ArrayList that contains each matched group you find.
Or, as other suggested, you can split on non-digits characters (\D):
"blabla 123 blabla 345".split("\\D+")
Note that \ has to be escaped in Java, hence the need of \\.
You can use String.split():
String[] nbs = str.split("[^0-9]+");
This will split the String on any group of non-numbers digits.
And this works perfectly for your input.
String str = "124333 is the otp of candidate number 9912111242. Please refer txn id 12323335465645 while referring blah blah.";
System.out.println(Arrays.toString(str.split("\\D+")));
Output:
[124333, 9912111242, 12323335465645]
\\D+ Matches one or more non-digit characters. Splitting the input according to one or more non-digit characters will give you the desired output.
Java 8 style:
long[] numbers = Pattern.compile("\\D+")
.splitAsStream(str)
.mapToLong(Long::parseLong)
.toArray();
Ah if you only need a String array, then you can just use String.split as the other answers suggests.
Alternatively, you can try this:
String str = "124333 is the otp of candidate number 9912111242. Please refer txn id 12323335465645 while referring blah blah.";
str = str.replaceAll("\\D+", ",");
System.out.println(Arrays.asList(str.split(",")));
\\D+ matches one or more non digits
Output
[124333, 9912111242, 12323335465645]
First thing comes into my mind is filter and split, then i realized that it can be done via
String[] result =str.split("\\D+");
\D matches any non-digit character, + says that one or more of these are needed, and leading \ escapes the other \ since \D would be parsed as 'escape character D' which is invalid

Regex to match 2 or more commas

I'm trying to write a regex that will identify whether a string has 2 or more consecutive commas. For example:
hello,,457
,,,,,
dog,,,elephant,,,,,
Can anyone help on what a valid regex would be?
String str ="hello,,,457";
Pattern pat = Pattern.compile("[,]{2,}");
Matcher matcher = pat.matcher(str);
if(matcher.find()){
System.out.println("contains 2 or more commas");
}
The below regex would matches the strings which has two or more consecutive commas,
^.*?,,+.*$
DEMO
You don't need to include start and the end anchors while using the regex with matches method.
System.out.println("dog,,,elephant,,,,,".matches(".*?,,+.*"));
Output:
true
Try:
int occurance = StringUtils.countOccurrencesOf("dog,,,elephant,,,,,", ",,");
or
int count = StringUtils.countMatches("dog,,,elephant,,,,,", ",,");
depend which library you use:
Check the solution here: Java: How do I count the number of occurrences of a char in a String?

split strings with uppercase

I have some strings that I want to split them word by word. They are in different formats like:
THIS-IS-MY-STRING
ThisIsMyString
This_Is_My_String
This is my string
I use:
String[] x = str1.split("(?=[A-Z])|[_]|[-]|[ ]");
But there are some problems:
some elements in x array will be empty
for the first string I want “THIS” but the result of split is “T”, “H”, “I”, “S”
How should I change split to reach my purpose? Could you please help me?
You need to include look-behind as well, here you go:
String[] x = str1.split("([-_ ]|(?<=[^-_ A-Z])(?=[A-Z]))");
[-_ ] means - or _ or space.
(?<=[^-_ A-Z]) means the previous character isn't a -, _, space, or A-Z.
(?=[A-Z]) means the next character is A-Z.
Reference.
EDIT:
Unfortunately there is no way (I know of) that you can use split to split _CITY_ABC while avoiding _CITY or an empty string.
You can however only process the first and last string if not empty, but this is not ideal.
For this I suggest Matcher:
String str1 = "_CityCITY_";
Pattern p = Pattern.compile("[A-Z][a-z]+(?=[A-Z]|$)|[A-Za-z]+(?=[-_ ]|$)");
Matcher m = p.matcher(str1);
while (m.find())
System.out.println(m.group());
Try Regex.Split(). The first param is the string to split and the second string would be your regular expression. Hope this helps.

Categories

Resources