I'm trying to write a regex that will identify whether a string has 2 or more consecutive commas. For example:
hello,,457
,,,,,
dog,,,elephant,,,,,
Can anyone help on what a valid regex would be?
String str ="hello,,,457";
Pattern pat = Pattern.compile("[,]{2,}");
Matcher matcher = pat.matcher(str);
if(matcher.find()){
System.out.println("contains 2 or more commas");
}
The below regex would matches the strings which has two or more consecutive commas,
^.*?,,+.*$
DEMO
You don't need to include start and the end anchors while using the regex with matches method.
System.out.println("dog,,,elephant,,,,,".matches(".*?,,+.*"));
Output:
true
Try:
int occurance = StringUtils.countOccurrencesOf("dog,,,elephant,,,,,", ",,");
or
int count = StringUtils.countMatches("dog,,,elephant,,,,,", ",,");
depend which library you use:
Check the solution here: Java: How do I count the number of occurrences of a char in a String?
Related
I am far from mastering regular expressions but I would like to split a string on first and last underscore e.g.
split the string on first and last underscore with regular expression
"hello_5_9_2018_world"
to
"hello"
"5_9_2018"
"world"
I can split it on the last underscore with
String[] splitArray = subjectString.split("_(?=[^_]*$)");
but I am not able to figure out how to split on first underscore.
Could anyone show me how I can do this?
Thanks
David
You can achieve this without regex. You can achieve this by finding the first and last index of _ and getting substrings based on them.
String s = "hello_5_9_2018_world";
int firstIndex = s.indexOf("_");
int lastIndex = s.lastIndexOf("_");
System.out.println(s.substring(0, firstIndex));
System.out.println(s.substring(firstIndex + 1, lastIndex));
System.out.println(s.substring(lastIndex + 1));
The above prints
hello
5_9_2018
world
Note:
If the string does not have two _ you will get a StringIndexOutOfBoundsException.
To safeguard against it, you can check if the extracted indices are valid.
If firstIndex == lastIndex == -1 then it means the string does
not have any underscores.
If firstIndex == lastIndex then the string has just one underscore.
If you have always three parts as above, you can use
([^_]*)_(.*)_(^_)*
and get the single elements as groups.
Regular Expression
(?<first>[^_]+)_(?<middle>.+)+_(?<last>[^_]+)
Demo
Java Code
final String str = "hello_5_9_2018_world";
Pattern pattern = Pattern.compile("(?<first>[^_]+)_(?<middle>.+)+_(?<last>[^_]+)");
Matcher matcher = pattern.matcher(str);
if(matcher.matches()) {
String first = matcher.group("first");
String middle = matcher.group("middle");
String last = matcher.group("last");
}
I see that a lot of guys provided their solution, but I have another regex pattern for your question
You can achieve your goal with this pattern:
"([a-zA-Z]+)_(.*)_([a-zA-Z]+)"
The whole code looks like this:
String subjectString= "hello_5_9_2018_world";
Pattern pattern = Pattern.compile("([a-zA-Z]+)_(.*)_([a-zA-Z]+)");
Matcher matcher = pattern.matcher(subjectString);
if(matcher.matches()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
}
It outputs:
hello
5_9_2018
world
While the other answers are actually nicer and better, if you really want to use split, this is the way to go:
"hello_5_9_2018_world".split("((?<=^[^_]*)_)|(_(?=[^_]*$))")
==> String[3] { "hello", "5_9_2018", "world" }
This is a combination of your lookahead pattern (_(?=[^_]*$))
and the symmetrical look-behind pattern: ((?<=^[^_]*)_)
(match the _ preceeded by ^ (start of the string) and [^_]* (0..n non-underscore chars).
I have this small piece of code
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]"))
{
System.out.println(s);
}
}
Supposed to print
dkoe
but it prints nothing!!
Welcome to Java's misnamed .matches() method... It tries and matches ALL the input. Unfortunately, other languages have followed suit :(
If you want to see if the regex matches an input text, use a Pattern, a Matcher and the .find() method of the matcher:
Pattern p = Pattern.compile("[a-z]");
Matcher m = p.matcher(inputstring);
if (m.find())
// match
If what you want is indeed to see if an input only has lowercase letters, you can use .matches(), but you need to match one or more characters: append a + to your character class, as in [a-z]+. Or use ^[a-z]+$ and .find().
[a-z] matches a single char between a and z. So, if your string was just "d", for example, then it would have matched and been printed out.
You need to change your regex to [a-z]+ to match one or more chars.
String.matches returns whether the whole string matches the regex, not just any substring.
java's implementation of regexes try to match the whole string
that's different from perl regexes, which try to find a matching part
if you want to find a string with nothing but lower case characters, use the pattern [a-z]+
if you want to find a string containing at least one lower case character, use the pattern .*[a-z].*
Used
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]+"))
{
System.out.println(s);
}
}
I have faced the same problem once:
Pattern ptr = Pattern.compile("^[a-zA-Z][\\']?[a-zA-Z\\s]+$");
The above failed!
Pattern ptr = Pattern.compile("(^[a-zA-Z][\\']?[a-zA-Z\\s]+$)");
The above worked with pattern within ( and ).
Your regular expression [a-z] doesn't match dkoe since it only matches Strings of lenght 1. Use something like [a-z]+.
you must put at least a capture () in the pattern to match, and correct pattern like this:
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("(^[a-z]+$)"))
{
System.out.println(s);
}
}
You can make your pattern case insensitive by doing:
Pattern p = Pattern.compile("[a-z]+", Pattern.CASE_INSENSITIVE);
I have this small piece of code
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]"))
{
System.out.println(s);
}
}
Supposed to print
dkoe
but it prints nothing!!
Welcome to Java's misnamed .matches() method... It tries and matches ALL the input. Unfortunately, other languages have followed suit :(
If you want to see if the regex matches an input text, use a Pattern, a Matcher and the .find() method of the matcher:
Pattern p = Pattern.compile("[a-z]");
Matcher m = p.matcher(inputstring);
if (m.find())
// match
If what you want is indeed to see if an input only has lowercase letters, you can use .matches(), but you need to match one or more characters: append a + to your character class, as in [a-z]+. Or use ^[a-z]+$ and .find().
[a-z] matches a single char between a and z. So, if your string was just "d", for example, then it would have matched and been printed out.
You need to change your regex to [a-z]+ to match one or more chars.
String.matches returns whether the whole string matches the regex, not just any substring.
java's implementation of regexes try to match the whole string
that's different from perl regexes, which try to find a matching part
if you want to find a string with nothing but lower case characters, use the pattern [a-z]+
if you want to find a string containing at least one lower case character, use the pattern .*[a-z].*
Used
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]+"))
{
System.out.println(s);
}
}
I have faced the same problem once:
Pattern ptr = Pattern.compile("^[a-zA-Z][\\']?[a-zA-Z\\s]+$");
The above failed!
Pattern ptr = Pattern.compile("(^[a-zA-Z][\\']?[a-zA-Z\\s]+$)");
The above worked with pattern within ( and ).
Your regular expression [a-z] doesn't match dkoe since it only matches Strings of lenght 1. Use something like [a-z]+.
you must put at least a capture () in the pattern to match, and correct pattern like this:
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("(^[a-z]+$)"))
{
System.out.println(s);
}
}
You can make your pattern case insensitive by doing:
Pattern p = Pattern.compile("[a-z]+", Pattern.CASE_INSENSITIVE);
I have this small piece of code
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]"))
{
System.out.println(s);
}
}
Supposed to print
dkoe
but it prints nothing!!
Welcome to Java's misnamed .matches() method... It tries and matches ALL the input. Unfortunately, other languages have followed suit :(
If you want to see if the regex matches an input text, use a Pattern, a Matcher and the .find() method of the matcher:
Pattern p = Pattern.compile("[a-z]");
Matcher m = p.matcher(inputstring);
if (m.find())
// match
If what you want is indeed to see if an input only has lowercase letters, you can use .matches(), but you need to match one or more characters: append a + to your character class, as in [a-z]+. Or use ^[a-z]+$ and .find().
[a-z] matches a single char between a and z. So, if your string was just "d", for example, then it would have matched and been printed out.
You need to change your regex to [a-z]+ to match one or more chars.
String.matches returns whether the whole string matches the regex, not just any substring.
java's implementation of regexes try to match the whole string
that's different from perl regexes, which try to find a matching part
if you want to find a string with nothing but lower case characters, use the pattern [a-z]+
if you want to find a string containing at least one lower case character, use the pattern .*[a-z].*
Used
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]+"))
{
System.out.println(s);
}
}
I have faced the same problem once:
Pattern ptr = Pattern.compile("^[a-zA-Z][\\']?[a-zA-Z\\s]+$");
The above failed!
Pattern ptr = Pattern.compile("(^[a-zA-Z][\\']?[a-zA-Z\\s]+$)");
The above worked with pattern within ( and ).
Your regular expression [a-z] doesn't match dkoe since it only matches Strings of lenght 1. Use something like [a-z]+.
you must put at least a capture () in the pattern to match, and correct pattern like this:
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("(^[a-z]+$)"))
{
System.out.println(s);
}
}
You can make your pattern case insensitive by doing:
Pattern p = Pattern.compile("[a-z]+", Pattern.CASE_INSENSITIVE);
I want to find all numbers from a given string (all numbers are mixed with letters but are separated by space).I try to split the input String but when check the result array I find that there are a lot of empty Strings, so how to change my split regex to remove this empty spaces?
Pattern reg = Pattern.compile("\\D0*");
String[] numbers = reg.split("asd0085 sa223 9349x");
for(String s:numbers){
System.out.println(s);
}
And the result:
85
223
9349
I know that I can iterate over the array and to remove empty results. But how to do it only with regex?
If you are using java 8, you can do it in 1 statement like this:
String[] array = Arrays.asList(s1.split("[,]")).stream().filter(str -> !str.isEmpty()).collect(Collectors.toList()).toArray(new String[0]);
Don't use split. Use find method which will return all matching substrings. You can do it like
Pattern reg = Pattern.compile("\\d+");
Matcher m = reg.matcher("asd0085 sa223 9349x");
while (m.find())
System.out.println(m.group());
which will print
0085
223
9349
Based on your regex it seems that your goal is also to remove leading zeroes like in case of 0085. If that is true, you can use regex like 0*(\\d+) and take part matched by group 1 (the one in parenthesis) and let leading zeroes be matched outside of that group.
Pattern reg = Pattern.compile("0*(\\d+)");
Matcher m = reg.matcher("asd0085 sa223 9349x");
while (m.find())
System.out.println(m.group(1));
Output:
85
223
9349
But if you really want to use split then change "\\D0*" to \\D+0* so you could split on one-or-more non-digits \\D+, not just one non-digit \\D, but with this solution you may need to ignore first empty element in result array (depending if string will start with element which should be split on, or not).
You can try with Pattern and Matcher as well.
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher("asd0085 sa223 9349x");
while (m.find()) {
System.out.println(m.group());
}
The method i think to solve this problem is,
String urStr = "asd0085 sa223 9349x";
urStr = urStr.replaceAll("[a-zA-Z]", "");
String[] urStrAry = urStr.split("\\s");
Replace all alphabets from the string.
Then split it by whitespace (\\s).
Pattern reg = Pattern.compile("\\D+");
// ...
results in:
0085
223
9349
You may try this:
reg.split("asd0085 sa223 9349x").replace("^/", "")
Using String.split(), you get an empty string as array element, when you have back to back delimiter in your string, on which you're splitting.
For e.g, if you split xyyz on y, the 2nd element will be an empty string. To avoid that, you can just add a quantifier to delimiter - y+, so that split happens on 1 or more iteration.
In your case it happens because you've used \\D0* which will match each non-digit character, and split on that. Thus you've back to back delimiter. You can of course use surrounding quantifier here:
Pattern reg = Pattern.compile("(\\D0*)+");
But what you really need is: \\D+0* there.
However, if what you only want is the numeric sequence from your string, I would use Matcher#find() method instead, with \\d+ as regex.