Match a character exactly once with regex - java

How can I match a character exactly once with a regex in Java? Let's say I want to look for strings which contain exactly one time the digit 3, and it doesn't matter where it is.
I tried to do this with ".*3{1}.*" but obviously this will also match "330" as I specified with the period that I don't care what character it is. How can I fix this?

^[^3]*3[^3]*$
Match (not three), then three, then (not three).
Edit: Adding ^ and $ at beginning and end. This will force the regex to match the whole line. Thanks #Bobbyrogers and #Mindastic

A non-regex solution:
int index = s.indexOf('3');
boolean unique = index != -1 && index == s.lastIndexOf('3');
Basically the character is unique if the first and last occurrences are at the same place and exist in the string (not -1).

Related

Please justify the output in Regex Java program

I have came across one Java program in Regex .
Below is the program code :
import java.util.regex.*;
public class Regex_demo01 {
public static void main(String[] args) {
boolean b=true;
Pattern p=Pattern.compile("\\d*");
Matcher m=p.matcher("ab34ef");
while(b=m.find())
{
System.out.println(b);
System.out.println(">"+m.start()+"\t"+m.group()+"<");
}
}
}
Output :
true
>0 <
true
>1 <
true
>2 34<
true
>4 <
true
>5 <
true
>6 <
Doubt : As we all know that The find() method returns true if it gets a match and remembers the start position of the match. If find() returns true, you can call the start() method to get the starting position of the match, and you can call the group() method to get the string that represents the actual bit of source data that was matched.
My question is how come ">6 <" is present is the output when the string indexing is till index 5 ?
Anser is simple. x* matche any count of x even 0.
Replace * to + which matche to 1 or more element that is left to it.
My question is how come >6 < is present is the output when the string indexing is till index 5 ?
That behavior is due to your regex i.e. \\d* which matches 0 or more digits.
As you can see it is showing start position 0 as well when there is no digit at the start.
Similarly 6 is last index +1 because there is an empty match past the last character as well.
You should use \\d+ as your regex.
The star quantifier (*) is defined as "zero or more times". That said, your pattern matches zero digits most of the time.
What you actually want is probably the plus quantifier (+), which means "one or more times".
Source: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Why is there a match at index 6?
RegEx doesn't work on a char-basis, but rather inbetween single chars. When matching an empty string, it will look before and after every character. Duplicate findings are omitted, of course, so an empty string after the first char and before the second char will yield one match instead of two. By default the algorithm is greedy, which means it will match as many characters as possible.
Consider this example:
Input string is 1
RegEx is \\d*
In this case the RegEx engine starts before the first character and tries to match zero, one or more digits. Since it's greedy, it doesn't stop after the empty string it finds at the beginning. It finds a '1' with no digits following. This is the first match. Then it continues the search after the match. It finds an empty string and matches it too, since that equals zero digits.
For RegEx the string '1' looks rather like this:
"" + "1" + ""
The first two units (empty string and the "1") match the pattern, the third, empty string does, too.
In-depth article about this: http://www.regular-expressions.info/zerolength.html

How do implement this regex function in java?

The string should be 15 character maximum, composed of numbers only and there should be at least two one-character sized whitespace anywhere in the string.
It is easy to find the solution for numeric only, I'm getting stuck finding adding the condition for the whitespace.
I tried searching the most frequently asked regex question but couldn't find anything similar.
EDIT:
Additional conditions
whitespaces cannot be next to each other
they must not be placed in first or last character
I suppose for your demands, something like this would work:
\d+(\s\d+){2,}
But you'll need to check the length separately (e.g. input.length() <= 15).
This expressions says:
Digits in the beginning.
Then a single space followed by digits - at least two such combinations
This ensures that all spaces in the string are not before or after a space, and that there are at least two of them. It also prevents the spaces from being in the beginning or the end, and also allows for more than two of them.
You can use this regex: [0-9\s]{2,15}
And in your Java code you check if there are three parts separated by a whitespace:
String input =...;
if (input.matches("[0-9\\s]{2,15}") && (input.split(" ").length == 3) ) {
System.out.println("valid input");
}
Edited: Leading and ending whitespaces, connected whitespaces are not allowed

regex to strip leading zeros treated as string

I have numbers like this that need leading zero's removed.
Here is what I need:
00000004334300343 -> 4334300343
0003030435243 -> 3030435243
I can't figure this out as I'm new to regular expressions. This does not work:
(^0)
You're almost there. You just need quantifier:
str = str.replaceAll("^0+", "");
It replaces 1 or more occurrences of 0 (that is what + quantifier is for. Similarly, we have * quantifier, which means 0 or more), at the beginning of the string (that's given by caret - ^), with empty string.
Accepted solution will fail if you need to get "0" from "00". This is the right one:
str = str.replaceAll("^0+(?!$)", "");
^0+(?!$) means match one or more zeros if it is not followed by end of string.
Thank you to the commenter - I have updated the formula to match the description from the author.
If you know input strings are all containing digits then you can do:
String s = "00000004334300343";
System.out.println(Long.valueOf(s));
// 4334300343
Code Demo
By converting to Long it will automatically strip off all leading zeroes.
Another solution (might be more intuitive to read)
str = str.replaceFirst("^0+", "");
^ - match the beginning of a line
0+ - match the zero digit character one or more times
A exhausting list of pattern you can find here Pattern.
\b0+\B will do the work. See demo \b anchors your match to a word boundary, it matches a sequence of one or more zeros 0+, and finishes not in a word boundary (to not eliminate the last 0 in case you have only 00...000)
The correct regex to strip leading zeros is
str = str.replaceAll("^0+", "");
This regex will match 0 character in quantity of one and more at the string beginning.
There is not reason to worry about replaceAll method, as regex has ^ (begin input) special character that assure the replacement will be invoked only once.
Ultimately you can use Java build-in feature to do the same:
String str = "00000004334300343";
long number = Long.parseLong(str);
// outputs 4334300343
The leading zeros will be stripped for you automatically.
I know this is an old question, but I think the best way to do this is actually
str = str.replaceAll("(^0+)?(\d+)", "$2")
The reason I suggest this is because it splits the string into two groups. The second group is at least one digit. The first group matches 1 or more zeros at the start of the line. However, the first group is optional, meaning that if there are no leading zeros, you just get all of the digits. And, if str is only a zero, you get exactly one zero (because the second group must match at least one digit).
So if it's any number of 0s, you get back exactly one zero. If it starts with any number of 0s followed by any other digit, you get no leading zeros. If it starts with any other digit, you get back exactly what you had in the first place.
Here is the simple and proper solution.
str = str.replaceAll(/^0+/g, "");
Global Flag g is required when using replaceAll with regex

How to negate a vowel condition using Regex in java

I'm trying to construct a Regex for a string which should have these following conditions:
It must contain at least one vowel.
It cannot contain three consecutive vowels or three consecutive consonants.
It cannot contain two consecutive occurrences of the same letter, except for 'ee' or 'oo'.
I'm not able to construct regex for 2nd and 3rd conditions.
e.g:
bower - accepted,
appple - not accepted,
miiixer - not accepted,
hedding - not accepted,
feeding - accepted
Thanks in advance!
Edited:
My code:
Pattern ptn = Pattern.compile("((.*[A-Za-z0-9]*)(.*[aeiou|AEIOU]+)(.*[##$%]).*)(.*[^a]{3}.*)");
Matcher mtch = ptn.matcher("zoggax");
if (mtch.find()) {
return true;
}
else
return false;
The following one should suit your needs:
(?=.*[aeiouy])(?!.*[aeiouy]{3})(?!.*[a-z&&[^aeiouy]]{3})(?!.*([a-z&&[^eo]])\\1).*
In Java:
String regex = "(?=.*[aeiouy])(?!.*[aeiouy]{3})(?!.*[a-z&&[^aeiouy]]{3})(?!.*([a-z&&[^eo]])\\1).*";
System.out.println("bower".matches(regex));
System.out.println("appple".matches(regex));
System.out.println("miiixer".matches(regex));
System.out.println("hedding".matches(regex));
System.out.println("feeding".matches(regex));
Prints:
true
false
false
false
true
Explanation:
(?=.*[aeiouy]): contains at least one vowel
(?!.*[aeiouy]{3}): does not contain 3 consecutive vowels
(?!.*[a-z&&[^aeiouy]]{3}): does not contain 3 consecutive consonants
[a-z&&[^aeiouy]]: any letter between a and z but none of aeiouy
(?!.*([a-z&&[^eo]])\1): does not contain 2 consecutive letters, except e and o
[a-z&&[^eo]]: any letter between a and z, but none of eo
See http://www.regular-expressions.info/charclassintersect.html.
This should work for English under the assumption that 'y' is a non-vowel;
^(?!.*[aeiou]{3})(?!.*[bcdfghjklmnpqrstvwxyz]{3})(?!.*([^eo])\1).*[aeiou]
Explanation:
^ fixes the match to the beginning of the string.
(?!.*[aeiou]{3}) checks that you can not find 3 consecutive vowels at any point after the current position in the string. (Since this is immidiately after the ^ this checks the entire string). It also does not advance the cursor.
Non vowels are tested similarily. This can be done in a prettier way if your regexp flavor supports set subtraction. But I think Java does not do this.
(?!.*([^eo])\1) checks that there are no occurence of a single character capture group, of characters other than e or o, which is followed by a copy of itself. Ie. no character other than e and o is repeated twice.
.*[aeiou] looks for a vowel at some point in the string.
This regexp also assumes that the case-insensitive flag is set. I think this is the default for java but I can be wrong about that.
It also is a regexp that will find a match in a string satisfying your criteria. It will not necesarily match the whole string. - If this is needed add .*$ to the end of the regexp.
If my hunch is correct that you meant to say "three consecutive occurrences of the same letter" (looking at your examples) then you can simply say "e and o may not occur thrice, everything else may not occur twice", like so:
^(?=.*[aeiouy].*)(?!.*([eo])\1\1.*)(?!.*([a-df-np-z])\2.*).*$
Debuggex Demo, Key is that a letter occuring thrice is also occuring twice.

Java: substring.matches numbers

I'm trying to check if a my substring within my teleInput string contain numbers but I can't seem to get it return true. The results always ends up as false, what am I doing wrong.
String teleInput = "(555)555-5555";
boolean returntT = teleInput.substring(1,3).matches(".*[0-9].*");
I'm an extreme beginner so I don't know if I'm missing something obvious, like mixing methods or maybe something is wrong with the rest of my code, either way, I would really appreciate the help!
Your problem is your substring - you are only returning two characters: the second parameter is the index up to, but not including, the last character. To get 3 characters, you need:
teleInput.substring(1,4)
So try this (notice that matches only needs to check that it's "all digits", because the length is already know to be 3):
teleInput.substring(1,4).matches("\\d*");
Or just forget substring and use matches alone:
teleInput.matches(".\\d{3}.*");
Which is the regex for "any character then 3 digits then anything".
But you can validate the entire input in one line:
teleInput.matches("(.\\d{3}){3}\\d");
You might have to read up on regex to understand this pattern, but it works!
Note: matches() must match the entire string to be true.
If you want to check if the 2nd character in your string is number, you can do it like this:
String teleInput = "(555)555-5555";
boolean returntT = teleInput.substring(1,2).matches("[0-9]");
Instead of this line:
boolean returntT = teleInput.substring(1,3).matches(".*[0-9].*");
You should be doing:
boolean returntT = teleInput.substring(1,3).matches("[0-9]+");
OR this:
boolean returntT = teleInput.substring(1,3).matches("\\d{2}");
Regex [0-9]+ will make sure that the given input has only 1 or more digits in it. For good tutorial on regex read: http://www.regular-expressions.info/

Categories

Resources