Matching only one occurrence of a character from a given set - java

I need to validate an input string such that validation returns true only if the string contains one of the special characters # # $ %, only one, and one time at the most. Letters and numbers can be anywhere and can be repeated any number of times, but at least one number or letter should be present
For example:
a# : true
#a : true
a#$: false
a#n01 : true
an01 : false
a : false
# : false
I tried
[0-9A-Za-z]*[##%$]{1}[0-9A-Za-z]*
I was hoping this would match one occurrence of any of the special characters. But, no. I need only one occurrence of any one in the set.
I also tried alternation but could not solve it.

Vivek, your regex was really close. Here is the one-line regex you are looking for.
^(?=.*?[0-9a-zA-Z])[0-9a-zA-Z]*[##$%][0-9a-zA-Z]*$
See demo
How does it work?
The ^ and $ anchors ensure that whatever we are matching is the whole string, avoiding partial matches with forbidden characters later.
The (?=.*?[0-9a-zA-Z]) lookahead ensures that we have at least one number or letter.
The [0-9a-zA-Z]*[##$%][0-9a-zA-Z]* matches zero or more letters or digits, followed by exactly one character that is either a #, #, $ or %, followed by zero or more letters or digits—ensuring that we have one special character but no more.
Implementation
I am sure you know how to implement this in Java, but to test if the string match, you could use something like this:
boolean foundMatch = subjectString.matches("^(?=[0-9a-zA-Z]*[##$%][0-9a-zA-Z]*$)[##$%0-9a-zA-Z]*");
What was wrong with my regex?
Actually, your regex was nearly there. Here is what was missing.
Because you didn't have the ^ and $ anchors, the regex was able to match a subset of the string, for instance a# in a##%%, which means that special characters could appear in the string, but outside of the match. Not what you want: we need to validate the whole string by anchoring it.
You needed something to ensure that at least one letter or digit was present. You could definitely have done it with an alternation, but in this case a lookahead is more compact.
Alternative with Alternation
Since you tried alternations, for the record, here is one way to do it:
^(?:[0-9a-zA-Z]+[##$%][0-9a-zA-Z]*|[0-9a-zA-Z]*[##$%][0-9a-zA-Z]+)$
See demo.
Let me know if you have any questions.

I hope this answer will be useful for you, if not, it might be for future readers. I am going to make two assumptions here up front: 1) You do not need regex per se, you are programming in Java. 2) You have access to Java 8.
This could be done the following way:
private boolean stringMatchesChars(final String str, final List<Character> characters) {
return (str.chars()
.filter(ch -> characters.contains((char)ch))
.count() == 1);
}
Here I am:
Using as input a String and a List<Character> of the ones that are allowed.
Obtaining an IntStream (consisting of chars) from the String.
Filtering every char to only remain in the stream if they are in the List<Character>.
Return true only if the count() == 1, that is of the characters in List<Character>, exactly one is present.
The code can be used as:
String str1 = "a";
String str2 = "a#";
String str3 = "a##a";
String str4 = "a##a";
List<Character> characters = Arrays.asList('#', '#', '$', '%');
System.out.println("stringMatchesChars(str1, characters) = " + stringMatchesChars(str1, characters));
System.out.println("stringMatchesChars(str2, characters) = " + stringMatchesChars(str2, characters));
System.out.println("stringMatchesChars(str3, characters) = " + stringMatchesChars(str3, characters));
System.out.println("stringMatchesChars(str4, characters) = " + stringMatchesChars(str4, characters));
Resulting in false, true, false, false.

Related

Please justify the output in Regex Java program

I have came across one Java program in Regex .
Below is the program code :
import java.util.regex.*;
public class Regex_demo01 {
public static void main(String[] args) {
boolean b=true;
Pattern p=Pattern.compile("\\d*");
Matcher m=p.matcher("ab34ef");
while(b=m.find())
{
System.out.println(b);
System.out.println(">"+m.start()+"\t"+m.group()+"<");
}
}
}
Output :
true
>0 <
true
>1 <
true
>2 34<
true
>4 <
true
>5 <
true
>6 <
Doubt : As we all know that The find() method returns true if it gets a match and remembers the start position of the match. If find() returns true, you can call the start() method to get the starting position of the match, and you can call the group() method to get the string that represents the actual bit of source data that was matched.
My question is how come ">6 <" is present is the output when the string indexing is till index 5 ?
Anser is simple. x* matche any count of x even 0.
Replace * to + which matche to 1 or more element that is left to it.
My question is how come >6 < is present is the output when the string indexing is till index 5 ?
That behavior is due to your regex i.e. \\d* which matches 0 or more digits.
As you can see it is showing start position 0 as well when there is no digit at the start.
Similarly 6 is last index +1 because there is an empty match past the last character as well.
You should use \\d+ as your regex.
The star quantifier (*) is defined as "zero or more times". That said, your pattern matches zero digits most of the time.
What you actually want is probably the plus quantifier (+), which means "one or more times".
Source: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Why is there a match at index 6?
RegEx doesn't work on a char-basis, but rather inbetween single chars. When matching an empty string, it will look before and after every character. Duplicate findings are omitted, of course, so an empty string after the first char and before the second char will yield one match instead of two. By default the algorithm is greedy, which means it will match as many characters as possible.
Consider this example:
Input string is 1
RegEx is \\d*
In this case the RegEx engine starts before the first character and tries to match zero, one or more digits. Since it's greedy, it doesn't stop after the empty string it finds at the beginning. It finds a '1' with no digits following. This is the first match. Then it continues the search after the match. It finds an empty string and matches it too, since that equals zero digits.
For RegEx the string '1' looks rather like this:
"" + "1" + ""
The first two units (empty string and the "1") match the pattern, the third, empty string does, too.
In-depth article about this: http://www.regular-expressions.info/zerolength.html

Regex to validate 4 different characters are in a string

I would like to enforce that 4 different characters will be in a string.
Valid examples:
"1q2w3e4r5t"
"abcd"
Invalid examples:
"good"
"1ab1"
Ideas for a pattern?
You should consider using a non-regex solution. I only write this answer to show a simpler regex solution for this problem.
Initial solution
Here is a simpler regex solution, which asserts that there are at least 4 distinct characters in the string:
(.).*?((?!\1).).*?((?!\1|\2).).*?((?!\1|\2|\3).).*
Demo on regex101 (PCRE and Java has the same behavior for this regex)
.*?((?!\1).), .*?((?!\1|\2).), ... searches for the next character which has not appeared before, which is implemented by the checking the character is not the same as whatever captured in previous capturing groups.
Logically, the laziness/greediness of the quantifier doesn't matter here. The lazy quantifier .*? is used to make the search start from the closest character which has not appeared before, rather than from the furthest character. It should slightly improve the performance in matching case, since less backtracking is done.
Used with String.matches(), which asserts that the whole string matches the regex:
input.matches("(.).*?((?!\\1).).*?((?!\\1|\\2).).*?((?!\\1|\\2|\\3).).*")
Improved solution
If you are concerned about performance:
(.)(?>.*?((?!\1).))(?>.*?((?!\1|\2).))(?>.*?((?!\1|\2|\3).)).*
Demo on regex101
With String.matches():
input.matches("(.)(?>.*?((?!\\1).))(?>.*?((?!\\1|\\2).))(?>.*?((?!\\1|\\2|\\3).)).*")
The (?>pattern) construct prevents backtracking into the group once you exit from the pattern inside. This is used to "lock" the capturing groups to the first appearance of each of the distinct character, since the result is the same even if you pick a different character later in the string.
This regex behaves the same as a normal program which loops from left-to-right, checks the current character against a set of distinct characters and adds it to the set if the current character is not in the set.
Due to this reason, the lazy quantifier .*? becomes significant, since it searches for the closest character which has not appeared so far.
You can use a regular expression to validate this, with negative look-aheads checking that the captured alphanumeric character is not the same 4 times.
I'd say it is very ugly, but working:
String rx = "^(.).*?((?!\\1).).*?((?!\\1|\\2).).*?((?!\\1|\\2|\\3).).*?$"
See demo
IDEONE Demo
String re = "^(.).*?((?!\\1).).*?((?!\\1|\\2).).*?((?!\\1|\\2|\\3).).*?$";
// Good
System.out.println("1q2w3e4r5t".matches(re));
System.out.println("goody".matches(re));
System.out.println("gggoooggoofr".matches(re));
// Bad
System.out.println("good".matches(re));
System.out.println("1ab1".matches(re));
Output:
true
true
true
false
false
You can count the number of distinct chars like this:
String s = "abcdefaa";
long numDistinctChars = s.chars().distinct().count()
Or if not on Java 8 (I couldn't come up with something better):
Set<Character> set = new HashSet<>();
char[] charArray = s.toCharArray();
for (char c : charArray) {
set.add(Character.valueOf(c));
}
int numDistinctChars = set.size();

How to negate a vowel condition using Regex in java

I'm trying to construct a Regex for a string which should have these following conditions:
It must contain at least one vowel.
It cannot contain three consecutive vowels or three consecutive consonants.
It cannot contain two consecutive occurrences of the same letter, except for 'ee' or 'oo'.
I'm not able to construct regex for 2nd and 3rd conditions.
e.g:
bower - accepted,
appple - not accepted,
miiixer - not accepted,
hedding - not accepted,
feeding - accepted
Thanks in advance!
Edited:
My code:
Pattern ptn = Pattern.compile("((.*[A-Za-z0-9]*)(.*[aeiou|AEIOU]+)(.*[##$%]).*)(.*[^a]{3}.*)");
Matcher mtch = ptn.matcher("zoggax");
if (mtch.find()) {
return true;
}
else
return false;
The following one should suit your needs:
(?=.*[aeiouy])(?!.*[aeiouy]{3})(?!.*[a-z&&[^aeiouy]]{3})(?!.*([a-z&&[^eo]])\\1).*
In Java:
String regex = "(?=.*[aeiouy])(?!.*[aeiouy]{3})(?!.*[a-z&&[^aeiouy]]{3})(?!.*([a-z&&[^eo]])\\1).*";
System.out.println("bower".matches(regex));
System.out.println("appple".matches(regex));
System.out.println("miiixer".matches(regex));
System.out.println("hedding".matches(regex));
System.out.println("feeding".matches(regex));
Prints:
true
false
false
false
true
Explanation:
(?=.*[aeiouy]): contains at least one vowel
(?!.*[aeiouy]{3}): does not contain 3 consecutive vowels
(?!.*[a-z&&[^aeiouy]]{3}): does not contain 3 consecutive consonants
[a-z&&[^aeiouy]]: any letter between a and z but none of aeiouy
(?!.*([a-z&&[^eo]])\1): does not contain 2 consecutive letters, except e and o
[a-z&&[^eo]]: any letter between a and z, but none of eo
See http://www.regular-expressions.info/charclassintersect.html.
This should work for English under the assumption that 'y' is a non-vowel;
^(?!.*[aeiou]{3})(?!.*[bcdfghjklmnpqrstvwxyz]{3})(?!.*([^eo])\1).*[aeiou]
Explanation:
^ fixes the match to the beginning of the string.
(?!.*[aeiou]{3}) checks that you can not find 3 consecutive vowels at any point after the current position in the string. (Since this is immidiately after the ^ this checks the entire string). It also does not advance the cursor.
Non vowels are tested similarily. This can be done in a prettier way if your regexp flavor supports set subtraction. But I think Java does not do this.
(?!.*([^eo])\1) checks that there are no occurence of a single character capture group, of characters other than e or o, which is followed by a copy of itself. Ie. no character other than e and o is repeated twice.
.*[aeiou] looks for a vowel at some point in the string.
This regexp also assumes that the case-insensitive flag is set. I think this is the default for java but I can be wrong about that.
It also is a regexp that will find a match in a string satisfying your criteria. It will not necesarily match the whole string. - If this is needed add .*$ to the end of the regexp.
If my hunch is correct that you meant to say "three consecutive occurrences of the same letter" (looking at your examples) then you can simply say "e and o may not occur thrice, everything else may not occur twice", like so:
^(?=.*[aeiouy].*)(?!.*([eo])\1\1.*)(?!.*([a-df-np-z])\2.*).*$
Debuggex Demo, Key is that a letter occuring thrice is also occuring twice.

How to match any uppercase letter followed by the corresponding lower case letter?

I have a requirement that says a name must not start with 3 identical letters ignoring their case. A name starts with an upper case letter followed by lower case letters.
Basically I could convert the whole name to upper case and then match with a regex like (\p{Lu})\1{3,}.*.
But I was wondering if there exists a regex that matches the above requirements and does not need any preprocessing of the string to be matched. So what regex can I use to match strings like Aa, Dd or Uu without explicitly specifiying any possible combination?
EDIT:
I accepted Markos answer. I just needed to fix it to work with names of length 1 and two and anchor it at the beginning. So the actual regex for my use case is ^(\p{Lu})(\p{Ll}?$|(?=\p{Ll}{2})(?i)(?!(\1){2})).
I also upvoted the answers of Evgeniy and sp00m for helping me to learn a lesson in regexes.
Thanks for your efforts.
I admit to rising on the shoulders of giants (the other posters here), but this solution actually works for your use case:
final String[] strings = { "Aba", "ABa", "aba", "aBa", "Aaa", "Aab" };
final Pattern p = Pattern.compile("(\\p{Lu})(?=\\p{Ll}{2})(?i)(?!(\\1){2})");
for (String s : strings) System.out.println(s + ": " + p.matcher(s).find());
Now we have:
a match for one upcase char at front;
a lookahead assertion of two lowcase chars following;
another lookahead that asserts these two chars are not both the same (ignoring case) as the first one.
Output:
Aba: true
ABa: false
aba: false
aBa: false
Aaa: false
Aab: true
try
String regex = "(?i)(.)(?=\\p{javaLowerCase})(?<=\\p{javaUpperCase})\\1";
System.out.println("dD".matches(regex));
System.out.println("dd".matches(regex));
System.out.println("DD".matches(regex));
System.out.println("Dd".matches(regex));
output
false
false
false
true
This matches any uppercased letter followed by the same letter, uppercased or not:
([A-Z])(?i)\1
This matches any uppercased letter followed by the same letter, but necessarily lowercased:
([A-Z])(?!\1)(?i)\1
For example in Java,
String pattern = "([A-Z])(?!\\1)(?i)\\1";
System.out.println("AA".matches(pattern));
System.out.println("aa".matches(pattern));
System.out.println("aA".matches(pattern));
System.out.println("Aa".matches(pattern));
Prints
false
false
false
true
Evgeniy Dorofeev solution is working (+1), but it can be done simpler, using only a lookahead
(\\p{Lu})(?=\\p{Ll})(?i)\\1
(\\p{Lu}) matches a uppercase character and stores it to \\1
(?=\\p{Ll}) is a positive lookahead assertion ensuring that the next character is a lowercase letter.
(?i) is an inline modifier, enabling case independent matching.
\\1 matches the uppercase letter from the first part (but now case independent because of the modifier in front).
Test it:
String[] TestInput = { "foobar", "Aal", "TTest" };
Pattern p = Pattern.compile("(\\p{Lu})(?=\\p{Ll})(?i)\\1");
for (String t : TestInput) {
Matcher m = p.matcher(t);
if (m.find()) {
System.out.println(t + " ==> " + true);
} else {
System.out.println(t + " ==> " + false);
}
}
Output:
foobar ==> false
Aal ==> true
TTest ==> false
I have a requirement that says a name must not start with 3 identical letters ignoring their case.
You should use the case-insensitive option: (?i)
and the "catch-all" \w e.g.: (?i)(\w)\1{2,}.*
or just [a-z] e.g.: (?i)([a-z])\1{2,}.*
It might make sense here to use separate checks for the different requirements, especially since requirement lists tend to grow over time.
Your requirements as described are:
A name must not start with 3 identical letters ignoring their case
and
A name starts with an upper case letter followed by lower case letters.
Performing a separate check for each (as described in the other posts) also allows you to give the user proper error messages describing what is actually wrong. And it's certainly more readable.

How to concatenate several strings with different format and then split them

Hi all.
I want to concatenate some strings without specified format in java. for example I want to concatenate multiple objects like signature and BigInteger and string, that all of them are converted to string. So i can not use of the specified delimiter because each delimiter may be exist in these strings. how i can concatenate these strings and then split them?
thanks all.
Use a well-defined format, like XML or JSON. Or choose a delimiter and escape every instance of this delimiter in each of the Strings. Or prepend the length of each part in the message. For example:
10/7/14-<10 chars of signature><7 chars of BigInteger><14 chars of string>
or
10-<10 chars of signature>7-<7 chars of BigInteger>14-<14 chars of string>
You can escape the delimiter in your string. For example, let's say you have the following strings:
String a = "abc;def";
String b = "12345:";
String c = "99;red:balloons";
You want to be able to do something like this
String concat = a + delim + b + delim + c;
String[] tokens = concat.split(delim);
But if our delim is ";" then quite clearly this will not suffice, as we will have 5 tokens, and not 3. We could use a set of possible delimiters, search the strings for those delimiters, and then use the first one that isn't in the target strings, but this has two problems. First, how do we know which delimiter was used? Second, what if all delimiters exist in the strings? That's not a valid solution, and it's certainly not robust.
We can get around this by using an escape delimiter. Let us use ":" as our escape delimiter. We can use it to say "The next character is just a regular old character, it doesn't mean anything important."
So if we did this:
String aEscaped = a.replace(";",":;");
String bEscaped = b.replace(";",":;");
String cEscaped = c.replace(";",":;");
Then, we can split the concat'd string like
String tokens = concat.split("[^:];")
But there is one problem: What if our text actually contains ":;" or ends with ":"? Either way, these will produce false positives. In this case, we must also escape our escape character. It basically says the same thing as before: "The next character does nothing special."
So now our escaped strings become:
// note we escape our escape token first, otherwise we'll escape
// real usages of the token
String aEscaped = a.replace(":","::").replace(";",":;");
String bEscaped = b.replace(":","::").replace(";",":;");
String cEscaped = c.replace(":","::").replace(";",":;");
And now, we must account for this in the regex. If someone knows a regex that works for this, they can feel free to edit it in. What occurs to me is something like concat.split("(::;|[^:];)") but it doesn't seem to get the job done. The job of parsing it would be pretty easy. I threw together a small test driver for it, and it seems to work just fine.
Code found at http://ideone.com/wUlyz
Result:
abc;def becomes abc:;def
ja:3fr becomes ja::3fr
; becomes :;
becomes
: becomes ::
83;:;:;;;; becomes 83:;:::;:::;:;:;:;
:; becomes :::;
Final product:
abc:;def;ja::3fr;:;;;::;83:;:::;:::;:;:;:;;:::;
Expected 'abc;def', Actual 'abc;def', Matches true
Expected 'ja:3fr', Actual 'ja:3fr', Matches true
Expected ';', Actual ';', Matches true
Expected '', Actual '', Matches true
Expected ':', Actual ':', Matches true
Expected '83;:;:;;;;', Actual '83;:;:;;;;', Matches true
Expected ':;', Actual ':;', Matches true
You concatenate using the concatenation operator(+) as below:
String str1 = "str1";
String str2 = "str2";
int inte = 2;
String result = str1+str2+inte;
But to split them back again you need some special character as delimiter as the split function in String works on delimiter.

Categories

Resources