Regular expression not working as intended - java

I am using the following regular expression.
(?=.+[a-z])(?=.+[A-Z])(?=.+[^a-zA-Z]).{8,}
my goal is to have a password that has 3 of the 4 properties below
upper case character, lower case character, number, special character
I am using http://rubular.com/r/i26nwsTcaU and http://regexlib.com/RETester.aspx to test the expression with the following inputs
P#55w0rd
password1P
Password1
paSSw0rd
all of these should pass but only the second and fourth are passing at http://rubular.com/r/i26nwsTcaU and all of them pass at http://regexlib.com/RETester.aspx.
I also have the following code that I am using to validate
private void doValidate(String inputStr,String regex) {
Pattern pattern = Pattern.compile(regex);
if(!pattern.matcher(inputStr).matches()){
String errMessage = "";
throw new UniquenessConstraintViolatedException(errMessage);
}
}
this code fails to validate "Password1" which should pass.
as far as the expression goes I understand it like this
must have lower (?=.+[a-z])
must have upper (?=.+[A-Z])
must have non alpha (?=.+[^a-zA-Z])
must be eight characters long .{8,}
can anyone tell me what it is I'm doing wrong.
Thanks in advance.

Essentially, the .+ subexpressions are to blame, they should be .*. Otherwise, the lookahead part looks for lower case, upper case or non-alpha but a character of each corresponding type does not count if it is the first one in string. So, you are validating not the password, but the password with first char truncated. While #Cfreak is not right, he is close - what you are doing would not be possible with normal regex and you would have to use what he suggests. With the lookahead groups - (?=) - it is possible to do what you need. Still, personally I would rather code it like #Cfreak suggests - it is more readable and your intentions are clearer from the code. Complex regular expressions tend to be hard to write but close to impossible to read, debug, or improve after some time.

Your regex right now says you must have 1 or more lowercase characters, followed by 1 or more upper case characters followed by 1 or more upper or lowercase characters, followed by 8 characters or more.
Regex can't do AND unless you specify where a particular character appears. You basically need to split each part of your regex into it's own regex and check each one. You can check the length with whatever string length method Java has (sorry i'm not a java dev so I don't know what it is off my head).
Pseudo code:
if( regexForLower && regexForUpper && regexForSpecial && string.length == 8 ) {
// pass
}

As I said in a comment, position-0 capital letters are being ignored.
Here's a regex to which all four passwords match.
(?=.+\\d)(?=.+[a-z])(?=\\w*[A-Z]).{8,}

I wouldn't use such a regex.
it is hard to understand
hard to debug
hard to extend
you can't do much with its result
If you like to tell the client what is wrong with his password, you have investigate the password again. In real world environments you might want to support characters from foreign locales.
import java.util.*;
/**
Pwtest
#author Stefan Wagner
#date Fr 11. Mai 20:55:38 CEST 2012
*/
public class Pwtest
{
public int boolcount (boolean [] arr) {
int sum = 0;
for (boolean b : arr)
if (b)
++sum;
return sum;
}
public boolean [] rulesMatch (String [] groups, String password) {
int idx = 0;
boolean [] matches = new boolean [groups.length];
for (String g: groups) {
matches[idx] = (password.matches (".*" + g + ".*"));
++idx;
}
return matches;
}
public Pwtest ()
{
String [] groups = new String [] {"[a-z]", "[A-Z]", "[0-9]", "[^a-zA-Z0-9]"};
String [] pwl = new String [] {"P#55w0rd", "password1P", "Password1", "paSSw0rd", "onlylower", "ONLYUPPER", "1234", ",:?!"};
List <boolean[]> lii = new ArrayList <boolean[]> ();
for (String password: pwl) {
lii.add (rulesMatch (groups, password));
}
for (int i = 0 ; i < lii.size (); ++i) {
boolean [] matches = lii.get (i);
String pw = pwl[i];
if (boolcount (matches) < 3) {
System.out.print ("Password:\t" + pw + " violates rule (s): ");
int idx = 0;
for (boolean b: matches) {
if (! b)
System.out.print (groups[idx] + " ");
++idx;
}
System.out.println ();
}
else System.out.println ("Password:\t" + pw + " fine ");
}
}
public static void main (String args[])
{
new Pwtest ();
}
}
Output:
Password: P#55w0rd fine
Password: password1P fine
Password: Password1 fine
Password: paSSw0rd fine
Password: onlylower violates rule (s): [A-Z] [0-9] [^a-zA-Z0-9]
Password: ONLYUPPER violates rule (s): [a-z] [0-9] [^a-zA-Z0-9]
Password: 1234 violates rule (s): [a-z] [A-Z] [^a-zA-Z0-9]
Password: ,:?! violates rule (s): [a-z] [A-Z] [0-9]
Password: Übergrößen345 fine
Password: 345ÄÜö violates rule (s): [a-z] [A-Z]

Related

validate a user input

Hello I'm new to programming and I'm having a trouble understanding my assignment. I know that this might be a really simple problem for you guys and I'm sorry for that. Is it possible that she's just asking me to write a method that will perform the given instructions?
Write a program to find if the user input is valid base on the instructions.**
a string must have at least nine characters
a string consists of letters and numbers only.
a string must contain at least two digits.
You can simply use the regex, ^(?=(?:\D*\d){2})[a-zA-Z\d]{9,}$ which can be explained as follows:
^ : asserts position at start of a line
Positive Lookahead (?=(?:\D*\d){2})
Non-capturing group (?:\D*\d){2}
{2} matches the previous token exactly 2 times
\D matches any character that's not a digit (equivalent to [^0-9])
* matches the previous token between zero or more time (greedy)
\d matches a digit (equivalent to [0-9])
The pattern, [a-zA-Z\d]{9,} :
{9,} matches the previous token between 9+ times (greedy)
a-z matches a single character in a-z
A-Z matches a single character in A-Z
\d matches a digit (equivalent to [0-9])
$ : asserts position at the end of a line
Demo:
import java.util.stream.Stream;
public class Main {
public static void main(String args[]) {
//Test
Stream.of(
"helloworld",
"hello",
"hello12world",
"12helloworld",
"helloworld12",
"123456789",
"hello1234",
"1234hello",
"12345hello",
"hello12345"
).forEach(s -> System.out.println(s + " => " + isValid(s)));
}
static boolean isValid(String s) {
return s.matches("^(?=(?:\\D*\\d){2})[a-zA-Z\\d]{9,}$");
}
}
Output:
helloworld => false
hello => false
hello12world => true
12helloworld => true
helloworld12 => true
123456789 => true
hello1234 => true
1234hello => true
12345hello => true
hello12345 => true
Requirement #1: a string must have at least nine characters
This is solved by checking whether the length of the String is greater than 9, with s.length()>9
Requirement #2: a string consists of letters and numbers (whole numbers) only.
Use the regex [a-zA-Z0-9]+, which matches all Latin alphabet characters and numbers.
Requirement #3: a string must contain at least two digits.
I've written a method that loops through every character and uses Character.isDigit() to check whether it is a digit.
Check it out:
public static boolean verify(String s) {
final String regex = "[a-zA-Z0-9]+";
System.out.println(numOfDigits(s));
return s.length() > 9 && s.matches(regex) && numOfDigits(s) > 2;
}
public static int numOfDigits(String s) {
int a = 0;
int b = s.length();
for (int i = 0; i < b; i++) {
a += (Character.isDigit(s.charAt(i)) ? 1 : 0);
}
return a;
}

How do I match non-word characters anywhere in the string?

This is a simple question, but please hear me out - A part of a homework assignment for Java has a password validator method. The requirements are simple - password must be between 6-10 characters, must be made only of digits or letters and has to have at least 2 digits in it to be valid. I made this with if statements and using regex, for some reason I cannot get the non-word character regex to match despite every online regex checker showing this should work and even the jetbrains plugin for regex check shows this should be valid. (I also understand this could be done with a one-liner and none of the if statements but this way is simpler for me.
Example input - MyPass123 should be valid, MyPass123# should match the non-word character and should return "Password must consist only of letters and digits" instead this never happens. I am a beginner in Java so it is most likely I am not doing something right even though it is such a simple problem. Thank you in advance!
Method code below:
public static void passswordCheck(String password)
{
if(password.length()<6 || password.length()>10)
{
System.out.printf("Password needs to be between 6 and 10 characters");
}
else if(password.matches("\\W")) \\This should match when input is MyPass123#
{
System.out.printf("Password must consist only of letters and digits");
}
else if(!password.matches("/(\\d).*(\\d)/"))
{
System.out.printf("Password needs to have at least 2 digits");
}
else
{
System.out.printf("Password is valid");
}
}
You're only matching if the string consists of a single character which is non alphanumeric (= [^a-zA-Z0-9_]).
If you want any string which contains at least one non alphanumeric character: .*?\W.*
String#matches always performs a whole match, i.e. the match needs to span the whole string from the first to the last character. To search for a match anywhere in the string, you need to use the find method of a Matcher object instead:
final Pattern nonWordChar = Pattern.compile("\\W");
if (nonWordChar.matcher(password).find()) {
System.out.printf("Password must consist only of letters and digits");
}
…
You will need to do the same with your other regular expressions.
I have tested the below code. Two options are possible, try using the find method as mentioned by Konrad in one of the comments above or handle it in the regex to match a character anywhere in the string.
\\w{6,10} - Matches only the valid passwords which contains word character(A-Za-z0-9)
.*?\\d{2,}.*? - Looks for 2 or more consecutive digits
I have changed it to use Pattern.matches.
import java.util.regex.*;
public class test {
public static void passswordCheck(String password)
{
if(password.length()<6 || password.length()>10)
{
System.out.println("Password needs to be between 6 and 10 characters");
}
else if(!Pattern.matches("\\w{6,10}",password)) //This should match when input is MyPass123#
{
System.out.println("Password must consist only of letters and digits");
}
else if(!Pattern.matches(".*?\\d{2,}.*?",password))
{
System.out.println("Password needs to have at least 2 digits");
}
else
{
System.out.println("Password is valid");
}
}
public static void main(String[] args)
{
passswordCheck("Mypass2");
}
}
Problems in your code:
You have used \W (i.e. [^\w]) which matches non-word character but note that \w matches not only digits and alphabets but also _ which you do not need in the valid password. Therefore, you need to use \p{Alnum}. Alternatively, you can use [A-Za-z0-9]. Also, in order to consider the whole string, the quantifier + is required.
The pattern, \d.*\d matches the string bounded by two digits which can be at any place (i.e. not just at the beginning and at the end) in the password and therefore, you need to match any place, not the whole string. You can understand it from this example. Thus, String#match will be able to match the whole string only when the digits are placed in the beginning and at the end. Therefore, you need to use Matcher#find instead of String#match. Note that you do not need parenthesis ( ) around \d in your regex pattern. The parenthesis ( ) is used to specify capturing group which you do not need for your requirement.
Given below is a demo code addressing all these issues:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
// Test strings
String[] arr = { "abcdefghjk", "abc1def2gh", "Ab1Cd2EfGh", "Ab1CdEfGhI", "Ab1Cd2E3Gh", "Ab_ed2EFG3" };
for (String s : arr) {
System.out.println("Validating the password, " + s);
passswordCheck(s);
}
}
public static void passswordCheck(String password) {
if (password.length() < 6 || password.length() > 10) {
System.out.println("Password needs to be between 6 and 10 characters.\n");
} else if (!password.matches("\\p{Alnum}+")) {
System.out.println("Password must consist only of letters and digits.\n");
} else {
Pattern pattern = Pattern.compile("\\d.*\\d");
Matcher matcher = pattern.matcher(password);
if (!matcher.find()) {
System.out.println("Password needs to have at least 2 digits.\n");
} else {
System.out.println("Password is valid\n");
}
}
}
}
Output:
Validating the password, abcdefghjk
Password needs to have at least 2 digits.
Validating the password, abc1def2gh
Password is valid
Validating the password, Ab1Cd2EfGh
Password is valid
Validating the password, Ab1CdEfGhI
Password needs to have at least 2 digits.
Validating the password, Ab1Cd2E3Gh
Password is valid
Validating the password, Ab_ed2EFG3
Password must consist only of letters and digits.

3 out of 4 conditions in regex java [duplicate]

This question already has answers here:
The best way to match at least three out of four regex requirements
(2 answers)
Closed 2 years ago.
I am trying to create a java to check strength of a password with regex. The password must pass 3 out of 4 conditions:
lowercase
uppercase
contains digits
has special characters
The code looks like below:
import java.util.*;
public class StringPasswordStrength {
public static void main(String[] args) {
Scanner input = new Scanner(System.in);
System.out.print("Enter password: ");
String password = input.nextLine();
boolean test = checkStrength(password);
if (test) {
System.out.print("OK");
}
else {
System.out.print("Not strong enough!");
}
}
public static boolean checkStrength(String password) {
if (password.matches("^(?=.*[a-zA-Z][0-9][!##$%^&*])(?=.{8,})")){
return true;
}
else {
return false;
}
}
}
However when the password is Passw0rd it doesn't accept. How can I change the conditions in regex that the program would accept Passw0rd because it passes 3 out of 4 conditions: uppercase, lowercase and digit?
I would suggest avoiding a potentially cryptic regular expression for this, and instead to provide something easier to read, understand and maintain (albeit more verbose).
Something like this (depending on what your conditions are, of course). For example, a length test should be mandatory:
public boolean isValid(String password) {
// password must contain 3 out of 4 of lowercase, uppercase, digits,
// and others (punctuation, symbols, spaces, etc.).
if (password == null) {
return false;
}
if (password.length() < 8) {
return false;
}
char[] chars = password.toCharArray();
int lowers = 0;
int uppers = 0;
int digits = 0;
int others = 0;
for (Character c : chars) {
if (Character.isLowerCase(c)) {
lowers = 1;
} else if (Character.isUpperCase(c)) {
uppers = 1;
} else if (Character.isDigit(c)) {
digits = 1;
} else {
others = 1;
}
}
// at least 3 out of 4 tests must pass:
return (lowers + uppers + digits + others >= 3);
}
I understand this is not a direct answer to your question, and so may not meet your needs.
I am also deliberately avoiding the discussion about char[] vs. String for password handling in Java - for example, see this post.
EDIT: Removed wording relating to password length, and changed related code to reflect the question.
You could define a set of rules (regex), count how many a given password comply with and compare with the minimum you require. A possible implementation could be:
import java.util.Arrays;
import java.util.List;
import java.util.stream.Stream;
/**
* Patterns to be tested in your passwords. If you want some of them
* mandatory, you can define them in a "mandatoryPatterns" list and
* check those ones always.
*/
static List<String> patterns = Arrays.asList(
".*[A-Z]+.*",
".*[a-z]+.*"
);
/** Number of required patterns. */
static long requiredPatterns = 1;
/** This functions counts the number of patterns that a password matches. */
static long passwordStrength(String password) {
return patterns.stream().filter(password::matches).count();
}
static boolean checkStrength(String password) {
return passwordStrength(password) >= requiredPatterns;
}
Stream.of("", "foo", "BAR", "FooBar").forEach(pass -> {
System.out.println(pass);
System.out.println(passwordStrength(pass));
System.out.println(checkStrength(pass));
});
Your issue has been pointed out by another user, along with a solution. This is an alternative solution.
Have 4 Pattern objects, one for each requirement
Pattern uppercase = Pattern.compile("[A-Z]");
Pattern number = Pattern.compile("\\d+");
Pattern symbol = Pattern.compile("[+&$%!#]");
Pattern other = ...;
String#matches "compiles" the regex every time it is called, which can be time consuming. By using Pattern objects, you'll be using already-compiled regex patterns.
Add the requirements to a list
List<Pattern> requirements = Arrays.asList(uppercase, number, symbol, other);
Loop over the list of requirements. For each requirement, check if the password matches the requirement. If the element does, increase a counter which tracks how many requirements have already been met.
If the requirements equals 3 (or is greater than 3), return true. If the loop exits gracefully, that means 3 requirements were not met; return false if the loop exits gracefully.
public boolean isStrong(String password) {
int requirementsMet = 0;
for(Pattern req : requirements) {
if(req.matcher(password).matches())
requirementsMet++;
if(requirementsMet >= 3)
return true;
}
return false;
}
I assume the four requirements, of which at three must be met, are as follows. The string must contain:
a letter
a digit
a character in the string "!##$%^&*"
at least 8 characters
Is the use of a regular expression the best way to determine if a password meets three of the four requirements? That may be a valid question but it's not the one being asked or the one that I will attempt to answer. The OP may just be curious: can this problem be solved using a regular expression? Moreover, even if there are better ways to address the problem there is educational value in answers to the specific question that's been posed.
I am not familiar with Java, but I can suggest a regular expression that uses Ruby syntax. Readers unfamiliar with Ruby should be able to understand the expression, and its translation to Java should be straightforward. (If a reader can perform that translation, I would be grateful to see an edit to this answer that provides the Java equivalent at the end.)
r = /
((?=.*[a-z])) # match a lowercase letter in the string in
# a positive lookahead in cap grp 1, 0 times
((?=.*\d)) # match a digit in the string in a positive
# lookahead in cap grp 2, 0 times
((?=.*[!##$%^&*])) # match a special character in in the string
# in a positive lookahead in cap grp 3, 0 times
(.{8,}) # match at least 8 characters in cap grp 4, 0 times
\g<1>\g<2>\g<3> # match conditions 1, 2 and 3
| # or
\g<1>\g<2>\g<4> # match conditions 1, 2 and 4
| # or
\g<1>\g<3>\g<4> # match conditions 1, 3 and 4
| # or
\g<2>\g<3>\g<4> # match conditions 2, 3 and 4
/xi # case indiff & free-spacing regex def modes
\g{2}, for example, is replaced by the sub-expression contained in capture group 2 ((?=.*\d)). The first four lines each contain an expression in a capture group, with the capture group repeated zero times. This is just a device to define the subexpressions in the capture groups for retrieval later.
Let's test some strings.
"Passw0rd".match? r #=> true (a letter, digit and >= 8)
"ab2c#45d".match? r #=> true (all 4 conditions satisfied)
"ab2c#5d".match? r #=> true (a letter, digit and special char)
"ab2c345d".match? r #=> true (a letter, digit and >= 8)
"ab#c?def".match? r #=> true (a letter, special char and >= 8)
"21#6?512".match? r #=> true (a digit, special char and >= 8)
"ab26c4".match? r #=> false (only letter and digit)
"a$b#c".match? r #=> false (only letter and special char)
"abc ef h".match? r #=> false (only letter and >= 8)
"12 45 78".match? r #=> false (only digit and >=8)
"########".match? r #=> false (only special char and >= 8)
"".match r #=> false (no condition matched)
To use named capture groups, ((?=.*[a-z])) would be replaced with, say,
(?<letter>(?=.*[a-z]))
and \g<1>\g<2>\g<3> would be replaced by something like
\g<letter>\g<digit>\g<spec_char>
To answer your question, the sequence in which you have provided
1st: [a-zA-Z] characters
2nd: [0-9] Numbers
3rd: [!##$%^&*] Sp. Chars
The occurrence of literals in this sequence is must.
Abcd1234# will pass but Abcd1234#A will not pass, as A appears again after [!##$%^&*]
And a positive lookahead must include this sequence only. If you provied any special char before, it will not be validated, similarly in your case, characters after number is not expected.
use a positive lookahead for each combination, use groups for each
Try this instead(This is my work after several lookups):
^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[!##$%&*_])(?!.*[`~^=+/?<>():;-])(?=\S+$).{8,20}$
In this case: any of the provided chars or literals can appear anywhere.
(?=.*[0-9])
(?=.*[a-z])
(?=.*[A-Z])
(?=.*[!##$%&*_])
(?!.*[`~^=+/?<>():;-])
(?=\S+$)

Java regex: Replace all characters with `+` except instances of a given string

I have the following problem which states
Replace all characters in a string with + symbol except instances of the given string in the method
so for example if the string given was abc123efg and they want me to replace every character except every instance of 123 then it would become +++123+++.
I figured a regular expression is probably the best for this and I came up with this.
str.replaceAll("[^str]","+")
where str is a variable, but its not letting me use the method without putting it in quotations. If I just want to replace the variable string str how can I do that? I ran it with the string manually typed and it worked on the method, but can I just input a variable?
as of right now I believe its looking for the string "str" and not the variable string.
Here is the output its right for so many cases except for two :(
List of open test cases:
plusOut("12xy34", "xy") → "++xy++"
plusOut("12xy34", "1") → "1+++++"
plusOut("12xy34xyabcxy", "xy") → "++xy++xy+++xy"
plusOut("abXYabcXYZ", "ab") → "ab++ab++++"
plusOut("abXYabcXYZ", "abc") → "++++abc+++"
plusOut("abXYabcXYZ", "XY") → "++XY+++XY+"
plusOut("abXYxyzXYZ", "XYZ") → "+++++++XYZ"
plusOut("--++ab", "++") → "++++++"
plusOut("aaxxxxbb", "xx") → "++xxxx++"
plusOut("123123", "3") → "++3++3"
Looks like this is the plusOut problem on CodingBat.
I had 3 solutions to this problem, and wrote a new streaming solution just for fun.
Solution 1: Loop and check
Create a StringBuilder out of the input string, and check for the word at every position. Replace the character if doesn't match, and skip the length of the word if found.
public String plusOut(String str, String word) {
StringBuilder out = new StringBuilder(str);
for (int i = 0; i < out.length(); ) {
if (!str.startsWith(word, i))
out.setCharAt(i++, '+');
else
i += word.length();
}
return out.toString();
}
This is probably the expected answer for a beginner programmer, though there is an assumption that the string doesn't contain any astral plane character, which would be represented by 2 char instead of 1.
Solution 2: Replace the word with a marker, replace the rest, then restore the word
public String plusOut(String str, String word) {
return str.replaceAll(java.util.regex.Pattern.quote(word), "#").replaceAll("[^#]", "+").replaceAll("#", word);
}
Not a proper solution since it assumes that a certain character or sequence of character doesn't appear in the string.
Note the use of Pattern.quote to prevent the word being interpreted as regex syntax by replaceAll method.
Solution 3: Regex with \G
public String plusOut(String str, String word) {
word = java.util.regex.Pattern.quote(word);
return str.replaceAll("\\G((?:" + word + ")*+).", "$1+");
}
Construct regex \G((?:word)*+)., which does more or less what solution 1 is doing:
\G makes sure the match starts from where the previous match leaves off
((?:word)*+) picks out 0 or more instance of word - if any, so that we can keep them in the replacement with $1. The key here is the possessive quantifier *+, which forces the regex to keep any instance of the word it finds. Otherwise, the regex will not work correctly when the word appear at the end of the string, as the regex backtracks to match .
. will not be part of any word, since the previous part already picks out all consecutive appearances of word and disallow backtrack. We will replace this with +
Solution 4: Streaming
public String plusOut(String str, String word) {
return String.join(word,
Arrays.stream(str.split(java.util.regex.Pattern.quote(word), -1))
.map((String s) -> s.replaceAll("(?s:.)", "+"))
.collect(Collectors.toList()));
}
The idea is to split the string by word, do the replacement on the rest, and join them back with word using String.join method.
Same as above, we need Pattern.quote to avoid split interpreting the word as regex. Since split by default removes empty string at the end of the array, we need to use -1 in the second parameter to make split leave those empty strings alone.
Then we create a stream out of the array and replace the rest as strings of +. In Java 11, we can use s -> String.repeat(s.length()) instead.
The rest is just converting the Stream to an Iterable (List in this case) and joining them for the result
This is a bit trickier than you might initially think because you don't just need to match characters, but the absence of specific phrase - a negated character set is not enough. If the string is 123, you would need:
(?<=^|123)(?!123).*?(?=123|$)
https://regex101.com/r/EZWMqM/1/
That is - lookbehind for the start of the string or "123", make sure the current position is not followed by 123, then lazy-repeat any character until lookahead matches "123" or the end of the string. This will match all characters which are not in a "123" substring. Then, you need to replace each character with a +, after which you can use appendReplacement and a StringBuffer to create the result string:
String inputPhrase = "123";
String inputStr = "abc123efg123123hij";
StringBuffer resultString = new StringBuffer();
Pattern regex = Pattern.compile("(?<=^|" + inputPhrase + ")(?!" + inputPhrase + ").*?(?=" + inputPhrase + "|$)");
Matcher m = regex.matcher(inputStr);
while (m.find()) {
String replacement = m.group(0).replaceAll(".", "+");
m.appendReplacement(resultString, replacement);
}
m.appendTail(resultString);
System.out.println(resultString.toString());
Output:
+++123+++123123+++
Note that if the inputPhrase can contain character with a special meaning in a regular expression, you'll have to escape them first before concatenating into the pattern.
You can do it in one line:
input = input.replaceAll("((?:" + str + ")+)?(?!" + str + ").((?:" + str + ")+)?", "$1+$2");
This optionally captures "123" either side of each character and puts them back (a blank if there's no "123"):
So instead of coming up with a regular expression that matches the absence of a string. We might as well just match the selected phrase and append + the number of skipped characters.
StringBuilder sb = new StringBuilder();
Matcher m = Pattern.compile(Pattern.quote(str)).matcher(input);
while (m.find()) {
for (int i = 0; i < m.start(); i++) sb.append('+');
sb.append(str);
}
int remaining = input.length() - sb.length();
for (int i = 0; i < remaining; i++) {
sb.append('+');
}
Absolutely just for the fun of it, a solution using CharBuffer (unexpectedly it took a lot more that I initially hoped for):
private static String plusOutCharBuffer(String input, String match) {
int size = match.length();
CharBuffer cb = CharBuffer.wrap(input.toCharArray());
CharBuffer word = CharBuffer.wrap(match);
int x = 0;
for (; cb.remaining() > 0;) {
if (!cb.subSequence(0, size < cb.remaining() ? size : cb.remaining()).equals(word)) {
cb.put(x, '+');
cb.clear().position(++x);
} else {
cb.clear().position(x = x + size);
}
}
return cb.clear().toString();
}
To make this work you need a beast of a pattern. Let's say you you are operating on the following test case as an example:
plusOut("abXYxyzXYZ", "XYZ") → "+++++++XYZ"
What you need to do is build a series of clauses in your pattern to match a single character at a time:
Any character that is NOT "X", "Y" or "Z" -- [^XYZ]
Any "X" not followed by "YZ" -- X(?!YZ)
Any "Y" not preceded by "X" -- (?<!X)Y
Any "Y" not followed by "Z" -- Y(?!Z)
Any "Z" not preceded by "XY" -- (?<!XY)Z
An example of this replacement can be found here: https://regex101.com/r/jK5wU3/4
Here is an example of how this might work (most certainly not optimized, but it works):
import java.util.regex.Pattern;
public class Test {
public static void plusOut(String text, String exclude) {
StringBuilder pattern = new StringBuilder("");
for (int i=0; i<exclude.length(); i++) {
Character target = exclude.charAt(i);
String prefix = (i > 0) ? exclude.substring(0, i) : "";
String postfix = (i < exclude.length() - 1) ? exclude.substring(i+1) : "";
// add the look-behind (?<!X)Y
if (!prefix.isEmpty()) {
pattern.append("(?<!").append(Pattern.quote(prefix)).append(")")
.append(Pattern.quote(target.toString())).append("|");
}
// add the look-ahead X(?!YZ)
if (!postfix.isEmpty()) {
pattern.append(Pattern.quote(target.toString()))
.append("(?!").append(Pattern.quote(postfix)).append(")|");
}
}
// add in the other character exclusion
pattern.append("[^" + Pattern.quote(exclude) + "]");
System.out.println(text.replaceAll(pattern.toString(), "+"));
}
public static void main(String [] args) {
plusOut("12xy34", "xy");
plusOut("12xy34", "1");
plusOut("12xy34xyabcxy", "xy");
plusOut("abXYabcXYZ", "ab");
plusOut("abXYabcXYZ", "abc");
plusOut("abXYabcXYZ", "XY");
plusOut("abXYxyzXYZ", "XYZ");
plusOut("--++ab", "++");
plusOut("aaxxxxbb", "xx");
plusOut("123123", "3");
}
}
UPDATE: Even this doesn't quite work because it can't deal with exclusions that are just repeated characters, like "xx". Regular expressions are most definitely not the right tool for this, but I thought it might be possible. After poking around, I'm not so sure a pattern even exists that might make this work.
The problem in your solution that you put a set of instance string str.replaceAll("[^str]","+") which it will exclude any character from the variable str and that will not solve your problem
EX: when you try str.replaceAll("[^XYZ]","+") it will exclude any combination of character X , character Y and character Z from your replacing method so you will get "++XY+++XYZ".
Actually you should exclude a sequence of characters instead in str.replaceAll.
You can do it by using capture group of characters like (XYZ) then use a negative lookahead to match a string which does not contain characters sequence : ^((?!XYZ).)*$
Check this solution for more info about this problem but you should know that it may be complicated to find regular expression to do that directly.
I have found two simple solutions for this problem :
Solution 1:
You can implement a method to replace all characters with '+' except the instance of given string:
String exWord = "XYZ";
String str = "abXYxyzXYZ";
for(int i = 0; i < str.length(); i++){
// exclude any instance string of exWord from replacing process in str
if(str.substring(i, str.length()).indexOf(exWord) + i == i){
i = i + exWord.length()-1;
}
else{
str = str.substring(0,i) + "+" + str.substring(i+1);//replace each character with '+' symbol
}
}
Note : str.substring(i, str.length()).indexOf(exWord) + i this if statement will exclude any instance string of exWord from replacing process in str.
Output:
+++++++XYZ
Solution 2:
You can try this Approach using ReplaceAll method and it doesn't need any complex regular expression:
String exWord = "XYZ";
String str = "abXYxyzXYZ";
str = str.replaceAll(exWord,"*"); // replace instance string with * symbol
str = str.replaceAll("[^*]","+"); // replace all characters with + symbol except *
str = str.replaceAll("\\*",exWord); // replace * symbol with instance string
Note : This solution will work only if your input string str doesn't contain any * symbol.
Also you should escape any character with a special meaning in a regular expression in phrase instance string exWord like : exWord = "++".

pattern matching using regular expressions replace by digits

my program is to take a big string from the user like aaaabaaaaaba
then the output should be replace aaa by 0 and aba by 1 in the given pattern of
string it should not be take a sequence one into the other every sequence is
individual and like aaaabaaabaaaaba here aaa-aba-aab-aaa-aba are individual and
should not overlap eachother while matching please help me to get this program
example: aaaabaaaaaba input ended output is 0101
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Pattern1 {
Scanner sc =new Scanner(System.in);
public void m1()
{ String s;
System.out.println("enter a string");
s=sc.nextLine();
assertTrue(s!=null);
Pattern p = Pattern.compile(s);
Matcher m =p.matcher(".(aaa");
Matcher m1 =p.matcher("aba");
while(m.find())
{
s.replaceAll(s, "1");
}
while(m1.find())
{
s.replaceAll(s, "0");
}
System.out.println(s);
}
private boolean assertTrue(boolean b) {
return b;
// TODO Auto-generated method stub
}
public static void main(String[] args) {
Pattern1 p = new Pattern1();
p.m1();
}
}
With regex and find you can search for each successive match and then add a 0 or 1 depending on the characters to the output.
String test = "aaaabaaaaabaaaa";
Pattern compile = Pattern.compile("(?<triplet>(aaa)|(aba))");
Matcher matcher = compile.matcher(test);
StringBuilder out = new StringBuilder();
int start = 0;
while (matcher.find(start)) {
String triplet = matcher.group("triplet");
switch (triplet) {
case "aaa":
out.append("0");
break;
case "aba":
out.append("1");
break;
}
start = matcher.end();
}
System.out.println(out.toString());
If you have "aaaaaba" (one a too much in the first triplet) as input, it will ignore the last "a" and output "01". So any invalid characters between valid triplets will be ignored.
If you want to go through the string blocks of 3 you can use a for-loop and the substring() function like this:
String test = "aaaabaaaaabaaaa";
StringBuilder out = new StringBuilder();
for (int i = 0; i < test.length() - 2; i += 3) {
String triplet = test.substring(i, i + 3);
switch (triplet) {
case "aaa":
out.append("0");
break;
case "aba":
out.append("1");
break;
}
}
System.out.println(out.toString());
In this case, if a triplet is invalid, it will just be ignored and neither a "0" nor a "1" will be added to the output. If you want to do something in this case, just add a default clause to the switch statement.
Here's what I understand from your question:
The user string will be some sequence of the tokens "aaa" and "aba"
There will be no other combinations of 'a' and 'b'. For example, you will not get "aaabaa" as an input string as "baa" is invalid..
For each consecutive 3 character string, replace "aaa" with 0 and "aba" with 1.
I'm guessing that this is a homework assignment designed to teach you about the dangers of catastrophic backtracking and how to carefully use quantifiers.
My suggestion would be to do this in two parts:
Identify and replace each 3-letter segment with a single character.
Replace those characters with the appropriate value. ('1' or '0')
For example, first construct a pattern like a([ab])a to capture the character ('a' or 'b') between two 'a's. Then, use the Matcher class' replaceAll method to replace each match with the captured character. So, for input aaaabaaaaaba' you getabab` as a result. Finally, replace all 'a' with '0' and all 'b' with '1'.
In Java:
// Create the matcher to identify triplets in the form "aaa" or "aba"
Matcher tripletMatcher = Pattern.compile("a([ab])a").matcher(inputString);
// Replace each triplet with the middle letter, then replace 'a' and 'b' properly.
String result = tripletMatcher.replaceAll("$1").replace('a', '0').replace('b', '1');
There's better ways of doing this, of course, but this should work. I've left the code intentionally dense and hard to read quickly. So, if this is a homework assignment, make sure you understand it fully and then rewrite it yourself.
Also, keep in mind that this will not work if the input string that isn't a sequence of "aaa" and "aba". Any other combination, such as "baa" or "abb", will cause errors. For example, ababaa, aababa, and aaabab will all result in unexpected and potentially incorrect results.

Categories

Resources