java regex find all whitespace in a string [duplicate] - java

This question already has answers here:
Whitespace Matching Regex - Java
(11 answers)
Regexp Java for password validation
(17 answers)
Closed 5 years ago.
I have see numerous suggestions for regex to find whitespace in a string none of which have worked so far. Yes the concept of looping through the string with a for next loop will work. I would really like to learn how to do this with regex and Pattern Matcher ! My question is what and where do I need to add to my regex string so it will return FALSE? code below I have added numerous incarnations of (\\s) to no avail. I do not want to remove the whitespace.
I tested the code suggested as a duplicate and it does not work see the link suggested in the comments
String tstr = "^(?=.*[a-z])(?=.*[A-Z])(?=.*\\d)(?=.*[$#$!%*?&])[A-Za-z\\d$#$!%*?&]";
String astr = etPW.getText().toString().trim();
Pattern regex = Pattern.compile(tstr);
Matcher regexMatcher = regex.matcher(astr);
boolean foundMatch = regexMatcher.find();
if(foundMatch == false){
Toast.makeText( MainActivity.this, "Password must have one Numeric Value\n"
+ "\nOne Upper & Lower Case Letters\n"
+ "\nOne Special Character $ # ! % * ? &", Toast.LENGTH_LONG ).show();
//etPW.setText("");
//etCPW.setText("");
// Two lines of code above are optional
// Also by design these fields can be set to input type Password in the XML file
etPW.requestFocus();
return ;
}

You can use negative lookahead to check for spaces:
^(?!.* )
^ - Start matching at the beginning of the string.
(?! - Begin a negative lookahead group (the pattern inside the parentheses must not come next.
.* - Any non-newline character any number of times followed by a space.
) - Close the negative lookahead group.
Combined with the full regex pattern (also cleaned up a bit to remove redundancy):
^(?!.* )(?=.*[a-z])(?=.*[A-Z])(?=.*\\d)(?=.*[!#$%&*?])[A-Za-z\\d!#$%&*?]+

Related

Why p{Digit} Pattern is not Working in Java? [duplicate]

This question already has answers here:
Java String - See if a string contains only numbers and not letters
(23 answers)
Closed 3 years ago.
I am using p{Digit} to validade a String. However when I use "101ᶁ1" the result is true. This is happening with some symbols: ᶁ,ﻹ
Pattern p = Pattern.compile("[\\p{Digit}]");
boolean result = p.matcher(value).find();
I didn't find the characters that are validated in the documentation.
I believe you misunderstood the usage of find(). It searches for any the first occurrence of the regular expression in the searched text. (Pattern.start() returns the position where the expression was found)
The expression "[\\p{Digit}]" - the [] do nothing here - is just matching ONE digit. Since the searched text has a digit, the result of find() is true.
To match the whole text, the expression must start with ^ to match the beginning of the text and end with $ corresponding to the end of the text. And it must allow more than one digit, so it needs an + (one or more) resulting in
Pattern p = Pattern.compile("^\\p{Digit}+$");
boolean result = p.matcher(value).find();
matches() can be used to test against the whole text, so ^ and $ are not needed - still needs a + to allow more than one digit:
Pattern p = Pattern.compile("\\p{Digit}+");
boolean result = p.matcher(value).matches();
Note: this can written as:
boolean result = value.matches("\\p{Digit}+");

Negative Look-Ahead assertion for multiline text [duplicate]

This question already has answers here:
How to use java regex to match a line
(2 answers)
Closed 4 years ago.
i'm looking for a way to check whether a multiline string (from a pdf) contains a certain letter combination which must not start with a specific prefix. Specifically, i'm trying to find Strings that contain ARC but don't contain NON-ARC.
I found this great example Regular expression for a string that does not start with a sequence but it seems it does not work with my problem. With my pattern ^(?!NON\\-)ARC.* i get the expected result in a single line test, with real input the negative look ahead assertion has a false positive. Here is what i did:
#Test
public void testRegexLookAhead() {
String strTestSimplePos = "ARC 0.1-1";
String strTestSimpleNeg = "NON-ARC 3.4-1";
String strTestRealPos = "HEADLINE\r\n" + "Subheader Author\r\n" + "ARC 0.1-1\r\n" + "20190211";
String strTestRealNeg = "HEADLINE\r\n" + "Subheader Author\r\n" + "NON-ARC 0.1-1\r\n" + "20190211";
//based on https://stackoverflow.com/questions/899422/regular-expression-for-a-string-that-does-not-start-with-a-sequence
String regexNoNON = "^(?!NON\\-)ARC.*";
Pattern noNONPatter = Pattern.compile(regexNoNON);
System.out.println(noNONPatter.matcher(strTestSimplePos).find()); //true OK
System.out.println(noNONPatter.matcher(strTestSimpleNeg).find()); //false OK
System.out.println(noNONPatter.matcher(strTestRealPos).find()); //false but should be true -> does not work as intended
System.out.println(noNONPatter.matcher(strTestRealNeg).find()); //false OK
Would be glad if anyone can point out what went wrong...
Edit: This was marked as a duplicate of How to use java regex to match a line - however i didn't try to use a regex to match a line at all. Just needed a way to find a specific sequence (with negative look-ahead) for a multiline text input. One approach to solve the other question is also the solution to this one (compile pattern with java.util.regex.Pattern.MULTILINE) - but the questions are at best related.
Your input strings have multiple lines and you're using the caret, you need to add the multi-line flag:
Pattern.compile(regexNoNON, java.util.regex.Pattern.MULTILINE);
About MULTILINE:
Enables multiline mode.
In multiline mode the expressions ^ and $ match just after or just before, respectively, a line terminator or the end of the input sequence. By default these expressions only match at the beginning and the end of the entire input sequence.
Try this Regex:
HEADLINE(?:(?!HEADLINE)[\s\S])*(?<!NON-)ARC(?:(?!HEADLINE)[\s\S])*
Click for Demo
JAVA Code
Explanation:
HEADLINE - matches the word HEADLINE
(?:(?!HEADLINE)[\s\S])* - matches 0+ occurrences of any character that does not start with the word HEADLINE
(?<!NON-)ARC - matches the word ARC if it is not immediately preceded by NON-
(?:(?!HEADLINE)[\s\S])* - matches 0+ occurrences of any character that does not start with the word HEADLINE

Finding the character "^" using regex [duplicate]

This question already has an answer here:
Escaping special characters in java regex (not quoting)
(1 answer)
Closed 8 years ago.
I'm trying to define a regex pattern that searches for a caret character, but since ^ is used for negation, I'm not sure how to define the pattern. I'm trying to make the program find a string that is a letter then a caret then a number (as you may have guessed, this is a mathematical term), such as "x^23". This is the line I tried:
String caseFour = "[a-zA-Z]" + "^" + "\\d+";
It's not working. Can anyone help me out?
You need to escape that character also since it is a character of special meaning.
String regex = "[a-zA-Z]\\^\\d+";

Writing a regex to detect repeat-characters [duplicate]

This question already has answers here:
Regex to match the longest repeating substring
(5 answers)
Closed 9 years ago.
I need to write a regex, that would identify a word that have a repeating character set at the end. According to the following code fragment, the repeating character set is An. I need to write a regex so this will be spotted and displayed.
According to the following code, \\w will match any word character (including digit, letter, or special character). But i only want to identify english characters.
String stringToMatch = "IranAnAn";
Pattern p = Pattern.compile("(\\w)\\1+");
Matcher m = p.matcher(stringToMatch);
if (m.find())
{
System.out.println("Word contains duplicate characters " + m.group(1));
}
UPDATE
Word contains duplicate characters a
Word contains duplicate characters a
Word contains duplicate characters An
You want to catch as many characters in your set as possible, so instead of (\\w) you should use (\\w+) and you want the sequence to be at the end, so you need to add $ (and I have removed the + after \\1 which is not useful to detect repetition: only one repetition is needed):
Pattern p = Pattern.compile("(\\w+)\\1$");
Your program then outputs An as expected.
Finally, if you only want to capture ascii characters, you can use [a-zA-Z] instead of \\w:
Pattern p = Pattern.compile("([a-zA-Z]+)\\1$");
And if you want the character set to be at least 2 characters:
Pattern p = Pattern.compile("([a-zA-Z]{2,})\\1$");
If by "only English characters" you mean A-Z and a-z, the follow regex will work:
".*([A-Za-z]{2,})\\1$"

How to write a regex that prevents partial matching [duplicate]

This question already has answers here:
Regex whitespace word boundary
(3 answers)
Closed 2 years ago.
How do I build a regex pattern that searches over a text T and tries to find a search string S.
There are 2 requirements:
S could be made of any character.
S could be anywhere in the string but can't be part of a word.
I know that in order to escape special regex characters I put the search string between \Q and \E as such:
\EMySearch_String\Q
How do I prevent finding partial matching of S in T?
You can do like this if
can't be part of a word
is interpreted as
preceded by start-of-string or space and followed by end-of-string or space:
String s = "3894$75\\/^()";
String text = "fdsfsd3894$75\\/^()dasdasd 22348 3894$75\\/^()";
Matcher m = Pattern.compile("(?<=^|\\s)\\Q" + s + "\\E(?=\\s|$)").matcher(text);
while (m.find()) {
System.out.println("Found match! :'" + m.group() + "'");
}
This prints only one
Found match! :'3894$75/^()'
I think what you're trying to find can be easily solved with lookaheads and lookbehinds. Take a look at this for a good explanation.
Then there's a bit of flip-flopping booleans, but you're looking ahead and behind for NOT Non-Space characters (\S). You don't want to look for space characters only because S might be at the start or end of the string. Like so:
(?<!\S)S(?!\S)

Categories

Resources