Why p{Digit} Pattern is not Working in Java? [duplicate] - java

This question already has answers here:
Java String - See if a string contains only numbers and not letters
(23 answers)
Closed 3 years ago.
I am using p{Digit} to validade a String. However when I use "101ᶁ1" the result is true. This is happening with some symbols: ᶁ,ﻹ
Pattern p = Pattern.compile("[\\p{Digit}]");
boolean result = p.matcher(value).find();
I didn't find the characters that are validated in the documentation.

I believe you misunderstood the usage of find(). It searches for any the first occurrence of the regular expression in the searched text. (Pattern.start() returns the position where the expression was found)
The expression "[\\p{Digit}]" - the [] do nothing here - is just matching ONE digit. Since the searched text has a digit, the result of find() is true.
To match the whole text, the expression must start with ^ to match the beginning of the text and end with $ corresponding to the end of the text. And it must allow more than one digit, so it needs an + (one or more) resulting in
Pattern p = Pattern.compile("^\\p{Digit}+$");
boolean result = p.matcher(value).find();
matches() can be used to test against the whole text, so ^ and $ are not needed - still needs a + to allow more than one digit:
Pattern p = Pattern.compile("\\p{Digit}+");
boolean result = p.matcher(value).matches();
Note: this can written as:
boolean result = value.matches("\\p{Digit}+");

Related

How to extract sub string by matching the known set of keyword(s) [duplicate]

This question already has answers here:
Regex match one of two words
(2 answers)
Closed 2 years ago.
Trying to extract substring after a particular code for example
String sample1 = "/ASDF/096/GHJKL/WER/WER/dv/7906/CODEM/TEAR1331927498xxxxxx/YUII/OPL";
String sample2 = "/CODEM/TEAR1331927498xxxxxx";
String regExpresssion = "[/CODEM/]{6}(^[a-zA-Z0-9|\\s])?";
final Pattern pattern = Pattern.compile(regExpresssion);
final Matcher matcher = pattern.matcher(sample1);
if (matcher.find()) {
String subStringOut = sample1.substring(matcher.end());
}
subStringOut for sample 1 > TEAR1331927498xxxxxx/YUII/OPL
subStringOut for sample 2 > TEAR1331927498xxxxxx
above code is working fine but now I need to add one more identifier '/CODER/' in regex expression for below sample
String sample3 = "/ASDF/096/GHJKL/WER/WER/dv/7906/CODER/TEAR1331927498xxxxxx/YUII/OPL";
I have tried
String regExpresssion = "[/CODEM/|/CODER/]{6}(^[a-zA-Z0-9|\\s])?";
but it is not working. Any suggestions guys?
Thanks!!
try replacing [/CODEM/|/CODER/]{6} with /CODE[RM]/
I think you meant to match the entire phrase /CODEM/ or /CODER/ but because of the way you wrote it you were accepting any sequence of any of those characters 6 characters long. I'm not entirely sure though. The Brackets represent a "character class" and they only match a single character, if you want to match multiple in a row you use parentheses. Also the second part does not make sense to me because the exponent sign is in the middle of the phrase, and in that context it matches the beginning of a line.
Just need single look behind assersun
Try (?<=/CODE[MR]/).*
PCRE demo
but works for Java in this case

Negative Look-Ahead assertion for multiline text [duplicate]

This question already has answers here:
How to use java regex to match a line
(2 answers)
Closed 4 years ago.
i'm looking for a way to check whether a multiline string (from a pdf) contains a certain letter combination which must not start with a specific prefix. Specifically, i'm trying to find Strings that contain ARC but don't contain NON-ARC.
I found this great example Regular expression for a string that does not start with a sequence but it seems it does not work with my problem. With my pattern ^(?!NON\\-)ARC.* i get the expected result in a single line test, with real input the negative look ahead assertion has a false positive. Here is what i did:
#Test
public void testRegexLookAhead() {
String strTestSimplePos = "ARC 0.1-1";
String strTestSimpleNeg = "NON-ARC 3.4-1";
String strTestRealPos = "HEADLINE\r\n" + "Subheader Author\r\n" + "ARC 0.1-1\r\n" + "20190211";
String strTestRealNeg = "HEADLINE\r\n" + "Subheader Author\r\n" + "NON-ARC 0.1-1\r\n" + "20190211";
//based on https://stackoverflow.com/questions/899422/regular-expression-for-a-string-that-does-not-start-with-a-sequence
String regexNoNON = "^(?!NON\\-)ARC.*";
Pattern noNONPatter = Pattern.compile(regexNoNON);
System.out.println(noNONPatter.matcher(strTestSimplePos).find()); //true OK
System.out.println(noNONPatter.matcher(strTestSimpleNeg).find()); //false OK
System.out.println(noNONPatter.matcher(strTestRealPos).find()); //false but should be true -> does not work as intended
System.out.println(noNONPatter.matcher(strTestRealNeg).find()); //false OK
Would be glad if anyone can point out what went wrong...
Edit: This was marked as a duplicate of How to use java regex to match a line - however i didn't try to use a regex to match a line at all. Just needed a way to find a specific sequence (with negative look-ahead) for a multiline text input. One approach to solve the other question is also the solution to this one (compile pattern with java.util.regex.Pattern.MULTILINE) - but the questions are at best related.
Your input strings have multiple lines and you're using the caret, you need to add the multi-line flag:
Pattern.compile(regexNoNON, java.util.regex.Pattern.MULTILINE);
About MULTILINE:
Enables multiline mode.
In multiline mode the expressions ^ and $ match just after or just before, respectively, a line terminator or the end of the input sequence. By default these expressions only match at the beginning and the end of the entire input sequence.
Try this Regex:
HEADLINE(?:(?!HEADLINE)[\s\S])*(?<!NON-)ARC(?:(?!HEADLINE)[\s\S])*
Click for Demo
JAVA Code
Explanation:
HEADLINE - matches the word HEADLINE
(?:(?!HEADLINE)[\s\S])* - matches 0+ occurrences of any character that does not start with the word HEADLINE
(?<!NON-)ARC - matches the word ARC if it is not immediately preceded by NON-
(?:(?!HEADLINE)[\s\S])* - matches 0+ occurrences of any character that does not start with the word HEADLINE

Find phoneNumbers in text with regex [duplicate]

This question already has answers here:
What do ^ and $ mean in a regular expression?
(2 answers)
Closed 4 years ago.
I'm having a whole text in a string and I want to find all belgium cell phone numbers.
So I wrote this piece of code:
Pattern cellPhoneRegex = Pattern.compile("^((\\+|00)32\\s?|0)4(60|[789]\\d)(\\s?\\d{2}){3}$");
List<String> cellPhoneList = new ArrayList<>();
Matcher cellPhoneMatches = cellPhoneRegex.matcher("+32495715511");
while (cellPhoneMatches.find()) {
cellPhoneList.add(cellPhoneMatches.group());
}
System.out.println(cellPhoneList);
Now the thing is that when you run this it matches the phone number.
But when the same number is in a huge text it doesn't find anything.
For this string "Tel: +32495715511" there are no matches.
I don't see why it's not matching.
Exactly what #Thefourthbird said. You're regex is looking for an exact match. As in the text to match has to start with (^ means starts with in this example) and end with ($ means ends with in this example) the phone number matching the regex.
Try using this
var telephone = /\(?s?+?32s?\)?s?[789]d{8,}/;
I’ve not tried it before.

java regex find all whitespace in a string [duplicate]

This question already has answers here:
Whitespace Matching Regex - Java
(11 answers)
Regexp Java for password validation
(17 answers)
Closed 5 years ago.
I have see numerous suggestions for regex to find whitespace in a string none of which have worked so far. Yes the concept of looping through the string with a for next loop will work. I would really like to learn how to do this with regex and Pattern Matcher ! My question is what and where do I need to add to my regex string so it will return FALSE? code below I have added numerous incarnations of (\\s) to no avail. I do not want to remove the whitespace.
I tested the code suggested as a duplicate and it does not work see the link suggested in the comments
String tstr = "^(?=.*[a-z])(?=.*[A-Z])(?=.*\\d)(?=.*[$#$!%*?&])[A-Za-z\\d$#$!%*?&]";
String astr = etPW.getText().toString().trim();
Pattern regex = Pattern.compile(tstr);
Matcher regexMatcher = regex.matcher(astr);
boolean foundMatch = regexMatcher.find();
if(foundMatch == false){
Toast.makeText( MainActivity.this, "Password must have one Numeric Value\n"
+ "\nOne Upper & Lower Case Letters\n"
+ "\nOne Special Character $ # ! % * ? &", Toast.LENGTH_LONG ).show();
//etPW.setText("");
//etCPW.setText("");
// Two lines of code above are optional
// Also by design these fields can be set to input type Password in the XML file
etPW.requestFocus();
return ;
}
You can use negative lookahead to check for spaces:
^(?!.* )
^ - Start matching at the beginning of the string.
(?! - Begin a negative lookahead group (the pattern inside the parentheses must not come next.
.* - Any non-newline character any number of times followed by a space.
) - Close the negative lookahead group.
Combined with the full regex pattern (also cleaned up a bit to remove redundancy):
^(?!.* )(?=.*[a-z])(?=.*[A-Z])(?=.*\\d)(?=.*[!#$%&*?])[A-Za-z\\d!#$%&*?]+

Why does Java regex "matches" vs "find" get a different match when using non-greedy pattern? [duplicate]

This question already has answers here:
Difference between matches() and find() in Java Regex
(5 answers)
Closed 1 year ago.
So I ran into a bug caused by expecting the matches() method to find exactly the same match as using find(). Normally this is the case, but it appears that if a non-greedy pattern can be stretched to greedily accept the whole string, its allowed. This seems like a bug in Java. Am I wrong? I don't see anything in the docs which indicates this behavior.
Pattern stringPattern = Pattern.compile("'.*?'");
String nonSingleString = "'START'===stageType?'active':''";
Matcher m1 = stringPattern.matcher(nonSingleString);
boolean matchesCompleteString = m1.matches();
System.out.println("Matches complete string? " + matchesCompleteString);
System.out.println("What was the match? " + m1.group()); //group() gets the string that matched
Matcher m2 = stringPattern.matcher(nonSingleString);
boolean foundMatch = m2.find(); //this looks for the next match
System.out.println("Found a match in at least part of the string? " + foundMatch);
System.out.println("What was the match? " + m2.group());
Outputs
Matches complete string? true
What was the match? 'START'===stageType?'active':''
Found a match in at least part of the string? true
What was the match? 'START'
This makes perfect sense.
The matches(...) method must attempt to consume the whole string, so it does, even with a non-greedy pattern.
The find(...) method may find a substring, so it stops at the point if finds any matching substring.
They are supposed to be different. Matcher#matches attempts to match the complete input string using the implicit anchors ^ and $ around your regex, whereas Matcher#find matches whatever your regex can match.
As per Javadoc:
public boolean matches()
Attempts to match the entire region
against the pattern. If the match succeeds then more information can
be obtained via the start, end, and group methods.
and
public boolean find()
Attempts to find the next subsequence of the input sequence that matches the pattern.

Categories

Resources