Java regular expression match two same number - java

I want to use RE to match the file paths like below:
../90804/90804_0.jpg
../89246/89246_8.jpg
../89247/89247_14.jpg
Currently, I use the code as below to match:
Pattern r = Pattern.compile("^(.*?)[/](\\d+?)[/](\\d+?)[_](\\d+?).jpg$");
Matcher m = r.matcher(file_path);
But I found it will be an unexpected match like for:
../90804/89246_0.jpg
Is impossible in RE to match two same number?

You may use a \2 backreference instead of the second \d+ here:
s.matches("(.*?)/(\\d+)/(\\2)_(\\d+)\\.jpg")
See the regex demo. Note that if you use matches method, you won't need ^ and $ anchors.
Details
(.*?) - Group 1: any 0+ chars other than line break chars as few as possible
/ - a slash
(\\d+) - Group 2: one or more digits
/ - a slash
(\\2) - Group 3: the same value as in Group 2
_ - an underscore
(\\d+) - Group 4: one or more digits
\\.jpg - .jpg.
Java demo:
Pattern r = Pattern.compile("(.*?)/(\\d+)/(\\2)_(\\d+)\\.jpg");
Matcher m = r.matcher(file_path);
if (m.matches()) {
System.out.println("Match found");
System.out.println(m.group(1));
System.out.println(m.group(2));
System.out.println(m.group(3));
System.out.println(m.group(4));
}
Output:
Match found
..
90804
90804
0

You can use this regex with a capture group and back-reference of the same:
(\d+)/\1
RegEx Demo
Equivalent Java regex string will be:
final String regex = "(\\d+)/\\1";
Details:
(\d+): Match 1+ digits and capture it in group #1
/: Math literal /
\1: Using back-reference #1, match same number as in group #1

this regEx ^(.*)\/(\d+?)\/(\d+?)_(\d+?)\.jpg$
is matching stings like this:
../90804/90804_0.jpg
../89246/89246_8.jpg
../89247/89247_14.jpg
into 4 parts.
See example Result:

Related

String replacement when regex reverse group is null in java

I want to convert a software version number into a github tag name by regular expression.
For example, the version of ognl is usually 3.2.1. What I want is the tag name OGNL_3_2_1
So we can use String::replaceAll(String regex, String replacement) method like this
"3.2.1".replaceAll("(\d+).(\d+).(\d+)", "OGNL_$1_$2_$3")
And we can get the tag name OGNL_3_2_1 easily.
But when it comes to 3.2, I want the regex still working so I change it into (\d+).(\d+)(?:.(\d+))?.
Execute the code again, what I get is OGNL_3_2_ rather than OGNL_3_2. The underline _ at the tail is not what I want. It is resulted by the null group for $3
So how can I write a suitable replacement to solve this case?
When the group for $3 is null, the underline _ should disappear
Thanks for your help !!!
You can make the last . + digits part optional by enclosing it with an optional non-capturing group and use a lambda as a replacement argument with Matcher.replaceAll in the latest Java versions:
String regex = "(\\d+)\\.(\\d+)(?:\\.(\\d+))?";
Pattern p = Pattern.compile(regex);
String s="3.2.1";
Matcher m = p.matcher(s);
String result = m.replaceAll(x ->
x.group(3) != null ? "OGNL_" + x.group(1) + "_" + x.group(2) + "_" + x.group(3) :
"OGNL_" + x.group(1) + "_" + x.group(2) );
System.out.println(result);
See the Java demo.
The (\d+)\.(\d+)(?:\.(\d+))? pattern (note that literal . are escaped) matches and captures into Group 1 any one or more digits, then matches a dot, then captures one or more digits into Group 2 and then optionally matches a dot and digits (captured into Group 3). If Group 3 is not null, add the _ and Group 3 value, else, omit this part when building the final replacement value.

Reg-ex to match statsD Format

I am using the following reg-ex to match StatsD data format -
^[\w.]+:.+\|.\|#(?:[\w.]+:[^,\n]+(?:,|$))*$
This satisfies any of the following formats -
performance.os.disk:1099511627776|g|#region:us-west-1,datacenter:us-west-1a
or
performance.os.disk:1099511627776|g|#
or
performance.os.disk:1099511627776|g|#region:us-west-1
But I am unable to match it against -
datastore.reads:9876|ms
Any help?
RegEx 101 to try - https://regex101.com/r/H8vQTa/1/
You may use
^[\w.]+:[^|]+\|[^|]+(?:\|#(?:[\w.]+:[^,\n]+(?:,|$))*)?$
^^^^^^^^ ^^
See the regex demo
The point is that you only match any char with . between two |s, I suggest matching 1 or more chars other than | there, and make the rest optional by wrapping \|#(?:[\w.]+:[^,\n]+(?:,|$))* within an optional non-capturing group, (?:...)?.
Details
^ - start of string
[\w.]+ - 1+ word or . chars
: - a colon
[^|]+ - a negated character class matching 1+ non-| chars
\| - a | char
[^|]+ - 1+ chars other than |
(?:\|#(?:[\w.]+:[^,\n]+(?:,|$))*)? - an optional non-capturing group matching 1 or 0 occurrences of
\|# - |# substring
(?:[\w.]+:[^,\n]+(?:,|$))* - 0 or more consecutive repetitions of
[\w.]+: - 1+ word or . chars and then :
[^,\n]+ - 1+ chars other than LF (I guess it is used for debug purposes here) and ,
(?:,|$) - , or end of string
$ - end of string.

Regex: Match any word that is not the one defined by regex

I want to extract the words between the two bracket "blocks" and also the word in first brackets (RUNNING or STOPPED).
Example (extract the bolded part):
[ **RUNNING** ] **My First Application** [Pid: 4194]
[ **RUNNING** ] **Second app (some data)** [Pid: 5248]
[ **STOPPED** ] **Logger App**
So, as you can see, the [Pid: X] part is optional. I can write the regex as follows:
\[\s+(RUNNING|STOPPED)\s+\]\s+([^\[]+).*
and it will work. But this would fail if App name would contain the '[' character. I tried the following, but it won't work:
\[\s+(RUNNING|STOPPED)\s+\]\s+(?!\[Pid)+.*
My idea was to match any words/characters that are not starting with "[Pid", but I guess this would match any words that are not followed by "[Pid".
Is there any way to do exactly that: Match any word that is not "[Pid", i.e. match the part until first appearing of "[Pid" substring?
You may use
\[\s+(RUNNING|STOPPED)\s+\]\s+([^\[]*(?:\[(?!Pid:)[^\[]*)*)
See the regex demo
Details:
\[ - a literal [
\s+ - 1+ whitespaces
(RUNNING|STOPPED) - Group 1 capturing either RUNNING or STOPPED
\s+ - 1+ whitespaces
\] - a literal ]
\s+ - 1 or more whitespaces
([^\[]*(?:\[(?!Pid:)[^\[]*)*) - Group 2 capturing:
[^\[]* - zero or more chars other than [
(?:\[(?!Pid:)[^\[]*)* - zero or more sequences of:
\[(?!Pid:) - a [ not followed with Pid:
[^\[]* - zero or more chars other than [.
Java code:
String rx = "\\[\\s+(RUNNING|STOPPED)\\s+\\]\\s+([^\\[]*(?:\\[(?!Pid:)[^\\[]*)*)";
Pattern p = Pattern.compile(rx);
Matcher m = p.matcher("[ RUNNING ] My First Application");
if (m.find()) {
System.out.println(m.group(1));
System.out.println(m.group(2));
}
You can specify end of regex as [Pid or end of line by using this syntax:
\[\s+(RUNNING|STOPPED)\s+\]\s+(.*)(\[Pid|$)
Example.
You could achieve it with:
\[\ (RUNNING|STOPPED)\ \] # RUNNING or STOPPED -> group 1
(.+?) # everything afterwards in the same line lazily
(?:\[Pid:\ (\d+)\]|$) # [Pid:, numbers -> group 2, optional
See it working on regex101.com.

Weird password check matching using regex in Java

I'm trying to check a password with the following constraint:
at least 9 characters
at least 1 upper case
at least 1 lower case
at least 1 special character into the following list:
~ ! # # $ % ^ & * ( ) _ - + = { } [ ] | : ; " ' < > , . ?
no accentuated letter
Here's the code I wrote:
Pattern pattern = Pattern.compile(
"(?!.*[âêôûÄéÆÇàèÊùÌÍÎÏÐîÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ€£])"
+ "(?=.*\\d)"
+ "(?=.*[a-z])"
+ "(?=.*[A-Z])"
+ "(?=.*[`~!##$%^&*()_\\-+={}\\[\\]\\\\|:;\"'<>,.?/])"
+ ".{9,}");
Matcher matcher = pattern.matcher(myNewPassword);
if (matcher.matches()) {
//do what you've got to do when you
}
The issue is that some characters like € or £ doesn't make the password wrong.
I don't understand why this is working that way since I explicitly exclude € and £ from the authorized list.
Rather than trying to disallow those non-ascii characters why not makes your regex accept only ASCII characters like this:
Pattern pattern = Pattern.compile(
"(?=.*\\d)(?=.*[a-z])(?=.*[A-Z])(?=.*\\p{Print})\\p{ASCII}{9,})");
Also see use of \p{Print} instead of the big character class. I believe that would be suffice for you.
Check Javadoc for more details
This just allows printable Ascii. Note that it allows space character, but you could disallow space by setting \x21 instead.
Edit - I didn't see a number in the requirement, saw it in your regex, but wasn't sure.
# "^(?=.*[A-Z])(?=.*[a-z])(?=.*[`~!##$%^&*()_\\-+={}\\[\\]|:;\"'<>,.?])[\\x20-\\x7E]{9,}$"
^
(?= .* [A-Z] )
(?= .* [a-z] )
(?= .* [`~!##$%^&*()_\-+={}\[\]|:;"'<>,.?] )
[\x20-\x7E]{9,}
$

regular expressions in java

How to validate an expression for a single dot character?
For example if I have an expression "trjb....fsf..ib.bi." then it should return only dots at index 15 and 18. If I use Pattern p=Pattern.compile("(\\.)+"); I get
4 ....
11 ..
15 .
18 .
This seems to do the trick:
String input = "trjb....fsf..ib.bi.";
Pattern pattern = Pattern.compile("[^\\.]\\.([^\\.]|$)");
Matcher matcher = pattern.matcher(" " + input);
while (matcher.find()) {
System.out.println(matcher.start());
}
The extra space in front of the input does two things:
Allows for a . to be detected as the first character of the input string
Offsets the matcher.start() by one to account for the character in front of the matched .
Result is:
15
18
add a blank at the beginning and at the end of the string and then use the pattern
"[^\\.]\\.[^\\.]"
you need to use negative lookarounds .
Something like Pattern.compile("(?<!\\.)\\.(?!\\.)");
Try
Pattern.compile("(?<=[^\\.])\\.(?=[^\\.])")
or even better...
Pattern.compile("(?<![\\.])\\.(?![\\.])")
This uses negative lookaround.
(?<![\\.]) => not preceeded by a .
\\. => a .
(?![\\.]) => not followed by a .

Categories

Resources