Java regex to detect semver strings is failing without qualifiers - java

I am trying to get a Java method to validate whether or not a String argument is a properly-formatted "semver" (semantic versioning) version string.
In my app, semver strings must be of the form:
<major>.<minor>.<patch>-<qualifier>
Where:
<major> is a positive integer (1+)
<minor> and <patch> are both non-negative integers (0+)
<qualifier> is an alphanumeric string (([0-9][a-z][A-Z])+)
Valid examples:
1.2.40
1.0.0-SNAPSHOT
2.0.45-RC
3.10.0
My best attempt thus far:
public boolean isSemVer(String version) {
Pattern versionPattern = Pattern.compile("^[a-zA-Z-]+\\d+\\.\\d+\\.\\d+");
Matcher matcher = versionPattern.matcher(version);
return matcher.matches();
}
Produces false for the first valid example of 1.2.40. Can anyone tell me where I'm going awry and what I need to tweak in my regex to get it to accept my use cases? Thanks in advance!

Your valid strings start with digits and not with letters, so [a-zA-Z-]+ in your pattern already makes the pattern wrong.
Use
^[1-9]\d*\.\d+\.\d+(?:-[a-zA-Z0-9]+)?$
See the regex demo
Details
^ - start of string
[1-9]\d* - a digit from 1 to 9 and then 0 or more digits
\.\d+\.\d+ - two occurrences of . and 1+ digits (can be written as (?:\.\d+){2})
(?:-[a-zA-Z0-9]+)? - an optional occurrence of - and 1+ alphanumeric chars ([a-zA-Z0-9] can be written as \p{Alnum})
$ - end of string.
In Java, use with .matches():
public boolean isSemVer(String version) {
Pattern versionPattern = Pattern.compile("[1-9]\\d*\\.\\d+\\.\\d+(?:-[a-zA-Z0-9]+)?");
Matcher matcher = versionPattern.matcher(version);
return matcher.matches();
}

You can try with the official SemVer regex
"^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$"gm

Related

JAVA REGEX: Match until the specific character

I have this Java code
String cookies = TextUtils.join(";", LoginActivity.msCookieManager.getCookieStore().getCookies());
Log.d("TheCookies", cookies);
Pattern csrf_pattern = Pattern.compile("csrf_cookie=(.+)(?=;)");
Matcher csrf_matcher = csrf_pattern.matcher(cookies);
while (csrf_matcher.find()) {
json.put("csrf_key", csrf_matcher.group(1));
Log.d("CSRF KEY", csrf_matcher.group(1));
}
The String contains something like this:
SessionID=sessiontest;csrf_cookie=e18d027da2fb95e888ebede711f1bc39;ci_session=3f4675b5b56bfd0ba4dae46249de0df7994ee21e
Im trying to get the csrf_cookie data by using this Regular Expression:
csrf_cookie=(.+)(?=;)
I expect a result like this in the code:
csrf_matcher.group(1);
e18d027da2fb95e888ebede711f1bc39
instead I get a:
3492f8670f4b09a6b3c3cbdfcc59e512;ci_session=8d823b309a361587fac5d67ad4706359b40d7bd0
What is the possible work around for this problem?
Here is a one-liner using String#replaceAll:
String input = "SessionID=sessiontest;csrf_cookie=e18d027da2fb95e888ebede711f1bc39;ci_session=3f4675b5b56bfd0ba4dae46249de0df7994ee21e";
String cookie = input.replaceAll(".*csrf_cookie=([^;]*).*", "$1");
System.out.println(cookie);
e18d027da2fb95e888ebede711f1bc39
Demo
Note: We could have used a formal regex pattern matcher, and in face you may want to do this if you need to do this search/replacement often in your code.
You are getting more data than expected because you are using an greedy '+' (It will match as long as it can)
For example the pattern a+ could match on aaa the following: a, aa, and aaa. Where the later is 'preferred' if the pattern is greedy.
So you are matching
csrf_cookie=e18d027da2fb95e888ebede711f1bc39;ci_session=3f4675b5b56bfd0ba4dae46249de0df7994ee21e;
as long as it ends with a ';'. The first ';' is skipped with .+ and the last ';' is found with the possitive lookahead
To make a patter ungreedy/lazy use +? instead of + (so a+? would match a (three times) on aaa string)
So try with:
csrf_cookie=(.+?);
or just match anything that is not a ';'
csrf_cookie=([^;]*);
that way you don't need to make it lazy.

Regex starts with "ATG" ends with "TAG, TAA orTGA" but does not contain "ATG" and "TAG, TAA or TGA" in between

I'm searching for patterns in a String starting with ATG, ending with TAG, TAA or TGA and length = multiple of 3. ATG and TAG, TAA or TGA can only appear at respectively beginning or end. Which means:
From ATGTTGTGATGT extract ATGTTGTGA
From ATGATGTTGTGATGT extract ATGTTGTGA
Currently I'm using regex (ATG)([ATG]{3})+?(TAG|TAA|TGA).
For ATGATGTTGTGATGT this gets me the wrong result ATGATGTTGTGA.
I've tried:
(^ATG)(!?=.*ATG)([ATG]{3})+?(TAG|TAA|TGA)
(^ATG)(!?=(ATG)+)([ATG]{3})+?(TAG|TAA|TGA)
How to tell it to contain ATG only once in the beginning and no more after that?
You may use
ATG(?:(?!ATG)[ATG]{3})*?(?:TAG|TAA|TGA)
See the regex demo
Details
ATG - an ATG substring
(?:(?!ATG)[ATG]{3})*? - a tempered greedy token matching any sequence of 3 chars from the [ATG] character set that is not equal to ATG (that is restricted with the negative lookahead (?!ATG))
(?:TAG|TAA|TGA) - either of the three alternatives defined in the non-capturing group: TAG, TAA or TGA.
Java demo:
String rx = "ATG(?:(?!ATG)[ATG]{3})*?(?:TAG|TAA|TGA)";
String s = "ATGTTGTGATGT, ATGATGTTGTGATGT, ATGATGTTGTGATGT";
Pattern pattern = Pattern.compile(rx);
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Result:
ATGTTGTGA
ATGTTGTGA
ATGTTGTGA

Java pattern match for cisco configuration line

I have a text cisco configuration.
The hostname line I should match is "125-hostname billdevice".
I am using below pattern but not matching true.
Pattern ciscohostname = Pattern.compile("^[0-9999999]-hostname");
Matcher matcherx = ciscohostname.matcher(BlockIndexList.get(k).toString());
How can I match this line.
What you want is
"^[0-9]+-hostname"
This means:
Match if string starts with at least one character in range of [0-9](aka. digits) followed by string "-hostname"
As you've specified a range in your code (i.e., 0-9999999) then you can use this RegEx
^[0-9]{1,7}-hostname
This will ensure that only 1 to 7 digit numbers are matched and any number more than that will be eliminated.
0-hostname billdevice //match
9999999-hostname billdevice //match
10000000-hostname billdevice //no match
DEMO

Change group using regex java

I need help in regular expression using in regex java.
I need change group in string:
Example:
Input:
=sum($var1;2) or =if($result<10;"little";"big") ...
Need Output:
=sum(teste;2) or =if(teste<10;"little";"big") ...
Code I have:
Pattern p = Pattern.compile("(\\.*)(\\$\\w)(\\.*)");
Matcher m = p.matcher(total);
if (m.find()) {
System.out.println(m.replaceAll("$2teste"));
}
Output I have:
=sum($vtestear1;2)
=if($r testeesultado<5;"maior";"menor")
Why match everything when all you need is to match variable tokens?
Pattern p = Pattern.compile("\\b\\$[a-z0-9]+\\b");
p.matcher(total).replaceAll("teste");
Change the [a-z0-9] part if you can have more than lowercase ASCII letters and digits.
Also, you don't need to test for .find() or anything if you .replace(): no match means nothing will be replaced.

regular expressions for random letters/symbols

I have searched for hours using regular expression generators and checkers but i cant seem to get it to work...
I have this string: hdr("");cr(92);cl(3,"",4,"420720250","random message here");etr();
and so far my code is :
private void strchecker() {
Pattern pattern = Pattern.compile("(\\d{9})");
Matcher matcher = pattern.matcher(strLine);
if (matcher.find()) {
System.out.println(matcher.group(0)); //prints /{item}/
} else {
//System.out.println("Match not found");
}
}
This code is working and it finds the 9 digit number in the string. What im trying to do it find the regex code to search for "cl(3" or "cl(2", if it exists then send the 9 digit number to a variable. i just don't know how to find that cl(3 or 2..
any advice?
Thanks
Matt
/cl\([23].*(\d{9})/
The final parentheses will capture the 9 digits in group 1.
Since you note you're using javascript, I think you could do it like this, as you can't use lookbehind; you just have to grab the capture group rather than use the full match.
cl\\([23].*?(\\d{9})

Categories

Resources