Regex: How to find exact value length?

Regex: How to find exact value length? - java

I got these several cases of given String:
key1=12345
key1=12345&key2=12345
key1=12345123456789
key1=12345123456789&key2=123456789
Using this pattern: (key1)=([^&]{5})(|$)).
The expected results are:
12345
12345, 12345
nothing
nothing
And while running, the results were:
12345
12345, 12345
12345
12345, 12345
Which means that the {5} is actually cutting the text by the given length which is 5 and not looking for exact 5.
How can I make it to look for exact 5 and not to cut the text ?

This pattern will do it:
=([^&]{5})(?:&|$)
It finds =, followed by 5 captured characters that are not &, immediately be followed by either & or the end of the string.
Test
public static void main(String[] args) {
test("key1=12345");
test("key1=12345&key2=12345");
test("key1=12345123456789");
test("key1=12345123456789&key2=123456789");
}
private static void test(String input) {
Matcher m = Pattern.compile("=([^&]{5})(?:&|$)").matcher(input);
List<String> list = new ArrayList<>();
while (m.find())
list.add(m.group(1));
System.out.println(list);
}
Output
[12345]
[12345, 12345]
[]
[]

([^=&]+)=(?<![^=&])([^=&]{5})(?![^=&])
key, then =, then five characters that are not = or &, not surrounded by more characters that are not = or &. (\b is useful in limited circumstances where your values are guaranteed to only consist of letters, numbers and underscores; negative lookaround is much more general).

Basically, you must add boundaries somehow. One can use anchors for that (such as \b for "word boundary", or ^ and $ for string/line boundary). Other way is to limit match until a given character appears - e.g [^&\n] - not until the & sign or a newline, then check the length programmaticaly in Java.
I came up with a regex of my own: demo here
(?:=)(\d{5})(?:[&|\n])
First of all I look for a = sign, but do not capture it.
Then I look for 5 digits...
... followed either by & sign or a newline (which I do not capture either)
If you need to look only for key1, key2 values, just add (?:key\d=) instead of (?:=) - demo
Then the only matches found are of given length.
#Stribizhev's regex might be potentially more secure and more false-positive protected, though.

Related

Java regular expression for UserName with length range

I am writing a regular expression to validate UserName.
Here is the rule:
Length: 6 - 20 characters
Must start with letter a-zA-Z
Can contains a-zA-Z0-9 and dot(.)
Can't have 2 consecutive dots
Here is what I tried:
public class TestUserName {
private static String USERNAME_PATTERN = "[a-z](\\.?[a-z\\d]+)+";
private static Pattern pattern = Pattern.compile(USERNAME_PATTERN, CASE_INSENSITIVE);
public static void main(String[] args) {
System.out.println(pattern.matcher("user.name").matches()); // true
System.out.println(pattern.matcher("user.name2").matches()); // true
System.out.println(pattern.matcher("user2.name").matches()); // true
System.out.println(pattern.matcher("user..name").matches()); // false
System.out.println(pattern.matcher("1user.name").matches()); // false
}
}
The pattern I used is good but no length constraint.
I tried to append {6,20} constraint to the pattern but It failed.
"[a-z](\\.?[a-z\\d]+)+{6,20}" // failed pattern to validate length
Anyone has any ideas?
Thanks!

You can use a lookahead regex for all the checks:
^[a-zA-Z](?!.*\.\.)[a-zA-Z.\d]{5,19}$
Using [a-zA-Z.\d]{5,19} because we have already matched one char [a-zA-Z] at start this making total length in the range {6,20}
Negative lookahead (?!.*\.\.) will assert failure if there are 2 consecutive dots
Equivalent Java pattern will be:
Pattern p = Pattern.compile("^[a-zA-Z](?!.*\\.\\.)[a-zA-Z.\\d]{5,19}$");

Use a negative look ahead to prevent double dots:
"^(?!.*\\.\\.)(?i)[a-z][a-z\\d.]{5,19}$"
(?i) means case insensitve (so [a-z] means [a-zA-Z])
(?!.*\\.\\.) means there isn't two consecutive dots anywhere in it
The rest is obvious.
See live demo.

I would use the following regex :
^(?=.{6,20}$)(?!.*\.\.)[a-zA-Z][a-zA-Z0-9.]+$
The (?=.{6,20}$) positive lookahead makes sure the text will contain 6 to 20 characters, while the (?!.*\.\.) negative lookahead makes sure the text will not contain .. at any point.

This will also suffice (for only matching)
(?=^.{6,20}$)(?=^[A-Za-z])(?!.*\.\.)
For capturing, the matched pattern, you can use
(?=^.{6,20}$)(?=^[A-Za-z])(?!.*\.\.)(^.*$)

Validate if input string is a number between 0-255 using regex

I am facing problem while matching input string with Regex. I want to validate input number is between 0-255 and length should be up to 3 characters long. code is working fine but when I input 000000 up to any length is shows true instead false.
Here is my code :-
String IP = "000000000000000";
System.out.println(IP.matches("(0*(?:[0-9][0-9]?|[0-2][0-5][0-5]))"));

Tested this:
static String pattern = "^(([0-1]?[0-9]?[0-9]?|2[0-4][0-9]|25[0-5])\\.){3}([0-1]?[0-9]?[0-9]?|2[0-4][0-9]|25[0-5]){1}$";
It works for the following:
IP Addresses xxx.xxx.xxx.xxx / xx.xx.xx.xx / x.x.x.x / mix of these.
Leading zeros are allowed.
Range 0-255 / maximum 3 digts.

You may use this regex:
boolean valid = IP.matches("^(?:1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])$");
// since .matches expects complete match you can omit anchors
boolean valid = IP.matches("(?:1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])");
RegEx Demo

You can use this pattern which matches "0", "1", ... "255":
"([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])"
Demo on Ideone

Using boundary tags to ensure only (0 to 255) numbers is matched, the optimized pattern that I have to offer is:
\b(?:1\d{2}|2[0-4]\d|[1-9]?\d|25[0-5])\b
Pattern Demo (in PHP/PCRE to show step count)
4010 steps when checking a list from 0 to 256.
This pattern will not match 01 or 001. (no match on one or more leading zeros)
Considerations:
Use quantifiers on consecutive duplicate characters.
Organize the alternatives not in sequential order (single-digit, double-digit, 100's, under-249, 250-255) but with quickest mis-matches first.
Avoid non-essential capture (or non-capture) groups. (despite seeming logical to condense the "two hundreds" portion of the pattern)

Please try this
"^(((\d|0\d|00\d|\d{2}|0\d{2}|1\d{2}|2[0-4]\d|2[0-5]{2})\.){3})(\d|0\d|00\d|\d{2}|0\d{2}|1\d{2}|2[0-4]\d|2[0-5]{2})$"
It works also with leading zeroes

boolean valid = IP.matches("(0?[0-9]{1,2}|1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])");

Complete ip inet4 match :
JS
/(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])/g.exec(myIp);
https://regex101.com/r/tU3gC3/12
Minified :
/(1?(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.){3}(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])/g.exec(myIp);
https://regex101.com/r/tU3gC3/13

This will work for following pattern and ip containing initial zeros
e.g: 023.45.12.56
pattern=(\\d{1,2}|(0|1)\\d{2}|2[0-4]\\d|25[0-5]);

If you need leading zeros, try this:
"((\\d{1,2}|[01]\\d{1,2}|[0-2][0-4]\\d|25[0-5])\\.){3}(\\d{1,2}|[01]\\d{1,2}|[0-2][0-4]\\d|25[0-5])"
It satisfies following conditions: IP address is a string in the form "A.B.C.D", where the value of A, B, C, and D may range from 0 to 255. Leading zeros are allowed. The length of A, B, C, or D can't be greater than 3.
Maybe somebody can help with additional simplifying?

If you want to validate ip4 with 'ip/mask', so regex looks like this:
^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\/([0-9]|[1-2][0-9]|3[0-2]))$
Just ip
^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$
JS code to test it
function isMatchByRegexp(stringToValidate, regexp) {
var re = new RegExp(regexp);
return re.test(stringToValidate);
}

(2[0-4][0-9])|([0-1]?[0-9]?[0-9])
To match 0 to 249 specifically

You can simplify it by thinking in four conditions that might happen
String zeroTo255 = "((0|1)\\d{2}|2[0-4]\\d|25[0-5]|\\d{1,2})";
String validIpPattern = zeroTo255 + "\\." + zeroTo255 + "\\." + zeroTo255 + "\\." + zeroTo255;
(0|1)\d{2} catches any three digit number starting with 0 or 1.
2[0-4]\d catches numbers between 200 and 249.
25[0-5] catches numbers between 250 and 255.
\d{1,2} catches any one or two digit number
you can test it in https://regexr.com/
To test it here in regexr you need to remove one backslash
(\\d --> \d)
((0|1)\d{2}|2[0-4]\d|25[0-5]|\d{1,2})
Note that \d represents digits in regular expressions, same as [0-9]

regex can't get special constructs (?=x) to work

I'm trying to get a valid regex to use in java (java.util.regex) that validates the following format:
a number that has max 15 digits, of which 3 MAX digits may be decimal which are preceeded by a separator (,)
So, valid would be:
123456789012345 (15 digits, ok)
12345678901234,1
[EDIT], these should also be valid:
1234567890123,45
123456789012,345
So far i've come up with the following regex pattern:
Pattern = [0-9]{1,15}(\,[0-9]{1,3})?
This checks for a a range of 1 to 15 digits, following by an optional separator followed by another 3 digits. However, this doesnt check the total length of the input. With this regex, a value of 123456789012345,123 would be valid.
So, i thought i'd add an extra expression that checks the total length, and use the "(?=" construct to simulate the logical AND behaviour.
So i started with adding that to my existing regex expression as follows:
Pattern = (?= [0-9]{1,15}(\,[0-9]{1,3})?)
This however results in basically everything i throw at it failing, and i cant get it to work further. I don't see what i'm doing wrong here? After this works, i'd add another expression to check total length, something like (?=.{16}) i think.

[EDIT]
Realised you wanted to accept total length of 16 if there is a ,, and also that you don't really need to use lookaround here, since you only have two cases. This works just fine:
public static boolean isValid(String input) {
return input.matches("^(\\d{0,15}|\\d{1,12},\\d{1,3})$");
}
This returns valid if one of these is true
input consists of 0-15 numbers or
input consists of 1-12 numbers, followed by a ,, followed by 1-3 numbers
[EDIT2]
Ok, new try:
public static boolean isValid(String input) {
return input.matches("^(\\d{0,15}|(?=.{3,16})\\d+,\\d{1,3})$");
}
This returns valid if one of these is true
input consists of 0-15 numbers or
input consists of 3-16 characters, consisting of at least one digit, followed by a ,, followed by 1-3 numbers

What about this one? play it on RegExr
\d{12,15},?\d{3}

this worked for me.
boolean a = Pattern.matches("\\d{15}|\\d{12},\\d{3}", "123456789012345");
System.out.println(a);//true
boolean b = Pattern.matches("\\d{15}|\\d{12},\\d{3}", "123456789012,345");
System.out.println(b);//true
boolean c = Pattern.matches("\\d{15}|\\d{12},\\d{3}", "1234567890123,456");
System.out.println(c);//false
so your regEx is:
\d{15}|\d{12},\d{3}

Try this regex:
^\d{1,12}(?:\d{0,3}|\d{0,2},\d|\d{0,1},\d\d|,\d\d\d)$

Java: Change an var's value (String) according to the value of an regex

I would like to know if it is possible (and if possible, how can i implement it) to manipulate an String value (Java) using one regex.
For example:
String content = "111122223333";
String regex = "?";
Expected result: "1111 2222 3333 ##";

With one regex only, I don't think it is possible. But you can:
first, replace (?<=(.))(?!\1) with a space;
then, use a string append to append " ##".
ie:
Pattern p = Pattern.compile("(?<=(.))(?!\\1)");
String ret = p.matcher(input).replaceAll(" ") + " ##";
If what you meant was to separate all groups, then drop the second operation.
Explanation: (?<=...) is a positive lookbehind, and (?!...) a negative lookahead. Here, you are telling that you want to find a position where there is one character behind, which is captured, and where the same character should not follow. And if so, replace with a space. Lookaheads and lookbehinds are anchors, and like all anchors (including ^, $, \A, etc), they do not consume characters, this is why it works.

OK, since the OP has redefined the problem (ie, a group of 12 digits which should be separated in 3 groups of 4, then followed by ##, the solution becomes this:
Pattern p = Pattern.compile("(?<=\\d)(?=(?:\\d{4})+$)");
String ret = p.matcher(input).replaceAll(" ") + " ##";
The regex changes quite a bit:
(?<=\d): there should be one digit behind;
(?=(?:\d{4})+$): there should be one or more groups of 4 digits afterwards, until the end of line (the (?:...) is a non capturing grouping -- not sure it really makes a difference for Java).
Validating that the input is 12 digits long can easily be done with methods which are not regex-related at all. And this validation is, in fact, necessary: unfortunately, this regex will also turn 12345 into 1 2345, but there is no way around that, for the reason that lookbehinds cannot match arbitrary length regexes... Except with the .NET languages. With them, you could have written:
(?<=^(?:\d{4})+)(?=(?:\d{4})+$

Regex split into overlapping strings

I'm exploring the power of regular expressions, so I'm just wondering if something like this is possible:
public class StringSplit {
public static void main(String args[]) {
System.out.println(
java.util.Arrays.deepToString(
"12345".split(INSERT_REGEX_HERE)
)
); // prints "[12, 23, 34, 45]"
}
}
If possible, then simply provide the regex (and preemptively some explanation on how it works).
If it's only possible in some regex flavors other than Java, then feel free to provide those as well.
If it's not possible, then please explain why.
BONUS QUESTION
Same question, but with a find() loop instead of split:
Matcher m = Pattern.compile(BONUS_REGEX).matcher("12345");
while (m.find()) {
System.out.println(m.group());
} // prints "12", "23", "34", "45"
Please note that it's not so much that I have a concrete task to accomplish one way or another, but rather I want to understand regular expressions. I don't need code that does what I want; I want regexes, if they exist, that I can use in the above code to accomplish the task (or regexes in other flavors that work with a "direct translation" of the code into another language).
And if they don't exist, I'd like a good solid explanation why.

I don't think this is possible with split(), but with find() it's pretty simple. Just use a lookahead with a capturing group inside:
Matcher m = Pattern.compile("(?=(\\d\\d)).").matcher("12345");
while (m.find())
{
System.out.println(m.group(1));
}
Many people don't realize that text captured inside a lookahead or lookbehind can be referenced after the match just like any other capture. It's especially counter-intuitive in this case, where the capture is a superset of the "whole" match.
As a matter of fact, it works even if the regex as a whole matches nothing. Remove the dot from the regex above ("(?=(\\d\\d))") and you'll get the same result. This is because, whenever a successful match consumes no characters, the regex engine automatically bumps ahead one position before trying to match again, to prevent infinite loops.
There's no split() equivalent for this technique, though, at least not in Java. Although you can split on lookarounds and other zero-width assertions, there's no way to get the same character to appear in more than one of the resulting substrings.

This somewhat heavy implementation using Matcher.find instead of split will also work, although by the time you have to code a for loop for such a trivial task you might as well drop the regular expressions altogether and use substrings (for similar coding complexity minus the CPU cycles):
import java.util.*;
import java.util.regex.*;
public class StringSplit {
public static void main(String args[]) {
ArrayList<String> result = new ArrayList<String>();
for (Matcher m = Pattern.compile("..").matcher("12345"); m.find(result.isEmpty() ? 0 : m.start() + 1); result.add(m.group()));
System.out.println( result.toString() ); // prints "[12, 23, 34, 45]"
}
}
EDIT1
match(): the reason why nobody so far has been able to concoct a regular expression like your BONUS_REGEX lies within Matcher, which will resume looking for the next group where the previous group ended (i.e. no overlap), as oposed to after where the previous group started -- that is, short of explicitly respecifying the start search position (above). A good candidate for BONUS_REGEX would have been "(.\\G.|^..)" but, unfortunately, the \G-anchor-in-the-middle trick doesn't work with Java's Match (but works just fine in Perl):
perl -e 'while ("12345"=~/(^..|.\G.)/g) { print "$1\n" }'
12
23
34
45
split(): as for INSERT_REGEX_HERE a good candidate would have been (?<=..)(?=..) (split point is the zero-width position where I have two characters to my right and two to my left), but again, because split concieves naught of overlap you end up with [12, 3, 45] (which is close, but no cigar.)
EDIT2
For fun, you can trick split() into doing what you want by first doubling non-boundary characters (here you need a reserved character value to split around):
Pattern.compile("((?<=.).(?=.))").matcher("12345").replaceAll("$1#$1").split("#")
We can be smart and eliminate the need for a reserved character by taking advantage of the fact that zero-width look-ahead assertions (unlike look-behind) can have an unbounded length; we can therefore split around all points which are an even number of characters away from the end of the doubled string (and at least two characters away from its beginning), producing the same result as above:
Pattern.compile("((?<=.).(?=.))").matcher("12345").replaceAll("$1$1").split("(?<=..)(?=(..)*$)")
Alternatively tricking match() in a similar way (but without the need for a reserved character value):
Matcher m = Pattern.compile("..").matcher(
Pattern.compile("((?<=.).(?=.))").matcher("12345").replaceAll("$1$1")
);
while (m.find()) {
System.out.println(m.group());
} // prints "12", "23", "34", "45"

Split chops a string into multiple pieces, but that doesn't allow for overlap. You'd need to use a loop to get overlapping pieces.

I don't think you can do this with split() because it throws away the part that matches the regular expression.
In Perl this works:
my $string = '12345';
my #array = ();
while ( $string =~ s/(\d(\d))/$2/ ) {
push(#array, $1);
}
print join(" ", #array);
# prints: 12 23 34 45
The find-and-replace expression says: match the first two adjacent digits and replace them in the string with just the second of the two digits.

Alternative, using plain matching with Perl. Should work anywhere where lookaheads do. And no need for loops here.
$_ = '12345';
#list = /(?=(..))./g;
print "#list";
# Output:
# 12 23 34 45
But this one, as posted earlier, is nicer if the \G trick works:
$_ = '12345';
#list = /^..|.\G./g;
print "#list";
# Output:
# 12 23 34 45
Edit: Sorry, didn't see that all of this was posted already.

Creating overlapping matches with String#split isn't possible, as the other answers have already stated. It is however possible to add a regex-replace before it to prepare the String, and then use the split to create regular pairs:
"12345".replaceAll(".(?=(.).)","$0$1")
.split("(?<=\\G..)")
The .replaceAll(".(?=(.).)","$0$1") will transform "12345" into "12233445". It basically replaces every 123 substring to 1223, then every 234 to 2334 (note that it's overlapping), etc. In other words, it'll duplicate every character, except for the first and last.
.(?=(.).) # Replace-regex:
. # A single character
(?= ) # followed by (using a positive lookahead):
. . # two more characters
( ) # of which the first is saved in capture group 1
$0$1 # Replacement-regex:
$0 # The entire match, which is the character itself since everything
# else was inside a lookahead
$1 # followed by capture group 1
After that, .split("(?<=\\G..)") will split this new String into pairs:
(?<=\G..) # Split-regex:
(?<= ) # A positive lookbehind:
\G # Matching the end of the previous match
# (or the start of the string initially)
.. # followed by two characters
Some more information about .split("(?<=\\G..)") can be found here.
Try it online.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex: How to find exact value length? - java

Related

Java regular expression for UserName with length range

Validate if input string is a number between 0-255 using regex

regex can't get special constructs (?=x) to work

Java: Change an var's value (String) according to the value of an regex

Regex split into overlapping strings

Categories

Resources