Alphanumeric and Special characters RegEx 8-32 digit

Alphanumeric and Special characters RegEx 8-32 digit - java

[a-zA-Z0-9\#\#\$\%\&\*\(\)\-\_\+\]\[\'\;\:\?\.\,\!\^]+$
The output that is valid is : reahb543)(*&&!##$%^kshABmhbahdxb!#$##%6813741646
This is the expression I have. But I need the value to be 8 to 32 digits only.
So a valid string would be:
8 to 32 characters long
including digits,alphabets,special characters

Description
There are a couple things I would change in your expression:
all the escaping of characters in the character class is unnecessary
move the dash inside the character class to the end, as this character does have a special meaning inside the character class and needs to be at the end or beginning of the class
add a look ahead to force the required number of digits in the string
add a start string anchor to require the prevent the string from matching longer strings which may contain more digits then allowed
This expression will:
require the string to contain between 8 and 32 digits, any more or less will not be allowed
allow any number of characters from your character set (providing the other rules here are also true)
allow numbers to appear as the first or last character of the string
^(?=(?:\D*?\d){8,32}(?!.*?\d))[a-zA-Z0-9#\#$%&*()_+\]\[';:?.,!^-]+$
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
(?: group, but do not capture (between 8 and
32 times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\D*? non-digits (all but 0-9) (0 or more
times (matching the least amount
possible))
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
){8,32} end of grouping
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more
times (matching the least amount
possible))
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[a-zA-Z0- any character of: 'a' to 'z', 'A' to 'Z',
9#\#$%&*()_+\]\[';:? '0' to '9', '#', '\#', '$', '%', '&', '*',
.,!^-]+ '(', ')', '_', '+', '\]', '\[', ''', ';',
':', '?', '.', ',', '!', '^', '-' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
Example
Live Demo
Samples
reahb)(*&&!##$%^kshABmhbahdxb!#$##%1234567 = bad
reahb)(*&&!##$%^kshABmhbahdxb!#$##%12345678 = good
1234reahb)(*&&!##$%^kshABmhbahdxb!#$##%5678 = good
1234reahb)(*&&!##$%^kshABmhbahdxb!#$##%5678901234567890123456789012 = good
1234reahb)(*&&!##$%^kshABmhbahdxb!#$##%56789012345678901234567890123 = bad
reahb)(*&&!#12345678901234567890123456789012#$%^kshABmhbahdxb!#$##% = good
reahb)(*&&!#123456789012345678901234567890123#$%^kshABmhbahdxb!#$##% = bad
Or
If you're looking to allow only 8-32 characters of any type from your character class then will work:
^[a-zA-Z0-9#\#$%&*()_+\]\[';:?.,!^-]{8,32}$

Related

Java String replace all using regex with lookahead

I am trying to get a normalized URI from the incoming HTTP Request to print in the logs. This will help us to compute stats & other data by this normalized URI.
To normalize, I'm trying to do String replace using regex on the requestURI with x for all numeric & alphanumeric strings except version (eg. v1):
String str = "/v1/profile/abc13abc/13abc/cDe12/abc-bla/text_tw/HELLO/test/random/2234";
str.replaceAll("/([a-zA-Z]*[\\d|\\-|_]+[a-zA-Z]*)|([0-9]+)","/x");
This results in
/x/profile/x/x/x/x/x/HELLO/test/random/x
I want to get the result as (do not replace v1)
/v1/profile/x/x/x/x/x/HELLO/test/random/x
I tried using skip look ahead
String.replaceAll("/(?!v1)([a-zA-Z]*[\d|\-|_]+[a-zA-Z]*)|([0-9]+)","/x");
But not helping. Any clue is appreciated.
Thanks

Use
/(?:(?!v[1-4])[a-zA-Z]*[0-9_-]+[a-zA-Z]*|[0-9]+)
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
v 'v'
--------------------------------------------------------------------------------
[1-4] any character of: '1' to '4'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[a-zA-Z]* any character of: 'a' to 'z', 'A' to 'Z'
(0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
[0-9_-]+ any character of: '0' to '9', '_', '-'
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
[a-zA-Z]* any character of: 'a' to 'z', 'A' to 'Z'
(0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of grouping

With the added explanation, here is how I would approach it.
create a list of the elements spliting on / starting with the second one.
initialize a string builder with the first element.
then simply iterate over a sublist starting with the second element. Use String.matches to determine whether to replace with an x.
List<String> pathElements = Arrays.asList(str.substring(1).split("/"));
StringBuilder sb = new StringBuilder("/" + pathElements.get(0));
for(String pe : pathElements.subList(1,pathElements.size())) {
sb.append("/").append(pe.matches(".*[\\d-_].*") ? "x" : pe);
}
System.out.println(sb);
prints
/v1/profile/x/x/x/x/x/HELLO/test/random/x

Rather than using one large regular expression that will be quite difficult for people to understand and maintain in the future (including yourself, probably), I would opt for using a few lines, which make your logic more apparent:
List<String> parts = Arrays.asList(path.split("/"));
parts.replaceAll(
p -> !p.matches("v\\d+") && p.matches(".*[-_\\d].*") ? "x" : p);
path = String.join("/", parts);

How do I get a regex expression to contain only uppercase letters or numbers?

Regex expression: [A-Z]([^0-9]|[^A-Z])+[A-Z]
The requirements are that the string should start and end with a capital letter A-Z, and contain at least one number in between. It should not have anything else besides capital letters on the inside. However, it's accepting spaces and punctuation too.
My expression fails the following test case A65AJ3L 3F,D due to the comma and whitespace.
Why does this happen when I explicitly said only numbers and uppercase letters can be in the string?

Starting the character class with [^ makes is a negated character class.
Using ([^0-9]|[^A-Z])+ matches any char except a digit (but does match A-Z), or any char except A-Z (but does match a digit).
This way it can match any character.
If you would turn it into [A-Z]([0-9]|[A-Z])+[A-Z] it still does not make it mandatory to match at least a single digit on the inside due to the alternation | and it can still match AAA for example.
You might use:
^[A-Z]+[0-9][A-Z0-9]*[A-Z]$
The pattern matches:
^ Start of string
[A-Z]+ Match 1+ times A-Z
[0-9] Match a single digit
[A-Z0-9]* Optionally match either A-Z or 0-9
[A-Z] Match a single char A-Z
$ End of string
Regex demo

Use
^(?=\D*\d\D*$)[A-Z][A-Z\d]*[A-Z]$
See regex proof.
(?=\D*\d\D*$) requires only one digit in the string, no more no less.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\D* non-digits (all but 0-9) (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
\D* non-digits (all but 0-9) (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
--------------------------------------------------------------------------------
[A-Z\d]* any character of: 'A' to 'Z', digits (0-9)
(0 or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Java/Groovy regex parse Key-Value pairs without delimiters

I have trouble fetching Key Value pairs with my regex
Code so far:
String raw = '''
MA1
 D. Mueller Gießer MA2 Peter Mustermann 2. Mann  MA3 Ulrike Mastorius Schmelzer MA4 Heiner Becker s 3.Mann
 MA5 Rudolf Peters Gießer '''
Map map = [:] ArrayList<String> split = raw.findAll("(MA\\d)+(.*)"){ full, name, value -> map[name] = value }  println map
Output is:
[MA1:, MA2: Peter, MA3: Ulrike Mastorius Schmelzer, MA4: Heiner Becker, MA5: Rudolf Peters]
In my case the keys are:
MA1, MA2, MA3, MA\d (so MA with any 1 digit Number)
The value is absolutely everything until the next key comes up (including line breaks, tab, spaces etc...)
Does anybody have a clue how to do this?
Thanks in advance,
Sebastian

You can capture in the second group all that follows after the key and all the lines that do not start with the key
^(MA\d+)(.*(?:\R(?!MA\d).*)*)
The pattern matches
^ Start of string
(MA\d+) Capture group 1 matching MA and 1+ digits
( Capture group 2
.* Match the rest of the line
(?:\R(?!MA\d).*)* Match all lines that do not start with MA followed by a digit, where \R matches any unicode newline sequence
) Close group 2
Regex demo
In Java with the doubled escaped backslashes
final String regex = "^(MA\\d+)(.*(?:\\R(?!MA\\d).*)*)";

Use
(?ms)^(MA\d+)(.*?)(?=\nMA\d|\z)
See proof.
Explanation
EXPLANATION
--------------------------------------------------------------------------------
(?ms) set flags for this block (with ^ and $
matching start and end of line) (with .
matching \n) (case-sensitive) (matching
whitespace and # normally)
--------------------------------------------------------------------------------
^ the beginning of a "line"
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
MA 'MA'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
.*? any character (0 or more times (matching
the least amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
MA 'MA'
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\z the end of the string
--------------------------------------------------------------------------------
) end of look-ahead

Minimum & Maximum character amounts Regex

I'm still new to regex, and I'm trying to create a regex for verifying ids for an app I am creating.
The id constraints are as follows -
Can only begin with either A-Z, a-z, ,, ', -.
Can contain all of the above and also ., just not at the beginning.
Must have at least two A-Z | a-z characters
And characters can only appear once. (,, shouldn't match, only ,)
EDIT: I was unclear about the fourth point, it should only disallow consecutive symbols, but not consecutive letters.
So far all I have is
^(A-Za-z',-)(A-Za-z',-\\.)+$ // I'm using java hence the reason for the `\\.`
I don't know how to match a specific amount of things within my regex. I would imagine it is something simple, but any help would be very useful.
I'm very new to regex and I'm really lost as to how to do this.
Edit: final regex is as follows
^(?=.*[A-Za-z].*[A-Za-z].*)(?!.*(,|'|\-|\.)\1.*)[A-Za-z,'\-][A-Za-z,'\-\.]*
Thanks to Ro Yo Mi and RebelWitoutAPulse!

Description
^(?!\.)(?=(?:.*?[A-Za-z]){2})(?:([a-zA-Z,'.-])(?!.*?\1))+$
This regular expression will do the following:
(?!\.)
validates the string does not start with a .
(?=(?:.*?[A-Za-z]){2})
validates the string has at least two A-Z | a-z characters
(?:([a-zA-Z,'.-])(?!.*?\1))+
allows the string to only contain a-z, A-Z, ,, ., -
Allows characters to only appear once. (,, shouldn't match, only ,)
Example
Live Demo
https://regex101.com/r/hO2mU1/1
Sample text
-abced
aabdefsa
abcdefs
.abded
ac.dC
ab
a.b
Sample Matches
-abced
abcdefs
ac.dC
ab
a.b
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
(?: group, but do not capture (2 times):
----------------------------------------------------------------------
.*? any character (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
[A-Za-z] any character of: 'A' to 'Z', 'a' to
'z'
----------------------------------------------------------------------
){2} end of grouping
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?: group, but do not capture (1 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[a-zA-Z,'.-] any character of: 'a' to 'z', 'A' to
'Z', ',', ''', '.', '-'
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.*? any character (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
\1 what was matched by capture \1
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
)+ end of grouping
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------

You could use the positive/negative lookahead. Rough explanation of this technique is that when regex processor encounters it - it suspends further regex processing until subregex defined inside the lookahead is matched.
The regex might be:
^(?=.*[A-Za-z].*[A-Za-z].*)(?!.*(.)\1.*)[A-Za-z,'\-][A-Za-z,'\-\.]*
Explanation:
^ - beginning of the string
(?=.*[A-Za-z].*[A-Za-z].*) - continue matching only if string has any amount of any characters, followed by something from a-Z, then again any amount of any character, then again a-Z, then anything. This effectively covers point 3.
(?!.*(.)\1.*) - stop matching if there are duplicate consequitive characters in the string. It checks for anything, then remembers a character using a capture group and checks the remainder for the string for occurence of character from capture group. This covers point 4.
Note: if point 4 meant that every character in the string should be unique, then you may add .* between (.) and \1.
Note: if this matches - the regex processing "caret" is back at the beginning of the string.
[A-Za-z,'\-] - the "real" matching begins. Character class matches your requirement from point 1.
[A-Za-z,'\-\.]* - any amount of characters mentioned in point 1 and point 4
Not sure about java regex specifics - quick google search found that this might be possible. Synthetic test works:
Astring # match
,string # match
.string # does not match
a.- # does not match: there are no two characters from [a-Z]
doesnotmatch # does not match: double non-consequitive occurrence of 't'
P.S. The regex may be optimised quite a lot if one were to use the defined character classes instead of a . - but this would add quite a lot of visual clutter to the answer.

Java - regular expression for get number format

I have this:
110121 NATURAL 95 1570,40
110121 NATURAL 95 1570,40*
41,110 1 x 38,20 CZK)[A] *
' 31,831 261,791 1308,61)
>01572 PRAVO SO 17,00
1,000 ks x 17,00
1570,40
Every line of this output is saved in List and I want to get number 1570,40
My regular expressions looks like this for this type of format
"([1-9][0-9]*[\\.|,][0-9]{2})[^\\.\\d](.*)"
"^([1-9][0-9]*[\\.|,][0-9]{2})$"
I have a problem that 1570,40 at the last line if founded (by second regular expression), also 1570,40 (from line with 1570,40* at the end) but the first line is not founded.. do you know where is the problem?

Not sure I well understand your needs, but I think you could use word boundaries like:
\b([1-9]\d*[.,]\d{2})\b
In order to not match dates, you can use:
(?:^|[^.,\d])(\d+[,.]\d\d)(?:[^.,\d]|$)
explanation:
The regular expression:
(?-imsx:(?:^|[^.,\d])(\d+[,.]\d\d)(?:[^.,\d]|$))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
[^.,\d] any character except: '.', ',', digits
(0-9)
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
[,.] any character of: ',', '.'
----------------------------------------------------------------------
\d digits (0-9)
----------------------------------------------------------------------
\d digits (0-9)
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[^.,\d] any character except: '.', ',', digits
(0-9)
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
$ before an optional \n, and the end of
the string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

The "([1-9][0-9]*[\\.|,][0-9]{2})[^\\.\\d](.*)" has [^\\.\\d], it means it expects one non-digit, non-dot symbol right after the number. The second line has * which matches it. First line has the number at the end of line, so nothing matches. I think you need just one regexp which will catch all numbers: [^.\\d]*([1-9][0-9]*[.,][0-9]{2})[^.\\d]*. Also, you should use find instead of match to find any substring in a string instead of matching the whole string. Also, maybe it has a sense to find all matches in case if a line has two such numbers in it, not sure if it is a case for you or not.
Also, use either [0-9] or \d. At the moment it is confusing - it means the same, but looks differently.

Try this:
String s = "41,110 1 x 38,20 CZK)[A] * ";
Matcher m = Pattern.compile("\\d+,\\d+").matcher(s);
while(m.find()) {
System.out.println(m.group());
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Alphanumeric and Special characters RegEx 8-32 digit - java

Related

Java String replace all using regex with lookahead

How do I get a regex expression to contain only uppercase letters or numbers?

Java/Groovy regex parse Key-Value pairs without delimiters

Minimum & Maximum character amounts Regex

Java - regular expression for get number format

Categories

Resources