Some more regex criteria in existing regex - java

I want to add into below regex which also pass following criteria -
^[\p{L}\d'][ \p{L}\d'-]*[\p{L}\d'-']$
Should start with letter (A-Z or a-z) only.
Can accepts only single letter also.
Accept hyphen (-), Space, dot (.) in between the string or end of the string. (No other special character)
Accept numbers in between and end to the string.
Please also want to achieve existing criteria what this regex is doing.
E.g.
Expected -
t, T, test, test123, te12st, te-st, te.st, te st, éééééé, ṪỲɎɆḂɃɀȿȸȺȔȐȳɊÉâÇë, Επίθετο
Not Expected -
12test, 1, .test, -test, , tes*t (none of the special character except hyphen, dot & space),

To match the expected and not the not expected including a single letter, you could match \pL from the start of the string. Then repeat 0+ times any of the listed in [\d\pL .-] and then assert the end of the string.
Note that not all of your expected start with a-zA-Z.
^\pL[\d\pL .-]*$
In Java
String regex = "^\\pL[\\d\\pL .-]*$";
Regex demo | Java demo

^[A-Za-z]+[\p{L}\d-.\s]*$
This is a possible solution, however these test criteria conflict with your first requirement: éééééé, ṪỲɎɆḂɃɀȿȸȺȔȐȳɊÉâÇë, Επίθετο. Where it 1) accepts one or more of A-Za-z then 2) zero or more combination of letters, numbers, hyphens, space, and periods.
If you want it to also accept those three test criteria then this is a possible solution:
^[\p{L}]+[\p{L}\d-.\s]*$

Related

Java Regex to validate group field pattern example - abc.def.gh1

I am just writing some piece of java code where I need to validate groupId (maven) passed by user.
For example - com.fb.test1.
I have written regex which says string should not start and end with '.' and can have alphanumeric characters delimited by '.'
[^\.][[a-zA-Z0-9]+\\.{0,1}]*[a-zA-Z0-9]$
But this regex not able to find out consecutive '.' For example - com..fb.test. I have added {0,1} followed by decimal to restrict it limitation to 1 but it didnt work.
Any leads would be highly appreciated.
The quantifier {0,1} and the dot should not be in the character class, because you are repeating the whole character class allowing for 0 or more dots, including { , } chars.
You can also exclude a dot to the left using a negative lookbehind instead of matching an actual character that is not a dot.
In Java you could write the pattern as
(?<!\\.)[a-zA-Z0-9]+(?:\\.[a-zA-Z0-9]+)+[a-zA-Z0-9]$
Note that the $ makes sure that that match is at the end of the string.
Regex demo

Regex pattern matching with multiple strings

Forgive me. I am not familiarized much with Regex patterns.
I have created a regex pattern as below.
String regex = Pattern.quote(value) + ", [NnoneOoff0-9\\-\\+\\/]+|[NnoneOoff0-9\\-\\+\\/]+, "
+ Pattern.quote(value);
This regex pattern is failing with 2 different set of strings.
value = "207e/160";
Use Case 1 -
When channelStr = "207e/160, 149/80"
Then channelStr.matches(regex), returns "true".
Use Case 2 -
When channelStr = "207e/160, 149/80, 11"
Then channelStr.matches(regex), returns "false".
Not able to figure out why? As far I can understand it may be because of the multiple spaces involved when more than 2 strings are present with separated by comma.
Not sure what should be correct pattern I should write for more than 2 strings.
Any help will be appreciated.
If you print your pattern, it is:
\Q207e/160\E, [NnoneOoff0-9\-\+\/]+|[NnoneOoff0-9\-\+\/]+, \Q207e/160\E
It consists of an alternation | matching a mandatory comma as well on the left as on the right side.
Using matches(), should match the whole string and that is the case for 207e/160, 149/80 so that is a match.
Only for this string 207e/160, 149/80, 11 there are 2 comma's, so you do get a partial match for the first part of the string, but you don't match the whole string so matches() returns false.
See the matches in this regex demo.
To match all the values, you can use a repeating pattern:
^[NnoeOf0-9+/-]+(?:,\h*[NnoeOf0-90+/-]+)*$
^ Start of string
[NnoeOf0-9\\+/-]+
(?: Non capture group
,\h* Match a comma and optional horizontal whitespace chars
[NnoeOf0-90-9\\+/-]+ Match 1+ any of the listed in the character class
)* Close the non capture group and optionally repeat it (if there should be at least 1 comma, then the quantifier can be + instead of *)
$ End of string
Regex demo
Example using matches():
String channelStr1 = "207e/160, 149/80";
String channelStr2 = "207e/160, 149/80, 11";
String regex = "^[NnoeOf0-9+/-]+(?:,\\h*[NnoeOf0-90+/-]+)*$";
System.out.println(channelStr1.matches(regex));
System.out.println(channelStr2.matches(regex));
Output
true
true
Note that in the character class you can put - at the end not having to escape it, and the + and / also does not have to be escaped.
You can use regex101 to test your RegEx. it has a description of everything that's going on to help with debugging. They have a quick reference section bottom right that you can use to figure out what you can do with examples and stuff.
A few things, you can add literals with \, so \" for a literal double quote.
If you want the pattern to be one or more of something, you would use +. These are called quantifiers and can be applied to groups, tokens, etc. The token for a whitespace character is \s. So, one or more whitespace characters would be \s+.
It's difficult to tell exactly what you're trying to do, but hopefully pointing you to regex101 will help. If you want to provide examples of the current RegEx you have, what you want to match and then the strings you're using to test it I'll be happy to provide you with an example.
^(?:[NnoneOoff0-9\\-\\+\\/]+ *(?:, *(?!$)|$))+$
^ Start
(?: ... ) Non-capturing group that defines an item and its separator. After each item, except the last, the separator (,) must appear. Spaces (one, several, or none) can appear before and after the comma, which is specified with *. This group can appear one or more times to the end of the string, as specified by the + quantifier after the group's closing parenthesis.
Regex101 Test

Having problems with java regex

I have the following regex:
/[-A-Z]{4}\d{2}/[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}.png
Basically I want to check for strings of the basic type
ABCD12/<here_is_a_random_uuid_as_a_string>.png
The UUID (which is in UPPER CASE) checking works fine, but now let's take a look at a special case. I want to accept strings like this
--CD12/...
AB--12/...
but NOT like this:
A--D12/...
But I can not get the first part of the regex right. Basically I need to check for either two digits or two -after each other twice.
For my understanding [-A-Z]{4} means "either - or something between A - Z with a length of 4". So why doesn't my pattern work?
EDIT:
This answer was posted within the comments and it works:
(?mi)^(?:--[A-Z]{2}|[A-Z]{2}(?:--|[A-Z]{2}))\d{2}/[0-9A-F]{8}(?:-[0-9A-F]{4}){3}-[0-9A-F]{12}\.png$
Can somebody explain to me what (?mi) and what (?:...) means? The normal ? means 0 or 1 time, but what is the : for?
EDIT 2:
Just for those how might have a similar problem and do not want to read all of those regexes ;)
I slightly modified an answer to also accept patterns like ----12. The end result:
"^/(?:--[A-Z]{2}|-{4}|[A-Z]{2}(?:--|[A-Z]{2}))\\d{2}/[0-9A-F]{8}(?:-[0-9A-F]{4}){3}-[0-9A-F]{12}\\.png$"
It works like a charm.
You may use this regex for your cases:
^(?:--[A-Z]{2}|[A-Z]{2}(?:--|[A-Z]{2}))\d{2}/[0-9A-F]{8}(?:-[0-9A-F]{4}){3}-[0-9A-F]{12}\.png$
RegEx Demo
Details about first part:
^: Start
(?:: Start non-capture group
--[A-Z]{2}: Match -- followed by 2 letters
|: OR
[A-Z]{2}: Match 2 letters
(?:--|[A-Z]{2}): Match -- OR 2 letters
): End non-capture group
btw (?:...) is non-capture group.
Your [-A-Z]{4} matches any four occurrences of an uppercase ASCII letter or -, so it can also match ----, A---, ---B, -B--, etc.
You want to make sure that if there are hyphens, they come after or before two letters:
(?:[A-Z]{2}--|--[A-Z]{2}|[A-Z]{4})
It means:
(?: - start of a non-capturing group:
[A-Z]{2}-- - two uppercase ASCII letters and then --
| - or
--[A-Z]{2} - -- and then any two uppercase ASCII letters
| - or
[A-Z]{4} - any four uppercase ASCII letters
) - end of the non-capturing group.
The full pattern:
(?:[A-Z]{2}--|--[A-Z]{2}|[A-Z]{4})\d{2}/[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}\.png
To force the entire string match, add ^ (start of string) and $ (end of string) anchors:
^(?:[A-Z]{2}--|--[A-Z]{2}|[A-Z]{4})\d{2}/[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}\.png$
See the regex demo
Note the . matches any char, to match a literal dot, you should escape it.

RegEx of underscore delimited string

I have a string with 5 pieces of data delimited by underscores:
AAA_BBB_CCC_DDD_EEE
I want a different regex for each component.
The regex needs to return just the one component.
For example, the first would return just AAA, the second for BBB, etc.
I am able to parse out AAA with the following:
^([^_]*)?
I see that I can do a look-around like this to find:
(?<=[^_]*_).*
BBB_CCC_DDD_EEE
But the following can not find just BBB
(?<=[^_]*_)[^_]*(?=_)
Mixing lookbehind and lookahead
^([^_]+)? // 1st
(?<=_)[^_]+ // 2nd
(?<=_)[^_]+(?=_[^_]+_[^_]+$) // 3rd
(?<=_)[^_]+(?=_[^_]+$) // 4th
[^_]+$ // 5th
Just if the lengths of the strings beetween the "_" are known it can be like this
1st match
^([^_]+)?
2nd match
(?<=_)\K[^_]+
3rd match
(?<=_[A-Za-z]{3}_)\K[^_]+
4th match
(?<=_[A-Za-z]{3}_[A-Za-z]{3}_)\K[^_]+
5th match
(?<=_[A-Za-z]{3}_[A-Za-z]{3}_[A-Za-z]{3}_)\K[^_]+
each {3} is expressing the length of the string beetween "_"
If your string is always uses underscores, you might use 1 regex to capture your values in a capturing group by repeating the pattern of what is before (in this case NOT an underscore followed by an underscore) using a quantifier which you can change like {3}.
This way you can specify using the quantifier how many times you want to repeat the pattern before and then capture your match. For your example string AAA_BBB_CCC_DDD_EEE you could use {0}, {1},{2},{3} or {4}
^(?:[^_\n]+_){3}([0-9A-Za-z]+)(?:_[^_\n]+)*$
That would match:
^ Assert position at start of the line
(?:[^_\n]+_){3} In a non capturing group (?:, match NOT and underscore or a new line one or more times [^_\n]+ followed by an underscore and repeat that n times (In this example n is 3 times)
([0-9A-Za-z]+) Capture your characters in a group using for example a character class (or use [^_]+ to match not an underscore but that will also match any white space characters)
(?:_[^_\n]+)* Following after your captured values, repeat in a non capturing group matching an underscore, NOT and underscore or a new line one or more times and repeat that pattern zero or more times to get a full match
$ Assert position at the end of the line

Regex-How to prevent repeated special characters?

I don't have an experience on Regular Expressions. I need to a regular expression which doesn't allow to repeat of special characters (+-*/& etc.)
The string can contain digits, alphanumerics, and special characters.
This should be valid : abc,df
This should be invalid : abc-,df
i will be really appreciated if you can help me ! Thanks for advance.
Two solutions presented so far match a string that is not allowed.
But the tilte is How to prevent..., so I assume that the regex
should match the allowed string. It means that the regex should:
match the whole string if it does not contain 2
consecutive special characters,
not match otherwise.
You can achieve this putting together the following parts:
^ - start of string anchor,
(?!.*[...]{2}) - a negative lookahead for 2 consecutive special
characters (marked here as ...), in any place,
a regex matching the whole (non-empty) string,
$ - end of string anchor.
So the whole regex should be:
^(?!.*[!##$%^&*()\-_+={}[\]|\\;:'",<.>\/?]{2}).+$
Note that within a char class (between [ and ]) a backslash
escaping the following char should be placed before - (if in
the middle of the sequence), closing square bracket,
a backslash itself and / (regex terminator).
Or if you want to apply the regex to individual words (not the whole
string), then the regex should be:
\b(?!\S*[!##$%^&*()\-_+={}[\]|\\;:'",<.>\/?]{2})\S+
[\,\+\-\*\/\&]{2,} Add more characters in the square bracket if you want.
Demo https://regex101.com/r/CBrldL/2
Use the following regex to match the invalid string.
[^A-Za-z0-9]{2,}
[^\w!\s]{2,} This would be a shortest version to match any two consecutive special characters (ignoring space)
If you want to consider space, please use [^\w]{2,}

Categories

Resources