I am working on a regular expression which requires the following pattern with and without spaces
Comma separated list of alphanumeric values : DG1, D3 OR R4,UI2
Comma separated list of alphanumeric and numeric values : D1, 2 OR D1,2
Range of alphanumeric values : DG1 - DG5 OR DG1-DG5
Range of alphanumeric and numeric values : DG1 - 8 OR DG1-8 OR 8-DG11 OR 8 - DG13
A combination of range and comma separated values : DG1, DG3-DG7 OR DG1,DG3-DG6
A combination of range and comma separated alphanumeric and numeric values : DG1, 3-DG7 OR 1,DG3-6
No other special character should be allowed other than comma and hyphen
2 special characters cannot come together
Cannot start or end with a special character
Invalid values may be
,1,DG1
-DG1-5
DG1 - 3 - GP9
1,F4,
RE3-
1,-G3
5,S3,-9
I'd use:
^([A-Z0-9]+(?:-[A-Z0-9]+)?)(?:,(?1))*$
Explanation:
^ : Start of string
( : begin group 1
[A-Z0-9]+ : 1 or more alphanum
(?: : begin NON capture group
- : literally -
[A-Z0-9]+ : 1 or more alphanum
)? : end group, optional
) : end group 1
(?: : begin NON capture group
, : literally ,
(?1) : repeat regex in group 1
)? : end group,optional
$
If (?1) doesn't work, have to duplicate the relevant part:
^[A-Z0-9]+(?:-[A-Z0-9]+)?(?:,[A-Z0-9]+(?:-[A-Z0-9]+)?)*$
If you want to deal with optional spaces:
^\s*[A-Z0-9]+(?:\s*-\s*[A-Z0-9]+)?(?:\s*,\s*[A-Z0-9]+(?:\s*-\s*[A-Z0-9]+)?)*\s*$
Related
I have multiple values in a Rate Column
Value is as below:
20
10
invalidtext
<blank>
NA
n/a
#NA
I have tried using the below pattern
"^0-9" // If input value is not a numeric value (0-9) then replace it with null
But this will replace all text values if any in the column with null,
And I want to replace only values which are having values as na, n/a, #na (Values can be Caps or small) with null using regular expression and keep value "invalidtext" as it is.
Any help on how to set such a specific single pattern check? Because I don't want to use multiple replace statements.
Assuming the text is a multiline text, you may use
s = s.replaceAll("(?mi)^(?!invalidtext$)[^\\d\r\n]*n/?a[^\\d\r\n]*$", "");
See the regex demo
Details
(?mi) - MULTILINE and IGNORECASE flags are on
^ - start of a line
(?!invalidtext$) - the line cannot be equal to invalidtext
[^\d\r\n]* - 0+ chars other than digits, CR and LFs
n/?a - n/a or na
[^\d\r\n]* - 0+ chars other than digits, CR and LFs
$ - end of a line.
I need to replace string 'name' with fullName in the following kind of strings:
software : (publisher:abc and name:oracle)
This needs to be replaced as:
software : (publisher:abc and fullName:xyz)
Now, basically, part "name:xyz" can come anywhere inside parenthesis. e.g.
software:(name:xyz)
I am trying to use groups and the regex I built looks :
(\bsoftware\s*?:\s*?\()((.*?)(\s*?(and|or)\s*?))(\bname:.*?\)\s|:.*?\)$)
You may use
\b(software\s*:\s*\([^()]*)\bname:\w+
and replace with $1fullName:xyz. See the regex demo and the regex graph:
Details
\b - word boundary
(software\s*:\s*\([^()]*) - Capturing group 1 ($1 in the replacement pattern is a placeholder for the value captured in this group):
software - a word
\s*:\s* - a : enclosed with 0+ whitespaces
\( - a ( char
[^()]* - 0 or more chars other than ( and )
\bname - whole word name
: - colon
\w+ - 1 or more letters, digits or underscores.
Java sample code:
String result = s.replaceAll("\\b(software\\s*:\\s*\\([^()]*)\\bname:\\w+", "$1fullName:xyz");
I'm reading in a file for a Java application which has data separated by colons in the format:
test : test : 0 : 0
Where the first two segments are names of something and the last two are digits.
The match should fail if the input is not formatted in that exact way above (aside from the data being different)
test : test : 0 : 0 -----> pass
: test: 0 : 0 -----> fail
0 : test : 0 : test -----> fail
test test : 0 : 0 -----> fail
So the match will fail if there are any segments omitted, if the digits and words do not appear where they should, i.e. word : word : digit : digit, and there has to be 3 colons and 4 segments no more no less as above.
This is where I have gotten so far but it's not quite right:
^\D+(?:\s\:\s\w+)*$
You may use a regex like
^[a-zA-Z]+\s*:\s*[a-zA-Z]+(?:\s*:\s*\d+){2}$
Details
^ - start of string (implicit in String#matches)
[a-zA-Z]+ - 1+ ASCII letters
\s*:\s* - a : enclosed with 0+ whitespaces
[a-zA-Z]+ - 1+ ASCII letters
(?:\s*:\s*\d+){2} - two occurrences of : enclosed with 0+ whitespaces and then 1+ digits
$ - end of string (implicit in String#matches)
NOTE: If there must be an obligatory single space between the items, you need to replace \s* with \s. To match 1 or more whitespaces, \s* must be turned into \s+.
In Java, you may write it as
s.matches("[a-zA-Z]+\\s*:\\s*[a-zA-Z]+(?:\\s*:\\s*\\d+){2}")
See the regex demo
Here you go (demo at Regex101):
[a-zA-Z]+\s+:\s+[a-zA-Z]+\s+:\s+\d+\s+:\s+\d+
Explanation:
[a-zA-Z]+ stands for 1 or more letters (+ is the modifiers allowing to match the previous statement at least once
\s+ stands for 1 or more
: is the : character, literally
\d+ stands for at least one digit (remove the + to match one digit exactly)
Finally, compose those parts according to your needs. You might want to make the Regex make stricter replacing the \s+ with only one empty space .
Validate the String using the method String::matches (don't forget to use two slashes \\):
boolean isValid = string.matches("[a-zA-Z]+\\s+:\\s+[a-zA-Z]+\\s+:\\s+\\d+\\s+:\\s+\\d+");
I would just use String#matches on each line, with the following pattern:
[a-z]+ : [a-z]+ : [0-9]+ : [0-9]+
For example:
String line = "test : test : 0 : 0";
if (line.matches("[a-z]+ : [a-z]+ : [0-9]+ : [0-9]+")) {
System.out.println("Found a match");
}
I am creating regex for two comma separated values (example - coordinates), i am using regex like below -
^(\-?\d+(\.\d+)?),\s*(\-?\d+(\.\d+)?)$
The above regex mandates two comma separated values, but i want the second value as optional including comma, so i tried changing the regex like this -
^(\-?\d+(\.\d+)?)(,\s*(\-?\d+(\.\d+)?)$)?
This is working but and keeping the second value optional, but it is also allowing comma without any second value like below -
3456,
What can be added in the regex to not allowing comma if second value is not present ? Thanks.
You misplaced the quantifier with the anchor.
Use
^(-?\d+(\.\d+)?)(,\s*(-?\d+(\.\d+)?))?$
^^
See the regex demo.
You may adjust the number of capturing groups in your pattern and convert the optional group into non-capturing by adding ?:after the opening (. I'd use it like
^(-?\d+(?:\.\d+)?)(?:,\s*(-?\d+(?:\.\d+)?))?$
See another demo.
Also note you do not need to escape a hyphen outside a character class.
When using it in Java, do not forget to use double backslashes to define a literal backslash in the string literal and omit ^ and $ if you use the pattern with .matches() method:
s.matches("-?\\d+(?:\\.\\d+)?(?:,\\s*-?\\d+(?:\\.\\d+)?)?")
Details:
^ - start of string anchor
(-?\d+(\.\d+)?) - Group 1 matching an optional hyphen, 1+ digits, then an optional sequence (Group 2) of a dot followed with one or more digits
(,\s*(-?\d+(\.\d+)?))? - an optional sequence (Group 3) matching one or zero occurrences of:
, - comma
\s* - zero or more whitespaces
(-?\d+(\.\d+)?) - Group 4 matching
-? - an optional hyphen
\d+ - one or more digits
(\.\d+)? - Group 5 matching an optional sequence of a dot followed with 1 or more digits
$ - end of string
I am trying to parse SQL statements with regex and save it's parameters to use later.
Lets say I have this SQL statement:
INSERT INTO tablename (id, name, email) VALUES (#id, #name, #email)
The following regex will work just fine:
(#[0-9a-zA-Z$_]+)
However in this statement I should ignore everything in ' ' or " " and save only first parameter:
UPDATE mytable SET id = #id, name = 'myname#id' WHERE id = 1;
According to this answer https://stackoverflow.com/a/307957 "it's not practical to do it in a single regular expression", but I am still trying to do this.
I tried to add Regex Lookahead and Lookbehind, but its not working:
(?<!\').*(#[0-9a-zA-Z$_]+).*(?!\')
Is there any way to do it using only one regular expression? Should I use lookahead/lookbehind or something else?
You can use: [\=\(\s]\s*\#[0-9+^a-zA-Z_0-9$_]+\s*[\),]
Explanation:
[\=\(\s] match a single character present in the list below
\= matches the character = literally
\( matches the character ( literally
\s match any white space character [\r\n\t\f ]
\s* match any white space character [\r\n\t\f ]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\# matches the character # literally
[0-9+^a-zA-Z_0-9$_]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
0-9 a single character in the range between 0 and 9
+^ a single character in the list +^ literally
a-z a single character in the range between a and z (case insensitive)
A-Z a single character in the range between A and Z (case insensitive)
_ the literal character _
0-9 a single character in the range between 0 and 9
$_ a single character in the list $_ literally
\s* match any white space character [\r\n\t\f ]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
[\),] match a single character present in the list below
\) matches the character ) literally
, the literal character ,
You can simplify your regex. Note the group you want always to capture is followed with , or ). Being aware of this fact you get this regex:
(#[0-9a-zA-Z$_]+)(?=[,)])
#[0-9a-zA-Z$_]+ is your value
(?=[,)]) checks if the ) or , character follows.
If the way describing where your string can't be placed is complicated, better look where it must be places instead.
See how it works at Regex101.