Java - Regular expression to match specific text

Java - Regular expression to match specific text - java

I have multiple values in a Rate Column
Value is as below:
20
10
invalidtext
<blank>
NA
n/a
#NA
I have tried using the below pattern
"^0-9" // If input value is not a numeric value (0-9) then replace it with null
But this will replace all text values if any in the column with null,
And I want to replace only values which are having values as na, n/a, #na (Values can be Caps or small) with null using regular expression and keep value "invalidtext" as it is.
Any help on how to set such a specific single pattern check? Because I don't want to use multiple replace statements.

Assuming the text is a multiline text, you may use
s = s.replaceAll("(?mi)^(?!invalidtext$)[^\\d\r\n]*n/?a[^\\d\r\n]*$", "");
See the regex demo
Details
(?mi) - MULTILINE and IGNORECASE flags are on
^ - start of a line
(?!invalidtext$) - the line cannot be equal to invalidtext
[^\d\r\n]* - 0+ chars other than digits, CR and LFs
n/?a - n/a or na
[^\d\r\n]* - 0+ chars other than digits, CR and LFs
$ - end of a line.

Related

Allow only left aligned zeros using regex

I am fairly new to using regex. I have a serial number which can take the following forms: VV-XXXXXX-P or VVXXXXXXP
If the hyphen variant is used, then the number of 'X' can be variable. For example 01-162-8 is equivalent to 010001628.
In order to identify the 2 formats, I have created the following regex's:
String HYPHENS = ([0]{1}[13]{1}-[1-9]{1,6}-[0-9]{1})
String NO_HYPHENS = ([0]{1}[13]{1}[0-9]{6}[0-9]{1})
However the issue with the NO_HYPHENS variant is that it allows 0 anywhere in the sequence
For example: 010010628 should not be a valid sequence because there's a non leading 0.
Additionally, how would I create a regex that I can use to replace all 0 from the sequence but the first one? I have tried the following but it also replaces the first 0.
String code = 010001234;
code = code.replaceAll("0+", "");
How could I modify the regex's to achieve this?

You can use
String NO_HYPHENS = "0[13](?!0*[1-9]0)[0-9]{6}[0-9]";
code = code.replaceAll("(?!^)0(?!$)", "");
See the regex demo.
The 0[13](?!0*[1-9]0)[0-9]{6}[0-9] regex matches:
0 - zero
[13] - one or three
(?!0*[1-9]0) - any zeros followed with a non-zero digit and then a zero is not allowed at this location
[0-9]{6} - six digits
[0-9] - a digit.
I understand you use it in Java with .matches, else, add ^ at the start and $ at the end.
Also, the (?!^)0(?!$) regex will match any zero that is not at the string start/end position.

^0[13]0*[1-9]*[0-9]$
^ - beginning of string
0 - first sign must be zero
[13] - second sign must be one or three
0* - sequence of zeros of variable length
[1-9] - sequence of non-zeros of variable length
[0-9] - finally one digit (it can be replaced with \d also)
$ - end of string
This regex has one problem: it doesn't check how many digits are in the XXXXXX section of serial number. But you can check it with length function:
String code = "010000230";
if (code.matches("^0[13]0*[1-9]*[0-9]$") && code.length() == 9) {
// code is valid
}
// replacement
code = code.replaceAll("^(0[13])0*([1-9]*[0-9])$", "$1$2");
Explanation of the replacement:
(0[13]) group number 1 (groups are in bracket)
0* some zeros
([1-9]*[0-9]) group number 2
This will be replaced with:
$1$2 group number 1 and group number 2 ($1 means group number 1)

REGEX greediness or just wrong syntax

I tried to delete all the [.!?] from quotes in a text and doing so , I want first to catch all my quotes including [.!?] with a regex to delete them after that.
My regex doesn't work, maybe because it's greedy. It takes from my "«" (character at index 569) to the last character which is another "»" (character at index 2730).
My regex was:
Pattern full=Pattern.compile("«.*[.!?].*?»");
Matcher mFull = full.matcher(result);
while(mFull.find()){
System.out.println(mFull.start()+" "+mFull.end());
}
So I got:
569 2731
Also , Same problem of greediness , with catching sentences ( beginning with any [A-Z] and ending with any [.!?].

You may use
s = s.replaceAll("(\\G(?!^)|«)([^«».!?]*)[.!?](?=[^«»]*»)", "$1$2");
See the regex demo
Details
(\G(?!^)|«) - Group 1 (whose value is referred to with $1 from the replacement pattern): either the end of the previous match or «
([^«».!?]*) - Group 2 ($2): any 0+ chars other than «, », !, . and ?
[.!?] - any of the three symbols
(?=[^«»]*») - there must be a » after 0 or more chars other than « and » immediately to the right of the current location.

Regular expression for both length with whitespaces

I am trying to write a regular expression with following conditions.
Allow empty at any position in string.
First three are characters-range (1-3)
Next six are numeric (must) -range (6)
Next optional to have characters - range (1-3)
After that optional to have numeric - range(0-2)
For this i tried lot of things nothing works.
^[a-zA-Z]{1,3}[0-9]{6}[a-zA-Z]{0,3}[0-9]{0,2}
This expression works fine for matching all criteria but it is not allowing empty strings. Thanks in advance.
I just want to validate the string like "AB 123456 ADF 12".
As i mentioned first point the string contains empty space at any position in given string like "AB 123 456 ADF 12".

You have to wrap your pattern in parentheses and make it optional using ?:
^(?:[a-zA-Z]{1,3}[0-9]{6}[a-zA-Z]{0,3}[0-9]{0,2})?$
^ Assert beginning of string
(?: Start of non-capturing group
[a-zA-Z]{1,3}[0-9]{6}[a-zA-Z]{0,3}[0-9]{0,2} Your pattern
)? End of NCG, optional
$ Assert end of string
If you want to match strings with whitespace characters add \\s (or \s treating literal) and remove ?:
^(?:[a-zA-Z]{1,3}[0-9]{6}[a-zA-Z]{0,3}[0-9]{0,2}|\s*)$
^^^^
Live demo
Update
Based on comment:
^(?:[a-zA-Z](?:\s*[a-zA-Z]){0,2}\s*\d(?:\s*\d){5}(?:\s*[a-zA-Z](?:\s*[a-zA-Z]){0,2})?\s*(?:\d\s*\d?)?)$
Live demo

Regex for only two comma separated values, keeping second value optional

I am creating regex for two comma separated values (example - coordinates), i am using regex like below -
^(\-?\d+(\.\d+)?),\s*(\-?\d+(\.\d+)?)$
The above regex mandates two comma separated values, but i want the second value as optional including comma, so i tried changing the regex like this -
^(\-?\d+(\.\d+)?)(,\s*(\-?\d+(\.\d+)?)$)?
This is working but and keeping the second value optional, but it is also allowing comma without any second value like below -
3456,
What can be added in the regex to not allowing comma if second value is not present ? Thanks.

You misplaced the quantifier with the anchor.
Use
^(-?\d+(\.\d+)?)(,\s*(-?\d+(\.\d+)?))?$
^^
See the regex demo.
You may adjust the number of capturing groups in your pattern and convert the optional group into non-capturing by adding ?:after the opening (. I'd use it like
^(-?\d+(?:\.\d+)?)(?:,\s*(-?\d+(?:\.\d+)?))?$
See another demo.
Also note you do not need to escape a hyphen outside a character class.
When using it in Java, do not forget to use double backslashes to define a literal backslash in the string literal and omit ^ and $ if you use the pattern with .matches() method:
s.matches("-?\\d+(?:\\.\\d+)?(?:,\\s*-?\\d+(?:\\.\\d+)?)?")
Details:
^ - start of string anchor
(-?\d+(\.\d+)?) - Group 1 matching an optional hyphen, 1+ digits, then an optional sequence (Group 2) of a dot followed with one or more digits
(,\s*(-?\d+(\.\d+)?))? - an optional sequence (Group 3) matching one or zero occurrences of:
, - comma
\s* - zero or more whitespaces
(-?\d+(\.\d+)?) - Group 4 matching
-? - an optional hyphen
\d+ - one or more digits
(\.\d+)? - Group 5 matching an optional sequence of a dot followed with 1 or more digits
$ - end of string

Regex Lookahead and Lookbehind to parse SQL statement

I am trying to parse SQL statements with regex and save it's parameters to use later.
Lets say I have this SQL statement:
INSERT INTO tablename (id, name, email) VALUES (#id, #name, #email)
The following regex will work just fine:
(#[0-9a-zA-Z$_]+)
However in this statement I should ignore everything in ' ' or " " and save only first parameter:
UPDATE mytable SET id = #id, name = 'myname#id' WHERE id = 1;
According to this answer https://stackoverflow.com/a/307957 "it's not practical to do it in a single regular expression", but I am still trying to do this.
I tried to add Regex Lookahead and Lookbehind, but its not working:
(?<!\').*(#[0-9a-zA-Z$_]+).*(?!\')
Is there any way to do it using only one regular expression? Should I use lookahead/lookbehind or something else?

You can use: [\=\(\s]\s*\#[0-9+^a-zA-Z_0-9$_]+\s*[\),]
Explanation:
[\=\(\s] match a single character present in the list below
\= matches the character = literally
\( matches the character ( literally
\s match any white space character [\r\n\t\f ]
\s* match any white space character [\r\n\t\f ]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\# matches the character # literally
[0-9+^a-zA-Z_0-9$_]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
0-9 a single character in the range between 0 and 9
+^ a single character in the list +^ literally
a-z a single character in the range between a and z (case insensitive)
A-Z a single character in the range between A and Z (case insensitive)
_ the literal character _
0-9 a single character in the range between 0 and 9
$_ a single character in the list $_ literally
\s* match any white space character [\r\n\t\f ]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
[\),] match a single character present in the list below
\) matches the character ) literally
, the literal character ,

You can simplify your regex. Note the group you want always to capture is followed with , or ). Being aware of this fact you get this regex:
(#[0-9a-zA-Z$_]+)(?=[,)])
#[0-9a-zA-Z$_]+ is your value
(?=[,)]) checks if the ) or , character follows.
If the way describing where your string can't be placed is complicated, better look where it must be places instead.
See how it works at Regex101.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - Regular expression to match specific text - java

Related

Allow only left aligned zeros using regex

REGEX greediness or just wrong syntax

Regular expression for both length with whitespaces

Regex for only two comma separated values, keeping second value optional

Regex Lookahead and Lookbehind to parse SQL statement

Categories

Resources