Semi colon separated alphanumeric - java

I need to validate the below string using regular expression in Java:
String alphanumericList ="[\"State\"; \"districtOne\";\"districtTwo\"]";
I have tried the following:
String pattern="^\\[ (\"[\\w]\")\\s+(?:\\s+;\\s+ (\"[\\w]\")+) \\]$";
String alphanumericList ="[\"State1\"; \"district1\";\"district2\"]";
But the validation fails.
Any help is appreciated.

I'll try and mark the possible issues with your expression (issue numbers above the chars):
1 4 2 3 1 4 5 1
"^\\[ (\"[\\w]\")\\s+(?:\\s+;\\s+ (\"[\\w]\")+) \\]$"
As you can see, there are at least 5 issues:
The spaces in your expression are interpreted literally, i.e. if the input doesn't contain them, it would not match. Most probably you want to remove those spaces.
You expect at least one whitespace character after the first group (\\s+), which the input doesn't seem to contain. You probably want to remove that or change the quantifier from + to *.
You expect at least one whitespace character before each semicolon. Together with no. 2 this would make at least two after the first group. The solution would be the same as for no. 2.
Your expression the strings between double quotes seems wrong. (\"[\\w]\")+ means "a double quote, a single word character, a double quote" and all at least once. Besides that, \w is already a character class, you the brackets around that are not needed here (unless you want to add more classes or characters inside). You probably want (\"\\w+\") instead.
Additionally to 4 your non-capturing group that contains the semicolon ((?:\\s+;\\s+ (\"[\\w]\")+)) doesn't have a quantifier, i.e. it would be expected exactly once. You probably want to put the quantifier + or * after that group.
Another point that's not a direct issue is the capturing group around \"[\\w]\". Since you seem to want to match multiple strings after semicolons you'd only be able to capture one of the matching groups. Hence you'd most probably not be able to do what you intended anyways and thus the group is not necessary.
That said the fixed original expression would look like this:
pattern = "^\\[(\"\\w+\")(?:\\s*;\\s+\"\\w+\")+\\]$"

You are looking for this pattern:
String pattern = "\\[\\s*\"[^\"]*\"\\s*(?:;\\s*\"[^\"]*\"\\s*)*+\\]";
No need to add anchors since there are implicit if you use the matches() method since this method is the more appropriate for validation tasks.
pattern details:
\\[ # a literal opening square bracket
\\s* # optional whitespaces
\" # literal quote
[^\"]* # content between quotes: chars that are not a quote (zero or more)
\"
\\s*
(?: # non-capturing group:
; # a literal semi-colon
\\s*
\" # quoted content
[^\"]*
\"
\\s*
)*+ # repeat this group zero or more time (with a possessive quantifier)
\\] # a literal closing square bracket
The possessive quantifier prevent the regex engine to backtrack into repeated non-capturing groups if the closing square bracket is not present. It is a security to prevent uneeded backtracking and to make the pattern fail faster. Not that you can make possessive other quantifiers too before the non-capturing group for the same reason. More about possessive quantifiers.
I decided to describe the content between quotes in this way: \"[^\"]*\", but you can be more restrictive, allowing for example only words characters: \"\\w*\" or more general, allowing escaped quotes: \"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*+\"

Try this
static final String HEAD = "^\\[\\s*";
static final String TAIL = "\\s*\\]$";
static final String SEP = "\\s*;\\s*";
static final String ITEM = "\"[^\"]*\"";
static final String PAT = HEAD + ITEM + "(" + SEP + ITEM + ")*" + TAIL;

Try:
pattern = "^\\[(\"\\w+\";\\s*)*(\"\\w+\")\\]$";

Related

Regex pattern matching with multiple strings

Forgive me. I am not familiarized much with Regex patterns.
I have created a regex pattern as below.
String regex = Pattern.quote(value) + ", [NnoneOoff0-9\\-\\+\\/]+|[NnoneOoff0-9\\-\\+\\/]+, "
+ Pattern.quote(value);
This regex pattern is failing with 2 different set of strings.
value = "207e/160";
Use Case 1 -
When channelStr = "207e/160, 149/80"
Then channelStr.matches(regex), returns "true".
Use Case 2 -
When channelStr = "207e/160, 149/80, 11"
Then channelStr.matches(regex), returns "false".
Not able to figure out why? As far I can understand it may be because of the multiple spaces involved when more than 2 strings are present with separated by comma.
Not sure what should be correct pattern I should write for more than 2 strings.
Any help will be appreciated.
If you print your pattern, it is:
\Q207e/160\E, [NnoneOoff0-9\-\+\/]+|[NnoneOoff0-9\-\+\/]+, \Q207e/160\E
It consists of an alternation | matching a mandatory comma as well on the left as on the right side.
Using matches(), should match the whole string and that is the case for 207e/160, 149/80 so that is a match.
Only for this string 207e/160, 149/80, 11 there are 2 comma's, so you do get a partial match for the first part of the string, but you don't match the whole string so matches() returns false.
See the matches in this regex demo.
To match all the values, you can use a repeating pattern:
^[NnoeOf0-9+/-]+(?:,\h*[NnoeOf0-90+/-]+)*$
^ Start of string
[NnoeOf0-9\\+/-]+
(?: Non capture group
,\h* Match a comma and optional horizontal whitespace chars
[NnoeOf0-90-9\\+/-]+ Match 1+ any of the listed in the character class
)* Close the non capture group and optionally repeat it (if there should be at least 1 comma, then the quantifier can be + instead of *)
$ End of string
Regex demo
Example using matches():
String channelStr1 = "207e/160, 149/80";
String channelStr2 = "207e/160, 149/80, 11";
String regex = "^[NnoeOf0-9+/-]+(?:,\\h*[NnoeOf0-90+/-]+)*$";
System.out.println(channelStr1.matches(regex));
System.out.println(channelStr2.matches(regex));
Output
true
true
Note that in the character class you can put - at the end not having to escape it, and the + and / also does not have to be escaped.
You can use regex101 to test your RegEx. it has a description of everything that's going on to help with debugging. They have a quick reference section bottom right that you can use to figure out what you can do with examples and stuff.
A few things, you can add literals with \, so \" for a literal double quote.
If you want the pattern to be one or more of something, you would use +. These are called quantifiers and can be applied to groups, tokens, etc. The token for a whitespace character is \s. So, one or more whitespace characters would be \s+.
It's difficult to tell exactly what you're trying to do, but hopefully pointing you to regex101 will help. If you want to provide examples of the current RegEx you have, what you want to match and then the strings you're using to test it I'll be happy to provide you with an example.
^(?:[NnoneOoff0-9\\-\\+\\/]+ *(?:, *(?!$)|$))+$
^ Start
(?: ... ) Non-capturing group that defines an item and its separator. After each item, except the last, the separator (,) must appear. Spaces (one, several, or none) can appear before and after the comma, which is specified with *. This group can appear one or more times to the end of the string, as specified by the + quantifier after the group's closing parenthesis.
Regex101 Test

java regular expression and replace all occurrences

I want to replace one string in a big string, but my regular expression is not proper I guess. So it's not working.
Main string is
Some sql part which is to be replaced
cond = emp.EMAIL_ID = 'xx#xx.com' AND
emp.PERMANENT_ADDR LIKE('%98n%')
AND hemp.EMPLOYEE_NAME = 'xxx' and is_active='Y'
String to find and replace is
Based on some condition sql part to be replaced
hemp.EMPLOYEE_NAME = 'xxx'
I have tried this with
Pattern and Matcher class is used and
Pattern pat1 = Pattern.compile("/^hemp.EMPLOYEE_NAME\\s=\\s\'\\w\'\\s[and|or]*/$", Pattern.CASE_INSENSITIVE);
Matcher mat = pat1.matcher(cond);
while (mat.find()) {
System.out.println("Match: " + mat.group());
cond = mat.replaceFirst("xx "+mat.group()+"x");
mat = pat1.matcher(cond);
}
It's not working, not entering the loop at all. Any help is appreciated.
Obviously not - your regexp pattern doesn't make any sense.
The opening /: In some languages, regexps aren't strings and start with an opening slash. Java is not one of those languages, and it has nothing to do with regexps itself. So, this looks for a literal slash in that SQL, which isn't there, thus, failure.
^ is regexpese for 'start of string'. Your string does not start with hemp.EMPLOYEE_NAME, so that also doesn't work. Get rid of both / and ^ here.
\\s is one whitespace character (there are many whitespace characters - this matches any one of them, exactly one though). Your string doesn't have any spaces. Your intent, surely, was \\s* which matches 0 to many of them, i.e.: \\s* is: "Whitespace is allowed here". \\s is: There must be exactly one whitespace character here. Make all the \\s in your regexp an \\s*.
\\w is exactly one 'word' character (which is more or less a letter or digit), you obviously wanted \\w*.
[and|or] this is regexpese for: "An a, or an n, or a d, or an o, or an r, or a pipe symbol". Clearly you were looking for (and|or) which is regexpese for: Either the sequence "and", or the sequence "or".
* - so you want 0 to many 'and' or 'or', which makes no sense.
closing slash: You don't want this.
closing $: You don't want this - it means 'end of string'. Your string didn't end here.
The code itself:
replaceFirst, itself, also does regexps. You don't want to double apply this stuff. That's not how you replace a found result.
This is what you wanted:
Matcher mat = pat1.matcher(cond);
mat.replaceFirst("replacement goes here");
where replacement can include references to groups in the match if you want to take parts of what you matched (i.e. don't use mat.group(), use those references).
More generally did you read any regexp tutorial, did any testing, or did any reading of the javadoc of Pattern and Matcher?
I've been developing for a few years. It's just personal experience, perhaps, but, reading is pretty fundamental.
Instead of the anchors ^ and $, you can use word boundaries \b to prevent a partial match.
If you want to match spaces on the same line, you can use \h to match horizontal whitespace char, as \s can also match a newline.
You can use replaceFirst on the string using $0 to get the full match, and an inline modifier (?i) for a case insensitive match.
Note that using [and|or] is a character class matching one of the listed chars and escape the dot to match it literally, or else . matches any char except a newline.
(?i)\bhemp\.EMPLOYEE_NAME\h*=\h*'\w+'\h+(?:and|or)\b
See a regex demo or a Java demo
For example
String regex = "\\bhemp\\.EMPLOYEE_NAME\\h*=\\h*'\\w+'\\h+(?:and|or)\\b";
String string = "cond = emp.EMAIL_ID = 'xx#xx.com' AND\n"
+ "emp.PERMANENT_ADDR LIKE('%98n%') \n"
+ "AND hemp.EMPLOYEE_NAME = 'xxx' and is_active='Y'";
System.out.println(string.replaceFirst(regex, "xx$0x"));
Output
cond = emp.EMAIL_ID = 'xx#xx.com' AND
emp.PERMANENT_ADDR LIKE('%98n%')
AND xxhemp.EMPLOYEE_NAME = 'xxx' andx is_active='Y'

Java regular expressions for specific name\value format

I'm not familiar yet with java regular expressions. I want to validate a string that has the following format:
String INPUT = "[name1 value1];[name2 value2];[name3 value3];";
namei and valuei are Strings should contain any characters expect white-space.
I tried with this expression:
String REGEX = "([\\S*\\s\\S*];)*";
But if I call matches() I get always false even for a good String.
what's the best regular expression for it?
This does the trick:
(?:\[\w.*?\s\w.*?\];)*
If you want to only match three of these, replace the * at the end with {3}.
Explanation:
(?:: Start of non-capturing group
\[: Escapes the [ sign which is a meta-character in regex. This
allows it to be used for matching.
\w.*?: Lazily matches any word character [a-z][A-Z][0-9]_. Lazy matching means it attempts to match the character as few times possible, in this case meaning that when will stop matching once it finds the following \s.
\s: Matches one whitespace
\]: See \[
;: Matches one semicolon
): End of non-capturing group
*: Matches any number of what is contained in the preceding non-capturing group.
See this link for demonstration
You should escape square brackets. Also, if your aim is to match only three, replace * with {3}
(\[\\S*\\s\\S*\];){3}

Find java comments (multi and single line) using regex

I found the following regex online at http://regexlib.com/
(\/\*(\s*|.*?)*\*\/)|(\/\/.*)
It seems to work well for the following matches:
// Compute the exam average score for the midterm exam
/**
* The HelloWorld program implements an application that
*/
BUT it also tends to match
http://regexr.com/foo.html?q=bar
at least starting at the //
I'm new to regex and a total infant, but I read that if you put a caret at the beginning it forces the match to start at the beginning of the line, however this doesn't seem to work on RegExr.
I'm using the following:
^(\/\*(\s*|.*?)*\*\/)|(\/\/.*)$
The regex you are looking for is one that allows the comment beginning (// or /*) to appear anywhere except in each of the regexps that result in tokens that can contain those substrings inside. If you look at the lexical structure of java language, you'll see that the only lexical element that can contain a // or a /* inside is the string literal, so to match a comment inside a string you have to match all the string (for not having a string literal before your match that happens to begin a string literal --- and contain your comment inside)
So, the string before your comment should be composed of any valid string that don't begin a string literal (without ending) and so, it can be rounded by any number of string literals with any string that doesn't form a string literal in between. If you consider a string literal, it should be matched by the following:
\"()*\"
and the inside of the parenthesis must be filled with something that cannot be a \n, a single ", a single \, and also not a unicode literal \uxxxx that results in a valid " (java forbids to use normal java characters to be encoded as unicode sequences, so this last case doesn't apply) but can be a escaped \\ or a escaped \", so this leads to
\"([^\\\"\n]|\\.)*\"
and this can be repeated any number of times optionaly, and preceded of any character not being a " (that should begin the last part considered):
([^\\"](\"([^\\\"\n]|\\.)*\")?)*
well, the previous part to our valid string should be matched by this string, and then comes the comment string, it can be any of two forms:
\/\/[^\n]*$
or
/\*([^\*]|\*[^\/])*\*\/
(this is, a slash, an asterisk (escaped), and any number of things that can be: either something different than a * or * followed by something not a /, to finally reach a */ sequence)
These can be grouped in an alternative group, as in:
(\/\/[^\n]*\n|\/\*([^\*]|\*[^\/])*\*\/)
finally, our expression shows:
^([^\\"](\"([^\\\"\n]|\\.)*\")?)*(\/\/[^\n]*|\/\*([^\*]|\*[^/])*\*\/)
But you should be careful that your matched comment begins not at the beginning, but in the 4th group (in the mark of the 4th left parenthesis) and the regexp should match the string from the beginning, see demo
Note
Think you are matching not only the comment, but the text before. This makes the result match to be composed of what is before the matching you want and the matched. Also think that if you try this regexp with several comments in sequence, it will match only the last, as we have not covered the case of a /* ... /* .... */ sequence (the comment is also something that can be embedded into a comment, but considering also this case will make you hate regexps forever. The correct way to cope with this problem is to write a lex/flex specification to get the java tokens and you'll only get them, but this is out of scope in this explanation. See an probably valid example here.
You can try this pattern:
(?ms)^[^'"\n]*?(?:(?:"(?:\\.|[^"])*"|'\\?.')[^'"\n]*?)*((?:(?://[^\n]*|/\*.*?\*/)[ \t]*)+)
This captures comments in group 1, but only if the comment is not inside a string. Demo.
Breakdown:
(?ms) multiline flag, makes ^ match at the start of a line
singleline flag makes . match newlines
^ start of line
[^'"\n]*? match anything but " or ' or newline
(?: then, any number strings:
(?:
" start with a quote...
(?: ...followed by any number of...
\\. ...a backslash and the escaped character
| or
[^"] any character other than "
)*
" ...and finally the closing quote
| or...
'\\?.' a single character in single quotes, possibly escaped
)
[^'"\n]*? and everything up to the next string or newline
)*
( finally, capture (any number of) comments:
(?:
(?: either...
//[^\n]* a single line comment
| or
/\*.*?\*/ a multiline comment
)
[ \t]* and any subsequent comments if only separated by whitespace
)+
)

Java regex: Negative lookahead

I'm trying to craft two regular expressions that will match URIs. These URIs are of the format: /foo/someVariableData and /foo/someVariableData/bar/someOtherVariableData
I need two regexes. Each needs to match one but not the other.
The regexes I originally came up with are:
/foo/.+ and /foo/.+/bar/.+ respectively.
I think the second regex is fine. It will only match the second string. The first regex, however, matches both. So, I started playing around (for the first time) with negative lookahead. I designed the regex /foo/.+(?!bar) and set up the following code to test it
public static void main(String[] args) {
String shouldWork = "/foo/abc123doremi";
String shouldntWork = "/foo/abc123doremi/bar/def456fasola";
String regex = "/foo/.+(?!bar)";
System.out.println("ShouldWork: " + shouldWork.matches(regex));
System.out.println("ShouldntWork: " + shouldntWork.matches(regex));
}
And, of course, both of them resolve to true.
Anybody know what I'm doing wrong? I don't need to use Negative lookahead necessarily, I just need to solve the problem, and I think that negative lookahead might be one way to do it.
Thanks,
Try
String regex = "/foo/(?!.*bar).+";
or possibly
String regex = "/foo/(?!.*\\bbar\\b).+";
to avoid failures on paths like /foo/baz/crowbars which I assume you do want that regex to match.
Explanation: (without the double backslashes required by Java strings)
/foo/ # Match "/foo/"
(?! # Assert that it's impossible to match the following regex here:
.* # any number of characters
\b # followed by a word boundary
bar # followed by "bar"
\b # followed by a word boundary.
) # End of lookahead assertion
.+ # Match one or more characters
\b, the "word boundary anchor", matches the empty space between an alphanumeric character and a non-alphanumeric character (or between the start/end of the string and an alnum character). Therefore, it matches before the b or after the r in "bar", but it fails to match between w and b in "crowbar".
Protip: Take a look at http://www.regular-expressions.info - a great regex tutorial.

Categories

Resources