regular expression java for URL parameter string

regular expression java for URL parameter string - java

I am trying to verify if the string match a regular expression or not.
The URL format is : key=value&key=value&....
Key or value can be empty.
My code is :
Pattern patt = Pattern.compile("\\w*=\\w*&(\\w *=\\w*)* ");
Matcher m = patt.matcher(s);
if(m.matches()) return true;
else return false;
when i enter one=1&two=2, it shows false whereas it should show true.
Any idea !

The regex you need is
Pattern.compile("(?:\\w+=\\w*|=\\w+)(?:&(?:\\w+=\\w*|=\\w+))*");
See the regex demo. It will match:
(?:\\w+=\\w*|=\\w+) - either 1+ word chars followed with = and then 0+ word chars (obligatory key, optional value) or = followed with 1+ word chars (optional key)
(?:&(?:\\w+=\\w*|=\\w+))* - zero or more of such sequences as above.
Java demo:
String s = "one=1&two=2&=3&tr=";
Pattern patt = Pattern.compile("(?:\\w+=\\w*|=\\w+)(?:&(?:\\w+=\\w*|=\\w+))*");
Matcher m = patt.matcher(s);
if(m.matches()) {
System.out.println("true");
} else {
System.out.println("false");
}
// => true
To allow whitespaces, add \\s* where needed. If you need to also allow non-word chars, use, say, [\\w.-] instead of \w to match word chars, . and - (keep the - at the end of the character class).

Related

How do I make a regex part optional and finish regex at the same point if that optional part contains a string end ($)?

String s1 = "NetworkElement=Test,testWork=1:[456]";
String s2 = "NetworkElement=Test,testWork=1";
String regex = "(.*):\\[(.*)\\]";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s1);
if(matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
Matcher matcher2 = pattern.matcher(s2);
if(matcher2.find()) {
System.out.println(matcher2.group(1));
System.out.println(matcher2.group(2));
}
/*
Expected output:
for s1 : NetworkElement=Test,testWork=1
456
for s2 : NetworkElement=Test,testWork=1
0
*/
Problem : This regex is working fine for String s1 but not for s2. for string s2, matcher2.find() return false.

You can use
^(.*?)(?::\[(.*?)])?$
^(.*?)(?::\[([^\]\[]*)])?$
See the regex demo.
In Java:
String regex = "^(.*?)(?::\\[(.*?)])?$";
// Or
String regex = "^(.*?)(?::\\[([^\\]\\[]*)])?$";
I added the ^ and $ anchors since you are using matcher.find(). If you switch to matcher.matches(), you can remove the anchors.
Details:
^ - start of string
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
(?::\[(.*?)])? - an optional sequence of :[, then Group 2 capturing any zero or more chars other than line break chars as few as possible and then a ] char (if you use [^\]\[]* it will match zero or more chars other than square brackets)
$ - end of string.

Tokenize Words separated by non-word characters exept single quote

I have the following method I'm trying to implement: parses the input into “word tokens”: sequences of word characters separated by non-word characters. However, non-word characters can become part of a token if they are quoted (in single quotes).
I want to use regex but have trouble getting my code just right:
public static List<String> wordTokenize(String input) {
Pattern pattern = Pattern.compile ("\\b(?:(?<=\')[^\']*(?=\')|\\w+)\\b");
Matcher matcher = pattern.matcher (input);
ArrayList ans = new ArrayList();
while (matcher.find ()){
ans.add (matcher.group ());
}
return ans;
}
My regex fails to identify that starting a word mid word without space doesn't mean starting a new word. Examples:
The input: this-string 'has only three tokens' // works
The input:
"this*string'has only two#tokens'"
Expected :[this, stringhas only two#tokens]
Actual :[this, string, has only two#tokens]
The input: "one'two''three' '' four 'twenty-one'"
Expected :[onetwothree, , four, twenty-one]
Actual :[one, two, three, four, twenty-one]
How do I fix the spaces?

You want to match one or more occurrences of a word char or a substring between the closest single straight apostrophes, and remove all those apostrophes from the tokens.
Use the following regex and .replace("'", "") on the matches:
(?:\w|'[^']*')+
See the regex demo. Details:
(?: - start of a non-capturing group
\w - a word char
| - or
' - a straight single quotation mark
[^']* - any 0+ chars other than a straight single quotation mark
' - a straight single quotation mark
)+ - end of the group, 1+ occurrences.
See the Java demo:
// String s = "this*string'has only two#tokens'"; // => [this, stringhas only two#tokens]
String s = "one'two''three' '' four 'twenty-one'"; // => [onetwothree, , four, twenty-one]
Pattern pattern = Pattern.compile("(?:\\w|'[^']*')+", Pattern.UNICODE_CHARACTER_CLASS);
Matcher matcher = pattern.matcher(s);
List<String> tokens = new ArrayList<>();
while (matcher.find()){
tokens.add(matcher.group(0).replace("'", ""));
}
Note the Pattern.UNICODE_CHARACTER_CLASS is added for the \w pattern to match all Unicode letters and digits.

Splitting string with parentheses

I have a list of String that follows this pattern:
'Name with space (field1_field2) CONST'
Example :
'flow gavage(ZAB_B2_COCUM) BS'
'flowWithoutSpace (WitoutUnderscore) BS'
I would like to extract :
Name with space
The values inside the brackets
The CONST value after the brackets
For the string inside the parentheses () I am using :
\(.*\)
Not sure about the other fields

You may use
String[] results = s.split("\\s*[()]\\s*");
See the regex demo
Pattern details
\\s* - 0+ whitespaces
[()] - a ) or (
\\s* - 0+ whitespaces
If your strings are always in the format specified (no parentheses, (...), no parentheses), you will have:
Name with space = results[0]
The values inside the brackets = results[1]
The CONST value after the brackets = results[2]
If you want a more controlled approach use a matching regex:
Pattern.compile("^([^()]*)\\(([^()]*)\\)(.*)$")
See the regex demo
If you use it with Matcher#matches(), you may omit ^ and $ since that method requires a full string match.
Java demo:
String regex = "^([^()]*)\\(([^()]*)\\)(.*)$";
String s = "flow gavage(ZAB_B2_COCUM) BS";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
if (matcher.matches()){
System.out.println(matcher.group(1).trim());
System.out.println(matcher.group(2).trim());
System.out.println(matcher.group(3).trim());
}
Here, the pattern means:
^ - start of the string (implicit in .matches())
([^()]*) - Capturing group 1: any 0+ chars other than ( and )
\\( - a (
([^()]*) - Capturing group 2: any 0+ chars other than ( and )
\\) - a )
(.*) - Capturing group 3: any 0+ chars, as many as possible, up to the end of the line (use ([^()]*) if you need to restrict ( and ) in this part, too).
$ - end of string (implicit in .matches())

Use the following:-
String line = "'Name with space (field1_field2) CONST'";
Pattern pattern = Pattern.compile("([A-Za-z\\s]+)\\((.*)\\)(.*)\\'");
Matcher matcher = pattern.matcher(line);
String nameWithSpace = "";
String fieldsValuesInBrackets = "";
String constantValue = "";
if (matcher.find()) {
nameWithSpace = matcher.group(1);
fieldsValuesInBrackets = matcher.group(2);
constantValue = matcher.group(3);
}

This expression will generate 3 groups:
(.*?)(\(.*?\))\s*?(.*)
First group will match name, second one will match values inside brackets, third one will match the constant.

RegEx: Matching n-char long sequence of repeating character

I want to split of a text string that might look like this:
(((Hello! --> ((( and Hello!
or
########No? --> ######## and No?
At the beginning I have n-times the same special character, but I want to match the longest possible sequence.
What I have at the moment is this regex:
([^a-zA-Z0-9])\\1+([a-zA-Z].*)
This one would return for the first example
( (only 1 time) and Hello!
and for the second
# and No!
How do I tell regEx I want the maximal long repetition of the matching character?
I am using RegEx as part of a Java program in case this matters.

I suggest the following solution with 2 regexps: (?s)(\\W)\\1+\\w.* for checking if the string contains same repeating non-word symbols at the start, and if yes, split with a mere (?<=\\W)(?=\\w) pattern (between non-word and a word character), else, just return a list containing the whole string (as if not split):
String ptrn = "(?<=\\W)(?=\\w)";
List<String> strs = Arrays.asList("(((Hello!", "########No?", "$%^&^Hello!");
for (String str : strs) {
if (str.matches("(?s)(\\W)\\1+\\w.*")) {
System.out.println(Arrays.toString(str.split(ptrn)));
}else { System.out.println(Arrays.asList(str)); }
}
See IDEONE demo
Result:
[(((, Hello!]
[########, No?]
[$%^&^Hello!]
Also, your original regex can be modified to fit the requirement like this:
String ptrn = "(?s)((\\W)\\2+)(\\w.*)";
List<String> strs = Arrays.asList("(((Hello!", "########No?", "$%^&^Hello!");
for (String str : strs) {
Pattern p = Pattern.compile(ptrn);
Matcher m = p.matcher(str);
if (m.matches()) {
System.out.println(Arrays.asList(m.group(1), m.group(3)));
}
else {
System.out.println(Arrays.asList(str));
}
}
See another IDEONE demo
That regex matches:
(?s) - DOTALL inline modifier (if the string has newline characters, .* will also match them).
((\\W)\\2+) - Capture group 1 matching and capturing into Group 2 a non-word character followed by the same character (since a backreference \2 is used) 1 or more times.
(\\w.*) - matches and captures into Group 3 a word character and then one or more characters.

What is wrong in regexp in Java

I want to get the word text2, but it returns null. Could you please correct it ?
String str = "Text SETVAR((&&text1 '&&text2'))";
Pattern patter1 = Pattern.compile("SETVAR\\w+&&(\\w+)'\\)\\)");
Matcher matcher = patter1.matcher(str);
String result = null;
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);

One way to do it is to match all possible pattern in parentheses:
String str = "Text SETVAR((&&text1 '&&text2'))";
Pattern patter1 = Pattern.compile("SETVAR[(]{2}&&\\w+\\s*'&&(\\w+)'[)]{2}");
Matcher matcher = patter1.matcher(str);
String result = "";
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);
See IDEONE demo
You can also use [^()]* inside the parentheses to just get to the value inside single apostrophes:
Pattern patter1 = Pattern.compile("SETVAR[(]{2}[^()]*'&&(\\w+)'[)]{2}");
^^^^^^
See another demo
Let me break down the regex for you:
SETVAR - match SETVAR literally, then...
[(]{2} - match 2 ( literally, then...
[^()]* - match 0 or more characters other than ( or ) up to...
'&& - match a single apostrophe and two & symbols, then...
(\\w+) - match and capture into Group 1 one or more word characters
'[)]{2} - match a single apostrophe and then 2 ) symbols literally.

Your regex doesn't match your string, because you didn't specify the opened parenthesis also \\w+ will match any combinations of word character and it won't match space and &.
Instead you can use a negated character class [^']+ which will match any combinations of characters with length 1 or more except one quotation :
String str = "Text SETVAR((&&text1 '&&text2'))";
"SETVAR\\(\\([^']+'&&(\\w+)'\\)\\)"
Debuggex Demo

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

regular expression java for URL parameter string - java

Related

How do I make a regex part optional and finish regex at the same point if that optional part contains a string end ($)?

Tokenize Words separated by non-word characters exept single quote

Splitting string with parentheses

RegEx: Matching n-char long sequence of repeating character

What is wrong in regexp in Java

Categories

Resources