What is wrong in regexp in Java - java

I want to get the word text2, but it returns null. Could you please correct it ?
String str = "Text SETVAR((&&text1 '&&text2'))";
Pattern patter1 = Pattern.compile("SETVAR\\w+&&(\\w+)'\\)\\)");
Matcher matcher = patter1.matcher(str);
String result = null;
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);

One way to do it is to match all possible pattern in parentheses:
String str = "Text SETVAR((&&text1 '&&text2'))";
Pattern patter1 = Pattern.compile("SETVAR[(]{2}&&\\w+\\s*'&&(\\w+)'[)]{2}");
Matcher matcher = patter1.matcher(str);
String result = "";
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);
See IDEONE demo
You can also use [^()]* inside the parentheses to just get to the value inside single apostrophes:
Pattern patter1 = Pattern.compile("SETVAR[(]{2}[^()]*'&&(\\w+)'[)]{2}");
^^^^^^
See another demo
Let me break down the regex for you:
SETVAR - match SETVAR literally, then...
[(]{2} - match 2 ( literally, then...
[^()]* - match 0 or more characters other than ( or ) up to...
'&& - match a single apostrophe and two & symbols, then...
(\\w+) - match and capture into Group 1 one or more word characters
'[)]{2} - match a single apostrophe and then 2 ) symbols literally.

Your regex doesn't match your string, because you didn't specify the opened parenthesis also \\w+ will match any combinations of word character and it won't match space and &.
Instead you can use a negated character class [^']+ which will match any combinations of characters with length 1 or more except one quotation :
String str = "Text SETVAR((&&text1 '&&text2'))";
"SETVAR\\(\\([^']+'&&(\\w+)'\\)\\)"
Debuggex Demo

Related

Splitting string with parentheses

I have a list of String that follows this pattern:
'Name with space (field1_field2) CONST'
Example :
'flow gavage(ZAB_B2_COCUM) BS'
'flowWithoutSpace (WitoutUnderscore) BS'
I would like to extract :
Name with space
The values inside the brackets
The CONST value after the brackets
For the string inside the parentheses () I am using :
\(.*\)
Not sure about the other fields
You may use
String[] results = s.split("\\s*[()]\\s*");
See the regex demo
Pattern details
\\s* - 0+ whitespaces
[()] - a ) or (
\\s* - 0+ whitespaces
If your strings are always in the format specified (no parentheses, (...), no parentheses), you will have:
Name with space = results[0]
The values inside the brackets = results[1]
The CONST value after the brackets = results[2]
If you want a more controlled approach use a matching regex:
Pattern.compile("^([^()]*)\\(([^()]*)\\)(.*)$")
See the regex demo
If you use it with Matcher#matches(), you may omit ^ and $ since that method requires a full string match.
Java demo:
String regex = "^([^()]*)\\(([^()]*)\\)(.*)$";
String s = "flow gavage(ZAB_B2_COCUM) BS";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
if (matcher.matches()){
System.out.println(matcher.group(1).trim());
System.out.println(matcher.group(2).trim());
System.out.println(matcher.group(3).trim());
}
Here, the pattern means:
^ - start of the string (implicit in .matches())
([^()]*) - Capturing group 1: any 0+ chars other than ( and )
\\( - a (
([^()]*) - Capturing group 2: any 0+ chars other than ( and )
\\) - a )
(.*) - Capturing group 3: any 0+ chars, as many as possible, up to the end of the line (use ([^()]*) if you need to restrict ( and ) in this part, too).
$ - end of string (implicit in .matches())
Use the following:-
String line = "'Name with space (field1_field2) CONST'";
Pattern pattern = Pattern.compile("([A-Za-z\\s]+)\\((.*)\\)(.*)\\'");
Matcher matcher = pattern.matcher(line);
String nameWithSpace = "";
String fieldsValuesInBrackets = "";
String constantValue = "";
if (matcher.find()) {
nameWithSpace = matcher.group(1);
fieldsValuesInBrackets = matcher.group(2);
constantValue = matcher.group(3);
}
This expression will generate 3 groups:
(.*?)(\(.*?\))\s*?(.*)
First group will match name, second one will match values inside brackets, third one will match the constant.

Parse string using Java Regex Pattern?

I have the below java string in the below format.
String s = "City: [name:NYK][distance:1100] [name:CLT][distance:2300] [name:KTY][distance:3540] Price:"
Using the java.util.regex package matter and pattern classes I have to get the output string int the following format:
Output: [NYK:1100][CLT:2300][KTY:3540]
Can you suggest a RegEx pattern which can help me get the above output format?
You can use this regex \[name:([A-Z]+)\]\[distance:(\d+)\] with Pattern like this :
String regex = "\\[name:([A-Z]+)\\]\\[distance:(\\d+)\\]";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
StringBuilder result = new StringBuilder();
while (matcher.find()) {
result.append("[");
result.append(matcher.group(1));
result.append(":");
result.append(matcher.group(2));
result.append("]");
}
System.out.println(result.toString());
Output
[NYK:1100][CLT:2300][KTY:3540]
regex demo
\[name:([A-Z]+)\]\[distance:(\d+)\] mean get two groups one the upper letters after the \[name:([A-Z]+)\] the second get the number after \[distance:(\d+)\]
Another solution from #tradeJmark you can use this regex :
String regex = "\\[name:(?<name>[A-Z]+)\\]\\[distance:(?<distance>\\d+)\\]";
So you can easily get the results of each group by the name of group instead of the index like this :
while (matcher.find()) {
result.append("[");
result.append(matcher.group("name"));
//----------------------------^^
result.append(":");
result.append(matcher.group("distance"));
//------------------------------^^
result.append("]");
}
If the format of the string is fixed, and you always have just 3 [...] groups inside to deal with, you may define a block that matches [name:...] and captures the 2 parts into separate groups and use a quite simple code with .replaceAll:
String s = "City: [name:NYK][distance:1100] [name:CLT][distance:2300] [name:KTY][distance:3540] Price:";
String matchingBlock = "\\s*\\[name:([A-Z]+)]\\[distance:(\\d+)]";
String res = s.replaceAll(String.format(".*%1$s%1$s%1$s.*", matchingBlock),
"[$1:$2][$3:$4][$5:$6]");
System.out.println(res); // [NYK:1100][CLT:2300][KTY:3540]
See the Java demo and a regex demo.
The block pattern matches:
\\s* - 0+ whitespaces
\\[name: - a literal [name: substring
([A-Z]+) - Group n capturing 1 or more uppercase ASCII chars (\\w+ can also be used)
]\\[distance: - a literal ][distance: substring
(\\d+) - Group m capturing 1 or more digits
] - a ] symbol.
In the .*%1$s%1$s%1$s.* pattern, the groups will have 1 to 6 IDs (referred to with $1 - $6 backreferences from the replacement pattern) and the leading and final .* will remove start and end of the string (add (?s) at the start of the pattern if the string can contain line breaks).

Regex including date string, email, number

I have this regex expression:
String patt = "(\\w+?)(:|<|>)(\\w+?),";
Pattern pattern = Pattern.compile(patt);
Matcher matcher = pattern.matcher(search + ",");
I am able to match a string like
search = "firstName:Giorgio"
But I'm not able to match string like
search = "email:giorgio.rossi#libero.it"
or
search = "dataregistrazione:27/10/2016"
How I should modify the regex expression in order to match these strings?
You may use
String pat = "(\\w+)[:<>]([^,]+)"; // Add a , at the end if it is necessary
See the regex demo
Details:
(\w+) - Group 1 capturing 1 or more word chars
[:<>] - one of the chars inside the character class, :, <, or >
([^,]+) - Group 2 capturing 1 or more chars other than , (in the demo, I added \n as the demo input text contains newlines).
You can use regex like this:
public static void main(String[] args) {
String[] arr = new String[]{"firstName:Giorgio", "email:giorgio.rossi#libero.it", "dataregistrazione:27/10/2016"};
String pattern = "(\\w+[:|<|>]\\w+)|(\\w+:\\w+\\.\\w+#\\w+\\.\\w+)|(\\w+:\\d{1,2}/\\d{1,2}/\\d{4})";
for(String str : arr){
if(str.matches(pattern))
System.out.println(str);
}
}
output is:
firstName:Giorgio
email:giorgio.rossi#libero.it
dataregistrazione:27/10/2016
But you have to remember that this regex will work only for your format of data. To make up the universal regex you should use RFC documents and articles (i.e here) about email format. Also this question can be useful.
Hope it helps.
The Character class \w matches [A-Za-z0-9_]. So kindly change the regex as (\\w+?)(:|<|>)(.*), to match any character from : to ,.
Or mention all characters that you can expect i.e. (\\w+?)(:|<|>)[#.\\w\\/]*, .

RegEx: Matching n-char long sequence of repeating character

I want to split of a text string that might look like this:
(((Hello! --> ((( and Hello!
or
########No? --> ######## and No?
At the beginning I have n-times the same special character, but I want to match the longest possible sequence.
What I have at the moment is this regex:
([^a-zA-Z0-9])\\1+([a-zA-Z].*)
This one would return for the first example
( (only 1 time) and Hello!
and for the second
# and No!
How do I tell regEx I want the maximal long repetition of the matching character?
I am using RegEx as part of a Java program in case this matters.
I suggest the following solution with 2 regexps: (?s)(\\W)\\1+\\w.* for checking if the string contains same repeating non-word symbols at the start, and if yes, split with a mere (?<=\\W)(?=\\w) pattern (between non-word and a word character), else, just return a list containing the whole string (as if not split):
String ptrn = "(?<=\\W)(?=\\w)";
List<String> strs = Arrays.asList("(((Hello!", "########No?", "$%^&^Hello!");
for (String str : strs) {
if (str.matches("(?s)(\\W)\\1+\\w.*")) {
System.out.println(Arrays.toString(str.split(ptrn)));
}else { System.out.println(Arrays.asList(str)); }
}
See IDEONE demo
Result:
[(((, Hello!]
[########, No?]
[$%^&^Hello!]
Also, your original regex can be modified to fit the requirement like this:
String ptrn = "(?s)((\\W)\\2+)(\\w.*)";
List<String> strs = Arrays.asList("(((Hello!", "########No?", "$%^&^Hello!");
for (String str : strs) {
Pattern p = Pattern.compile(ptrn);
Matcher m = p.matcher(str);
if (m.matches()) {
System.out.println(Arrays.asList(m.group(1), m.group(3)));
}
else {
System.out.println(Arrays.asList(str));
}
}
See another IDEONE demo
That regex matches:
(?s) - DOTALL inline modifier (if the string has newline characters, .* will also match them).
((\\W)\\2+) - Capture group 1 matching and capturing into Group 2 a non-word character followed by the same character (since a backreference \2 is used) 1 or more times.
(\\w.*) - matches and captures into Group 3 a word character and then one or more characters.

Find string after last underscore before dot extension

I need to find 20140809T0000Z in this string:
PREVIMER_F2-MARS3D-MENOR1200_20140809T0000Z.nc
I tried the following to keep the string before the .nc:
(?<=_)(.*)(?=.nc)
I have the following to start from the last underscore:
/_[^_]*$/
How can I find string after last underscore before dot extension, using a regex?
RegEx is not always the best solution... :)
String pattern="PREVIMER_F2-MARS3D-MENOR1200_20140809T0000Z.nc";
int start=pattern.lastIndexOf("_") + 1;
int end=pattern.lastIndexOf(".");
if(start != 0 && end != -1 && end > start) {
System.out.println(pattern.substring(start,end);
}
You just need lookahead for this requirement.
You can use:
[^._]+(?=[^_]*$)
// matches and returns 20140809T0000Z
RegEx Demo
You could use the below regex,
(?<=_)[^_]*(?=\.nc)
In your pattern just replace .* with [^_]* so that it would match the inner string.
DEMO
String s = "PREVIMER_F2-MARS3D-MENOR1200_20140809T0000Z.nc";
Pattern regex = Pattern.compile("(?<=_)[^_]*(?=\\.nc)");
Matcher regexMatcher = regex.matcher(s);
if (regexMatcher.find()) {
String ResultString = regexMatcher.group();
System.out.println(ResultString);
} //=> 20140809T0000Z
You could use a simpler pattern with a capturing group
.*_(.*)\.nc
By default the first .* will be "greedy" and consume as many characters as possible before the _, leaving just the desired string inside the (.*).
Demo: http://regex101.com/r/aI2xQ9/1
Java code:
String input = "PREVIMER_F2-MARS3D-MENOR1200_20140809T0000Z.nc";
Pattern pattern = Pattern.compile(".*_(.*)\\.nc");
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
String group = matcher.group(1);
// ...
}
So, you need a sequence of non-underscore characters that immediately precede the period character.
Try [^_.]+(?=\.)
Demo: https://regex101.com/r/sLAnVs/2
Thanks to Cary Swoveland for pointing out that "no need to escape a period in a character class".

Categories

Resources