I am trying to capture text that is matched by lookbehind.
My code :
private static final String t1="first:\\\w*";
private static final String t2="(?<=\\w+)=\\".+\\"";
private static final String t=t1+'|'+t2;
Pattern p=Pattern.compile(t);
Matcher m=p.matcher("first:second=\\"hello\\"");
while(m.find())
System.out.println(m.group());
The output:
first:second
="hello"
I expected:
first:second
second="hello"
How can I change my regex so that I could get what I expect.
Thank you
Why don't you just use one regex to match it all?
(first:)(\w+)(=".+")
And then simply use one match, and use the groups 1 and 2 for the first expected row and the groups 2 and 3 for the second expected row.
I modified your example to be compilable and showing my attempt:
package examples.stackoverflow.q71651411;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Q71651411 {
public static void main(String[] args) {
Pattern p = Pattern.compile("(first:)(\\w+)(=\".+\")");
Matcher m = p.matcher("first:second=\"hello\"");
while (m.find()) {
System.out.println("part 1: " + m.group(1) + m.group(2));
System.out.println("part 2: " + m.group(2) + m.group(3));
}
}
}
from this -> contractor:"Hi, this is \"Paul\", how are you?" client:"Hi ...." <-
I want to get just -> Hi, this is \"Paul\", how are you? <-
I need a regular expression in java to do that I try it but I m struggle with the inner quotation (\") is driving me mad.
Thanks for any hint.
Java supports lookbehinds, so vanilla regex:
"(.*?(?<!\\))"
Inside a Java string (see https://stackoverflow.com/a/37329801/1225328):
\"(.*?(?<!\\\\))\"
The actual text will be contained inside the first group of each match.
Demo: https://regex101.com/r/8OXujX/2
For example, in Java:
String regex = "\"(.*?(?<!\\\\))\"";
String input = "contractor:\"Hi, this is \\\"Paul\\\", how are you?\" client:\"Hi ....\"";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
if (matcher.find()) { // or while (matcher.find()) to iterate through all the matches
System.out.println(matcher.group(1));
} else {
System.out.println("No matches");
}
Prints:
Hi, this is \"Paul\", how are you?
The regexp should be like this: "(?:\\.|[^"\\])*"
Online demo
It uses non-capturing group ?:, matching any character . or a single character NOT in the list of double quote and backslash.
var text1 = "contractor:\"Hi, this is \\\"Paul\\\", how are you?\" client:\"Hi ....\" <-";
var regExWithQuotation = "contractor:(.+\".+\".+) client:";
Pattern p = Pattern.compile(regExWithQuotation);
var m = p.matcher(text1);
;
if (m.find()) {
var res = m.group(1);
System.out.println(res);
}
var regExWithoutQuotation = "contractor:\"(.+\".+\".+)?\" client:";
p = Pattern.compile(regExWithoutQuotation);
m = p.matcher(text1);
if (m.find()) {
var res = m.group(1);
System.out.println(res);
}
Output is:
"Hi, this is "Paul", how are you?"
Hi, this is "Paul", how are you?
You can use the regex, (?<=contractor:\").*(?=\" client:)
Description of the regex:
(?<=contractor:\") specifies positive lookbehind for contractor:\"
.* specifies any character
(?=\" client:) specifies positive lookahead for \" client:
In short, anything preceded by contractor:\" and followed by \" client:
Demo:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String str = "contractor:\"Hi, this is \\\"Paul\\\", how are you?\" client:\"Hi ....\"";
String regex = "(?<=contractor:\").*(?=\" client:)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
}
Output:
Hi, this is \"Paul\", how are you?
I have this string:
values="[72, 216, 930],[250],[72],[228, 1539],[12]";
am trying to combine two patterns in order to get the last number in first [] type and the number in the second [] type.
pattern="\\, ([0-9]+)\\]|\\[([0-9]+)\\]"
But it outputs null:
930, null, null, 1539, null
How do I solve this problem?
Here, we might not want to bound it from the left, and simply use the ] from right, then we swipe to left and collect our digits, maybe similar to this expression:
([0-9]+)\]
Graph
This graph shows how it would work:
If you like, we can also bound it from the left, similar to this expression:
([\[\s,])([0-9]+)(\])
Graph
This graph shows how the second one would work:
Try this.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = ", ([0-9]+)]";
final String string = "[72, 216, 930],[250],[72],[228, 1539],[12]";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
Output:
Full match: , 930]
Group 1: 930
Full match: , 1539]
Group 1: 1539
package Sample;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class StackOverFlow{
final static String regex = "\\d*]";
final static String string = "[72, 216, 930],[250],[72],[228, 1539],[12]";
final static Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final static Matcher matcher = pattern.matcher(string);
public static void main(String[] args) {
while (matcher.find()) {
String val = matcher.group(0).replace("]", "");
System.out.println(val);
}
}
}
output
930
250
72
1539
12
To make sure that the data is actually in between square brackets, you could use a capturing group, start the match with [ and end the match with ]
\[(?:\d+,\h+)*(\d+)]
In Java
\\[(?:\\d+,\\h+)*(\\d+)]
\[ Match [
(?:\d+,\h+)* Repeat 0+ times matching 1+ digit, comma and 1+ horizontal whitespace chars
(\d+) Capture in group 1 matching 1+ digit
] Match closing square bracket
Regex demo | Java demo
For example:
String regex = "\\[(?:\\d+,\\h+)*(\\d+)]";
String string = "[72, 216, 930],[250],[72],[228, 1539],[12]";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Result:
930
250
72
1539
12
It seems like for a structure such as this, it's likely beneficial to parse the whole thing into memory, then index into the elements you're particularly interested in to your heart's content. Should the structure change unexpectedly/dynamically, you won't need to rewrite your regex, just index as needed as many times as you wish:
import java.util.*;
class Main {
public static void main(String[] args) {
String values = "[72, 216, 930],[250],[72],[228, 1539],[12]";
String[] data = values.substring(1, values.length() - 1).split("\\]\\s*,\\s*\\[");
ArrayList<String[]> result = new ArrayList<>();
for (String d : data) {
result.add(d.split("\\s*,\\s*"));
}
System.out.println(result.get(0)[result.get(0).length-1]); // => 930
System.out.println(result.get(1)[0]); // => 250
}
}
Problem is last one character never gets matched.
When I tried displaying using group ,it shows all match except last character.
Its same in all cases.
Below is the code and its o/p.
package mon;
import java.util.*;
import java.util.regex.*;
class HackerRank {
static void Pattern(String text) {
String p="\\d{1,2}|(0|1)\\d{2}|2[0-4]\\d|25[0-5]";
String pattern="(("+p+")\\.){3}"+p;
Pattern pi=Pattern.compile(pattern);
Matcher m=pi.matcher(text);
// System.out.println(m.group());
if(m.find() && m.group().equals(text))
System.out.println(m.group()+"true");
else
System.out.println(m.group()+" false");
}
public static void main(String[] args) {
Scanner sc=new Scanner(System.in);
while(sc.hasNext()) {
Pattern(sc.next());
}
sc.close();
}
}
I/P:000.12.12.034;
O/P:000.12.12.03 false
You should properly group the alternatives inside the octet pattern:
String p="(?:\\d{1,2}|[01]\\d{2}|2[0-4]\\d|25[0-5])";
// ^^^ ^
Then build the patter like
String pattern = p + "(?:\\." + p + "){3}";
It will become a bit more efficient. Then, use matches to require a full string match:
if(m.matches()) {...
See a Java demo:
String p="(?:\\d{1,2}|[01]\\d{2}|2[0-4]\\d|25[0-5])";
String pattern = p + "(?:\\." + p + "){3}";
String text = "192.156.34.56";
// System.out.println(pattern); => (?:\d{1,2}|[01]\d{2}|2[0-4]\d|25[0-5])(?:\.(?:\d{1,2}|[01]\d{2}|2[0-4]\d|25[0-5])){3}
Pattern pi=Pattern.compile(pattern);
Matcher m=pi.matcher(text);
if(m.matches())
System.out.println(m.group()+" => true");
else
System.out.println("False"); => 192.156.34.56 => true
And here is the resulting regex demo.
I am missing something basic here. I have this regex (.*)=\1 and I am using it to match 100=100 and its failing. When I remove the back reference from the regex and continue to use the capturing group, it shows that the captured group is '100'. Why does it not work when I try to use the back reference?
package test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
public static void main(String[] args) {
String eqPattern = "(.*)=\1";
String input[] = {"1=1"};
testAndPrint(eqPattern, input); // this does not work
eqPattern = "(.*)=";
input = new String[]{"1=1"};
testAndPrint(eqPattern, input); // this works when the backreference is removed from the expr
}
static void testAndPrint(String regexPattern, String[] input) {
System.out.println("\n Regex pattern is "+regexPattern);
Pattern p = Pattern.compile(regexPattern, Pattern.CASE_INSENSITIVE);
boolean found = false;
for (String str : input) {
System.out.println("Testing "+str);
Matcher matcher = p.matcher(str);
while (matcher.find()) {
System.out.println("I found the text "+ matcher.group() +" starting at " + "index "+ matcher.start()+" and ending at index "+matcher.end());
found = true;
System.out.println("Group captured "+matcher.group(1));
}
if (!found) {
System.out.println("No match found");
}
}
}
}
When I run this, I get the following output
Regex pattern is (.*)=\1
Testing 100=100
No match found
Regex pattern is (.*)=
Testing 100=100
I found the text 100= starting at index 0 and ending at index 4
Group captured 100 -->If the group contains 100, why doesnt it match when I add \1 above
?
You have to escape the pattern string.
String eqPattern = "(.*)=\\1";
I think you need to escape the backslash.
String eqPattern = "(.*)=\\1";