Java named backreferences not matching - java

I'm writing a simplified SQL parser that's using regexes to match each valid command. I'm stuck on matching the following:
attribute1 type1, attribute2 type2, attribute3 type3, ...
Where attributes are names of table columns, and types can be a CHAR(size), INT, or DEC. This is used in a CREATE TABLE statement:
CREATE TABLE student (id INT, name CHAR(20), gpa DEC);
To debug it, I'm trying to match this:
id INT, name CHAR(20), gpa DEC
with this:
(?<attributepair>[A-Za-z0-9_]+ (INT|(CHAR\([0-9]{1,3}\))|DEC))(, \k<attributepair>)*
I even tried it without naming the backreference:
([A-Za-z0-9_]+ (INT|(CHAR\([0-9]{1,3}\))|DEC))(, \1)*
I tested the latter regex expression with regexpal and it matched, but both don't when I try it in my Java program. Is there something I'm missing? How can I make this work? Perhaps this has something to do with how I'm calling Pattern.compile(), like if I'm missing a flag or not. I'm also have JDK v7.
Update: I've found that although matches() returns false, lookingAt() and find() return true. It's matching each individual attribute. I want to craft my regex so it matches the whole expression rather than each attribute.

There is no "match as many time as possible and join all the groups together" in Java.
You either have to do it yourself using:
while(matcher.find()) {
// ...
}
... or using a regex that already matches everything in a single call to find.
For example, you could try the following regex (as Java String) instead, which will match all your attributes at once.
(?:\\w+ (?:INT|CHAR(?:\\(\\d{1,3}\\))?|DEC)(?:, )?)+
Here is a working example:
final String str = "CREATE TABLE student (id INT, name CHAR(20), gpa DEC);";
final Pattern p = Pattern.compile("(?:\\w+ (?:INT|CHAR(?:\\(\\d{1,3}\\))?|DEC)(?:, )?)+");
final Matcher m = p.matcher(str);
if(m.find()) {
System.out.println(m.group()); // prints "id INT, name CHAR(20), gpa DEC"
};
Output:
id INT, name CHAR(20), gpa DEC

When you do something like ([A-Za-z0-9_]+ (INT|(CHAR\([0-9]{1,3}\))|DEC))(, \1)* the backreference is for what the first group actually matched.
Ie, id INT, id INT, name CHAR(20), gpa DEC would work with the backreference in the sense that id INT, id INT would become part of the same match. (If you stick that in regexpal you'll see the difference quite clearly based on the highlights.)

Related

java.util.regex.PatternSyntaxException: Illegal repetition near index 12

I am new to regEx. I need to validate email using java. I have created regEx for email validation by hardcoding the domain name. But the domain name should be dynamic. I have passed the domain name as a parameter. But I don't know how to pass parameter in regEx.
But I tried this code, then I got the error "java.util.regex.PatternSyntaxException: Illegal repetition near index 12". I have followed some answers but it doesn't help for me. From those answers I understood about repetition quantifier. Can you tell me what I am missing here and how to solve this issue?
public static boolean validateEmail(String email, String domainName) {
pattern = Pattern.compile("^([\\w-\\.]+)# {"+ domainName +"}" , Pattern.CASE_INSENSITIVE);
matcher = pattern.matcher(email);
return matcher.matches();
}
{ and } have meaning in regex, namely for specifying how often the character before it can repeat. E.g. a{5} matches aaaaa.
If you want to use curly braces in regex, you should escape them like \\{ and \\}.
But that's not what you need for passing this as a parameter — it will just be literal text at that point. If you want to only match that literal domain, you could do Pattern.compile("^([\\w-\\.]+)#" + domainName, Pattern.CASE_INSENSITIVE).

How to get the particular word from response in jmeter Regular Expression extractor

I am trying to extract the last value from the string in Jmeter Regular Expression extractor.
My string
Server.init("asdfasd4ffffasdf", "http://x.x.x.x:8888/", "asdf-U-Yasdf77asdf99");
I want to get only asdf-U-Yasdf77asdf99.
I tried something like the below, but not correct:
Server.init\(".+", ".+", "([A-Za-z0-9\-]+)"\);
Using JMeter you need to reference your match group.
Reference Name: MYWORD
Regular Expression: Server\.init\("[^"]+", "[^"]+", "([^"]+)"\);
Template: $1$
Your captured match can be accessed by using ${MYWORD}
If you specify using a Match No: above, use the corresponding value to access the match.
That regex, while not very beautiful, should work when used correctly. But you need to look at the result of group 1, not the entire match.
So you need to do something like
Pattern regex = Pattern.compile("Server\\.init\\(\"[^\"]+\", \"[^\"]+\", \"([A-Za-z0-9\\-]+)\"\\);");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
ResultString = regexMatcher.group(1);
}
Regular Expression:
Server.init\("(.+?)",\s"(.+?)",\s"(.+?)"
Matches the following string
Server.init("asdfasd4ffffasdf", "http://x.x.x.x:8888/", "asdf-U-Yasdf77asdf99"
we can extract the following values in jmeter:
$1 values = asdfasd4ffffasdf
$2 values = http://x.x.x.x:8888/
$3 values = asdf-U-Yasdf77asdf99

Regex for parsing string with same type of expression

I am new to regex parsing in java. I want to parse the string which contain the records. But I want to select the selected part of that record only.
\"6\":\"Services Ops\",\"practice_name\":\"Services Ops\",\"7\":\"Management\",
For this, I have written regex expression as
(^\\\"6\\\":\\\"[A-Za-z \s]*)
and above expression gives me result as : \"6\":\"Services Ops\
I want only Service Ops
And also there are multiple records like \"5"\:\"xxx"\ and so on thus if I write the expression for only Service Ops then entries from other fields are also included in the result of the expression.
Is there any way that we can select the string which start with some pattern but we can exclude that pattern.
Like in above example, string starting with \"6\":\" but we can exclude this part and get only Service Ops as result.
Thank you.
You can use lookarounds which perform only a check but don't match:
lookahead (?=...)
lookbehind(?<=...)
example:
(?<=\\\"6\\\":\\\")[^\"]++(?=\")
An another way is to use a capturing group (...):
\\\"6\\\":\\\"([^\"]++)\"
Then you can extract only the content of the group. Example:
Pattern p = Pattern.compile("\\\"6\\\":\\\"([^\"]++)\"");
Matcher m = p.matcher(yourString);
if (m.matches()) {
System.out.println(m.group(1));
}

Java RegEX and group

I need to parse log files and get some values to variable.
The log file will have a string
String logStr = "21:19:03 -[ 8b4]- ERROR - Jhy AlarmOccure::OnAdd - Updated existing alarm: ID [StrValue1:StrValu2|StrValue3], Instance [4053], SetStatus [0], AckStatus [1], SetTime [DateValue4], ClearedTime [DateValue5]";
I need to get StrValue1,StrValue2,StrValue3,DateValue4 and DateValue5 to varaibles these values are changing fields when ever there is an error.
First i was trying to at least get StrValue1. But not getting the expected result.
Pattern twsPattern = Pattern.compile(".*?ID ?[([^]:]*):([^]|]*)|([^]]*)]");//.*ID\\s$.([^]:]*.):.([^]|]*.)|.([^]]*.).]
Matcher twsMatcher = twsPattern.matcher(logStr);
if(twsMatcher.find()){
System.out.println(twsMatcher.start());
System.out.println(twsMatcher.group());
System.out.println(twsMatcher.end());
}
I am not able to understand the grouping stuff, in regex.
Try regexp ([a-zA-z]+) \[([^\]]+)\].
For string 21:19:03 -[ 8b4]- ERROR - Jhy AlarmOccure::OnAdd - Updated existing alarm: ID [StrValue1:StrValu2|StrValue3], Instance [4053], SetStatus [0], AckStatus [1], SetTime [DateValue4], ClearedTime [DateValue5] it returns:
ID and StrValue1:StrValu2|StrValue3
Instance and 4053
SetStatus and 0
AckStatus and 1
SetTime and DateValue4
ClearedTime and DateValue5
You can test it here.
Good on you for the attempt! You're actually doing quite well. You need to escape square brackets that you don't mean as character classes, i.e.
.*?ID ?\[
^
And hopefully you are aware that by ([^]:]*) you are meaning, "The longest possible string of characters without a closing square bracket or colon."
You probably also want to escape the |, as that is an alternation operator in regular expressions, i.e.
\|
Long story short, your regex lacks escaping some chars, like [ and | (this one, if outside a character class - []).
So when you want to actually match the [ char, you have to use \[ (or \\[ inside the java string). Also, the negation in the group ([^]:]*) is not what it seems. You probably want just ([^:]*), which matches everything until a :.
To make it work, then, you would simply use Matcher#group(int) to retrieve the values. This is the adapted code with the final regex:
String logStr = "21:19:03 -[ 8b4]- ERROR - Jhy AlarmOccure::OnAdd - Updated existing alarm: ID [StrValue1:StrValu2|StrValue3], Instance [4053], SetStatus [0], AckStatus [1], SetTime [DateValue4], ClearedTime [DateValue5]";
Pattern twsPattern = Pattern.compile(".*?ID ?\\[([^:]*):([^|]*)\\|([^\\]]*)\\].*?SetTime ?\\[([^\\]]*)\\][^\\[]+\\[([^\\]]*)\\]");
Matcher twsMatcher = twsPattern.matcher(logStr);
if (twsMatcher.find()){
System.out.println(twsMatcher.group(1)); // StrValue1
System.out.println(twsMatcher.group(2)); // StrValu2
System.out.println(twsMatcher.group(3)); // StrValue3
System.out.println(twsMatcher.group(4)); // DateValue4
System.out.println(twsMatcher.group(5)); // DateValue5
}
I like more general solutions, but here is a very specific pattern you can use if it suits you. It will capture all of the values in a string as long as they are follow the same, very specific pattern.
ID (?:\[([^\]:]+):([^\]|]+)\|([^\]]+)\]).*?SetTime \[([^\]]+)\], ClearedTime \[([^\]]+)\]
Here is the result:
1: ID [StrValue1:StrValu2|StrValue3], Instance [4053], SetStatus [0], AckStatus [1], SetTime [DateValue4], ClearedTime [DateValue5]
[1]: StrValue1
[2]: StrValu2
[3]: StrValue3
[4]: DateValue4
[5]: DateValue5
Try it out
Multiple Matches per line
This version will just match each instance in a string of ID, SetTime, or ClearedTime followed by a bracketed value.
(ID|SetTime|ClearedTime) \[([^\]]+)\
Results
1: ID [StrValue1:StrValu2|StrValue3]
[1]: ID
[2]: StrValue1:StrValu2|StrValue3
1: SetTime [DateValue4]
[1]: SetTime
[2]: DateValue4
1: ClearedTime [DateValue5]
[1]: ClearedTime
[2]: DateValue5
Try it out

REGEXP in MySQL Return unwanted value

I have problem using REGEX in Mysql
I have oid value in database like this
id -> value
1.3.6.1.4.1 -> Value a
1.3.6.1.4.1.2499.1.1.2.1.1.1.1.1 -> Value b
1.3.6.1.4.1.2499 -> Value c
And my objecttives are
1. To get single oid & value with the specific oid that i put into sql statement
2. If no specific value then it should reverse the oid number by number until it found the newrest value
For example
If i use
[select id from tablename where '1.3.6.1.4.1.2499.1.1.2.1.1.1.1.1' REGEXP oid]
it should return only 1.3.6.1.4.1.2499.1.1.2.1.1.1.1.1 but the above sql will return all result
If i use
[select id from tablename where '1.3.6.1.4.1.24999999.5' REGEXP oid]
it should return 1.3.6.1.4.1 only but it return 1.3.6.1.4.1 and 1.3.6.1.4.1.2499
If i use
select id from tablename where '1.3.6.1.4.1.2499.1.1.2.1.1.1.1.100' REGEXP oid
it should return 1.3.6.1.4.1.2499 only but it return all ids
I am not really familiar with this REGEXP. Can anyone help me to solve this problem.
Thank you
With MySQL, you should use field REGEXP value, like this:
select id from tablename where oid REGEXP '1.3.6.1.4.1.2499.1.1.2.1.1.1.1.1'
. must be escaped with \
And to match an entire row, use ^ and $:
select id from tablename where oid REGEXP '^1\.3\.6\.1\.4\.1\.2499\.1\.1\.2\.1\.1\.1\.1\.1$'
I don't understand why do you use REGEXP when you can select by LIKE, because you don't search by a regular expression.
I don't think regex is the right tool for this job.
Instead, I'd loop over the input string, treating it as a period-delimited list.
Match the list against oid. If zero matches, remove the last list element. Repeat.

Categories

Resources