a strange regular on look behind - java

i write a piece of program to fetch content from a string between ":"(may not have) and "#" and order guaranteed,for example a string like "url:123#my.com",the I fetch "123",or "123#my.com" then i fetch "123" ,too; so I write a regular expression to implement it ,but i can not work,behind is first version:
Pattern pattern = Pattern.compile("(?<=:?).*?(?=#)");
Matcher matcher = pattern.matcher("sip:+8610086#dmcw.com");
if (matcher.find()) {
Log.d("regex", matcher.group());
} else {
Log.d("regex", "not match");
}
it can not work because in the first case:"url:123#my.com" it will get the result:"url:123"
obviously not what i want:
so i write the second version:
Pattern pattern = Pattern.compile("(?<=:??).*?(?=#)");
but it get the error,somebody said java not support variable length in look behind;
so I try the third version:
Pattern pattern = Pattern.compile("(?<=:).*?(?=#)|.*?(?=#)");
and its result is same as the first version ,BUT SHOULD NOT THE FIRST CONDITION BE CONSIDERED FIRST?
it same as
Pattern pattern = Pattern.compile(".*?(?=#)|(?<=:).*?(?=#)");
not left to right! I consider I understood regular expression before ,but confused again.thanks in advance anyway.

Try this (slightly edited, see comments):
String test = "sip:+8610086#dmcw.com";
String test2 = "8610086#dmcw.com";
Pattern pattern = Pattern.compile("(.+?:)?(.+?)(?=#)");
Matcher matcher = pattern.matcher(test);
if (matcher.find()) {
System.out.println(matcher.group(2));
}
matcher = pattern.matcher(test2);
if (matcher.find()) {
System.out.println(matcher.group(2));
}
Output:
+8610086
8610086
Let me know if you need explanations on the pattern.

You really don't need any look-aheads or look-behinds here. What you need can be accomplished by using a a greedy quantifer and some alternation:
.*(?:^|:)([^#]+)
By default java regular expression quantifiers (*+{n}?) are all greedy (will match as many characters as possible until a match can't be found. They can be made lazy by using a question mark after the quantifier like so: .*?
You will want to output capture group 1 for this expression, outputting capture group 0 will return the entire match.

As you said, you can't do a variable lookbehind in java.
Then, you can do something like this, you don't need lookbehind or lookaround.
Regex: :?([^#:]*)#
Example In this example (forget about \n, its because of regex101) you will get in the first group what you need, and you don't have to do anything special. Sometimes the easiest solution is the best.

Related

Regex For All String Except Certain Characters

I am trying to write a regular expression that matches a certain word that is not preceded by 2 dashes (--) or a slash and a star (/*). I tried some expression but none seem to work.
Below is the text I am testing on
a_func(some_param);
/* a comment initialization */
init;
I am trying to write a regex that will only match the word init in the last line alone, what I've tried so far is matching the word init in initialization and the last line, I tried to look for existing answers, and found that used negative lookahead, but it was still matching init in initialization. Below are the expressions I tried:
(?!\/\*).*(init)
[^(\-\-|\/\*)].*(init)
(?<!\/\*).*(init) While reading in regex101's quick reference, I found this negative lookbehind which I believe had a similar example to what I need but I was still not able to get what I want, should I look into the negative lookbehind more or is this not how I achieve what I want?
My knowledge in regular expression is not that extensive, so I don't know if it is possible for what I want or not, but is it doable?
Assuming the -- or /* are on the same line as the init, there are some options. As the commenters said, multiline comments will likely require stronger techniques.
The simplest way I know is to actually preprocess the strings to remove the --.*$ and /\*.*$, then look for init (or init\b if you don't want to match initialization):
String input = "if init then foo";
String checkme = input.replaceAll("--.*$", "").replaceAll("/\\*.*$", "");
Pattern pattern = Pattern.compile("init"); // or "init\\b"
Matcher matcher = pattern.matcher(checkme);
System.out.println(matcher.find());
You can also use negative lookbehind as in #olsli's answer.
You can start with:
String input = "/*init";
Pattern pattern = Pattern.compile("^((?!--|\\/\\*).)*init");
Matcher matcher = pattern.matcher(input);
System.out.println(matcher.find());
I have added more braces to separate things out. This should also work, tested it in Regexr and IDEONE
Pattern p = Pattern.compile("^(?!=((\\/\\*)|(--)))([.]*?)init[.]*$", Pattern.MULTILINE|Pattern.CASE_INSENSITIVE);
String s = "/* Initialisation";
Matcher m = p.matcher(s);
m.find(); /* should return you >-1 if there's a match

Grouping multiple digits prior to a known value

I'm executing this regex code expecting a grouping value of 11, but am getting a 1. Seems like the grouping contains the correct regex for getting one or more digits prior to a known value. I'm sure it is simple, bit I cannot seem to figure it out.
String mydata = "P0Y0M0W0DT11H0M0S";
Pattern pattern = Pattern.compile("P.*(\\\\d+)H.*");
Matcher matcher = pattern.matcher(mydata);
if (matcher.find()){
System.out.println(matcher.group(1));
}
Try this
public static void main(String a1[]) {
String mydata = "P0Y0M0W0DT11H0M0S";
Pattern pattern = Pattern.compile("P.*?(\\d+)H.*");
Matcher matcher = pattern.matcher(mydata);
if (matcher.find()){
System.out.println(matcher.group(1));
}
}
Output
11
The problem is that .* will try to consume/match as much as possible before the next part is checked. Thus in your regex P.*(\d+)H.* the first .* will match 0Y0M0W0DT1 since that's as much as can be matched with the group still being able to match a single digit afterwards.
If you make that quantifier lazy/reluctant (i.e. .*?), it will try to match as little as possible so of the possible matches 0Y0M0W0DT1 and 0Y0M0W0DT it will select the shorter one and leave all the digits for the group to match.
Thus the regex P.*?(\d+)H.* should do what you want.
Additional note: since you're using Matcher#find() you'd not need the catch-all-expression .* at the end. It would also match any string that contains the character H preceeded by at least one digit and a P somewhere in front of those digits. So if you want to be more restrictive your regex would need to be enhanced.

Java conditional regex

I have such text:
120.65UAH Produkti Kvartal
5*14 14:24
Bal. 16603.52UAH
What I want to do:
If this text contains "5*14", I need to get 16603.52 via one java reg exp.
this
and this
and this
I tried to create conditional regexp like this:
(5*14 ([\d\.*]+)UAH)
(5*14 d{2}:d{2} Bal. ([\d\.*]+))
etc
But no luck, can you please share your th
You can use a regex like this:
(?=5\*14)[\s\S]*?(\d{5}\.\d{2})
Working demo
Update: you even don't need the look ahead, you can just use:
5\*14[\s\S]*?(\d{5}\.\d{2})
(\d*\.\d\d)(?>\w*)$
will match a group on the last set of DDDDD.DD in the line. You will need to take the contents of the first matching group.
If you have 5*14 before the float number you need to get, you can just use
(?s)\\b5\\*14\\b.*?\\b(\\d+\\.\\d+)
See demo. The value will be in Group 1. I also used Java escaping style.
Note that 5\*14 can match in 145*143 that is why I am using word boundaries \b. .*? with (?s) matches any number of any symbols but as few as possible. \d+\.\d+ matches simple float number (irrespective of the number of digits there are in it).
IDEONE demo:
String str = "120.65UAH Produkti Kvartal\n5*14 14:24\nBal. 16603.52UAH";
Pattern ptrn = Pattern.compile("(?s)\\b5\\*14\\b.*?\\b(\\d+\\.\\d+)");
Matcher matcher = ptrn.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Result: 16603.52

Why does this regex capture the excluded character?

I have a regex like this:
(?:(\\s| |\\A|^))(?:#)[A-Za-z0-9]{2,}
What I am trying to do is find a pattern that starts with an # and has two or more characters after, however it can't start in the middle of a word.
I'm new to regex but was under the impression ?: matches but then excludes the character however my regex seems to match but include the characters. Ideally I'd like for "#test" to return "test" and "test#test" to not match at all.
Can anyone tell me what I've done wrong?
Thanks.
Your understanding is incorrect. The difference between (...) and (?:...) is only that the former also creates a numbered match group which can be referred to with a backreference from within the regex, or as a captured match group from code following the match.
You could change the code to use lookbehinds, but the simple and straightforward fix is to put ([A-Za-z0-9]{2,}) inside regular parentheses, like I have done here, and retrieve the first matched group. (The # doesn't need any parentheses around it in this scenario, but the ones you have are harmless.)
Try this : You could use word boundary to specify your condition.
public static void main(String[] args) {
String s1 = "#test";
String s2 = "test#test";
String pattern = "\\b#\\w{2,}\\b";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(s1);
m.find();
System.out.println(m.group());
}
o/p :
#test
throws `IllegalStateException` in the second case (s2)..
How about:
\W#[\S]{2}[\S]*
The strings caught by this regular expression needs to be trimmed and remove the first character.
I guess you better need the following one:
(?<=(?<!\w)#)\w{2,}
Debuggex Demo
Don't forget to escape the backslashes in Java since in a string literal:
(?<=(?<!\\w)#)\\w{2,}

Java Regex Question - Ignore Quotations

I am trying to write a program using regex. The format for an identifier, as I might have explained in another question of mine, is that it can only begin with a letter (and the rest of it can contain whatever). I have this part worked out for the most part.
However, anything within quotes cannot count as an identifier either.
Currently I am using Pattern pattern = Pattern.compile("[A-Za-z][_A-Za-z0-9]*"); as my pattern, which indicates that the first character can only be letters. So how can I edit this to check if the word is surrounded by quotations (and EXCLUSE those words)?
Use negative lookaround assertions:
"(?<!\")\\b[A-Za-z][_A-Za-z0-9]*\\b(?!\")"
Example:
Pattern pattern = Pattern.compile("(?<!\")\\b[A-Za-z][_A-Za-z0-9]*\\b(?!\")");
Matcher matcher = pattern.matcher("Foo \"bar\" baz");
while (matcher.find())
{
System.out.println(matcher.group());
}
Output:
Foo
baz
See it working online: ideone.
Use lookarounds.
"(?<![\"A-Za-z])[A-Z...
The (?<![\"A-Za-z]) part means "if the previous character is not a quotation mark or a letter".

Categories

Resources