Given the following text
KEYWORD This is a test
We want to match the following groups 1:YES 2:YES 3:YES
I want to match with "1:YES", "2:YES" and "3:YES" using
((\d):YES)
If and only if the first word in the complete text is "KEYWORD"
Given this test:
This is a test
We want to match the following groups 1:YES 2:YES 3:YES
No matches should be found
Java (as with most regex engines) doesn't support unbounded length look behinds, however there is a work-around!
String str = "KEYWORD This is a test\n" +
"We want to match the following groups 1:YES 2:YES 3:YES";
Matcher matcher = Pattern.compile("(?s)(?<=\\AKEYWORD\\b.{1,99999})(\\d+:YES)").matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Which outputs:
1:YES
2:YES
3:YES
The trick here is the look behind (?<=\\AKEYWORD.{1,99999}) which has a large (but not unbounded) length. (?s) means DOTALL flag (dot matches newline too) and \A means start of input which is needed because ^ matches start of line when DOTALL flag is used.
Without tricking lookbhinds in Java you can capture \d+:YES\b strings with using \G. \G causes a match to start from where previous match ended or it will match beginning of string the same as \A.
We are in need of its first capability:
(?:\AKEYWORD|\G(?!\A))[\s\S]*?(\d:YES\b)
Breakdown:
(?: Start of non-capturing group
\A Match beginning of subject string
KEYWORD Match keyword
| Or
\G(?!\A) Continue from where previous match ends
) End of NCG
[\s\S]*? Match anything else un-greedily
(\d+:YES\b) Match and capture our desired part
Live demo
Java code:
Pattern p = Pattern.compile("(?:\\AKEYWORD|\\G(?!\\A))[\\s\\S]*?(\\d+:YES\\b)");
Matcher m = p.matcher(string);
while (m.find()) {
System.out.println(m.group(1));
}
Live demo
Related
I'm searching for patterns in a String starting with ATG, ending with TAG, TAA or TGA and length = multiple of 3. ATG and TAG, TAA or TGA can only appear at respectively beginning or end. Which means:
From ATGTTGTGATGT extract ATGTTGTGA
From ATGATGTTGTGATGT extract ATGTTGTGA
Currently I'm using regex (ATG)([ATG]{3})+?(TAG|TAA|TGA).
For ATGATGTTGTGATGT this gets me the wrong result ATGATGTTGTGA.
I've tried:
(^ATG)(!?=.*ATG)([ATG]{3})+?(TAG|TAA|TGA)
(^ATG)(!?=(ATG)+)([ATG]{3})+?(TAG|TAA|TGA)
How to tell it to contain ATG only once in the beginning and no more after that?
You may use
ATG(?:(?!ATG)[ATG]{3})*?(?:TAG|TAA|TGA)
See the regex demo
Details
ATG - an ATG substring
(?:(?!ATG)[ATG]{3})*? - a tempered greedy token matching any sequence of 3 chars from the [ATG] character set that is not equal to ATG (that is restricted with the negative lookahead (?!ATG))
(?:TAG|TAA|TGA) - either of the three alternatives defined in the non-capturing group: TAG, TAA or TGA.
Java demo:
String rx = "ATG(?:(?!ATG)[ATG]{3})*?(?:TAG|TAA|TGA)";
String s = "ATGTTGTGATGT, ATGATGTTGTGATGT, ATGATGTTGTGATGT";
Pattern pattern = Pattern.compile(rx);
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Result:
ATGTTGTGA
ATGTTGTGA
ATGTTGTGA
I have a text cisco configuration.
The hostname line I should match is "125-hostname billdevice".
I am using below pattern but not matching true.
Pattern ciscohostname = Pattern.compile("^[0-9999999]-hostname");
Matcher matcherx = ciscohostname.matcher(BlockIndexList.get(k).toString());
How can I match this line.
What you want is
"^[0-9]+-hostname"
This means:
Match if string starts with at least one character in range of [0-9](aka. digits) followed by string "-hostname"
As you've specified a range in your code (i.e., 0-9999999) then you can use this RegEx
^[0-9]{1,7}-hostname
This will ensure that only 1 to 7 digit numbers are matched and any number more than that will be eliminated.
0-hostname billdevice //match
9999999-hostname billdevice //match
10000000-hostname billdevice //no match
DEMO
I am trying to use regex to match a file in the following format:
FILTER
<data>
ORDER
<data>
Now, the <data> part is the one that I need to extract, and that would be really simple, except I have the following complications:
1) This pattern can be repeated (no line breaks inbetween)
2) The <data>s could be not there.
In particular, this file is OK:
FILTER
test1
ORDER
test2
FILTER
test3
ORDER
FILTER
ORDER
And should give me the following groups:
"test1", "test2", "test3", "", "", ""
The regex that I already tried is: (?:FILTER\n(.*)\nORDER\n(.*))*
Here is the test on regex101.
I am pretty new to regex, any help would be appreciated.
You may use a lazy-dot matching + tempered greedy token based regex:
(?s)FILTER(.*?)ORDER((?:(?!FILTER).)*)
^-^ ^--------------^
Use a DOTALL modifier with this regex. Here is a regex demo. The .*? matches any character but as few as possilbe, thus, matching up to the first ORDER. The (?:(?!FILTER).)* tempered greedy token matches any text that is not FILTER. It is a kind of a negated character class synonym for multicharacter sequences.
You can unroll it as follows:
FILTER([^O]*(?:O(?!RDER)[^O]*)*)ORDER([^F]*(?:F(?!ILTER)[^F]*)*)
See the regex demo (and this regex does not require a DOTALL mode).
String s = "FILTER\ntest1\nORDER\ntest2\nFILTER\ntest3\nORDER\nFILTER\nORDER";
Pattern pattern = Pattern.compile("(?s)FILTER(.*?)ORDER((?:(?!FILTER).)*)");
Matcher matcher = pattern.matcher(s);
List<String> results = new ArrayList<>();
while (matcher.find()){
if (matcher.group(1) != null) {
results.add(matcher.group(1).trim());
}
if (matcher.group(2) != null) {
results.add(matcher.group(2).trim());
}
}
System.out.println(results); // => [test1, test2, test3, , , ]
See the IDEONE demo
If you need to make sure the FILTER and ORDER delimiter strings appear as individual lines, just use ^ and $ around them and add MULTILINE modifier (so that ^ could match the beginning of a line and $ could match the end of the line):
(?sm)^FILTER$(.*?)^ORDER$((?:(?!^FILTER$).)*)
^^^^
See another regex.
I would use the following regex :
FILTER(?:\n(?!ORDER)(.*))?\nORDER(?:\n(?!FILTER)(.*))?
You can test it on regex101
I need help in regular expression using in regex java.
I need change group in string:
Example:
Input:
=sum($var1;2) or =if($result<10;"little";"big") ...
Need Output:
=sum(teste;2) or =if(teste<10;"little";"big") ...
Code I have:
Pattern p = Pattern.compile("(\\.*)(\\$\\w)(\\.*)");
Matcher m = p.matcher(total);
if (m.find()) {
System.out.println(m.replaceAll("$2teste"));
}
Output I have:
=sum($vtestear1;2)
=if($r testeesultado<5;"maior";"menor")
Why match everything when all you need is to match variable tokens?
Pattern p = Pattern.compile("\\b\\$[a-z0-9]+\\b");
p.matcher(total).replaceAll("teste");
Change the [a-z0-9] part if you can have more than lowercase ASCII letters and digits.
Also, you don't need to test for .find() or anything if you .replace(): no match means nothing will be replaced.
There's a properties language bundle file:
label.username=Username:
label.tooltip_html=Please enter your username.</center></html>
label.password=Password:
label.tooltip_html=Please enter your password.</center></html>
How to match all lines that have both "_html" and "</center></html>" in that order and replace them with the same line except the ending "</center></html>". For example, line:
label.tooltip_html=Please enter your username.</center></html>
should become:
label.tooltip_html=Please enter your username.
Note: I would like to do this replacement using an IDE (IntelliJ IDEA, Eclipse, NetBeans...)
Since you clarified that this regex is to be used in the IDE, I tested this in Eclipse and it works:
FIND:
(_html.*)</center></html>
REPLACE WITH:
$1
Make sure you turn on the Regular expressions switch in the Find/Replace dialog. This will match any string that contains _html.* (where the .* greedily matches any string not containing newlines), followed by </center></html>. It uses (…) brackets to capture what was matched into group 1, and $1 in the replacement substitutes in what group 1 captured.
This effectively removes </center></html> if that string is preceded by _html in that line.
If there can be multiple </center></html> in a line, and they are all to be removed if there's a _html_ to their left, then the regex will be more complicated, but it can be done in one regex with \G continuing anchor if absolutely need be.
Variations
Speaking more generally, you can also match things like this:
(delete)this part only(please)
This now creates 2 capturing groups. You can match strings with this pattern and replace with $1$2, and it will effectively delete this part only, but only if it's preceded by delete and followed by please. These subpatterns can be more complicated, of course.
if (line.contains("_html=")) {
line = line.replace("</center></html>", "");
}
No regExp needed here ;) (edit) as long as all lines of the property file are well formed.
String s = "label.tooltip_html=Please enter your password.</center></html>";
Pattern p = Pattern.compile("(_html.*)</center></html>");
Matcher m = p.matcher(s);
System.out.println(m.replaceAll("$1"));
Try something like this:
Pattern p = Pattern.compile(".*(_html).*</center></html>");
Matcher m = p.matcher(input_line); // get a matcher object
String output = input_line;
if (m.matches()) {
String output = input_line.replace("</center></html>", "");
}
/^(.*)<\/center><\/html>/
finds you the
label.tooltip_html=Please enter your username.
part. then you can just put the string together correctly.