Regex to match specific file format and empty strings - java

I am trying to use regex to match a file in the following format:
FILTER
<data>
ORDER
<data>
Now, the <data> part is the one that I need to extract, and that would be really simple, except I have the following complications:
1) This pattern can be repeated (no line breaks inbetween)
2) The <data>s could be not there.
In particular, this file is OK:
FILTER
test1
ORDER
test2
FILTER
test3
ORDER
FILTER
ORDER
And should give me the following groups:
"test1", "test2", "test3", "", "", ""
The regex that I already tried is: (?:FILTER\n(.*)\nORDER\n(.*))*
Here is the test on regex101.
I am pretty new to regex, any help would be appreciated.

You may use a lazy-dot matching + tempered greedy token based regex:
(?s)FILTER(.*?)ORDER((?:(?!FILTER).)*)
^-^ ^--------------^
Use a DOTALL modifier with this regex. Here is a regex demo. The .*? matches any character but as few as possilbe, thus, matching up to the first ORDER. The (?:(?!FILTER).)* tempered greedy token matches any text that is not FILTER. It is a kind of a negated character class synonym for multicharacter sequences.
You can unroll it as follows:
FILTER([^O]*(?:O(?!RDER)[^O]*)*)ORDER([^F]*(?:F(?!ILTER)[^F]*)*)
See the regex demo (and this regex does not require a DOTALL mode).
String s = "FILTER\ntest1\nORDER\ntest2\nFILTER\ntest3\nORDER\nFILTER\nORDER";
Pattern pattern = Pattern.compile("(?s)FILTER(.*?)ORDER((?:(?!FILTER).)*)");
Matcher matcher = pattern.matcher(s);
List<String> results = new ArrayList<>();
while (matcher.find()){
if (matcher.group(1) != null) {
results.add(matcher.group(1).trim());
}
if (matcher.group(2) != null) {
results.add(matcher.group(2).trim());
}
}
System.out.println(results); // => [test1, test2, test3, , , ]
See the IDEONE demo
If you need to make sure the FILTER and ORDER delimiter strings appear as individual lines, just use ^ and $ around them and add MULTILINE modifier (so that ^ could match the beginning of a line and $ could match the end of the line):
(?sm)^FILTER$(.*?)^ORDER$((?:(?!^FILTER$).)*)
^^^^
See another regex.

I would use the following regex :
FILTER(?:\n(?!ORDER)(.*))?\nORDER(?:\n(?!FILTER)(.*))?
You can test it on regex101

Related

JAVA REGEX: Match until the specific character

I have this Java code
String cookies = TextUtils.join(";", LoginActivity.msCookieManager.getCookieStore().getCookies());
Log.d("TheCookies", cookies);
Pattern csrf_pattern = Pattern.compile("csrf_cookie=(.+)(?=;)");
Matcher csrf_matcher = csrf_pattern.matcher(cookies);
while (csrf_matcher.find()) {
json.put("csrf_key", csrf_matcher.group(1));
Log.d("CSRF KEY", csrf_matcher.group(1));
}
The String contains something like this:
SessionID=sessiontest;csrf_cookie=e18d027da2fb95e888ebede711f1bc39;ci_session=3f4675b5b56bfd0ba4dae46249de0df7994ee21e
Im trying to get the csrf_cookie data by using this Regular Expression:
csrf_cookie=(.+)(?=;)
I expect a result like this in the code:
csrf_matcher.group(1);
e18d027da2fb95e888ebede711f1bc39
instead I get a:
3492f8670f4b09a6b3c3cbdfcc59e512;ci_session=8d823b309a361587fac5d67ad4706359b40d7bd0
What is the possible work around for this problem?
Here is a one-liner using String#replaceAll:
String input = "SessionID=sessiontest;csrf_cookie=e18d027da2fb95e888ebede711f1bc39;ci_session=3f4675b5b56bfd0ba4dae46249de0df7994ee21e";
String cookie = input.replaceAll(".*csrf_cookie=([^;]*).*", "$1");
System.out.println(cookie);
e18d027da2fb95e888ebede711f1bc39
Demo
Note: We could have used a formal regex pattern matcher, and in face you may want to do this if you need to do this search/replacement often in your code.
You are getting more data than expected because you are using an greedy '+' (It will match as long as it can)
For example the pattern a+ could match on aaa the following: a, aa, and aaa. Where the later is 'preferred' if the pattern is greedy.
So you are matching
csrf_cookie=e18d027da2fb95e888ebede711f1bc39;ci_session=3f4675b5b56bfd0ba4dae46249de0df7994ee21e;
as long as it ends with a ';'. The first ';' is skipped with .+ and the last ';' is found with the possitive lookahead
To make a patter ungreedy/lazy use +? instead of + (so a+? would match a (three times) on aaa string)
So try with:
csrf_cookie=(.+?);
or just match anything that is not a ';'
csrf_cookie=([^;]*);
that way you don't need to make it lazy.

Regex match only if text contains something before

Given the following text
KEYWORD This is a test
We want to match the following groups 1:YES 2:YES 3:YES
I want to match with "1:YES", "2:YES" and "3:YES" using
((\d):YES)
If and only if the first word in the complete text is "KEYWORD"
Given this test:
This is a test
We want to match the following groups 1:YES 2:YES 3:YES
No matches should be found
Java (as with most regex engines) doesn't support unbounded length look behinds, however there is a work-around!
String str = "KEYWORD This is a test\n" +
"We want to match the following groups 1:YES 2:YES 3:YES";
Matcher matcher = Pattern.compile("(?s)(?<=\\AKEYWORD\\b.{1,99999})(\\d+:YES)").matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Which outputs:
1:YES
2:YES
3:YES
The trick here is the look behind (?<=\\AKEYWORD.{1,99999}) which has a large (but not unbounded) length. (?s) means DOTALL flag (dot matches newline too) and \A means start of input which is needed because ^ matches start of line when DOTALL flag is used.
Without tricking lookbhinds in Java you can capture \d+:YES\b strings with using \G. \G causes a match to start from where previous match ended or it will match beginning of string the same as \A.
We are in need of its first capability:
(?:\AKEYWORD|\G(?!\A))[\s\S]*?(\d:YES\b)
Breakdown:
(?: Start of non-capturing group
\A Match beginning of subject string
KEYWORD Match keyword
| Or
\G(?!\A) Continue from where previous match ends
) End of NCG
[\s\S]*? Match anything else un-greedily
(\d+:YES\b) Match and capture our desired part
Live demo
Java code:
Pattern p = Pattern.compile("(?:\\AKEYWORD|\\G(?!\\A))[\\s\\S]*?(\\d+:YES\\b)");
Matcher m = p.matcher(string);
while (m.find()) {
System.out.println(m.group(1));
}
Live demo

Split and replace Java string

I am trying to read a text file, split the contents as explained below, and append the split comments in to a Java List.
The error is in the splitting part.
Existing String:
a1(X1, UniqueVar1), a2(X2, UniqueVar1), a3(UniqueVar1, UniqueVar2)
Expected—to split them and append them to Java list:
a1(X1, UniqueVar1)
a2(X2, UniqueVar1)
a3(UniqueVar1, UniqueVar2)
Code:
subSplit = obj.split("\\), ");
for (String subObj: subSplit)
{
System.out.println(subObj.trim());
}
Result:
a1(X1, UniqueVar1
a2(X2, UniqueVar1
...
Please suggest how to correct this.
Use a positive lookbehind in your regular expression:
String[] subSplit = obj.split("(?<=\\)), ");
This expression matches a , preceded by a ), but because the lookbehind part (?<=\\)) is non-capturing (zero-width), it doesn't get discarded as being part of the split separator.
More information about lookaround assertions and non-capturing groups can be found in the javadoc of the Pattern class.

Change group using regex java

I need help in regular expression using in regex java.
I need change group in string:
Example:
Input:
=sum($var1;2) or =if($result<10;"little";"big") ...
Need Output:
=sum(teste;2) or =if(teste<10;"little";"big") ...
Code I have:
Pattern p = Pattern.compile("(\\.*)(\\$\\w)(\\.*)");
Matcher m = p.matcher(total);
if (m.find()) {
System.out.println(m.replaceAll("$2teste"));
}
Output I have:
=sum($vtestear1;2)
=if($r testeesultado<5;"maior";"menor")
Why match everything when all you need is to match variable tokens?
Pattern p = Pattern.compile("\\b\\$[a-z0-9]+\\b");
p.matcher(total).replaceAll("teste");
Change the [a-z0-9] part if you can have more than lowercase ASCII letters and digits.
Also, you don't need to test for .find() or anything if you .replace(): no match means nothing will be replaced.

Regex for removing part of a line if it is preceded by some word in Java

There's a properties language bundle file:
label.username=Username:
label.tooltip_html=Please enter your username.</center></html>
label.password=Password:
label.tooltip_html=Please enter your password.</center></html>
How to match all lines that have both "_html" and "</center></html>" in that order and replace them with the same line except the ending "</center></html>". For example, line:
label.tooltip_html=Please enter your username.</center></html>
should become:
label.tooltip_html=Please enter your username.
Note: I would like to do this replacement using an IDE (IntelliJ IDEA, Eclipse, NetBeans...)
Since you clarified that this regex is to be used in the IDE, I tested this in Eclipse and it works:
FIND:
(_html.*)</center></html>
REPLACE WITH:
$1
Make sure you turn on the Regular expressions switch in the Find/Replace dialog. This will match any string that contains _html.* (where the .* greedily matches any string not containing newlines), followed by </center></html>. It uses (…) brackets to capture what was matched into group 1, and $1 in the replacement substitutes in what group 1 captured.
This effectively removes </center></html> if that string is preceded by _html in that line.
If there can be multiple </center></html> in a line, and they are all to be removed if there's a _html_ to their left, then the regex will be more complicated, but it can be done in one regex with \G continuing anchor if absolutely need be.
Variations
Speaking more generally, you can also match things like this:
(delete)this part only(please)
This now creates 2 capturing groups. You can match strings with this pattern and replace with $1$2, and it will effectively delete this part only, but only if it's preceded by delete and followed by please. These subpatterns can be more complicated, of course.
if (line.contains("_html=")) {
line = line.replace("</center></html>", "");
}
No regExp needed here ;) (edit) as long as all lines of the property file are well formed.
String s = "label.tooltip_html=Please enter your password.</center></html>";
Pattern p = Pattern.compile("(_html.*)</center></html>");
Matcher m = p.matcher(s);
System.out.println(m.replaceAll("$1"));
Try something like this:
Pattern p = Pattern.compile(".*(_html).*</center></html>");
Matcher m = p.matcher(input_line); // get a matcher object
String output = input_line;
if (m.matches()) {
String output = input_line.replace("</center></html>", "");
}
/^(.*)<\/center><\/html>/
finds you the
label.tooltip_html=Please enter your username.
part. then you can just put the string together correctly.

Categories

Resources