JAVA REGEX: Match until the specific character

JAVA REGEX: Match until the specific character - java

I have this Java code
String cookies = TextUtils.join(";", LoginActivity.msCookieManager.getCookieStore().getCookies());
Log.d("TheCookies", cookies);
Pattern csrf_pattern = Pattern.compile("csrf_cookie=(.+)(?=;)");
Matcher csrf_matcher = csrf_pattern.matcher(cookies);
while (csrf_matcher.find()) {
json.put("csrf_key", csrf_matcher.group(1));
Log.d("CSRF KEY", csrf_matcher.group(1));
}
The String contains something like this:
SessionID=sessiontest;csrf_cookie=e18d027da2fb95e888ebede711f1bc39;ci_session=3f4675b5b56bfd0ba4dae46249de0df7994ee21e
Im trying to get the csrf_cookie data by using this Regular Expression:
csrf_cookie=(.+)(?=;)
I expect a result like this in the code:
csrf_matcher.group(1);
e18d027da2fb95e888ebede711f1bc39
instead I get a:
3492f8670f4b09a6b3c3cbdfcc59e512;ci_session=8d823b309a361587fac5d67ad4706359b40d7bd0
What is the possible work around for this problem?

Here is a one-liner using String#replaceAll:
String input = "SessionID=sessiontest;csrf_cookie=e18d027da2fb95e888ebede711f1bc39;ci_session=3f4675b5b56bfd0ba4dae46249de0df7994ee21e";
String cookie = input.replaceAll(".*csrf_cookie=([^;]*).*", "$1");
System.out.println(cookie);
e18d027da2fb95e888ebede711f1bc39
Demo
Note: We could have used a formal regex pattern matcher, and in face you may want to do this if you need to do this search/replacement often in your code.

You are getting more data than expected because you are using an greedy '+' (It will match as long as it can)
For example the pattern a+ could match on aaa the following: a, aa, and aaa. Where the later is 'preferred' if the pattern is greedy.
So you are matching
csrf_cookie=e18d027da2fb95e888ebede711f1bc39;ci_session=3f4675b5b56bfd0ba4dae46249de0df7994ee21e;
as long as it ends with a ';'. The first ';' is skipped with .+ and the last ';' is found with the possitive lookahead
To make a patter ungreedy/lazy use +? instead of + (so a+? would match a (three times) on aaa string)
So try with:
csrf_cookie=(.+?);
or just match anything that is not a ';'
csrf_cookie=([^;]*);
that way you don't need to make it lazy.

Related

Regex to match specific file format and empty strings

I am trying to use regex to match a file in the following format:
FILTER
<data>
ORDER
<data>
Now, the <data> part is the one that I need to extract, and that would be really simple, except I have the following complications:
1) This pattern can be repeated (no line breaks inbetween)
2) The <data>s could be not there.
In particular, this file is OK:
FILTER
test1
ORDER
test2
FILTER
test3
ORDER
FILTER
ORDER
And should give me the following groups:
"test1", "test2", "test3", "", "", ""
The regex that I already tried is: (?:FILTER\n(.*)\nORDER\n(.*))*
Here is the test on regex101.
I am pretty new to regex, any help would be appreciated.

You may use a lazy-dot matching + tempered greedy token based regex:
(?s)FILTER(.*?)ORDER((?:(?!FILTER).)*)
^-^ ^--------------^
Use a DOTALL modifier with this regex. Here is a regex demo. The .*? matches any character but as few as possilbe, thus, matching up to the first ORDER. The (?:(?!FILTER).)* tempered greedy token matches any text that is not FILTER. It is a kind of a negated character class synonym for multicharacter sequences.
You can unroll it as follows:
FILTER([^O]*(?:O(?!RDER)[^O]*)*)ORDER([^F]*(?:F(?!ILTER)[^F]*)*)
See the regex demo (and this regex does not require a DOTALL mode).
String s = "FILTER\ntest1\nORDER\ntest2\nFILTER\ntest3\nORDER\nFILTER\nORDER";
Pattern pattern = Pattern.compile("(?s)FILTER(.*?)ORDER((?:(?!FILTER).)*)");
Matcher matcher = pattern.matcher(s);
List<String> results = new ArrayList<>();
while (matcher.find()){
if (matcher.group(1) != null) {
results.add(matcher.group(1).trim());
}
if (matcher.group(2) != null) {
results.add(matcher.group(2).trim());
}
}
System.out.println(results); // => [test1, test2, test3, , , ]
See the IDEONE demo
If you need to make sure the FILTER and ORDER delimiter strings appear as individual lines, just use ^ and $ around them and add MULTILINE modifier (so that ^ could match the beginning of a line and $ could match the end of the line):
(?sm)^FILTER$(.*?)^ORDER$((?:(?!^FILTER$).)*)
^^^^
See another regex.

I would use the following regex :
FILTER(?:\n(?!ORDER)(.*))?\nORDER(?:\n(?!FILTER)(.*))?
You can test it on regex101

Regular expression for multiple words with * and space

My regular expression is of format "Exit* Order*". When i use in java its not working as expected.
String pattern = "Exit* Order*";
String ipLine = "Exiting orders";
Match: NO
String pattern = "Exit Order";
String ipLine = "Exit order";
Match: Yes.
Java Code:
Pattern patrn = Pattern.compile(pattern,Pattern.CASE_INSENSITIVE);
Matcher match = patrn.matcher(ipLine);
Can any one let me know what should be the pattern in such cases.

I believe you are looking for something like:
"Exit.* Order.*"
or maybe something instead of .*, e.g. \S*, \w*, [A-Za-z]*.
Your current regular expression is looking for zero or more t and r on the ends of the words, e.g. it would match
Exi Orde
Exit Orde
Exitt Orde
Exi Order
Exi Orderr
...

Exit\\w* Order\\w*
You should use this..* can match much more than intended.use i or ignorecase flag

It seems like you just want to match "Exit Order" case-insensitively:
Try this:
if (str.matches("(?i)exit order"))
Or to restrict the match to just your examples, where the "O" of "Order may be "o", use:
if (str.matches("Exit [Oo]rder"))

Java does not use Linux regexp expression:
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Use this:
String pattern = "Exit.* Order.*";

Change group using regex java

I need help in regular expression using in regex java.
I need change group in string:
Example:
Input:
=sum($var1;2) or =if($result<10;"little";"big") ...
Need Output:
=sum(teste;2) or =if(teste<10;"little";"big") ...
Code I have:
Pattern p = Pattern.compile("(\\.*)(\\$\\w)(\\.*)");
Matcher m = p.matcher(total);
if (m.find()) {
System.out.println(m.replaceAll("$2teste"));
}
Output I have:
=sum($vtestear1;2)
=if($r testeesultado<5;"maior";"menor")

Why match everything when all you need is to match variable tokens?
Pattern p = Pattern.compile("\\b\\$[a-z0-9]+\\b");
p.matcher(total).replaceAll("teste");
Change the [a-z0-9] part if you can have more than lowercase ASCII letters and digits.
Also, you don't need to test for .find() or anything if you .replace(): no match means nothing will be replaced.

Pattern syntax error

The following regex works in the find dialog of Eclipse but throws an exception in Java.
I can't find why
(?<=(00|\\+))?[\\d]{1}[\\d]*
The syntax error is at runtime when executing:
Pattern.compile("(?<=(00|\\+))?[\\d]{1}[\\d]*")
In the find I used
(?<=(00|\+))?[\d]{1}[\d]*
I want to match phone numbers with or without the + or 00. But that is not the point because I get a Syntax error at position 13. I don't get the error if I get rid of the second "?"
Pattern.compile("(?<=(00|\\+))[\\d]{1}[\\d]*")
Please consider that instead of 1 sometime I need to use a greater number and anyway the question is about the syntax error

If your data looks like 00ddddd or +ddddd where d is digit you want to get #Bergi's regex (?<=00|\\+)\\d+ will do the trick. But if your data sometimes don't have any part that you want to ignore like ddddd then you probably should use group mechanism like
String[] data={"+123456","00123456","123456"};
Pattern p=Pattern.compile("(?:00|\\+)?(\\d+)");
Matcher m=null;
for (String s:data){
m=p.matcher(s);
if(m.find())
System.out.println(m.group(1));
}
output
123456
123456
123456

Here is an example that works for me:
public static void main(String[] args) {
Pattern pattern = Pattern.compile("(?<=00|\\+)(\\d+)");
Matcher matcher = pattern.matcher("+1123456");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
}

You might shorten your regex a lot. The character classes are not needed when there is only one class inside - just use \d. And {1} is quite useless as well. Also, you can use + for matching "one or more" (it's short for {1,}). Next the additional grouping in your lookbehind should not be needed.
And last, why is that lookbehind optional (with ?)? Just leave it away if you don't need it. This might even be the source of your pattern syntax error - a lookaround must not be optional.
Try this:
/(?<=00|\+)\d+/
Java:
"(?<=00|\\+)\\d+"

Regex for removing part of a line if it is preceded by some word in Java

There's a properties language bundle file:
label.username=Username:
label.tooltip_html=Please enter your username.</center></html>
label.password=Password:
label.tooltip_html=Please enter your password.</center></html>
How to match all lines that have both "_html" and "</center></html>" in that order and replace them with the same line except the ending "</center></html>". For example, line:
label.tooltip_html=Please enter your username.</center></html>
should become:
label.tooltip_html=Please enter your username.
Note: I would like to do this replacement using an IDE (IntelliJ IDEA, Eclipse, NetBeans...)

Since you clarified that this regex is to be used in the IDE, I tested this in Eclipse and it works:
FIND:
(_html.*)</center></html>
REPLACE WITH:
$1
Make sure you turn on the Regular expressions switch in the Find/Replace dialog. This will match any string that contains _html.* (where the .* greedily matches any string not containing newlines), followed by </center></html>. It uses (…) brackets to capture what was matched into group 1, and $1 in the replacement substitutes in what group 1 captured.
This effectively removes </center></html> if that string is preceded by _html in that line.
If there can be multiple </center></html> in a line, and they are all to be removed if there's a _html_ to their left, then the regex will be more complicated, but it can be done in one regex with \G continuing anchor if absolutely need be.
Variations
Speaking more generally, you can also match things like this:
(delete)this part only(please)
This now creates 2 capturing groups. You can match strings with this pattern and replace with $1$2, and it will effectively delete this part only, but only if it's preceded by delete and followed by please. These subpatterns can be more complicated, of course.

if (line.contains("_html=")) {
line = line.replace("</center></html>", "");
}
No regExp needed here ;) (edit) as long as all lines of the property file are well formed.

String s = "label.tooltip_html=Please enter your password.</center></html>";
Pattern p = Pattern.compile("(_html.*)</center></html>");
Matcher m = p.matcher(s);
System.out.println(m.replaceAll("$1"));

Try something like this:
Pattern p = Pattern.compile(".*(_html).*</center></html>");
Matcher m = p.matcher(input_line); // get a matcher object
String output = input_line;
if (m.matches()) {
String output = input_line.replace("</center></html>", "");
}

/^(.*)<\/center><\/html>/
finds you the
label.tooltip_html=Please enter your username.
part. then you can just put the string together correctly.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

JAVA REGEX: Match until the specific character - java

Related

Regex to match specific file format and empty strings

Regular expression for multiple words with * and space

Change group using regex java

Pattern syntax error

Regex for removing part of a line if it is preceded by some word in Java

Categories

Resources