Looking for advices for regex

Looking for advices for regex - java

I'm trying to create a regex which has to match these patterns:
\n 700000000123
I mean this "\n"+"white space"+"12 digits"
So I tried:
(\\\\)(n)(\\s)(\\d{12})
or something like this:
(\\\\)(n)(\\s)(\\[0-9]{12})
But it still doesn't work. for me {12} means repeat a digit \d or [0-9], 12 times ?
My idea is a java code which could check if a string contains this regex:
Boolean result = false;
String string_to_match = "a random string \n 700000000123"
String re1="(\\\\)";
String re2="(n)";
String re3="(\\s)";
String re4="([0-9]{11})";
Pattern p = Pattern.compile(re1+re2+re3+re4, Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
if (string_to_match.contains(p.toString()){
result = true;
}
I tried to use: http://www.txt2re.com/ to help me.
Have you any advices to build this regex ? I would like to understand why at the moment it doesn't work.

You need to use String#matches instead of String#contains to match a regex.
Following should work:
String re1="(\\n)";
String re2="( )";
String re3="(\\d{12})";
Pattern p = Pattern.compile(re1+re2+re3, Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
System.out.println("\n 700000000123".matches(p.pattern())); // true
Or simply:
System.out.println( "\n 700000000123".matches("(\n)( )(\\d{12})") ); // true

You can wrap it up in a chain of invocations without compromising to matching the entire input.
For instance:
String input = "\n 700000000123";
System.out.println(Pattern.compile("\n\\s\\d{12}").matcher(input).find());
Output
true

^\\\\n\\s+\\d{12}$
I guess this should work for you.See demo.
https://regex101.com/r/eZ0yP4/33

\n will work only in Unix machines. In Windows it is \r\n. Please use System.getProperty("line.separator") if you want your code to work both in linux and windows.
Use the following
System.out.println(Pattern.compile(System.getProperty("line.separator")+"\s\d{12}").matcher(input).find());

Related

How to parse string using regex

I'm pretty new to java, trying to find a way to do this better. Potentially using a regex.
String text = test.get(i).toString()
// text looks like this in string form:
// EnumOption[enumId=test,id=machine]
String checker = text.replace("[","").replace("]","").split(",")[1].split("=")[1];
// checker becomes machine
My goal is to parse that text string and just return back machine. Which is what I did in the code above.
But that looks ugly. I was wondering what kinda regex can be used here to make this a little better? Or maybe another suggestion?

Use a regex' lookbehind:
(?<=\bid=)[^],]*
See Regex101.
(?<= ) // Start matching only after what matches inside
\bid= // Match "\bid=" (= word boundary then "id="),
[^],]* // Match and keep the longest sequence without any ']' or ','
In Java, use it like this:
import java.util.regex.*;
class Main {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("(?<=\\bid=)[^],]*");
Matcher matcher = pattern.matcher("EnumOption[enumId=test,id=machine]");
if (matcher.find()) {
System.out.println(matcher.group(0));
}
}
}
This results in
machine

Assuming you’re using the Polarion ALM API, you should use the EnumOption’s getId method instead of deparsing and re-parsing the value via a string:
String id = test.get(i).getId();

Using the replace and split functions don't take the structure of the data into account.
If you want to use a regex, you can just use a capturing group without any lookarounds, where enum can be any value except a ] and comma, and id can be any value except ].
The value of id will be in capture group 1.
\bEnumOption\[enumId=[^=,\]]+,id=([^\]]+)\]
Explanation
\bEnumOption Match EnumOption preceded by a word boundary
\[enumId= Match [enumId=
[^=,\]]+, Match 1+ times any char except = , and ]
id= Match literally
( Capture group 1
[^\]]+ Match 1+ times any char except ]
)\]
Regex demo | Java demo
Pattern pattern = Pattern.compile("\\bEnumOption\\[enumId=[^=,\\]]+,id=([^\\]]+)\\]");
Matcher matcher = pattern.matcher("EnumOption[enumId=test,id=machine]");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
machine
If there can be more comma separated values, you could also only match id making use of negated character classes [^][]* before and after matching id to stay inside the square bracket boundaries.
\bEnumOption\[[^][]*\bid=([^,\]]+)[^][]*\]
In Java
String regex = "\\bEnumOption\\[[^][]*\\bid=([^,\\]]+)[^][]*\\]";
Regex demo

A regex can of course be used, but sometimes is less performant, less readable and more bug-prone.
I would advise you not use any regex that you did not come up with yourself, or at least understand completely.
PS: I think your solution is actually quite readable.
Here's another non-regex version:
String text = "EnumOption[enumId=test,id=machine]";
text = text.substring(text.lastIndexOf('=') + 1);
text = text.substring(0, text.length() - 1);
Not doing you a favor, but the downvote hurt, so here you go:
String input = "EnumOption[enumId=test,id=machine]";
Matcher matcher = Pattern.compile("EnumOption\\[enumId=(.+),id=(.+)\\]").matcher(input);
if(!matcher.matches()) {
throw new RuntimeException("unexpected input: " + input);
}
System.out.println("enumId: " + matcher.group(1));
System.out.println("id: " + matcher.group(2));

How to use groups in Regular Expression In java?

I have this code and I want to find both 1234 and 4321 but currently I can only get 4321. How could I fix this problem?
String a = "frankabc123 1234 frankabc frankabc123 4321 frankabc";
String rgx = "frank.* ([0-9]*) frank.*";
Pattern patternObject = Pattern.compile(rgx);
Matcher matcherObject = patternObject.matcher(a);
while (matcherObject.find()) {
System.out.println(matcherObject.group(1));
}

Your regex is too greedy. Make it non-greedy.
String rgx = "frank.*? ([0-9]+) frank";

Your r.e. is incorrect. The first part: frank.* matches everything and then backtracks until the rest of the match succeeds. Try this instead:
String rgx = "frank.*? ([0-9]*) frank";
The ? after the quantifier will make it reluctant, matching as few characters as necessary for the rest of the pattern to match. The trailing .* is also causing problems (as nhahtdh pointed out in a comment).

Extract text from two different types of chars using regex

I'm trying to extract strings from a text that features two different types of characters.
The characters are | and # and the text is coming from an outside source.
I will give you an example:
Input: #hello|#what|whatsup| should return hello| and whatsup.
Input: #hello# should return hello
Input: |ola|1 should return ola
Input: |hello#|what#whatsup#node should return hello# and whatsup

This works for your strings. I don't know if I completely understood what you need, but I think it can be tuned if necessary:
String s1 = "#hello|#what|whatsup|";
String s2 = "#hello#";
String s3 = "|ola|1";
String s4 = "|hello#|what#whatsup#node";
Pattern pattern = Pattern.compile("((\\w)+)(\\||#)(\\||#)?");
Matcher matcher = pattern.matcher(s4);
while(matcher.find()) {
System.out.println(matcher.group(1) + (matcher.group(4) != null ? matcher.group(4).equals("|")? "#" : "|" : ""));
matcher.find(); //to jump over the next match
}
Update:
I just read the middlerecursion example. Doesn't work for that, I'm afraid, and I have to leave my computer for a while. So this is just something to get you started.
Update version that works for all the examples:
String s1 = "#hello|#what|whatsup|";
String s2 = "#hello#";
String s3 = "|ola|1";
String s4 = "|hello#|what#whatsup#node";
String s5 = "#||##||MiddleRecursion||##||#";
Pattern pattern = Pattern.compile("(#|\\|)((#|\\|)*\\w+(#|\\|)*)(#|\\|)");
Matcher matcher = pattern.matcher(s1);
while(matcher.find()) {
System.out.println(matcher.group(2));
}

Since #||##||MiddleRecursion||##||# --> ||##||MiddleRecursion||##||, I'm afraid you have to do bracket matching. In this case, there will be no general solution using regex (you can force it to work if you know the maximum consecutive appearance of | and #). The reason is that, there is middle recursion; regular expression can only solves left or right recursion.
This is also one of the reasons why HTML parsing is not possible with regex.

Ok, I'll start.
So you have to match #something# or |something|
Can you write two separate regexps that do that?
Where you'll get annoyed first is that the pipe | is a magic character in regexp. If you want to match on that character, you'll have to prefix it with \\ as per the other thread I linked.
When you have those two regexp working, let me know and I'll post more.
(I'm heading out for a few hours...)

validate Regular Expression using Java

I need validate a String using a Regular Expression, the String must be like "createRobot(x,y)", where x and y are digits.
I have Something like
String ins;
Pattern ptncreate= Pattern.compile("^createRobot(+\\d,\\d)");
Matcher m = ptncreate.matcher(ins);
System.out.println(m.find());
but doesn't work
Can you help me ?.
Thanks.

You forgot the word Robot in your pattern. Also, parenthesis are special characters in regex, and the + should be placed after the \d, not after a (:
Pattern.compile("^createRobot\\(\\d+,\\d+\\)$")
Note that if you want to validate input that should consist solely of this "createRobot"-string, you mind as well do:
boolean success = s.matches("createRobot\\(\\d+,\\d+\\)");
where s is the String you want to validate. But if you want to retrieve the numbers that were matched, you do need to use a Pattern/Matcher:
Pattern p = Pattern.compile("createRobot\\((\\d+),(\\d+)\\)");
Matcher m = p.matcher("createRobot(12,345)");
if(m.matches()) {
System.out.printf("x=%s, y=%s", m.group(1), m.group(2));
}
As you can see, after calling Matcher.matches() (or Matcher.find()), you can retrieve the nth match-group through group(n).

You must add \ before ( because ( in regex is the special group character
The regexp pattren is:
^create(\d+,\d+)

Regular expression to match unescaped special characters only

I'm trying to come up with a regular expression that can match only characters not preceded by a special escape sequence in a string.
For instance, in the string Is ? stranded//? , I want to be able to replace the ? which hasn't been escaped with another string, so I can have this result : **Is Dave stranded?**
But for the life of me I have not been able to figure out a way. I have only come up with regular expressions that eat all the replaceable characters.
How do you construct a regular expression that matches only characters not preceded by an escape sequence?

Use a negative lookbehind, it's what they were designed to do!
(?<!//)[?]
To break it down:
(
?<! #The negative look behind. It will check that the following slashes do not exist.
// #The slashes you are trying to avoid.
)
[\?] #Your special charactor list.
Only if the // cannot be found, it will progress with the rest of the search.
I think in Java it will need to be escaped again as a string something like:
Pattern p = Pattern.compile("(?<!//)[\\?]");

Try this Java code:
str="Is ? stranded//?";
Pattern p = Pattern.compile("(?<!//)([?])");
m = p.matcher(str);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, m.group(1).replace("?", "Dave"));
}
m.appendTail(sb);
String s = sb.toString().replace("//", "");
System.out.println("Output: " + s);
OUTPUT
Output: Is Dave stranded?

I was thinking about this and have a second simplier solution, avoiding regexs. The other answers are probably better but I thought I might post it anyway.
String input = "Is ? stranded//?";
String output = input
.replace("//?", "a717efbc-84a9-46bf-b1be-8a9fb714fce8")
.replace("?", "Dave")
.replace("a717efbc-84a9-46bf-b1be-8a9fb714fce8", "?");
Just protect the "//?" by replacing it with something unique (like a guid). Then you know any remaining question marks are fair game.

Use grouping. Here's one example:
import java.util.regex.*;
class Test {
public static void main(String[] args) {
Pattern p = Pattern.compile("([^/][^/])(\\?)");
String s = "Is ? stranded//?";
Matcher m = p.matcher(s);
if (m.matches)
s = m.replaceAll("$1XXX").replace("//", "");
System.out.println(s + " -> " + s);
}
}
Output:
$ java Test
Is ? stranded//? -> Is XXX stranded?
In this example, I'm:
first replacing any non-escaped ? with "XXX",
then, removing the "//" escape sequences.
EDIT Use if (m.matches) to ensure that you handle non-matching strings properly.
This is just a quick-and-dirty example. You need to flesh it out, obviously, to make it more robust. But it gets the general idea across.

Match on a set of characters OTHER than an escape sequence, then a regex special character. You could use an inverted character class ([^/]) for the first bit. Special case an unescaped regex character at the front of the string.

String aString = "Is ? stranded//?";
String regex = "(?<!//)[^a-z^A-Z^\\s^/]";
System.out.println(aString.replaceAll(regex, "Dave"));
The part of the regular expression [^a-z^A-Z^\\s^/] matches non-alphanumeric, whitespace or non-forward slash charaters.
The (?<!//) part does a negative lookbehind - see docco here for more info
This gives the output Is Dave stranded//?

try matching:
(^|(^.)|(.[^/])|([^/].))[special characters list]

I used this one:
((?:^|[^\\])(?:\\\\)*[ESCAPABLE CHARACTERS HERE])
Demo: https://regex101.com/r/zH1zO3/4

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Looking for advices for regex - java

You can wrap it up in a chain of invocations without compromising to matching the entire input. For instance: String input = "\n 700000000123"; System.out.println(Pattern.compile("\n\\s\\d{12}").matcher(input).find()); Output true

^\\\\n\\s+\\d{12}$ I guess this should work for you.See demo. https://regex101.com/r/eZ0yP4/33

\n will work only in Unix machines. In Windows it is \r\n. Please use System.getProperty("line.separator") if you want your code to work both in linux and windows. Use the following System.out.println(Pattern.compile(System.getProperty("line.separator")+"\s\d{12}").matcher(input).find());

Related

How to parse string using regex

How to use groups in Regular Expression In java?

Extract text from two different types of chars using regex

validate Regular Expression using Java

Regular expression to match unescaped special characters only

Categories

Resources