Trying to understand and reproduce Regex Pattern - java

I am currently trying to understand Pattern and Matcher a little bit more and found the following code:
private static final Pattern PATTERN = Pattern.compile(
String.format("addPart%s(?<assembly>%s)\\+(?<amount>%s)%s(?<part>%s)",
InOutputStrings.COMMAND_SEPARATOR,
InOutputStrings.NAME_PATTERN,
InOutputStrings.NUMBER_PATTERN,
InOutputStrings.INNER_SEPARATOR,
InOutputStrings.NAME_PATTERN));
private String assemblyName;
private int amount;
private String partName;
...
assemblyName = matcher.group("assembly");
amount = tryParse(matcher.group("amount"));
partName = matcher.group("part");
whereby
NAME_PATTERN("[a-zA-Z]+"),
NUMBER_PATTERN("(?!(0[0-9]))[0-9]+"),
COMMAND_SEPARATOR(" "),
ARGUMENT_SEPARATOR(";"),
INNER_SEPARATOR(":")
What would be a valid input here?
Could someone show me how this would look like for the input-pattern
"add track <startPoint> -> <endPoint>"?
I am working on a Command-line pattern and this would be a good way of implementing the input parsing.
Also, what is the meaning of "?", "\\+" and "<assembly>"...?

What would be a valid input here?
addPart foo+42:bar
...how this would look like for the input-pattern...
add track (?<startPoint>[a-zA-Z]+) -> (?<endPoint>[a-zA-Z]+)
...what is the meaning of...
?! is a Negative Lookahead
?<assembly> is a Named Capturing Group
\\+ is a literal plus sign (not a RegEx operator)
Also note that %s is a variable reference for String.format. It is not a RegEx operator either.

Related

placeholders for regular expressions

I'm pretty new to regular expressions, and i don't really know how to use them correctly yet.
As input i have a string, in which i want to look for a certain pattern, let's say a word enclosed in !, like this: "Hello, my name is !John!". Now i want to replace the substring inside with something different. How do i look for the substring without knowing what is inside?
String str = "I don't !know! how to do this";
str = str.replace("!placeholder!", "X");
Just like that.
str.replaceAll("!.*!", "X") would be a way to do it. There are however many different "placeholders" and special characters you should be aware of (at least to escape them). In this instance I used . to match any character and * to signify that I want any number of those. The expression then reads as "replace all exclamation points, followed by any number of characters and ending in another exclamation point with the letter X".
That would also replace the exclamation points, so perhaps you want to write str.replaceAll("!.*!", "!X!"). Or maybe you don't want to replace the string "!!" so you'd use "!.+!". But to explore all the possibilities, you should really read some tutorial like this one: https://www.vogella.com/tutorials/JavaRegularExpressions/article.html
Maybe,
!\w+!\s*
or,
!\w+!
might simply work OK for those examples.
Demo
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression{
public static void main(String[] args){
final String regex = "!\\w+!\\s*";
final String string = "I don't !know! how to do this\n"
+ "Hello, my name is !John! ";
final String subst = "something_else ";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
final String result = matcher.replaceAll(subst);
System.out.println(result);
}
}
Output
I don't something_else how to do this
Hello, my name is something_else
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:

Filter and find integers in a String with Regex

I have this long string:
String responseData = "fker.phone.bash,0,0,0"
+ "fker.phone.bash,0,0,0"
+ "fker.phone.bash,2,0,0";
What I want to do is to extract the integers in this string. I have successfully done that with this code:
String pattern = "(\\d+)";
// this pattern finds EVERY integer. I only want the integers after the comma
Pattern pr = Pattern.compile(pattern);
Matcher match = pr.matcher(responseData);
while (match.find()) {
System.out.println(match.group());
}
So far it is working, but I want to make my regex more secure because the responsedata I get is dynamic. Sometimes I might get an integer in the middle of the string, but I only want the last integers, meaning after the comma.
I know the regex for starts with is ^ and I have to put my comma tecken as an argument, but I don't know how to piece it all together and that is why I am asking for help. Thank you.
String pattern = "(,)(\\d)+";
Then get the second group.
You can use positive lookbehind for that:
String pattern = "(?<=,)\\d+";
You don't need to extract any groups to do use that solution, because lookbehind is zero-length assertion.
You can simply use the following and find by match.group(1):
String pattern = ",(\\d+)";
See working demo
You can also use word boundaries to get independent numbers:
String pattern = "\\b(\\d+)\\b";

Java regex to match the start of the word?

Objective: for a given term, I want to check if that term exist at the start of the word. For example if the term is 't'. then in the sentance:
"This is the difficult one Thats it"
I want it to return "true" because of :
This, the, Thats
so consider:
public class HelloWorld{
public static void main(String []args){
String term = "t";
String regex = "/\\b"+term+"[^\\b]*?\\b/gi";
String str = "This is the difficult one Thats it";
System.out.println(str.matches(regex));
}
}
I am getting following Exception:
Exception in thread "main" java.util.regex.PatternSyntaxException:
Illegal/unsupported escape sequence near index 7
/\bt[^\b]*?\b/gi
^
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.escape(Pattern.java:2416)
at java.util.regex.Pattern.range(Pattern.java:2577)
at java.util.regex.Pattern.clazz(Pattern.java:2507)
at java.util.regex.Pattern.sequence(Pattern.java:2030)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at java.util.regex.Pattern.compile(Pattern.java:1665)
at java.util.regex.Pattern.<init>(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at java.util.regex.Pattern.matches(Pattern.java:1128)
at java.lang.String.matches(String.java:2063)
at HelloWorld.main(HelloWorld.java:8)
Also the following does not work:
import java.util.regex.*;
public class HelloWorld{
public static void main(String []args){
String term = "t";
String regex = "\\b"+term+"gi";
//String regex = ".";
System.out.println(regex);
String str = "This is the difficult one Thats it";
System.out.println(str.matches(regex));
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
System.out.println(m.find());
}
}
Example:
{ This , one, Two, Those, Thanks }
for words This Two Those Thanks; result should be true.
Thanks
Since you're using the Java regex engine, you need to write the expressions in a way Java understands. That means removing trailing and leading slashes and adding flags as (?<flags>) at the beginning of the expression.
Thus you'd need this instead:
String regex = "(?i)\\b"+term+".*?\\b"
Have a look at regular-expressions.info/java.html for more information. A comparison of supported features can be found here (just as an entry point): regular-expressions.info/refbasic.html
In Java we don't surround regex with / so instead of "/regex/flags" we just write regex. If you want to add flags you can do it with (?flags) syntax and place it in regex at position from which flag should apply, for instance a(?i)a will be able to find aa and aA but not Aa because flag was added after first a.
You can also compile your regex into Pattern like this
Pattern pattern = Pattern.compile(regex, flags);
where regex is String (again not enclosed with /) and flag is integer build from constants from Pattern like Pattern.DOTALL or when you need more flags you can use Pattern.CASE_INSENSITIVE|Pattern.MULTILINE.
Next thing which may confuse you is matches method. Most people are mistaken by its name, because they assume that it will try to check if it can find in string element which can be matched by regex, but in reality, it checks if entire string can be matched by regex.
What you seem to want is mechanism to test of some regex can be found at least once in string. In that case you may either
add .* at start and end of your regex to let other characters which are not part of element you want to find be matched by regex engine, but this way matches must iterate over entire string
use Matcher object build from Pattern (representing your regex), and use its find() method, which will iterate until it finds match for regex, or will find end of string. I prefer this approach because it will not need to iterate over entire string, but will stop when match will be found.
So your code could look like
String str = "This is the difficult one Thats it";
String term = "t";
Pattern pattern = Pattern.compile("\\b"+term, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
System.out.println(matcher.find());
In case your term could contain some regex special characters but you want regex engine to treat them as normal characters you need to make sure that they will be escaped. To do this you can use Pattern.quote method which will add all necessary escapes for you, so instead of
Pattern pattern = Pattern.compile("\\b"+term, Pattern.CASE_INSENSITIVE);
for safety you should use
Pattern pattern = Pattern.compile("\\b"+Pattern.quote(term), Pattern.CASE_INSENSITIVE);
String regex = "(?i)\\b"+term;
In Java, the modifiers must be inserted between "(?" and ")" and there is a variant for turning them off again: "(?-" and ")".
For finding all words beginning with "T" or "t", you may want to use Matcher's find method repeatedly. If you just need the offset, Matcher's start method returns the offset.
If you need to match the full word, use
String regex = "(?i)\\b"+term + "\\w*";
String str = "This is the difficult one Thats it";
String term = "t";
Pattern pattern = Pattern.compile("^[+"+term+"].*",Pattern.CASE_INSENSITIVE);
String[] strings = str.split(" ");
for (String s : strings) {
if (pattern.matcher(s).matches()) {
System.out.println(s+"-->"+true);
} else {
System.out.println(s+"-->"+false);
}
}

Extract a number from an amount from a String

I have below method which I use to extract amount from a string.
strAmountString = "$272.94/mo for 24 months Regular Price -$336.9"
public static String fnAmountFromString(String strAmountString) {
String strOutput = "";
Pattern pat = Pattern.compile("\\$(-?\\d+.\\d+)?.*");
Matcher mat = pat.matcher(strAmountString);
while(mat.find())
strOutput = mat.group(1);
return strOutput;
}
Now I have to extract string 272.94 from the string and above function works fine.
But when I have to extract 272.94 from String strAmountString = "272.94", gives me a null.
Also I have to extract the amount -336.9 from string strAmountString = "$272.94/mo for 24 months Regular Price -$336.9"
Your first issue, with trying to use 272.94, is related to the requirements of your regular expression, the fact that there is a requirement for the String to be lead by a $
You could make $ part of an optional group, for example ((\\$)?\\d+.\\d+), which will match both 272.94 and $272.94, but won't match -$336.9 directly, it will match $336.9 though.
So, working off your example, you could use ((-)?(\\$)?\\d+.\\d+) which will now match -$336.9 as well...
Personally, I might use ((-)?(\\$)?(-)?\\d+.\\d+), which will match -$336.9, $-336.9, -336.9 and 336.9
The next step would be try remove $ from the result, yes, you could try using another regular expression, but to be honest, String#replaceAll would be easier...
Note- My regular expression knowledge is pretty basic, so there might be simpler soltion
Updated with example
String value = "$272.94/mo for 24 months Regular Price -$336.9";
String regExp = "((-)?(\\$)?(-)?\\d+.\\d+)";
Pattern p = Pattern.compile(regExp);
Matcher matcher = p.matcher(value);
while (matcher.find()) {
System.out.println(matcher.group());
}
Which outputs...
$272.94
-$336.9
The following reg ex will get you your two groups (as group 1 and group 3)
(\\$\\d+\\.\\d+)(.*)?(\\-?\\$\\d+\\.\\d+)
First, you need to make the dollar sign in your Pattern optional - or in other words, it needs to exist 0 or more times. Use the * qualifier.
Second, if you're sure that the dollar amount will always be at the beginning of the string, you can use the ^ boundary matcher, which indicates the beginning of the line.
Similarly, if you're sure that the final dollar amount will always be at the end of the line, you can use the $ boundary matcher.
See more details here: http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
Test your patterns here: http://www.regexplanet.com/advanced/java/index.html

Java Regex is including new line in match

I'm trying to match a regular expression to textbook definitions that I get from a website.
The definition always has the word with a new line followed by the definition. For example:
Zither
Definition: An instrument of music used in Austria and Germany It has from thirty to forty wires strung across a shallow sounding board which lies horizontally on a table before the performer who uses both hands in playing on it Not to be confounded with the old lute shaped cittern or cithern
In my attempts to get just the word (in this case "Zither") I keep getting the newline character.
I tried both ^(\w+)\s and ^(\S+)\s without much luck. I thought that maybe ^(\S+)$ would work, but that doesn't seem to successfully match the word at all. I've been testing with rubular, http://rubular.com/r/LPEHCnS0ri; which seems to successfully match all my attempts the way I want, despite the fact that Java doesn't.
Here's my snippet
String str = ...; //Here the string is assigned a word and definition taken from the internet like given in the example above.
Pattern rgx = Pattern.compile("^(\\S+)$");
Matcher mtch = rgx.matcher(str);
if (mtch.find()) {
String result = mtch.group();
terms.add(new SearchTerm(result, System.nanoTime()));
}
This is easily solved by triming the resulting string, but that seems like it should be unnecessary if I'm already using a regular expression.
All help is greatly appreciated. Thanks in advance!
Try using the Pattern.MULTILINE option
Pattern rgx = Pattern.compile("^(\\S+)$", Pattern.MULTILINE);
This causes the regex to recognise line delimiters in your string, otherwise ^ and $ just match the start and end of the string.
Although it makes no difference for this pattern, the Matcher.group() method returns the entire match, whereas the Matcher.group(int) method returns the match of the particular capture group (...) based on the number you specify. Your pattern specifies one capture group which is what you want captured. If you'd included \s in your Pattern as you wrote you tried, then Matcher.group() would have included that whitespace in its return value.
With regular expressions the first group is always the complete matching string. In your case you want group 1, not group 0.
So changing mtch.group() to mtch.group(1) should do the trick:
String str = ...; //Here the string is assigned a word and definition taken from the internet like given in the example above.
Pattern rgx = Pattern.compile("^(\\w+)\s");
Matcher mtch = rgx.matcher(str);
if (mtch.find()) {
String result = mtch.group(1);
terms.add(new SearchTerm(result, System.nanoTime()));
}
A late response, but if you are not using Pattern and Matcher, you can use this alternative of DOTALL in your regex string
(?s)[Your Expression]
Basically (?s) also tells dot to match all characters, including line breaks
Detailed information: http://www.vogella.com/tutorials/JavaRegularExpressions/article.html
Just replace:
String result = mtch.group();
By:
String result = mtch.group(1);
This will limit your output to the contents of the capturing group (e.g. (\\w+)) .
Try the next:
/* The regex pattern: ^(\w+)\r?\n(.*)$ */
private static final REGEX_PATTERN =
Pattern.compile("^(\\w+)\\r?\\n(.*)$");
public static void main(String[] args) {
String input = "Zither\n Definition: An instrument of music";
System.out.println(
REGEX_PATTERN.matcher(input).matches()
); // prints "true"
System.out.println(
REGEX_PATTERN.matcher(input).replaceFirst("$1 = $2")
); // prints "Zither = Definition: An instrument of music"
System.out.println(
REGEX_PATTERN.matcher(input).replaceFirst("$1")
); // prints "Zither"
}

Categories

Resources