How to deal with Java regular expressions? - java

I'm dealing with regular expressions, but I’m not a big fan of it and I’m obliged to deal with it in my task :(
I have passed hours looking for a solution but everytime I fail to cover all scenarios.
I have to write a regular expression template that supports these patterns:
DYYU-tx-6.7.9.7_6.1.1.0
DYYU-tx-6.7.9.7_60.11.11.09
DYYU-tx-60.70.90.70_6.1.1.0
I feel that this is very simple to do.. So excuse me if it's a stupid question for someone :(
I tried this pattern but it didn’t work :
^.*_.*-.*-([0-9]*)\\..*\\..* $
Any help please.
I will be more than thankful.

There are many patterns in the samples that we can use to design expressions. For instance, we can start with this expression:
^[^-]+-[^-]+-[^_]+_([0-9]+\.){3}[0-9]+$
The expression is explained on the top right panel of this demo, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs step by step, if you like.
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "^[^-]+-[^-]+-[^_]+_([0-9]+\\.){3}[0-9]+$";
final String string = "DYYU-tx-6.7.9.7_6.1.1.0\n"
+ "DYYU-tx-6.7.9.7_60.11.11.09\n"
+ "DYYU-tx-60.70.90.70_6.1.1.0";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
RegEx Circuit
jex.im visualizes regular expressions:

Try this one:
^\w+-\w+-(\d+)(\.\d+)+_(\d+\.)+\d+
Demo
In Java most probably sth like this:
"^\\w+-\\w+-(\\d+)(\\.\\d+)+_(\\d+\\.)+\d+"
Explanation:
^\w+-\w+- first two parts, e.g. DYYU-tx-
(\d+)(\.\d+)+_ numbers separated with . ending with _, e.g. 6.7.9.7_
(\d+\.)+\d+ numbers separted with ., e.g. 60.11.11.09

Your pattern does not match because you use .* which will first match until the end of the string. Then you match an _ so it backtracks to the last underscore and try to match the rest of the pattern.
Since there is 1 underscore, you want to match a hyphen that comes after it, but there is no hyphen to match after the underscore so there is no match.
Another way to write it could be using a negated character class [^-] matching not a hyphen instead of using .*
^[^-]+-[^-]+-\d+(?:\.\d+){3}_\d+(?:\.\d+){3} $
Explanation
^ Start of string
[^-]+- Match 1+ times any char other than -
[^-]+- Same as above
\d+(?:\.\d+){3} Math 1+ digits, repeat 3 times matching a . and 1+ digits
_ Match underscore
\d+(?:\.\d+){3} Math 1+ digits, repeat 3 times matching a . and 1+ digits
[ ]$ Match a space (denoted between brackers for clarity) and assert end of string
In Java
String regex = "^[^-]+-[^-]+-\\d+(?:\\.\\d+){3}_\\d+(?:\\.\\d+){3} $";
Regex demo
Note that in your example data, the strings end with a space, and so there is a space before $

DYYU-tx-(?>\d+[._]?){8}
Search for the literal DYYU-tx-
Look for 1 or more digits that may be followed by a . or an _ 8 times.
I assumed that it would always start with DYYU-tx- and that it would always be 4 numbers separated by periods followed by an underscore which would then have 4 more numbers separated by periods.

Related

regex match between two characters with or condition

I might be thinking of this wrong, but I'm trying to match all things between "_" characters but
I also need the last item (the datetime stamp) Here is the string:
StringText_62_590285_20200324082238.xml
Here is the regex I have started with (java):
\_(.*?)\_
but this only matches to: "_62_"
See here
The result I'm trying to get to is to have 3 matches (62, 590285, 20200324082238)
Now that I'm thinking about this, am I approaching this wrong? This input string is going to be very consistent and maybe just match all strings that are numbers?
For the example provided, you may use this regex:
(?<=_)[^_.]+
RegEx Demo
RegEx Demo:
(?<=_): Lookbehind to assert that we have a _ before current position
[^_.]+: Match 1+ of any character that is not a _ and not a dot
You can use the word boundaries with _ excluded:
(?<![^\W_])\d+(?![^\W_])
See the regex demo. Details:
(?<![^\W_]) - immediately on the left, there can't be a letter or digit
\d+ - one or more digits
(?![^\W_]) - immediately on the right, there can't be a letter or digit.
See the Java demo:
String s = "StringText_62_590285_20200324082238.xml";
Pattern pattern = Pattern.compile("(?<![^\\W_])\\d+(?![^\\W_])");
Matcher matcher = pattern.matcher(s);
List<String> results = new ArrayList<>();
while (matcher.find()){
results.add(matcher.group(0));
}
System.out.println(results); // => [62, 590285, 20200324082238]
I actually suggest not to use regex in this case but to use two splits, the first split with "_" where you will obtain 4 chunks, you will take the last three and then apply the second split on the last element with "."
This regex does the work anyways:
\d[0-9]*
Modifying your example, you can use something like this:
(?<=_)(.*?)(?=_|\.)
This will basically mean:
Match everything that is preceded by _
and followed by _ or .

Trying to match possible tags in string by regex

those are my possible inputs:
"#smoke"
"#smoke,#Functional1" (OR condition)
"#smoke,#Functional1,#Functional2" (OR condition)
"#smoke","#Functional1" (AND condition),
"#smoke","~#Functional1" (SKIP condition),
"~#smoke","~#Functional1" (NOT condition)
(Please note, the string input for the regex, stops at the last " character on each line, no space or comma follows it!
The regex I came up with so far is
"((?:[~#]{1}\w*)+),?"
This matches in capturing groups for the samples 1, 4, 5 and 6 but NOT 2 and 3.
I am not sure how to continue tweaking it further, any suggestions?
I would like to capture the preceding boolean meaning of the tag (eg: ~) as well please.
If you have any suggestions to pre-process the string in Java before regex that would make it simpler, I am open to that possibility as well.
Thanks.
It seems that you want to match an optional ~ followed by an # and get iterative matches for group 1. You could make use of the \G anchors, which matches either at the start, or at the end of the previous match.
(?:"(?=.*"$)|\G(?!^))(~?#\w+(?:,~?#\w+)*)"?[,\h]?
Explanation
(?: Non capture group
"(?=.*"$) Match " and assert that the string ends with "
| Or
\G(?!^) Assert the position at the end of the previous match, not at the start
) Close non capture group
( Capture group 1
~?#\w+(?:,~?#\w+)* Match an optional ~, than # and 1+ word characters and repeat 0+ times with a comma prepended
)"? Close group 1 and match an optional "
[,\h] Match either a comma or a horizontal whitespace char.
Regex demo | Java demo
Example code
String regex = "(?:\"(?=.*\"$)|\\G(?!^))(~?#\\w+(?:,~?#\\w+)*)\"?[,\\h]?";
String string = "\"#smoke\"\n"
+ "\"#smoke,#Functional1\"\n"
+ "\"#smoke,#Functional1,#Functional2\"\n"
+ "\"#smoke\",\"#Functional1\"\n"
+ "\"#smoke\",\"~#Functional1\"\n"
+ "\"~#smoke\",\"~#Functional1\"";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
#smoke
#smoke,#Functional1
#smoke,#Functional1,#Functional2
#smoke
#Functional1
#smoke
~#Functional1
~#smoke
~#Functional1
Edit
If there are no consecutive matches, you could also use:
"(~?#\w+(?:,~?#\w+)*)"
Regex demo

Java int to fraction

How can i change 4 -1/4 -5 to 4/1 -1/4 -5/1 using regex?
String str = "4 -1/4 -5";
String regex = "(-?\\d+/\\d+)";
Matcher matcher = Pattern.compile(regex).matcher(str);
My code finding only fraction but i want to find integer without fraction.
String result = str.replaceAll("(?<!/)\\b\\d+\\b(?!/)", "$0/1");
looks for entire numbers (\b\d+\b), not preceded by ((?<!/)) nor followed by a slash ((?!/)), and adds /1 to them.
Try (?<=-| |^)(\d+)(?!\d*\/)
Explanation:
(?<=...) - positive lookahead, assert, what precedes matches pattern inside
-| |^ - match either -, , or beginning of a line ^
(\d+) - match one or more digits and store in first capturing group
(?!\d*\/) - negative lookahead, assert what follows is not zero or mroe digits followed by \/.
Replace it with \1/1, so first capturing group followed by /1
Demo
I'm not sure I understand what you want to do here, but if you want to remove the slashes you can use:
str.replaceAll("\\/", " ");
This will leave you with a string having only the integers.

Java Regular Expression for ICD Codes doesn´t work

I have written the following Java code:
public void test(final String myString){
final String rule = "^[A-Z]\\d{2}(\\.\\d){0,2}$";
final Pattern pattern = Pattern.compile(rule);
final Matcher matcher = pattern.matcher(myString);
if(!matcher.matches()){
System.out.println("Failure, the String" + myString + " is not valid!");
}
}
The Regular Expression should by valid the following String:
[character are required][number are required][number are required][point is optional][number is optional][number is optional]
It is important, that if a point was declared in the string, at least one Number must be followed!
My solution only works for Strings like J45 or J45.9
Java Java like these are allowed:
D99
M00.0
M01.6
J98.3
T05.0
M96.81
D68.20
Java Strings like these are not allowed:
9D.0
6G
7H.
M96.811
J234.82
G687.1
GU87.11
How I can solve this problem by using Regular Expressions in Java?
[point is optional][number is optional][number is optional]
You need to make the dot optional and set the {0,2} quantifier to the \d pattern only:
^[A-Z]\d{2}\.?\d{0,2}$
See the regex demo
Details:
^ - start of string anchor
[A-Z] - an uppercase ASCII letter
\d{2} - any 2 digits
\.? - an optional dot
\d{0,2} - any 0 to 2 digits
$ - end of string.
Since you are using .matches() that anchors the pattern by default, you may declare it without the ^ and $ anchors as
final String rule = "[A-Z]\\d{2}\\.?\\d{0,2}";
See an online Java test.
Or, if there must be 1 or 2 digits after a dot, or if no dot is present 0 to 2 digits are allowed, you may consider using
^[A-Z]\d{2}(?:\.\d{1,2}|\d{0,2})$
See this regex demo, and use as
final String rule = "[A-Z]\\d{2}(?:\\.\\d{1,2}|\\d{0,2})";
where (?:\\.\\d{1,2}|\\d{0,2}) matches either a . and then any 1 or 2 digits, OR any 0 to 2 digits.
This regex expression:
Requires one Uppercase Letter followed by 2 number digits
Followed by an optional combination of a point and 1-2 number digits
^[A-Z]\d{2}(?:\.\d{1,2})?$

Constructing regex pattern to match sentence

I'm trying to write a regex pattern that will match any sentence that begins with multiple or one tab and/or whitespace.
For example, I want my regex pattern to be able to match " hello there I like regex!"
but so I'm scratching my head on how to match words after "hello". So far I have this:
String REGEX = "(?s)(\\p{Blank}+)([a-z][ ])*";
Pattern PATTERN = Pattern.compile(REGEX);
Matcher m = PATTERN.matcher(" asdsada adf adfah.");
if (m.matches()) {
System.out.println("hurray!");
}
Any help would be appreciated. Thanks.
String regex = "^\\s+[A-Za-z,;'\"\\s]+[.?!]$"
^ means "begins with"
\\s means white space
+ means 1 or more
[A-Za-z,;'"\\s] means any letter, ,, ;, ', ", or whitespace character
$ means "ends with"
An example regex to match sentences by the definition: "A sentence is a series of characters, starting with at lease one whitespace character, that ends in one of ., ! or ?" is as follows:
\s+[^.!?]*[.!?]
Note that newline characters will also be included in this match.
A sentence starts with a word boundary (hence \b) and ends with one or more terminators. Thus:
\b[^.!?]+[.!?]+
https://regex101.com/r/7DdyM1/1
This gives pretty accurate results. However, it will not handle fractional numbers. E.g. This sentence will be interpreted as two sentences:
The value of PI is 3.141...
If you looking to match all strings starting with a white space you can try using "^\s+*"
regular expression.
This tool could help you to test your regular expression efficiently.
http://www.rubular.com/
Based upon what you desire and asked for, the following will work.
String s = " hello there I like regex!";
Pattern p = Pattern.compile("^\\s+[a-zA-Z\\s]+[.?!]$");
Matcher m = p.matcher(s);
if (m.matches()) {
System.out.println("hurray!");
}
See working demo
String regex = "(?<=^|(\.|!|\?) |\n|\t|\r|\r\n) *\(?[A-Z][^.!?]*((\.|!|\?)(?! |\n|\r|\r\n)[^.!?]*)*(\.|!|\?)(?= |\n|\r|\r\n)"
This match any sentence following the definition 'a sentence start with a capital letter and end with a dot'.
The below regex pattern matches sentences in a paragraph.
Pattern pattern = Pattern.compile("\\b[\\w\\p{Space}“”’\\p{Punct}&&[^.?!]]+[.?!]");
Reference: https://devsought.com/regex-pattern-to-match-sentence

Categories

Resources