Regex matches in myregexp.com, but not matching in Java - java

This is the regex for finding the session ID: "(?<=( ))([0-9]*)(?=(.*ABC.DEEP. [1-9] s))" and the output is:
ID TYPE USER IDLE
63494 ABC DEEP 3 s
-> 70403 ABC DEEAP 0 s
82446 ABC DEEOP 52 min 27 s
In myregexp.com/signedJar.html, this regex works fine. But when I try to find using Java, it is not able to get the output. Please find the snippet:
FrameworkControls.regularExpressionPattern = Pattern.compile("(?<=( ))([0-9]*)(?=(.*ABC.*DEEP.*[1-9] s))");
String deepak = "\n" +
"\n" +
" ID TYPE USER IDLE\n" +
"\n" +
" 63494 ABC DEEP 3 s\n" +
" -> 70403 ABC DEEAP 0 s\n" +
" 82446 ABC DEEOP 52 min 27 s\n";
FrameworkControls.regularExpressionMatcher = FrameworkControls.regularExpressionPattern.matcher(deepak);
if (FrameworkControls.regularExpressionMatcher.find()) {
String h = FrameworkControls.regularExpressionMatcher.group().trim();
System.err.println(h);
}
"FrameworkControls.regularExpressionMatcher.find()" returns true. But h variable is always empty. Can anyone let me know, where I might be doing wrong.
Expected Output: 63494

I think you're trying to print ID of the USER DEEPAK. If yes, then your code would be,
Pattern p = Pattern.compile("(?<= )[0-9]+(?=\\s*ABC\\s*DEEP\\s*[0-9]\\s*s)");
Matcher m = p.matcher(deepak);
while (m.find()) {
System.out.println(m.group());
}
IDEONE

I would use the following expression:
"^\\s+(\\d+)\\s+(\\w+)\\s+(\\w+).+\$"
then
group(1) is ID
group(2) is TYPE
group(3) is USER
The expressions are non greedy, so you can remove last two groups if you don't need them.

Related

How to get only First Name Last Name from LDAP CN when format is last name\, first name

CN=Belzile\, Pierre,OU=LaptopUser,OU=Users,DC=Company,DC=local
I need only "Belzile Pierre" to be returned.
I need help with the regex syntax
For the regular expression we use Java syntax https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html.
Expected Result:
Belzile Pierre
You can use this regex and capture firstname and last name in group1 and group2,
CN=([a-zA-Z]+)\\,\s+([a-zA-Z]+)
Demo
Java code,
String s = "CN=Belzile\\, Pierre,OU=LaptopUser,OU=Users,DC=Company,DC=local";
Pattern p = Pattern.compile("CN=([a-zA-Z]+)\\\\,\\s+([a-zA-Z]+)");
Matcher m = p.matcher(s);
if(m.find()) {
System.out.println(m.group(1) + " " + m.group(2));
}
Prints your expected output,
Belzile Pierre

Need help in regex matching

It may be very simple, but I am extremely new to regex and have a requirement where I need to do some regex matches in a string and extract the number in it. Below is my code with sample i/p and required o/p. I tried to construct the Pattern by referring to https://www.freeformatter.com/java-regex-tester.html, but my regex match itself is returning false.
Pattern pattern = Pattern.compile(".*/(a-b|c-d|e-f)/([0-9])+(#[0-9]?)");
String str = "foo/bar/Samsung-Galaxy/a-b/1"; // need to extract 1.
String str1 = "foo/bar/Samsung-Galaxy/c-d/1#P2";// need to extract 2.
String str2 = "foo.com/Samsung-Galaxy/9090/c-d/69"; // need to extract 69
System.out.println("result " + pattern.matcher(str).matches());
System.out.println("result " + pattern.matcher(str1).matches());
System.out.println("result " + pattern.matcher(str1).matches());
All of above SOPs are returning false. I am using java 8, is there is any way by which in a single statement I can match the pattern and then extract the digit from the string.
I would be great if somebody can point me on how to debug/develop the regex.Please feel free to let me know if something is not clear in my question.
You may use
Pattern pattern = Pattern.compile(".*/(?:a-b|c-d|e-f)/[^/]*?([0-9]+)");
See the regex demo
When used with matches(), the pattern above does not require explicit anchors, ^ and $.
Details
.* - any 0+ chars other than line break chars, as many as possible
/ - the rightmost / that is followed with the subsequent subpatterns
(?:a-b|c-d|e-f) - a non-capturing group matching any of the alternatives inside: a-b, c-d or e-f
/ - a / char
[^/]*? - any chars other than /, as few as possible
([0-9]+) - Group 1: one or more digits.
Java demo:
List<String> strs = Arrays.asList("foo/bar/Samsung-Galaxy/a-b/1","foo/bar/Samsung-Galaxy/c-d/1#P2","foo.com/Samsung-Galaxy/9090/c-d/69");
Pattern pattern = Pattern.compile(".*/(?:a-b|c-d|e-f)/[^/]*?([0-9]+)");
for (String s : strs) {
Matcher m = pattern.matcher(s);
if (m.matches()) {
System.out.println(s + ": \"" + m.group(1) + "\"");
}
}
A replacing approach using the same regex with anchors added:
List<String> strs = Arrays.asList("foo/bar/Samsung-Galaxy/a-b/1","foo/bar/Samsung-Galaxy/c-d/1#P2","foo.com/Samsung-Galaxy/9090/c-d/69");
String pattern = "^.*/(?:a-b|c-d|e-f)/[^/]*?([0-9]+)$";
for (String s : strs) {
System.out.println(s + ": \"" + s.replaceFirst(pattern, "$1") + "\"");
}
See another Java demo.
Output:
foo/bar/Samsung-Galaxy/a-b/1: "1"
foo/bar/Samsung-Galaxy/c-d/1#P2: "2"
foo.com/Samsung-Galaxy/9090/c-d/69: "69"
Because you match always the last number in your regex, I would Like to just use replaceAll with this regex .*?(\d+)$ :
String regex = ".*?(\\d+)$";
String strResult1 = str.replaceAll(regex, "$1");
System.out.println(!strResult1.isEmpty() ? "result " + strResult1 : "no result");
String strResult2 = str1.replaceAll(regex, "$1");
System.out.println(!strResult2.isEmpty() ? "result " + strResult2 : "no result");
String strResult3 = str2.replaceAll(regex, "$1");
System.out.println(!strResult3.isEmpty() ? "result " + strResult3 : "no result");
If the result is empty then you don't have any number.
Outputs
result 1
result 2
result 69
Here is a one-liner using String#replaceAll:
public String getDigits(String input) {
String number = input.replaceAll(".*/(?:a-b|c-d|e-f)/[^/]*?(\\d+)$", "$1");
return number.matches("\\d+") ? number : "no match";
}
System.out.println(getDigits("foo.com/Samsung-Galaxy/9090/c-d/69"));
System.out.println(getDigits("foo/bar/Samsung-Galaxy/a-b/some other text/1"));
System.out.println(getDigits("foo/bar/Samsung-Galaxy/9090/a-b/69ace"));
69
no match
no match
This works on the sample inputs you provided. Note that I added logic which will display no match for the case where ending digits could not be matched fitting your pattern. In the case of a non-match, we would typically be left with the original input string, which would not be all digits.

What Regular Expression Will Get Last Price Listed On Receipt?

I have the following expression:
(?!\d+\s+TOTAL\s+)\$+\d+\.?\d+\s+
It produces the result "$23.00$0.03$23.80" from the following text:
SPEEDWAY 3007906
Wallace NC 28466
TRAM: 1086244
9/17/2017 2:12 pm
Pump 08
Regular Unleaded
8,716 # $2,639/6131
GAS TOTAL $23.00
TAX $0.03
TOTAL $23.80
Uisa
What regular expression will pull just $23.80 in this case? If I add positive lookahead, so that the expression is "(?!\d+\s+TOTAL\s+)\$+\d+\.?\d+\s+(?=.*\$\d+\.?\d+)", the result is "$23.00$0.03" and not "$23.80".
Please help. Thanks in advance.
Try this:
(?<=^TOTAL)\s*(\$\s*\d+\.?\d*)\s*$
Make sure you use MULTILINE match.
This will match all the spaces around the value, so you may want to strip those out to get the value
Example:
String in = "SPEEDWAY 3007906\n" +
"Wallace NC 28466 \n" +
"TRAM: 1086244 \n" +
"9/17/2017 2:12 pm \n" +
"Pump 08 \n" +
"Regular Unleaded \n" +
"8,716 # $2,639/6131 \n" +
"GAS TOTAL $23.00\n" +
"TAX $0.03 \n" +
"TOTAL $23.80\n" +
"Uisa ";
Pattern p = Pattern.compile("(?<=^TOTAL)\\s*(\\$\\s*\\d+\\.?\\d*)\\s*$", MULTILINE);
Matcher m = p.matcher(in);
if(m.find()) {
System.out.println(m.group(1));
}
This should print just the matched value
Maybe you could use a negative lookbehind to assert that what is before TOTAL is not GAS and capture your value in group 1.
(?<!GAS )TOTAL\s*(\$\d+\.\d+)
Demo output Java

Extract a particular number from a string using regex in java

Here is my string
INPUT:
22 TIRES (2 defs)
1 AP(PEAR + ANC)E (CAN anag)
6 CHIC ("SHEIK" hom)
EXPECTED OUTPUT:
22 TIRES
1 APPEARANCE
6 CHIC
ACTUAL OUTPUT :
TIRES
APPEARANCE
CHIC
I tried using below code and got the above output.
String firstnames =a.split(" \\(.*")[0].replace("(", "").replace(")", "").replace(" + ",
"");
Any idea of how to extract along with the numbers ? I don't want the numbers which are after the parentheses like in the input " 22 TIRES (2 defs)". I need the output as "22 TIRES" Any help would be great !!
I am doing it bit differently
String line = "22 TIRES (2 defs)\n\n1 AP(PEAR + ANC)E (CAN anag)\n\n6 CHIC (\"SHEIK\" hom)";
String pattern = "(\\d+\\s+)(.*)\\(";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
while (m.find()) {
String tmp = m.group(1) + m.group(2).replaceAll("[^\\w]", "");
System.out.println(tmp);
}
Ideone Demo
I would use a single replaceAll function.
str.replaceAll("\\s+\\(.*|\\s*\\+\\s*|[()]", "");
DEMO
\\s+\\(.*, this matches a space and and the following ( characters plus all the remaining characters which follows this pattern. So (CAN anag) part in your example got matched.
\\s*\\+\\s* matches + along with the preceding and following spaces.
[()] matches opening or closing brackets.
Atlast all the matched chars are replaced by empty string.

Certain strings that should be found by a working Regex are missed, and I need help identifying why

I have a set of strings, which I cycle through, checking those against the following set of regex, to try and separate the first small section from the rest of the string. The regex works in almost all cases, but unfortunately I have no idea why it fails occasionally. I’ve been using Pattern Matcher to print out the string, if the pattern is found.
Two example working strings:
98. SORGHUM Moench - Millets Annuals or rhizomatous perennials; inflorescence …
99. MISCANTHUS Andersson - Silver-grasses Rhizomatous perennials; inflorescence …
Two example failed strings:
100. ZEA L. - Maize Annuals; male and female inflorescences separate, the …
26. POA L. (Parodiochloa C.E. Hubb.) - Meadow-grasses Annuals or perennials with or without stolons or rhizomes; sheaths overlapping or some …
Regex’s used so far:
Pattern endOfGenus = Pattern.compile("(?<=(^\\d+\\. " + genusNames[l].toUpperCase() + "))");
Pattern endOfGenusTwo = Pattern.compile("(?<=(^\\d+" + genusNames[l].toUpperCase() + "))");
Pattern endOfGenusThree = Pattern.compile("(?<=(\\d+\\. " + genusNames[l] + "))");
Pattern endOfGenusFour = Pattern.compile("(?<=(\\d+" + genusNames[l] + "))");
Pattern endOfGenusFive = Pattern.compile("(?<=(\\. " + genusNames[l] + "))");
The first of these is the one thats producing the reliable results so far.
Example Code
Pattern endOfGenus = Pattern.compile("(?<=(^\\d+\\. " + genusNames[l].toUpperCase() + "))");
Matcher endOfGenusFinder = endOfGenus.matcher(descriptionPartBits[b]);
if (endOfGenusFinder.find()) {
System.out.print(descriptionPartBits[b] + ":- ");
System.out.print(genusNames[l] + "\n");
String[] genusNameBits = descriptionPartBits[b].split("(?<=(^\\d+\\. " + genusNames[l].toUpperCase() + "))");
}
Desired Output. This is what is produced by strings that work. Strings that don't work simply don't appear in the output:
98. SORGHUM Moench - Millets Annuals or rhizomatous perennials:- Sorghum
99. MISCANTHUS Andersson - Silver-grasses Rhizomatous perennials:- Miscanthus
From regex tutorial:
Lookahead and lookbehind, collectively called "lookaround", are
zero-length assertions just like the start and end of line, and start
and end of word anchors explained earlier in this tutorial.
Lookahead and lookbehind only return true or false.
So I changed your code example:
Pattern endOfGenus = Pattern.compile("(?<=(^\\d+\\. ZEA L))(.+)$");
// Matcher matcher = endOfGenus.matcher("98. SORGHUM Moench - Millets Annuals or rhizomatous perennials; inflorescence …");
Matcher matcher = endOfGenus.matcher("100. ZEA L. - Maize Annuals; male and female inflorescences separate, the …");
while (matcher.find()) {
String group1 = matcher.group(1);
String group2 = matcher.group(2);
System.out.println("group1=" + group1);
System.out.println("group2=" + group2);
}
Group 1 is matched by (^\\d+\\. ZEA L). Group 2 is matched by (.+).

Categories

Resources