Regular expression woes - java

I have the following data of which i would like to convert it to regular expression.
PALMKERNEL OIL Mal/Indo dlrs tonne cif Rotterdam
Dec15/Jan16 890.00
Jan16/Feb16 900.00 +10.00
My code below doesn't seem to work. Firstly how do I determine that after the 890, there could be either nothing or there could be a +10.00 or any number in that format? I tried to use ?: but sometimes it will totally ignore the month information which i am trying to capture..In this case i do not want to capture the +10.00 or any characters after the price of 890 or 900.
(PALMKERNEL OIL Mal\b)/(Indo dlrs tonne cif Rotterdam\b)\s*([^\s]+)\s*(\d*.?\d*)\s*([^\s]+|[+\d*.?\d*])\s*(\d*.?\d*)\s*([^\s]+|(?:[+\d*.?\d*]))

For the part with the dates and prices, this regular expression handles the two variants in your sample string.
Pattern pat = Pattern.compile(
"\\w{3}\\d{1,2}/\\w{3}\\d{1,2}" +
"\\s*(\\d+\\.\\d\\d)(\\s+\\+\\d+\\.\\d\\d)?" );
String s1 = "Dec15/Jan16 890.00";
String s2 = "Jan16/Feb16 900.00 +10.00";
Matcher m1 = pat.matcher( s1 );
if( m1.matches() )
System.out.println("m1 " + m1.group(1) + ":" + m1.group(2) );
Matcher m2 = pat.matcher( s2 );
if( m2.matches() )
System.out.println("m2 " + m2.group(1) + ":" + m2.group(2) );
Output:
m1 890.00:null
m2 900.00: +10.00
There's not enough information - so I don't know about a third alternative in /+10.00/?.

Related

Need help in regex matching

It may be very simple, but I am extremely new to regex and have a requirement where I need to do some regex matches in a string and extract the number in it. Below is my code with sample i/p and required o/p. I tried to construct the Pattern by referring to https://www.freeformatter.com/java-regex-tester.html, but my regex match itself is returning false.
Pattern pattern = Pattern.compile(".*/(a-b|c-d|e-f)/([0-9])+(#[0-9]?)");
String str = "foo/bar/Samsung-Galaxy/a-b/1"; // need to extract 1.
String str1 = "foo/bar/Samsung-Galaxy/c-d/1#P2";// need to extract 2.
String str2 = "foo.com/Samsung-Galaxy/9090/c-d/69"; // need to extract 69
System.out.println("result " + pattern.matcher(str).matches());
System.out.println("result " + pattern.matcher(str1).matches());
System.out.println("result " + pattern.matcher(str1).matches());
All of above SOPs are returning false. I am using java 8, is there is any way by which in a single statement I can match the pattern and then extract the digit from the string.
I would be great if somebody can point me on how to debug/develop the regex.Please feel free to let me know if something is not clear in my question.
You may use
Pattern pattern = Pattern.compile(".*/(?:a-b|c-d|e-f)/[^/]*?([0-9]+)");
See the regex demo
When used with matches(), the pattern above does not require explicit anchors, ^ and $.
Details
.* - any 0+ chars other than line break chars, as many as possible
/ - the rightmost / that is followed with the subsequent subpatterns
(?:a-b|c-d|e-f) - a non-capturing group matching any of the alternatives inside: a-b, c-d or e-f
/ - a / char
[^/]*? - any chars other than /, as few as possible
([0-9]+) - Group 1: one or more digits.
Java demo:
List<String> strs = Arrays.asList("foo/bar/Samsung-Galaxy/a-b/1","foo/bar/Samsung-Galaxy/c-d/1#P2","foo.com/Samsung-Galaxy/9090/c-d/69");
Pattern pattern = Pattern.compile(".*/(?:a-b|c-d|e-f)/[^/]*?([0-9]+)");
for (String s : strs) {
Matcher m = pattern.matcher(s);
if (m.matches()) {
System.out.println(s + ": \"" + m.group(1) + "\"");
}
}
A replacing approach using the same regex with anchors added:
List<String> strs = Arrays.asList("foo/bar/Samsung-Galaxy/a-b/1","foo/bar/Samsung-Galaxy/c-d/1#P2","foo.com/Samsung-Galaxy/9090/c-d/69");
String pattern = "^.*/(?:a-b|c-d|e-f)/[^/]*?([0-9]+)$";
for (String s : strs) {
System.out.println(s + ": \"" + s.replaceFirst(pattern, "$1") + "\"");
}
See another Java demo.
Output:
foo/bar/Samsung-Galaxy/a-b/1: "1"
foo/bar/Samsung-Galaxy/c-d/1#P2: "2"
foo.com/Samsung-Galaxy/9090/c-d/69: "69"
Because you match always the last number in your regex, I would Like to just use replaceAll with this regex .*?(\d+)$ :
String regex = ".*?(\\d+)$";
String strResult1 = str.replaceAll(regex, "$1");
System.out.println(!strResult1.isEmpty() ? "result " + strResult1 : "no result");
String strResult2 = str1.replaceAll(regex, "$1");
System.out.println(!strResult2.isEmpty() ? "result " + strResult2 : "no result");
String strResult3 = str2.replaceAll(regex, "$1");
System.out.println(!strResult3.isEmpty() ? "result " + strResult3 : "no result");
If the result is empty then you don't have any number.
Outputs
result 1
result 2
result 69
Here is a one-liner using String#replaceAll:
public String getDigits(String input) {
String number = input.replaceAll(".*/(?:a-b|c-d|e-f)/[^/]*?(\\d+)$", "$1");
return number.matches("\\d+") ? number : "no match";
}
System.out.println(getDigits("foo.com/Samsung-Galaxy/9090/c-d/69"));
System.out.println(getDigits("foo/bar/Samsung-Galaxy/a-b/some other text/1"));
System.out.println(getDigits("foo/bar/Samsung-Galaxy/9090/a-b/69ace"));
69
no match
no match
This works on the sample inputs you provided. Note that I added logic which will display no match for the case where ending digits could not be matched fitting your pattern. In the case of a non-match, we would typically be left with the original input string, which would not be all digits.

regex to only allow a range of doubles

I have a TextField where the input should be limited between 1.00 and 5.00.
I tried "\\d+\\.\\d+", [0-9]{1,5}(\.[0-9]+)? but neither worked, understandably.
Thanks.
As noted in a comment, there are input field types that will do this check for you. Answering your literal question, however, you can use this:
^([1-4]\.[0-9]{2})|(5\.00)$
You need to handle the 5.00 end of the range specially.
If you need double on exactly 2-digit precision you can use
^[1-5]{1}\.[0-9]{2}$
^[1-4](\\.\\d{1,2})?|5(\\.0{1,2})?$
This will validate 1.00 to 5.00 to two decimal places.
^([1-4]\\.[0-9]{2})$|(5\\.00)$
Example
// String to be scanned to find the pattern.
String line = "1.00";
String pattern = "^([1-4]\\.[0-9]{2})$|(5\\.00)$";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
}else {
System.out.println("NO MATCH: " + line);
}
line = "5.0000";
m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
}else {
System.out.println("NO MATCH: " + line);
}

Certain strings that should be found by a working Regex are missed, and I need help identifying why

I have a set of strings, which I cycle through, checking those against the following set of regex, to try and separate the first small section from the rest of the string. The regex works in almost all cases, but unfortunately I have no idea why it fails occasionally. I’ve been using Pattern Matcher to print out the string, if the pattern is found.
Two example working strings:
98. SORGHUM Moench - Millets Annuals or rhizomatous perennials; inflorescence …
99. MISCANTHUS Andersson - Silver-grasses Rhizomatous perennials; inflorescence …
Two example failed strings:
100. ZEA L. - Maize Annuals; male and female inflorescences separate, the …
26. POA L. (Parodiochloa C.E. Hubb.) - Meadow-grasses Annuals or perennials with or without stolons or rhizomes; sheaths overlapping or some …
Regex’s used so far:
Pattern endOfGenus = Pattern.compile("(?<=(^\\d+\\. " + genusNames[l].toUpperCase() + "))");
Pattern endOfGenusTwo = Pattern.compile("(?<=(^\\d+" + genusNames[l].toUpperCase() + "))");
Pattern endOfGenusThree = Pattern.compile("(?<=(\\d+\\. " + genusNames[l] + "))");
Pattern endOfGenusFour = Pattern.compile("(?<=(\\d+" + genusNames[l] + "))");
Pattern endOfGenusFive = Pattern.compile("(?<=(\\. " + genusNames[l] + "))");
The first of these is the one thats producing the reliable results so far.
Example Code
Pattern endOfGenus = Pattern.compile("(?<=(^\\d+\\. " + genusNames[l].toUpperCase() + "))");
Matcher endOfGenusFinder = endOfGenus.matcher(descriptionPartBits[b]);
if (endOfGenusFinder.find()) {
System.out.print(descriptionPartBits[b] + ":- ");
System.out.print(genusNames[l] + "\n");
String[] genusNameBits = descriptionPartBits[b].split("(?<=(^\\d+\\. " + genusNames[l].toUpperCase() + "))");
}
Desired Output. This is what is produced by strings that work. Strings that don't work simply don't appear in the output:
98. SORGHUM Moench - Millets Annuals or rhizomatous perennials:- Sorghum
99. MISCANTHUS Andersson - Silver-grasses Rhizomatous perennials:- Miscanthus
From regex tutorial:
Lookahead and lookbehind, collectively called "lookaround", are
zero-length assertions just like the start and end of line, and start
and end of word anchors explained earlier in this tutorial.
Lookahead and lookbehind only return true or false.
So I changed your code example:
Pattern endOfGenus = Pattern.compile("(?<=(^\\d+\\. ZEA L))(.+)$");
// Matcher matcher = endOfGenus.matcher("98. SORGHUM Moench - Millets Annuals or rhizomatous perennials; inflorescence …");
Matcher matcher = endOfGenus.matcher("100. ZEA L. - Maize Annuals; male and female inflorescences separate, the …");
while (matcher.find()) {
String group1 = matcher.group(1);
String group2 = matcher.group(2);
System.out.println("group1=" + group1);
System.out.println("group2=" + group2);
}
Group 1 is matched by (^\\d+\\. ZEA L). Group 2 is matched by (.+).

REGEX : How to escape []?

I'm working on strings like "[ro.multiboot]: [1]". How do I just select 1(it can also be 0) out of this string?
I am looking for a regex in Java.
Usually, you would do something like (assuming 0 and 1 were the only options):
^.*\[([01])\].*$
If you only wanted the value for ro.multiboot, you could change it to something like:
^.*\[ro.multiboot\].*\[([01])\].*$
(depending on how complex any of the non-bracketed stuff is allowed to be).
These would both basically only extract the value between square brackets if it were zero or one, and capture it into a capture variable so you could use it.
Of course, regex is not a world-wide standard, nor are the environments in which you use it. That means it depends a lot on your actual environment how you will actually code this up.
For Java, the following sample program may help:
import java.util.regex.*;
class Test {
public static void main(String args[]) {
Pattern p = Pattern.compile("^.*\\[ro.multiboot\\].*\\[([01])\\].*$");
String str;
Matcher m;
str = "[ro.multiboot]: [0]";
m = p.matcher (str);
if (m.find()) {
System.out.println ("str0 has " + m.group(1));
}
str = "[ro.multiboot]: [1]";
m = p.matcher (str);
if (m.find()) {
System.out.println ("str1 has " + m.group(1));
}
str = "[ro.multiboot]: [2]";
m = p.matcher (str);
if (m.find()) {
System.out.println ("str2 has " + m.group(1));
}
}
}
This results in (as expected):
str0 has 0
str1 has 1
#paxdiablo's regexps are correct, but complete answer for "How do I just select 1(it can also be 0) out of this string?" is:
1. very simple solution
String input = "[ro.multiboot]: [1]";
String matched = input.replaceFirst( "^.*\\[ro.multiboot\\].*\\[([01])\\].*$", "$1" );
2. same functionality, more complicated but with better performance
String input = "[ro.multiboot]: [1]";
Pattern p = Pattern.compile( "^.*\\[ro.multiboot\\].*\\[([01])\\].*$" );
Matcher m = p.matcher( input );
String matched = null;
if ( m.matches() ) matched = m.group( 1 );
Performance is better because the pattern is compiled just once (for example when you are matching array os such Strings);
Notes:
in both examples the group is part of regexps between ( and ) (if not escaped)
in Java you have to use \\[, because \[ returns error - it is not correct escape sequence for String

extract values with java regex

I begin with regex and i want extract values from a String like this
String test="[ABC]Name:User:Date: Adresse ";
I want extract Name, User , Date and Adresse
I can do the trick with substring and split
String test = "String test="[ABC]Name:User:Date: Adresse ";
String test2= test.substring(5,test.length());
System.out.println(test2);
String[] chaine = test2.split(":");
for(String s :chaine)
{
System.out.println("Valeur " + s);
}
but i want try with regex , i did
pattern = Pattern.compile("^[(ABC)|:].");
but it doesn ' t work
Can you help me please ?
Thanks a lot
String#split is really the best way to accomplish what you are trying to do. Having said that, with regex, the following will give you the same output:
Pattern p = Pattern.compile("^(?:\\[ABC\\])([^:]+):([^:]+):([^:]+):([^:]+)$");
Matcher m = p.matcher(test);
while (m.find()) {
System.out.println("Valeur " + m.group(1)); // Name
System.out.println("Valeur " + m.group(2)); // User
System.out.println("Valeur " + m.group(3)); // Date
System.out.println("Valeur " + m.group(4)); // Address
}
You have to escape the [ and ] here is a working example.
^\[(.*)\](.*):(.*):(.*):(.*)$
Note that your code is probably more easily maintained than regular expressions in cases where the regular expression becomes complex.
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems. - Jamie Zawinski

Categories

Resources