Extract a number from an amount from a String - java

I have below method which I use to extract amount from a string.
strAmountString = "$272.94/mo for 24 months Regular Price -$336.9"
public static String fnAmountFromString(String strAmountString) {
String strOutput = "";
Pattern pat = Pattern.compile("\\$(-?\\d+.\\d+)?.*");
Matcher mat = pat.matcher(strAmountString);
while(mat.find())
strOutput = mat.group(1);
return strOutput;
}
Now I have to extract string 272.94 from the string and above function works fine.
But when I have to extract 272.94 from String strAmountString = "272.94", gives me a null.
Also I have to extract the amount -336.9 from string strAmountString = "$272.94/mo for 24 months Regular Price -$336.9"

Your first issue, with trying to use 272.94, is related to the requirements of your regular expression, the fact that there is a requirement for the String to be lead by a $
You could make $ part of an optional group, for example ((\\$)?\\d+.\\d+), which will match both 272.94 and $272.94, but won't match -$336.9 directly, it will match $336.9 though.
So, working off your example, you could use ((-)?(\\$)?\\d+.\\d+) which will now match -$336.9 as well...
Personally, I might use ((-)?(\\$)?(-)?\\d+.\\d+), which will match -$336.9, $-336.9, -336.9 and 336.9
The next step would be try remove $ from the result, yes, you could try using another regular expression, but to be honest, String#replaceAll would be easier...
Note- My regular expression knowledge is pretty basic, so there might be simpler soltion
Updated with example
String value = "$272.94/mo for 24 months Regular Price -$336.9";
String regExp = "((-)?(\\$)?(-)?\\d+.\\d+)";
Pattern p = Pattern.compile(regExp);
Matcher matcher = p.matcher(value);
while (matcher.find()) {
System.out.println(matcher.group());
}
Which outputs...
$272.94
-$336.9

The following reg ex will get you your two groups (as group 1 and group 3)
(\\$\\d+\\.\\d+)(.*)?(\\-?\\$\\d+\\.\\d+)

First, you need to make the dollar sign in your Pattern optional - or in other words, it needs to exist 0 or more times. Use the * qualifier.
Second, if you're sure that the dollar amount will always be at the beginning of the string, you can use the ^ boundary matcher, which indicates the beginning of the line.
Similarly, if you're sure that the final dollar amount will always be at the end of the line, you can use the $ boundary matcher.
See more details here: http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
Test your patterns here: http://www.regexplanet.com/advanced/java/index.html

Related

What is the Regex for decimal numbers in Java?

I am not quite sure of what is the correct regex for the period in Java. Here are some of my attempts. Sadly, they all meant any character.
String regex = "[0-9]*[.]?[0-9]*";
String regex = "[0-9]*['.']?[0-9]*";
String regex = "[0-9]*["."]?[0-9]*";
String regex = "[0-9]*[\.]?[0-9]*";
String regex = "[0-9]*[\\.]?[0-9]*";
String regex = "[0-9]*.?[0-9]*";
String regex = "[0-9]*\.?[0-9]*";
String regex = "[0-9]*\\.?[0-9]*";
But what I want is the actual "." character itself. Anyone have an idea?
What I'm trying to do actually is to write out the regex for a non-negative real number (decimals allowed). So the possibilities are: 12.2, 3.7, 2., 0.3, .89, 19
String regex = "[0-9]*['.']?[0-9]*";
Pattern pattern = Pattern.compile(regex);
String x = "5p4";
Matcher matcher = pattern.matcher(x);
System.out.println(matcher.find());
The last line is supposed to print false but prints true anyway. I think my regex is wrong though.
Update
To match non negative decimal number you need this regex:
^\d*\.\d+|\d+\.\d*$
or in java syntax : "^\\d*\\.\\d+|\\d+\\.\\d*$"
String regex = "^\\d*\\.\\d+|\\d+\\.\\d*$"
String string = "123.43253";
if(string.matches(regex))
System.out.println("true");
else
System.out.println("false");
Explanation for your original regex attempts:
[0-9]*\.?[0-9]*
with java escape it becomes :
"[0-9]*\\.?[0-9]*";
if you need to make the dot as mandatory you remove the ? mark:
[0-9]*\.[0-9]*
but this will accept just a dot without any number as well... So, if you want the validation to consider number as mandatory you use + ( which means one or more) instead of *(which means zero or more). That case it becomes:
[0-9]+\.[0-9]+
If you on Kotlin, use ktx:
fun String.findDecimalDigits() =
Pattern.compile("^[0-9]*\\.?[0-9]*").matcher(this).run { if (find()) group() else "" }!!
Your initial understanding was probably right, but you were being thrown because when using matcher.find(), your regex will find the first valid match within the string, and all of your examples would match a zero-length string.
I would suggest "^([0-9]+\\.?[0-9]*|\\.[0-9]+)$"
There are actually 2 ways to match a literal .. One is using backslash-escaping like you do there \\., and the other way is to enclose it inside a character class or the square brackets like [.]. Most of the special characters become literal characters inside the square brackets including .. So use \\. shows your intention clearer than [.] if all you want is to match a literal dot .. Use [] if you need to match multiple things which represents match this or that for example this regex [\\d.] means match a single digit or a literal dot
I have tested all the cases.
public static boolean isDecimal(String input) {
return Pattern.matches("^[-+]?\\d*[.]?\\d+|^[-+]?\\d+[.]?\\d*", input);
}

Filter and find integers in a String with Regex

I have this long string:
String responseData = "fker.phone.bash,0,0,0"
+ "fker.phone.bash,0,0,0"
+ "fker.phone.bash,2,0,0";
What I want to do is to extract the integers in this string. I have successfully done that with this code:
String pattern = "(\\d+)";
// this pattern finds EVERY integer. I only want the integers after the comma
Pattern pr = Pattern.compile(pattern);
Matcher match = pr.matcher(responseData);
while (match.find()) {
System.out.println(match.group());
}
So far it is working, but I want to make my regex more secure because the responsedata I get is dynamic. Sometimes I might get an integer in the middle of the string, but I only want the last integers, meaning after the comma.
I know the regex for starts with is ^ and I have to put my comma tecken as an argument, but I don't know how to piece it all together and that is why I am asking for help. Thank you.
String pattern = "(,)(\\d)+";
Then get the second group.
You can use positive lookbehind for that:
String pattern = "(?<=,)\\d+";
You don't need to extract any groups to do use that solution, because lookbehind is zero-length assertion.
You can simply use the following and find by match.group(1):
String pattern = ",(\\d+)";
See working demo
You can also use word boundaries to get independent numbers:
String pattern = "\\b(\\d+)\\b";

Java Regex is including new line in match

I'm trying to match a regular expression to textbook definitions that I get from a website.
The definition always has the word with a new line followed by the definition. For example:
Zither
Definition: An instrument of music used in Austria and Germany It has from thirty to forty wires strung across a shallow sounding board which lies horizontally on a table before the performer who uses both hands in playing on it Not to be confounded with the old lute shaped cittern or cithern
In my attempts to get just the word (in this case "Zither") I keep getting the newline character.
I tried both ^(\w+)\s and ^(\S+)\s without much luck. I thought that maybe ^(\S+)$ would work, but that doesn't seem to successfully match the word at all. I've been testing with rubular, http://rubular.com/r/LPEHCnS0ri; which seems to successfully match all my attempts the way I want, despite the fact that Java doesn't.
Here's my snippet
String str = ...; //Here the string is assigned a word and definition taken from the internet like given in the example above.
Pattern rgx = Pattern.compile("^(\\S+)$");
Matcher mtch = rgx.matcher(str);
if (mtch.find()) {
String result = mtch.group();
terms.add(new SearchTerm(result, System.nanoTime()));
}
This is easily solved by triming the resulting string, but that seems like it should be unnecessary if I'm already using a regular expression.
All help is greatly appreciated. Thanks in advance!
Try using the Pattern.MULTILINE option
Pattern rgx = Pattern.compile("^(\\S+)$", Pattern.MULTILINE);
This causes the regex to recognise line delimiters in your string, otherwise ^ and $ just match the start and end of the string.
Although it makes no difference for this pattern, the Matcher.group() method returns the entire match, whereas the Matcher.group(int) method returns the match of the particular capture group (...) based on the number you specify. Your pattern specifies one capture group which is what you want captured. If you'd included \s in your Pattern as you wrote you tried, then Matcher.group() would have included that whitespace in its return value.
With regular expressions the first group is always the complete matching string. In your case you want group 1, not group 0.
So changing mtch.group() to mtch.group(1) should do the trick:
String str = ...; //Here the string is assigned a word and definition taken from the internet like given in the example above.
Pattern rgx = Pattern.compile("^(\\w+)\s");
Matcher mtch = rgx.matcher(str);
if (mtch.find()) {
String result = mtch.group(1);
terms.add(new SearchTerm(result, System.nanoTime()));
}
A late response, but if you are not using Pattern and Matcher, you can use this alternative of DOTALL in your regex string
(?s)[Your Expression]
Basically (?s) also tells dot to match all characters, including line breaks
Detailed information: http://www.vogella.com/tutorials/JavaRegularExpressions/article.html
Just replace:
String result = mtch.group();
By:
String result = mtch.group(1);
This will limit your output to the contents of the capturing group (e.g. (\\w+)) .
Try the next:
/* The regex pattern: ^(\w+)\r?\n(.*)$ */
private static final REGEX_PATTERN =
Pattern.compile("^(\\w+)\\r?\\n(.*)$");
public static void main(String[] args) {
String input = "Zither\n Definition: An instrument of music";
System.out.println(
REGEX_PATTERN.matcher(input).matches()
); // prints "true"
System.out.println(
REGEX_PATTERN.matcher(input).replaceFirst("$1 = $2")
); // prints "Zither = Definition: An instrument of music"
System.out.println(
REGEX_PATTERN.matcher(input).replaceFirst("$1")
); // prints "Zither"
}

Matching everything after the first comma in a string

I am using java to do a regular expression match. I am using rubular to verify the match and ideone to test my code.
I got a regex from this SO solution , and it matches the group as I want it to in rubular, but my implementation in java is not matching. When it prints 'value', it is printing the value of commaSeparatedString and not matcher.group(1) I want the captured group/output of println to be "v123_gpbpvl-testpv1,v223_gpbpvl-testpv1-iso"
String commaSeparatedString = "Vtest7,v123_gpbpvl-testpv1,v223_gpbpvl-testpv1-iso";
//match everything after first comma
String myRegex = ",(.*)";
Pattern pattern = Pattern.compile(myRegex);
Matcher matcher = pattern.matcher(commaSeparatedString);
String value = "";
if (matcher.matches())
value = matcher.group(1);
else
value = commaSeparatedString;
System.out.println(value);
(edit: I left out that commaSeparatedString will not always contain 2 commas. Rather, it will always contain 0 or more commas)
If you don't have to solve it with regex, you can try this:
int size = commaSeparatedString.length();
value = commaSeparatedString.substring(commaSeparatedString.indexOf(",")+1,size);
Namely, the code above returns the substring which starts from the first comma's index.
EDIT:
Sorry, I've omitted the simpler version. Thanks to one of the commentators, you can use this single line as well:
value = commaSeparatedString.substring( commaSeparatedString.indexOf(",") );
The definition of the regex is wrong. It should be:
String myRegex = "[^,]*,(.*)";
You are yet another victim of Java's misguided regex method naming.
.matches() automatically anchors the regex at the beginning and end (which is in total contradiction with the very definition of "regex matching"). The method you are looking for is .find().
However, for such a simple problem, it is better to go with #DelShekasteh's solution.
I would do this like
String commaSeparatedString = "Vtest7,v123_gpbpvl-testpv1,v223_gpbpvl-testpv1-iso";
System.out.println(commaSeparatedString.substring(commaSeparatedString.indexOf(",")+1));
Here is another approach with limited split
String[] spl = "Vtest7,v123_gpbpvl-testpv1,v223_gpbpvl-testpv1-iso".split(",", 2);
if (spl.length == 2)
System.out.println(spl[1]);
Byt IMHO Del's answer is best for your case.
I would use replaceFirst
String commaSeparatedString = "Vtest7,v123_gpbpvl-testpv1,v223_gpbpvl-testpv1-iso";
System.out.println(commaSeparatedString.replaceFirst(".*?,", ""));
prints
v123_gpbpvl-testpv1,v223_gpbpvl-testpv1-iso
or you could use the shorter but obtuse
System.out.println(commaSeparatedString.split(",", 2)[1]);

Extract an ISBN with regex

I have an extremely long string that I want to parse for a numeric value that occurs after the substring "ISBN". However, this grouping of 13 digits can be arranged differently via the "-" character. Examples: (these are all valid ISBNs) 123-456-789-123-4, OR 1-2-3-4-5-67891234, OR 12-34-56-78-91-23-4. Essentially, I want to use a regex pattern matcher on the potential ISBN to see if there is a valid 13 digit ISBN. How do I 'ignore' the "-" character so I can just regex for a \d{13} pattern? My function:
public String parseISBN (String sourceCode) {
int location = sourceCode.indexOf("ISBN") + 5;
String ISBN = sourceCode.substring(location); //substring after "ISBN" occurs
int i = 0;
while ( ISBN.charAt(i) != ' ' )
i++;
ISBN = ISBN.substring(0, i); //should contain potential ISBN value
Pattern pattern = Pattern.compile("\\d{13}"); //this clearly will find 13 consecutive numbers, but I need it to ignore the "-" character
Matcher matcher = pattern.matcher(ISBN);
if (matcher.find()) return ISBN;
else return null;
}
Alternative 1:
pattern.matcher(ISBN.replace("-", ""))
Alternative 2: Something like
Pattern.compile("(\\d-?){13}")
Demo of second alternative:
String ISBN = "ISBN: 123-456-789-112-3, ISBN: 1234567891123";
Pattern pattern = Pattern.compile("(\\d-?){13}");
Matcher matcher = pattern.matcher(ISBN);
while (matcher.find())
System.out.println(matcher.group());
Output:
123-456-789-112-3
1234567891123
Try this:
Pattern.compile("\\d(-?\\d){12}")
Use this pattern:
Pattern.compile("(?:\\d-?){13}")
and strip all dashes from the found isbn number
Do it in one step with a pattern recognizing everything, and optional dashes between digits. No need to fiddle with ISBN offset + substrings.
ISBN(\d(-?\d){12})
If you want the raw number, strip dashes from the first matched subgroup afterwards.
I am not a Java guy so I won't show you code.
If you're going to be calling the method a lot, the best thing you can do is not compile the Pattern inside it. Otherwise, each time you call the method you'll spend more time creating the regex than you will actually searching for it.
But after looking at your code again, I think you have a bigger problem, performance-wise. All that business of locating "ISBN" and then creating substrings to apply the regex to is completely unnecessary. Let the regex do that stuff; it's what they're for. The following regex finds the "ISBN" sentinel and the following thirteen digits, if they're there:
static final Pattern isbnPattern = Pattern.compile(
"\\bISBN[^A-Z0-9]*+(\\d(?:-*+\\d){12})", Pattern.CASE_INSENSITIVE );
The [^A-Z0-9]*+ gobbles up whatever characters may appear between the "ISBN" and the first digit. The possessive quantifier (*+) prevents needless backtracking; if the next character is not a digit, the regex engine immediately quits that match attempt and resumes scanning for another "ISBN" instance.
I used another possessive quantifier for the optional hyphens, plus a non-capturing group ((?:...)) for the repeated portion; that gives another slight performance gain over the capturing groups most of the other responders are using. But I used a capturing group for the whole number, so it can be extracted from the overall match easily. With these changes, your method reduces to this:
public String parseISBN (String source) {
Matcher m = isbnPattern.matcher(source);
return m.find() ? m.group(1) : null;
}
...and it's much more efficient, too. Note that we haven't addressed how the strings are getting into memory. If you're doing the I/O yourself, it's possible there are significant performance gains to be achieved in that area, too.
You can strip out the dashes with string manipulation, or you could use this:
"\\b(?:\\d-?){13}\\b"
It has the added bonus of making sure the string doesn't start or end with -.
Try stripping the dashes out, and regex the new string
you can try this
"(?:[0-9]{9}[0-9X]|[0-9]{13}|[0-9][0-9-]{11}[0-9X]|[0-9][0-9-]{15}[0-9])(?![0-9-])"

Categories

Resources