extracting long/float numbers from a string in java - java

I have input like this ==>
2 book at 12.99
4 potato chips at 3.99
I want to extract the numeric values from each line and store them in variables
for example in the line.. 2 book at 12.99 i want to extract Qauntity =2 and Price =12.99
from the given string

You can use:
Pattern p = Pattern.compile("(\\d+)\\D+(\\d+(?:.\\d+)?)");
Matcher mr = p.matcher("4 potato chips at 3.99");
if (mr.find()) {
System.out.println( mr.group(1) + " :: " + mr.group(2) );
}
OUTPUT:
4 :: 3.99

Regex
(\d+)[^\d]+([+-]?[0-9]{1,3}(?:,?[0-9]{3})*(?:\.[0-9]{2})?)
Debuggex Demo
Description (Example)
/^(\d+)[^\d]+([+-]?[0-9]{1,3}(?:,?[0-9]{3})*(?:\.[0-9]{2})?)$/gm
^ Start of line
1st Capturing group (\d+)
\d 1 to infinite times [greedy] Digit [0-9]
Negated char class [^\d] 1 to infinite times [greedy] matches any character except:
\d Digit [0-9]
2nd Capturing group ([+-]?[0-9]{1,3}(?:,?[0-9]{3})*(?:\.[0-9]{2})?)
Char class [+-] 0 to 1 times [greedy] matches:
+- One of the following characters +-
Char class [0-9] 1 to 3 times [greedy] matches:
0-9 A character range between Literal 0 and Literal 9
(?:,?[0-9]{3}) Non-capturing Group 0 to infinite times [greedy]
, 0 to 1 times [greedy] Literal ,
Char class [0-9] 3 times [greedy] matches:
0-9 A character range between Literal 0 and Literal 9
(?:\.[0-9]{2}) Non-capturing Group 0 to 1 times [greedy]
\. Literal .
Char class [0-9] 2 times [greedy] matches:
0-9 A character range between Literal 0 and Literal 9
$ End of line
g modifier: global. All matches (don't return on first match)
m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
Capture Group 1: Contains the Quantity
Capture Group 2: Contains the Amount
Java
try {
Pattern regex = Pattern.compile("(\\d+)[^\\d]+([+-]?[0-9]{1,3}(?:,?[0-9]{3})*(?:\\.[0-9]{2})?)", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
for (int i = 1; i <= regexMatcher.groupCount(); i++) {
// matched text: regexMatcher.group(i)
// match start: regexMatcher.start(i)
// match end: regexMatcher.end(i)
}
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
Note: This java is just an example, I don't code in Java

You can use MessageFormat class. Below is the working example:
MessageFormat f = new MessageFormat("{0,number,#.##} {2} at {1,number,#.##}");
try {
Object[] result = f.parse("4 potato chips at 3.99");
System.out.print(result[0]+ ":" + (result[1]));
} catch (ParseException ex) {
// handle parse error
}

Related

How can I get values of the same group several times in Matcher regex Java?

After multiple searches, I come to you.
I made a regex in java :
^([a-zA-Z][a-zA-Z0-9]{0,10})( \(([0-9]),([a-zA-Z][a-zA-Z0-9]{0,10})\))+$
When I try this : "S1 (4,S5)"
It returns :
Group 1 -> "S1"
Group 2 -> " (4,S5)"
Group 3 -> "(4,S5)"
it works well.
But when I try this : "S1 (4,S5) (2,S3)"
it returns :
Group 1 -> "S1"
Group 2 -> " (2,S3)"
Group 3 -> "(2,S3)"
It doesn't want to return the (4,S5)
How can I get the same group several times ?
Thanks for your help !
You could make use of the \G anchor to get contiguous matches with 2 capture groups instead of multiple capture groups.
(?:^([a-zA-Z][a-zA-Z0-9]{0,10})|\G(?!^))\h*(\([0-9]+,[a-zA-Z][a-zA-Z0-9]{0,10}\))
The pattern matches:
(?: Non capture group (for the alternation)
^ Start of string
( Capture group 1
[a-zA-Z][a-zA-Z0-9]{0,10} Match a single char a-zA-Z followed by 0-10 times
) Close group 1
| Or
\G(?!^) Assert the position at the end of the previous match, or at the start of the string. As we specify that we have a specific match at the start of the string in the first part of the alternation, we can rule out that position using a negative lookahead using (?!^)
) Close non capture group
\h* Match optional horizontal whitespace characters
( Capture group 2
\( Match (
[0-9]+, Match 1+ digits and a comma
[a-zA-Z][a-zA-Z0-9]{0,10} Match a single char a-zA-Z followed by 0-10 times
\)
) Close group 2
Regex demo | Java demo
String regex = "(?:^([a-zA-Z][a-zA-Z0-9]{0,10})|\\G(?!^))\\h*(\\([0-9]+,[a-zA-Z][a-zA-Z0-9]{0,10}\\))";
String string = "S1 (4,S5)\n"
+ "S2 (4,S5) (2,S3)";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
if (matcher.group(i) != null) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
Output
Group 1: S1
Group 2: (4,S5)
Group 1: S2
Group 2: (4,S5)
Group 2: (2,S3)

Using Regex Pattern and Matcher to print out number once only

This piece of code is part of my programme and I am trying to print out the last integer value of the string only whenever the operator and the equals sign are together (e.g. ^=, *=, etc.).
Hence, if I enter 4 4 ^ 4 ^ 4 ^=, I would only want to print out "4". The same counts if the number 4 is directly before the "^=", e.g. 4 4 ^ 4 ^ 4^=.
My code is this:
if ((input.endsWith("^=")) | (input.endsWith("*=")) |
(input.endsWith("+=")) | (input.endsWith("-=")) |
(input.endsWith("%=")) | (input.endsWith("/=")))
{
Pattern p = Pattern.compile("[^\\d]*[\\d]+[^\\d]+([\\d]+)");
Matcher m = p.matcher(input);
if (m.find()) {
System.out.println(m.group(1)); // second matched digits
}
}
Currently my code prints out the number 4 multiple times, but I would only want to print it once. Any help is is appreciate.
Thank you!
You might use:
([0-9]+)\h*[\^+%/*-]=(?!.*[\^+%/*-]=)
([0-9]+) Capture 1+ digits 0-9 in group 1
\h* Match 0+ horizontal whitespace chars
[\^+%/*-]= Match any of the listed followed by =
(?!.*[\^+%/*-]=) Negative lookahead, assert what is on the right does not contain an operator followed by an equals sign
Regex demo | Java demo
In Java
final String regex = "([0-9]+)\\h*[\\^+%/*-]=(?!.*[\\^+%/*-]=)";
Try
(\d+)\s*[-+*/^%]=$
Find 1 or more digits and capture them
if they're followed by 0 or more spaces
followed by -, +, *, /, ^ or %
followed by =
followed by the end of the string

Match all occurrences Regex Java

i'd like to recognize all sequences of "word-number-word" of a string with Regex Java API.
For example, if i have "ABC-122-JDHFHG-456-MKJD", i'd like the output : [ABC-122-JDHFHG, JDHFHG-456-MKJD].
String test = "ABC-122-JDHFHG-456-MKJD";
Matcher m = Pattern.compile("(([A-Z]+)-([0-9]+)-([A-Z]+))+")
.matcher(test);
while (m.find()) {
System.out.println(m.group());
}
The code above return only "ABC-122-JDHFHG".
Any ideas ?
The last ([A-Z]+) matches and consumes JDHFHG, so the regex engine only "sees" -456-MKJD after the first match, and the pattern does not match this string remainder.
You want to get "whole word" overlapping matches.
Use
String test = "ABC-122-JDHFHG-456-MKJD";
Matcher m = Pattern.compile("(?=\\b([A-Z]+-[0-9]+-[A-Z]+)\\b)")
.matcher(test);
while (m.find()) {
System.out.println(m.group(1));
} // => [ ABC-122-JDHFHG, JDHFHG-456-MKJD ]
See the Java demo
Pattern details
(?= - start of a positive lookahead that matches a position that is immediately followed with
\\b - a word boundary
( - start of a capturing group (to be able to grab the value you need)
[A-Z]+ - 1+ ASCII uppercase letters
- - a hyphen
[0-9]+ - 1+ digits
- - a hyphen
[A-Z]+ - 1+ ASCII uppercase letters
) - end of the capturing group
\\b - a word boundary
) - end of the lookahead construct.
Here you go, overlap the last word.
Make an array out of capture group 1.
Basically, find 3 consume 2. This makes the next match position start
on the next possible known word.
(?=(([A-Z]+-\d+-)[A-Z]+))\2
https://regex101.com/r/Sl5FgT/1
Formatted
(?= # Assert to find
( # (1 start), word,num,word
( # (2 start), word,num
[A-Z]+
-
\d+
-
) # (2 end)
[A-Z]+
) # (1 end)
)
\2 # Consume word,num

how to check a line start with foo or bar?

in my case i have a some word like
**********menu 1************
gaurav
saurav
amit
avanish
**********menu 2************
gauravqwe
sourav
anit
abhishek
now i want to check item "gaurav" from menu one or menu two.
if "gaurav" from menu 1 then return true else return false
i m try:
class Regex_Test {
public void checker(String Regex_Pattern){
Scanner Input = new Scanner(System.in);
String Test_String = Input.nextLine();
Pattern p = Pattern.compile(Regex_Pattern);
Matcher m = p.matcher(Test_String);
System.out.println(m.find());
}
}
public class CheckHere {
public static void main(String[] args) {
Regex_Test tester = new Regex_Test();
tester.checker("^[gsa][amv]"); // Use \\ instead of using \
}
}
but it returns true in case of "gauravqwe"
i need expression "string" for the above question
Condition string size is less then 15 character
Use the meta-character word boundaries \b
\bbgaurav\b
regex101 Demo
References:
http://www.regular-expressions.info/wordboundaries.html
Less than 15 characters
To do this in less than 15 characters you need a regex that looks like this:
^gaurav(\r|\Z)
This regex is a subset of following answer.
Description
I'd do it in one pass by building a Regex to grab the menu title of the matching entry. This regex will do the following:
find gaurav in the source string
return the menu name from the matching section
if the string does not contain a match then return will be empty
The Regex
[*]{9,}([^*]*?)[*]{9,}(?:(?![*]{9,}).)*^gaurav(?:\r|\Z)
Note this regex uses the following options: case-insensitive, multiline (with ^ and $ matching start and end of line), and dot matches new line (with . matching \n)
Explanation
This construct (?:(?![*]{9,}).)* is where all the magic happens. This forces the searching to move forward through the string, but does not allow the pattern matching to span multiple ********** delimited segments.
The ^ and (?:\n|\Z) constructs force the regex engine to match a full string, and not just the initial characters. Example: if you're looking for gaurav then gauravqwe will not be matched.
NODE EXPLANATION
----------------------------------------------------------------------
[*]{9,} any character of: '*' (at least 9 times
(matching the most amount possible))
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[^*]*? any character except: '*' (0 or more
times (matching the least amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
[*]{9,} any character of: '*' (at least 9 times
(matching the most amount possible))
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
[*]{9,} any character of: '*' (at least 9
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
^ the beginning of a "line"
----------------------------------------------------------------------
gaurav 'gaurav'
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
\r '\r' (carriage return)
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
\Z before an optional \n, and the end of
the string
----------------------------------------------------------------------
) end of grouping
Java Code Example
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
public static void main(String[] asd){
String sourcestring = "source string to match with pattern";
Pattern re = Pattern.compile("[*]{9,}([^*]*?)[*]{9,}(?:(?![*]{9,}).)*^gaurav(?:\\r|\\Z)",Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);
Matcher m = re.matcher(sourcestring);
int mIdx = 0;
while (m.find()){
for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
}
mIdx++;
}
}
}
Returns
$matches Array:
(
[0] => Array
(
[0] => **********menu 1************
gaurav
)
[1] => Array
(
[0] => menu 1
)
)

Extracting numbers from a String in Java by splitting on a regex

I want to extract numbers from Strings like this:
String numbers[] = "M0.286-3.099-0.44c-2.901,-0.436,,,123,0.123,.34".split(PATTERN);
From such String I'd like to extract these numbers:
0.286
-3.099
-0.44
-2.901
-0.436
123
0.123
.34
That is:
There can be garbage characters like "M", "c", "c"
The "-" sign is to include in the number, not to split on
A "number" can be anything that Float.parseFloat can parse, so .34 is valid
What I have so far:
String PATTERN = "([^\\d.-]+)|(?=-)";
Which works to some degree, but obviously far from perfect:
Doesn't skip the starting garbage "M" in the example
Doesn't handle consecutive garbage, like the ,,, in the middle
How to fix PATTERN to make it work?
You could use a regex like this:
([-.]?\d+(?:\.\d+)?)
Working demo
Match Information:
MATCH 1
1. [1-6] `0.286`
MATCH 2
1. [6-12] `-3.099`
MATCH 3
1. [12-17] `-0.44`
MATCH 4
1. [18-24] `-2.901`
MATCH 5
1. [25-31] `-0.436`
MATCH 6
1. [34-37] `123`
MATCH 7
1. [38-43] `0.123`
MATCH 8
1. [44-47] `.34`
Update
Jawee's approach
As Jawee pointed in his comment there is a problem for .34.34, so you can use his regex that fix this problem. Thanks Jawee to point out that.
(-?(?:\d+)?\.?\d+)
To have graphic idea about what happens behind this regex you can check this Debuggex
image:
Engine explanation:
1st Capturing group (-?(?:\d+)?\.?\d+)
-? -> matches the character - literally zero and one time
(?:\d+)? -> \d+ match a digit [0-9] one and unlimited times (using non capturing group)
\.? matches the character . literally zero and one time
\d+ match a digit [0-9] one and unlimited times
Try this one (-?(?:\d+)?\.?\d+)
Example as below:
Demo Here
Thanks a lot for nhahtdh's comments. That's true, we could update as below:
[-+]?(?:\d+(?:\.\d*)?|\.\d+)
Updated Demo Here
Actually, if we take all possible float input String format (e.g: Infinity, -Infinity, 00, 0xffp23d, 88F), then it could be a little bit complicated. However, we still could implement it as below Java code:
String sign = "[-+]?";
String hexFloat = "(?>0[xX](((\\p{XDigit}+)\\.?)|((\\p{XDigit}*)\\.(\\p{XDigit}+)))[pP]([-+])?(\\p{Digit}+)[fFdD]?)";
String nan = "(?>NaN)";
String inf = "(?>Infinity)";
String dig = "(?>\\d+(?:\\.\\d*)?|\\.\\d+)";
String exp = "(?:[eE][-+]?\\d+)?";
String suf = "[fFdD]?";
String digFloat = "(?>" + dig + exp + suf + ")";
String wholeFloat = sign + "(?>" + hexFloat + "|" + nan + "|" + inf + "|" + digFloat + ")";
String s = "M0.286-3.099-0.44c-2.901,-0.436,,,123,0.123d,.34d.34.34M24.NaNNaN,Infinity,-Infinity00,0xffp23d,88F";
Pattern floatPattern = Pattern.compile(wholeFloat);
Matcher matcher = floatPattern.matcher(s);
int i = 0;
while (matcher.find()) {
String f = matcher.group();
System.out.println(i++ + " : " + f + " --- " + Float.parseFloat(f) );
}
Then the output is as below:
0 : 0.286 --- 0.286
1 : -3.099 --- -3.099
2 : -0.44 --- -0.44
3 : -2.901 --- -2.901
4 : -0.436 --- -0.436
5 : 123 --- 123.0
6 : 0.123d --- 0.123
7 : .34d --- 0.34
8 : .34 --- 0.34
9 : .34 --- 0.34
10 : 24. --- 24.0
11 : NaN --- NaN
12 : NaN --- NaN
13 : Infinity --- Infinity
14 : -Infinity --- -Infinity
15 : 00 --- 0.0
16 : 0xffp23d --- 2.13909504E9
17 : 88F --- 88.0
You can do it in one line (but with one less step than aioobe's answer!):
String[] numbers = "M0.286-3.099-0.44c-2.901,-0.436,,,123,0.123,.34"
.replaceAll("^[^.\\d-]+|[^.\\d-]+$", "") // remove junk from start/end
.split("[^.\\d-]+"); // split on anything not part of a number
Although less calls are made, aioobe's answer is easier to read and understand, which makes his better code.
Using the regex you crafted yourself you can solve it as follows:
String[] numbers = "M0.286-3.099-0.44c-2.901,-0.436,,,123,0.123,.34"
.replaceAll(PATTERN, " ")
.trim()
.split(" +");
On the other hand, if I were you, I'd do the loop instead:
Matcher m = Pattern.compile("[.-]?\\d+(\\.\\d+)?").matcher(input);
List<String> matches = new ArrayList<>();
while (m.find())
matches.add(m.group());
I think this is exactly what you want:
String pattern = "[-+]?[0-9]*\\.?[0-9]+";
String line = "M0.286-3.099-0.44c-2.901,-0.436,,,123,0.123,.34";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
List<String> numbers=new ArrayList<String>();
while(m.find()) {
numbers.add(m.group());
}
Its nice you put a bounty on this.
Unfortunately, as you probably already know, this can't be done using
Java's string split method directly.
If it can't be done directly, there is no reason to kludge it as it is, well .. a kludge.
The reasons are many, some related, some not.
To start off, you need to define a good regex as a base.
This is the only regex I know that will validate and extract a proper form:
# "((?=[+-]?\\d*\\.?\\d)[+-]?\\d*\\.?\\d*)"
( # (1 start)
(?= [+-]? \d* \.? \d )
[+-]? \d* \.? \d*
) # (1 end)
So, looking at this base regex, its clear you want this form that it matches.
In the case of split, you don't want the form that this matches, because that's
where you want the breaks to be.
As I look at Java's split, I see that no matter what it matches, it will be excluded
from the resulting array.
So, presuming split usage, the first thing to match (and consume) is all the stuff that is not
this. That part will be something like this:
(?:
(?!
(?= [+-]? \d* \.? \d )
[+-]? \d* \.? \d*
)
.
)+
Since the only thing left is valid decimal numbers, the next break will be somewhere
between valid numbers. This part, added to the first part, will be something like this:
(?:
(?!
(?= [+-]? \d* \.? \d )
[+-]? \d* \.? \d*
)
.
)+
| # or,
(?<=
(?= [+-]? \d* \.? \d )
[+-]? \d* \.? \d*
)
(?=
(?= [+-]? \d* \.? \d )
[+-]? \d* \.? \d*
)
And all of a sudden, we have a problem .. a variable length lookbehind assertion
So, its game over for the whole thing.
Lastly and unfortunately, Java does not (as far as I can see) have a provision to include capture
group contents (matched in the regex) as an element in the resulting array.
Perl does, but I can't find that ability in Java.
If Java had that provision, the break sub expressions could be combined to do a seamless split.
Like this:
(?:
(?!
(?= [+-]? \d* \.? \d )
[+-]? \d* \.? \d*
)
.
)*
(
(?= [+-]? \d* \.? \d )
[+-]? \d* \.? \d*
)

Categories

Resources