Java: how to parse double from regex

Java: how to parse double from regex - java

I have a string that looks like "A=1.23;B=2.345;C=3.567"
I am only interested in "C=3.567"
what i have so far is:
Matcher m = Pattern.compile("C=\\d+.\\d+").matcher("A=1.23;B=2.345;C=3.567");
while(m.find()){
double d = Double.parseDouble(m.group());
System.out.println(d);
}
the problem is it shows the 3 as seperate from the 567
output:
3.0
567.0
i am wondering how i can include the decimal so it outputs "3.567"
EDIT: i would also like to match C if it does not have a decimal point:
so i would like to capture 3567 as well as 3.567
since the C= is built into the pattern as well, how can i strip it out before parsing the double?

I may be mistaken on this part, but the reason it's separating the two is because group() will only match the last-matched subsequence, which is whatever gets matched by each call to find(). Thanks, Mark Byers.
For sure, though, you can solve this by placing the entire part you want inside a "capturing group", which is done by placing it in parentheses. This makes it so that you can group together matched parts of your regular expression into one substring. Your pattern would then look like:
Pattern.compile("C=(\\d+\\.\\d+)")
For the parsing 3567 or 3.567, your pattern would be C=(\\d+(\\.\\d+)?) with group 1 representing the whole number. Also, do note that since you specifically want to match a period, you want to escape your . (period) character so that it's not interpreted as the "any-character" token. For this input, though, it doesn't matter
Then, to get your 3.567, you would you would call m.group(1) to grab the first (counting from 1) specified group. This would mean that your Double.parseDouble call would essentially become Double.parseDouble("3.567")
As for taking C= out of your pattern, since I'm not that well-versed with RegExp, I might recommend that you split your input string on the semi-colons and then check to see if each of the splits contain the C; then you could apply the pattern (with the capturing groups) to get the 3.567 from your Matcher.
Edit For the more general (and likely more useful!) cases in gawi's comment, please use the following (from http://www.regular-expressions.info/floatingpoint.html)
Pattern.compile("[-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?")
This has support for optional sign, either optional integer or optional decimal parts, and optional positive/negative exponents. Insert capturing groups where desired to pick out parts individually. The exponent as a whole is in its own group to make it, as a whole, optional.

Your regular expression is only matching numeric characters. To also match the decimal point too you will need:
Pattern.compile("\\d+\\.\\d+")
The . is escaped because this would match any character when unescaped.
Note: this will then only match numbers with a decimal point which is what you have in your example.

To match any sequence of digits and dots you can change the regular expression to this:
"(?<=C=)[.\\d]+"
If you want to be certain that there is only a single dot you might want to try something like this:
"(?<=C=)\\d+(?:\\.\\d+)?"
You should also be aware that this pattern can match the 1.2 in ABC=1.2.3;. You should consider if you need to improve the regular expression to correctly handle this situation.

if you need to validate decimal with dots, commas, positives and negatives:
Object testObject = "-1.5";
boolean isDecimal = Pattern.matches("^[\\+\\-]{0,1}[0-9]+[\\.\\,][0-9]+$", (CharSequence) testObject);
Good luck.

if you want a regex for an input which might be double or just integer without any *.0 thing you can use this:Pattern.compile("(-?\d+\.?\d*)")

Related

Which is the right regular expression to use for Numbers and Strings?

I am trying to create simple IDE and coloring my JTextPane based on
Strings (" ")
Comments (// and /* */)
Keywords (public, int ...)
Numbers (integers like 69 and floats like 1.5)
The way i color my source code is by overwritting the insertString and removeString methods inside the StyledDocument.
After much testing, i have completed comments and keywords.
Q1: As for my Strings coloring, I color my strings based on this regular expression:
Pattern strings = Pattern.compile("\"[^\"]*\"");
Matcher matcherS = strings.matcher(text);
while (matcherS.find()) {
setCharacterAttributes(matcherS.start(), matcherS.end() - matcherS.start(), red, false);
}
This works 99% of the time except for when my string contains a specific kind of string where there is a "\ inside the code. This messes up my whole color coding.
Can anyone correct my regular expression to fix my error?
Q2: As for Integers and Decimal coloring, numbers are detected based on this regular expression:
Pattern numbers = Pattern.compile("\\d+");
Matcher matcherN = numbers.matcher(text);
while (matcherN.find()) {
setCharacterAttributes(matcherN.start(), matcherN.end() - matcherN.start(), magenta, false);
}
By using the regular expression "\d+", I am only handling integers and not floats. Also, integers that are part of another string are matched which is not what i want inside an IDE. Which is the correct expression to use for integer color coding?
Below is a screenshot of the output:
Thank you for any help in advance!

For the strings, this is probably the fastest regex -
"\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\""
Formatted:
" [^"\\]*
(?: \\ . [^"\\]* )*
"
For integers and decimal numbers, the only foolproof expression I know of is
this -
"(?:\\d+(?:\\.\\d*)?|\\.\\d+)"
Formatted:
(?:
\d+
(?: \. \d* )?
| \. \d+
)
As a side note, If you're doing each independently from the start of
the string you could be possibly overlapping highlights.

Try with:
\\b\\d+(\\.\\d+)?\\b for int, float and double,
"(?<=[{(,=\\s+]+)".+?"(?=[,;)+ }]+)" for Strings,

For Integer go with
(?<!(\\^|\\d|\\.))[+-]?(\\d+(\\.\\d+)?)(?!(x|\\d|\\.))

Match a String ignoring the \" situations
".*?(?<!\\)"
The above will start a match once it sees a " and it will continue matching on anything until it gets to the next " which is not preceded by a \. This is achieved using the lookbehind feature explained very well at http://www.regular-expressions.info/lookaround.html
Match all numbers with & without decimal points
(\d+)(\.\d+)? will give you at least one digit followed by a point and any number of other digits greater than 1.
The question of matching numbers inside strings can be achieved in 2 ways :
a Modifying the above so that they have to exist with whitespace on either side \W(\d+)(\.\d+)?\W, which I don't think will be satisfactory in mathematical situations (ie 10+10) or at the end of an expression (ie 10;).
b Making this a matter of precedence. If the String colouring is checked after the numbers then that part of the string will be coloured pink at first but then immediately overwritten with red. String colouring takes precedence.

R1: I believe there is no regex-based answer to non-escaped " characters in the middle of an ongoing string. You'd need to actively process the text to eliminate or circumvent the false-positives for characters that are not meant to be matched, based on your specific syntax rules (which you didn't specify).
However:
If you mean to simply ignore escaped ones, \", like java does, then I believe you can simply include the escape+quote pair in the center as a group, and the greedy * will take care of the rest:
\"((\\\\\")|[^\"])*\"
R2: I believe the following regex would work for finding both integers and fractions:
\\d+(\.\\d+)?
You can expand it to find other kinds of numerals too. For example, \\d+([\./]\\d+)?, would additionally match numerals like "1/4".

Regular expression non-greedy but still

I have some larger text which in essence looks like this:
abc12..manycharshere...hi - abc23...manyothercharshere...jk
Obviously there are two items, each starting with "abc", the numbers (12 and 23) are interesting as well as the "hi" and "jk" at the end.
I would like to create a regular expression which allows me to parse out the numbers, but only if the two characters at the end match, i.e. I am looking for the number related to "jk", but the following regular expression matches the whole string and thus returns "12", not "23" even when non-greedy matching the area with the following:
abc([0-9]+).*?jk
Is there a way to construct a regular expression which matches text like the one above, i.e. retrieving "23" for items ending in "jk"?
Basically I would need something like "match abc followed by a number, but only if there is "jk" at the end before another instance of "abc followed by a number appears"
Note: the texts/matches are an abstraction here, the actual text is more complicated, espially the things that can appear as "manyothercharactershere", I simplified to show the underlying problem more clearly.

Use a regex like this. .*abc([0-9]+).*?jk
demo here

I think you want something like this,
abc([0-9]+)(?=(?:(?!jk|abc[0-9]).)*jk)
DEMO

You need to use negative lookahead here to make it work:
abc(?!.*?abc)([0-9]+).*?jk
RegEx Demo
Here (?!.*?abc) is negative lookahead that makes sure to match abc where it is NOT followed by another abc thus making sure closes string between abc and jk is matched.

Being non-greedy does not change the rule, that the first match is returned. So abc([0-9]+).*?jk will find the first jk after “abcnumber” rather than the last one, but still match the first “abcnumber”.
One way to solve this is to tell that the dot should not match abc([0-9]+):
abc([0-9]+)((?!abc([0-9]+)).)*jk
If it is not important to have the entire pattern being an exact match you can do it simpler:
.*(abc([0-9]+).*?jk)
In this case, it’s group 1 which contains your intended match. The pattern uses a greedy matchall to ensure that the last possible “abcnumber” is matched within the group.

Assuming that hyphen separates "items", this regex will capture the numbers from the target item:
abc([0-9]+)[^-]*?jk
See demo

Regular expression to identify all numerics, across all localization formats

I'm scanning a text with a Scanner object, let's say lineScanner. Here are the declarations:
String myText= "200,00/100,00/28/65.36/21/458,696/25.125/4.23/6.3/4,2/659845/4524/456,65/45/23.495.254,3";
Scanner lineScanner = new Scanner(myText);
With that Scanner, I would like to find the first BigDecimal, and after the second one, and so on. I declared a BIG_DECIMAL_PATTERN to match any case.
Here are the rules I defined:
Thousands separator is always followed by exactly 3 digits
There is always exactly 1 or 2 digits after the decimal point.
If the thousands separator is the comma symbol, so the decimal point is the dot symbol and conversely
Thousands separator is optional, as decimal part of the number
String nextBigDecimal = lineScanner.findInLine(BIG_DECIMAL_PATTERN);
Now, here is the BIG_DECIMAL_PATTERN I declared:
private final String BIG_DECIMAL_PATTERN=
"\\d+(\\054\\d{3}+)?(\\056\\d{1,2}+)?|\\d+(\\056\\d{3}+)?(\\054\\d{1,2}+)?)";
\\054 is the ASCII octal representation of ","
\\056 is the ASCII octal representation of "."
My problem is that it doesn't work well because when the pattern of the first part is found, the second part (after the |) is not checked and in my example
the first match will be 200 and not 200,00. So I can try this:
private final String BIG_DECIMAL_PATTERN=\\d+([.,]\\d{3}+)?([,.]\\d{1,2}+)?
But there is a new problem: comma and dot are not exclusive, I mean if one is the thousands separator, the decimal point should be the other one.
Thanks for helping.

I believe a variant of your 2nd RegEx will work for you. Consider this regex:
^\\d+(?:([.,])\\d{3})*(?:(?!\\1)[.,]\\d{1,2})?$
Live Demo: http://www.rubular.com/r/vHlEdBMhO9
Explanation: What it does is to first capture the comma or dot in capture group # 1. And then later makes sure same capture group # 1 doesn't appear at decimal point using negative lookahead. Which in other words ensures that if comma comes first then dot will come later and viceversa.

Could you do an either-or regular expression? E.g. something like:
private final String BIG_DECIMAL_PATTERN
= "\\d+((\\.\\d{3}+)?(,\\d{1,2}+)?|(,\\d{3}+)?(\\.\\d{1,2}+)?)"
Note - I haven't checked whether your regex actually works - and suspect this may not be the best way of achieving what you are trying to do. All I'm doing here to get you up and running is suggesting you could try using (regex1|regex2) where regex1 is dots followed by commas and regex2 is commas followed by dots.

Not finding substring in a group regular expression

I'm trying to find a substring that looks like "3/4" (any number 0-9 for the numerator and denominator). However, the text that I am trying to parse may contain only a "/4" if its 1/4. In those cases, I don't care about the "/" and only want the denominator. My current regular expression is "[0-9?\\/0-9]" but it returns '3/4' one by one, when I wanted it grouped instead.
Does anyone have a fix for this?
Thanks so much!

This regex:
[0-9?\/0-9]
is a character class that matches a single digit, question mark, or slash. You want this:
[0-9]?\/[0-9]
or this:
\d?\/\d

I think you want to use the pipe to create an or:
([0-9]\/[0-9])|\/([0-9])
The downside of this approach though would be that you'd need to validate if group 1 or 2 returned a match... so if you had a full fraction you'd get a result in the first group, if you matched a denominator, the denominator would show up in the second group.
Might make more sense to:
([0-9]*\/[0-9])
And then check the length of each match back, and strip out the '/' if you have to.

use a capture group via the ( ) operator
[0-9]?\/([0-9])

Try this instead:
"([0-9]*\\?[0-9]+)"

Pattern Matching - String Search

I am trying to work out a formula to match a following pattern:
input string example:
'444'/'443'/'434'/'433'/'344'/'334'/'333'
if any of the patterns above exist in a particular input string I want to match it as the same pattern.
also is it possible to do a variable substitution using regex? meaning check for the 3 chars of the string by using each character as a variable and just doing an increment/decrement for each character? so that you dont have to specify the particular number ranges (hardcoding the pattern string ) for different patterns?
Is there any good library one can use for this?? I was working with Pattern class in java.
If you have any link which would be helpful please pass it through :)
Thank you.

Let's first consider this pattern: [34]{3}
The […] is a character class, it matches exactly one of the characters in the set. The {n} is an exact finite repetition.
So, [34]{3} informally means "exactly 3 of either '3' or '4'". Thus, it matches "333", "334", "343", "344", "433", "434", "443", "444", and nothing else.
As a string literal, the pattern is "[34]{3}". If you don't want to hardcode this pattern, then just generate similar-looking strings that follows this template "[…]{n}". Just put the characters that you want to match in the …, and substitute n with the number you want.
Here's an example:
String alpha = "aeiou";
int n = 5;
String pattern = String.format("[%s]{%s}", alpha, n);
System.out.println(pattern);
// [aeiou]{5}
We've now seen that the pattern is not hardcoded, but rather programmatically generated depending on the values of the variables alpha and n. The pattern [aeiou]{5} will 5 consecutive lowercase vowels, e.g. "ooiae", "ioauu", "eeeee", etc.
It's again not clear if you just want to match these kinds of strings, or if they have to appear like '…'/'…'/'…'/'…'/'…'. If the latter is desired, then simply compose the pattern as desired, using repetition and grouping as necessary. You can also just programmatically copy and paste the pattern 5 times if that's simpler. Here's an example:
String p5 = String.format("'%s'/'%<s'/'%<s'/'%<s'/'%<s'", pattern);
System.out.println(p5);
// '[aeiou]{5}'/'[aeiou]{5}'/'[aeiou]{5}'/'[aeiou]{5}'/'[aeiou]{5}'
This will now match strings like "'aeooi'/'eeiuu'/'uaooo'/'eeeia'/'eieio'".
Caveat
Do be careful about what goes in alpha. Specifically, -, [. ], &&, ^, etc, are special metacharacters in Java character class definition. If you restrict alpha to contain only digits/letters, then you will probably not run into any problems, but e.g. [^a] does NOT mean "either '^' or 'a'". It in fact means "anything but 'a'. See java.util.regex.Pattern for exact character class syntax.

You can use the regex:
('\\d{3}'/){6}'\\d{3}'

Pattern.Compile takes a String as its parameter. Though that's probably most often supplied in the form of a string literal, if you have variable upper and lower bounds for your pattern, you can use something like StringBuilder to build your string, then pass that result to Pattern.Compile.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java: how to parse double from regex - java

if you need to validate decimal with dots, commas, positives and negatives: Object testObject = "-1.5"; boolean isDecimal = Pattern.matches("^[\\+\\-]{0,1}[0-9]+[\\.\\,][0-9]+$", (CharSequence) testObject); Good luck.

if you want a regex for an input which might be double or just integer without any .0 thing you can use this:Pattern.compile("(-?\d+\.?\d)")

Related

Which is the right regular expression to use for Numbers and Strings?

Regular expression non-greedy but still

Regular expression to identify all numerics, across all localization formats

Not finding substring in a group regular expression

Pattern Matching - String Search

Categories

Resources