Regex for android java to extract the numbers and specific symbols

Regex for android java to extract the numbers and specific symbols - java

I am working with a calculator , after writing up the whole statement in the EditText, I want to extract the numbers and calculator symbols from the edit text field. The numbers can be simple integers or decimal numbers, i.e; Float and Double. For example I have this string now, 2 + 2.7 - 10 x 20.000. The regex will extract the +,- and x separately and the numbers as; 2,2.7,10,20.000. Basically i need to regex for each.

The regex you're searching for will be different from the regex from the split. If you use a Pattern for this, things will get really slow and complicated. So here we're using String#split().
Here is your regex:
"\s*([^\d.]+)\s*"
Regex Explanation:
\s* - Matches every white space if available
([^\d.]+) - Capture everything but a digit and a dot together as many as possible
\s* - Matches every whitespace if available
Note[1]: Once again, as it's going to be used in the split, you don't want to capture the numbers, this regex will match anything but numbers and remove them from the string, so if you use them in a Pattern#exec() you'll have a different output.
So this way we can simply:
"2 + 2.7 - 10 x 20.000".split("\\s*([^\\d.]+)\\s*");
Note[2]: The regex inside a String in Java must have its backslashes escaped like above's. And also, the capture groups (...) are useless since we're just splitting it, feel free to remove them if you want.
JShell Output:
$1 ==> String[4] { "2", "2.7", "10", "20.000" }
Java snippet:
https://ideone.com/CHjKNB

Related

Expression to capture only 1 occurrence for a single character but multiple for others

I am trying to use the following regex to capture following values. This is for use in Java.
(\$|£|$|£)([ 0-9.]+)
Example values which I do want to be captured via above regex which works.
$100
$100.5
$100
$100.6
£200
£200.6
But the following as gets captured which is wrong. I only want to capture values when thereis only 1 dot in the text. Not multiples.
£200.15.
£200.6.6.6.6
Is there a way to select such that multiple periods doesn't count?
I can't do something like following cos that would affect the numbers too. Please advice.
(\$|£|$|£)([ 0-9.]{1})

You can use
(\$|£|$|£)(\d+(?:\.\d+)?)\b(?!\.)
See the regex demo.
In this regex, (\d+(?:\.\d+)?)\b(?!\.) matches
(\d+(?:\.\d+)?) - Group 1: one or more digits, then an optional occurrence of . and one or more digits
\b - a word boundary
(?!\.) - not immediately followed with a . char.
Another solution for Java (where the regex engine supports possessive quantifiers) will be
(\$|£|$|£)(\d++(?:\.\d+)?+)(?!\.)
See this regex demo. \d++ and (?:\.\d+)?+ contain ++ and ?+ possessive quantifiers that prevent backtracking into the quantified subpatterns.
In Java, do not forget to double the backslashes in the string literals:
String regex = "(\\$|£|$|£)(\\d++(?:\\.\\d+)?+)(?!\\.)";

You could try this
(\$|£|$|£)([0-9]+(?:\.[0-9]+)?)$
one or more digits followed by an optional dot and some digits and then the end of the string.
EDIT: some typos fixed
And it's not ok to delete the whole sentence obove, due to one word against my self. :(

Complicated regex and possible simple way to do it [duplicate]

I don't write many regular expressions so I'm going to need some help on the one.
I need a regular expression that can validate that a string is an alphanumeric comma delimited string.
Examples:
123, 4A67, GGG, 767 would be valid.
12333, 78787&*, GH778 would be invalid
fghkjhfdg8797< would be invalid
This is what I have so far, but isn't quite right: ^(?=.*[a-zA-Z0-9][,]).*$
Any suggestions?

Sounds like you need an expression like this:
^[0-9a-zA-Z]+(,[0-9a-zA-Z]+)*$
Posix allows for the more self-descriptive version:
^[[:alnum:]]+(,[[:alnum:]]+)*$
^[[:alnum:]]+([[:space:]]*,[[:space:]]*[[:alnum:]]+)*$ // allow whitespace
If you're willing to admit underscores, too, search for entire words (\w+):
^\w+(,\w+)*$
^\w+(\s*,\s*\w+)*$ // allow whitespaces around the comma

Try this pattern: ^([a-zA-Z0-9]+,?\s*)+$
I tested it with your cases, as well as just a single number "123". I don't know if you will always have a comma or not.
The [a-zA-Z0-9]+ means match 1 or more of these symbols
The ,? means match 0 or 1 commas (basically, the comma is optional)
The \s* handles 1 or more spaces after the comma
and finally the outer + says match 1 or more of the pattern.
This will also match
123 123 abc (no commas) which might be a problem
This will also match 123, (ends with a comma) which might be a problem.

Try the following expression:
/^([a-z0-9\s]+,)*([a-z0-9\s]+){1}$/i
This will work for:
test
test, test
test123,Test 123,test
I would strongly suggest trimming the whitespaces at the beginning and end of each item in the comma-separated list.

You seem to be lacking repetition. How about:
^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$
I'm not sure how you'd express that in VB.Net, but in Python:
>>> import re
>>> x [ "123, $a67, GGG, 767", "12333, 78787&*, GH778" ]
>>> r = '^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$'
>>> for s in x:
... print re.match( r, s )
...
<_sre.SRE_Match object at 0xb75c8218>
None
>>>>
You can use shortcuts instead of listing the [a-zA-Z0-9 ] part, but this is probably easier to understand.
Analyzing the highlights:
[a-zA-Z0-9 ]+ : capture one or more (but not zero) of the listed ranges, and space.
(?:[...]+,)* : In non-capturing parenthesis, match one or more of the characters, plus a comma at the end. Match such sequences zero or more times. Capturing zero times allows for no comma.
[...]+ : capture at least one of these. This does not include a comma. This is to ensure that it does not accept a trailing comma. If a trailing comma is acceptable, then the expression is easier: ^[a-zA-Z0-9 ,]+

Yes, when you want to catch comma separated things where a comma at the end is not legal, and the things match to $LONGSTUFF, you have to repeat $LONGSTUFF:
$LONGSTUFF(,$LONGSTUFF)*
If $LONGSTUFF is really long and contains comma repeated items itself etc., it might be a good idea to not build the regexp by hand and instead rely on a computer for doing that for you, even if it's just through string concatenation. For example, I just wanted to build a regular expression to validate the CPUID parameter of a XEN configuration file, of the ['1:a=b,c=d','2:e=f,g=h'] type. I... believe this mostly fits the bill: (whitespace notwithstanding!)
xend_fudge_item_re = r"""
e[a-d]x= #register of the call return value to fudge
(
0x[0-9A-F]+ | #either hardcode the reply
[10xks]{32} #or edit the bitfield directly
)
"""
xend_string_item_re = r"""
(0x)?[0-9A-F]+: #leafnum (the contents of EAX before the call)
%s #one fudge
(,%s)* #repeated multiple times
""" % (xend_fudge_item_re, xend_fudge_item_re)
xend_syntax = re.compile(r"""
\[ #a list of
'%s' #string elements
(,'%s')* #repeated multiple times
\]
$ #and nothing else
""" % (xend_string_item_re, xend_string_item_re), re.VERBOSE | re.MULTILINE)

Try ^(?!,)((, *)?([a-zA-Z0-9])\b)*$
Step by step description:
Don't match a beginning comma (good for the upcoming "loop").
Match optional comma and spaces.
Match characters you like.
The match of a word boundary make sure that a comma is necessary if more arguments are stacked in string.

Please use - ^((([a-zA-Z0-9\s]){1,45},)+([a-zA-Z0-9\s]){1,45})$
Here, I have set max word size to 45, as longest word in english is 45 characters, can be changed as per requirement

Regex for matching different float formats

I'm looking for a regex in scala to match several floats:
9,487,346 -> should match
9.487.356,453->should match
38,4 -> match
-38,4 -> should match
-38.5
-9,487,346.76
-38 -> should match
So basically it should match a number that:
Numbered lists are easy
possibly gave thousand separators (either comma or dot)
possibly are decimal again with either comma or dot as separator
Currently I'm stuck with
val pattern="\\d+((\\.\\d{3}+)?(,\\d{1,2}+)?|(,\\d{3}+)?(\\.\\d{1,2}+)?)"
Edit: I'm mostly concered with European Notation.
Example where the current pattern not matches: 1,052,161
I guess it would be close enough to match that the String only contains numbers,sign, comma and dot

If, as your edit suggests, you are willing to accept a string that simply "contains numbers, sign, comma and dot" then the task is trivial.
[+-]?\d[\d.,]*
update
After thinking it over, and considering some options, I realize that your original request is possible if you'll allow for 2 different RE patterns, one for US-style numbers (commas before dot) and one for Euro-style numbers (dots before comma).
def isValidNum(num: String): Boolean =
num.matches("[+-]?\\d{1,3}(,\\d{3})*(\\.\\d+)?") ||
num.matches("[+-]?\\d{1,3}(\\.\\d{3})*(,\\d+)?")
Note that the thousand separators are not optional, so a number like "1234" is not evaluated as valid. That can be changed by adding more RE patterns: || num.matches("[+-]?\\d+")

Based on your rules,
It should match a number that:
Numbered lists are easy
possibly gave thousand separators (either comma or dot)
possibly are decimal again with either comma or dot as separator
Regex:
^[+-]?\d{1,3}(?:[,.]\d{3})*(?:[,.]\d+)?$
[+-]? Allows + or - or nothing at the start
\d{1,3} allows one to 3 digits
([,.]\d{3}) allows . or , as thousands separator followed by 3 digits (* allows unlimited such matches)
(?:[,.]\d+)? allows . or , as decimal separator followed by at least one digit.
This matches all of the OP's example cases. Take a look at the demo below for more:
Regex101 Demo
However one limitation is it allows . or , as thousand separator and as decimal separator and doesn't validate that if , is thousands separator then . should be decimal separator. As a result the below cases incorrectly show up as matches:
201,350,780,88
211.950.266.4
To fix this as well, the previous regex can have 2 alternatives - one to check for a notation that has , as thousands separator and . as decimal, and another one to check vice-versa. Regex:
^[+-]?\d{1,3}(?:(?:(?:\.\d{3})*(?:\,\d+)?)|(?:(?:\,\d{3})*(?:\.\d+)?))$
Regex101 Demo
Hope this helps!

Java Regular expression for replacing specific strings

I want to replace numbers in a string if it is more than 3 digits (Phone numbers should be replaced) and it should not replace the number if it is followed by $ and if the number has decimal points. I used the below expression.
"\d{3,}+(?!\$/\.)"
Issues I face are , it is replacing numbers that are more than ten digits as i want to replace some numbers which are some ID's with more than 10 digits. Also if a number has more than 3 digits after the decimal , those numbers are also getting replaced. I dont want a number to be replaced if it has decimal points. can some body help?
For Eg, say a number string "3452678916381914". Actually it has to be replaced. But the above regex not replacing that. For numbers like $1234,45.567 - those numbers shouldn't be replaced. But above regex replacing 45.567

use lookahead and lookbehind regex, 1st assert start word boundary is not precede by a $ or ., then assert end word boundary is not follow by a $ or .
It works for both example you provided, you might need to tweak a little bit to handle some corner case
(?<![\$\.])\b\d{3,}\b(?![\$\.])
see demo, it match the first 2 but not the rest
3452678916381914 # match
1234 56789 # match
$1234,45.567
$1234
12.345
12345.6678
123$

Which is the right regular expression to use for Numbers and Strings?

I am trying to create simple IDE and coloring my JTextPane based on
Strings (" ")
Comments (// and /* */)
Keywords (public, int ...)
Numbers (integers like 69 and floats like 1.5)
The way i color my source code is by overwritting the insertString and removeString methods inside the StyledDocument.
After much testing, i have completed comments and keywords.
Q1: As for my Strings coloring, I color my strings based on this regular expression:
Pattern strings = Pattern.compile("\"[^\"]*\"");
Matcher matcherS = strings.matcher(text);
while (matcherS.find()) {
setCharacterAttributes(matcherS.start(), matcherS.end() - matcherS.start(), red, false);
}
This works 99% of the time except for when my string contains a specific kind of string where there is a "\ inside the code. This messes up my whole color coding.
Can anyone correct my regular expression to fix my error?
Q2: As for Integers and Decimal coloring, numbers are detected based on this regular expression:
Pattern numbers = Pattern.compile("\\d+");
Matcher matcherN = numbers.matcher(text);
while (matcherN.find()) {
setCharacterAttributes(matcherN.start(), matcherN.end() - matcherN.start(), magenta, false);
}
By using the regular expression "\d+", I am only handling integers and not floats. Also, integers that are part of another string are matched which is not what i want inside an IDE. Which is the correct expression to use for integer color coding?
Below is a screenshot of the output:
Thank you for any help in advance!

For the strings, this is probably the fastest regex -
"\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\""
Formatted:
" [^"\\]*
(?: \\ . [^"\\]* )*
"
For integers and decimal numbers, the only foolproof expression I know of is
this -
"(?:\\d+(?:\\.\\d*)?|\\.\\d+)"
Formatted:
(?:
\d+
(?: \. \d* )?
| \. \d+
)
As a side note, If you're doing each independently from the start of
the string you could be possibly overlapping highlights.

Try with:
\\b\\d+(\\.\\d+)?\\b for int, float and double,
"(?<=[{(,=\\s+]+)".+?"(?=[,;)+ }]+)" for Strings,

For Integer go with
(?<!(\\^|\\d|\\.))[+-]?(\\d+(\\.\\d+)?)(?!(x|\\d|\\.))

Match a String ignoring the \" situations
".*?(?<!\\)"
The above will start a match once it sees a " and it will continue matching on anything until it gets to the next " which is not preceded by a \. This is achieved using the lookbehind feature explained very well at http://www.regular-expressions.info/lookaround.html
Match all numbers with & without decimal points
(\d+)(\.\d+)? will give you at least one digit followed by a point and any number of other digits greater than 1.
The question of matching numbers inside strings can be achieved in 2 ways :
a Modifying the above so that they have to exist with whitespace on either side \W(\d+)(\.\d+)?\W, which I don't think will be satisfactory in mathematical situations (ie 10+10) or at the end of an expression (ie 10;).
b Making this a matter of precedence. If the String colouring is checked after the numbers then that part of the string will be coloured pink at first but then immediately overwritten with red. String colouring takes precedence.

R1: I believe there is no regex-based answer to non-escaped " characters in the middle of an ongoing string. You'd need to actively process the text to eliminate or circumvent the false-positives for characters that are not meant to be matched, based on your specific syntax rules (which you didn't specify).
However:
If you mean to simply ignore escaped ones, \", like java does, then I believe you can simply include the escape+quote pair in the center as a group, and the greedy * will take care of the rest:
\"((\\\\\")|[^\"])*\"
R2: I believe the following regex would work for finding both integers and fractions:
\\d+(\.\\d+)?
You can expand it to find other kinds of numerals too. For example, \\d+([\./]\\d+)?, would additionally match numerals like "1/4".

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex for android java to extract the numbers and specific symbols - java

Related

Expression to capture only 1 occurrence for a single character but multiple for others

Complicated regex and possible simple way to do it [duplicate]

Regex for matching different float formats

Java Regular expression for replacing specific strings

Which is the right regular expression to use for Numbers and Strings?

Categories

Resources