I'm trying to validate a simple arithmetic expression to insure it fits within the format operand operator operand : 234.34 + 5. I figured out how to validate this easy enough, but I want the users to be able to continue inputting more than 2 values such as: 234.34 + 5 / 6 * 7 - -34. So far my regex is as follows:
[-]*\d+[.\d+[E\d+]*]*[\s+[*+/-]\s+[-]*\d+[.\d+[E\d+]*]*]*
This partially works, but the problem I have is it allows for some strange things I don't want such as -4.34.1 - 34 +
Any suggestions?
Try this. It's ugly as hell but it should work (if you aren't using any parentheses):
-?\d+(?:\.\d+(?:E\d+)?)?(\s*[-+/\*]\s+-?\d+(?:\.\d+(?:E\d+)?)?)+
Explanation
This will math a number followed by an operator and a number indefinitely
-?\d+(?:\.\d+(?:E\d+)?)? Match a number
(
\s* optional whitespace
[-+/\*] any operator: +, -, *, /
\s+ at least one whitespace (to avoid a --b)
-?\d+(?:\.\d+(?:E\d+)?)? match another number
)+ repeat this block one or more times
And the number expression:
-? optional -
\d+ digits (one or more)
(?: start of optional part
\. dot
\d+ digits
(?: start of optional scientific notation part
E match E char
\d+ match digitx
)? close of the optional scientific notatotion part
)? close optional group
But i strongly suggest trying to write a proper parser for this, it will also allow supporting of parentheses: a + (b + c).
I hate to be "that guy" but why not just write a simple validator that parses the string without using regular expressions? What's the reasoning behind using regular expressions for this? If you were to write your own parser, not only will the solution be easier to understand and maintain but with a little bit more work you would be able to evaluate the expression as well.
It may be best to just write a parser. I know, that sounds scary, but this is actually a second-year homework exercise at college.
See Dijkstra's Shunting-yard algorithm. This will allow you to both verify and evaluate the expression, so if that is where you're going with this project, you're going to have to implement it anyways...
i released an expression evaluator based on Dijkstra's Shunting Yard algorithm, under the terms of the Apache License 2.0:
http://projects.congrace.de/exp4j/index.html
Why not use string.split to get each operand and value by itself. Then you can parse it using much simpler regex ([\d*.\d*|\d|+|-|*|/]) or just Integer.getInterger for your values.
Related
This regex: \p{L}+ matches these characters "ASKJKSDJKDSJÄÖÅüé" of the example string "ASKJKSDJK_-.;,DSJÄÖÅ!”#€%&/()=?`¨’<>üé" which is great but is the exact opposite of what I want. Which leads me to negating regexes.
Goal:
I want to match any and all characters that are not a letter nor a number in multiple languages.
Could a negative regex be a natural direction for this?
I should mention one intended use for the regex I'd like to find is to validate passwords for the rule:
that it needs to contain at least one special character, which I
define as not being a number nor a letter.
It would seem defining ranges of special characters should be avoided if possible, because why limit the possibilities? Thus my definition. I assume there could be some problems with such a wide definition, but it is a first step.
If you have some suggestions for a better solution I'm giving below or just have some thoughts on the subject, I'm sure I'm not the only one that would like to learn about it. Thanks.
Note I'm using double \\ in the Java code. Platform is Java 11.
You can shove those \\p things in []. And thus, use the fact that you can negate chargroups. This is all you need:
Pattern p = Pattern.compile("[^\\p{L}]");
Matcher m = p.matcher("ASKJKSDJK_-.;,DSJÄÖÅ!”#€%&/()=?`¨’<>üé");
while (m.find()) System.out.print(m.group(0));
That prints:
_-.;,!”#€%&/()=?`¨’<>
Which is exactly what you're looking for, no?
No need to mess with lookaheads here.
So after having read similar, though not identical questions and some equally great answers, I came up with this solution: (?=\P{L})(?=\P{N}) meaning match both not letters and not numbers. Even if I'm asserting numbers separately I need to negate both to meet the specification of special characters (See question).
This is making use of a non-consuming regular expression with the parentheses and the?=, first matching the expression in the first parenthesis and after that continue to match the whole in the second. Thanks to #Jason Cohen for this detail in the Regular Expressions: Is there an AND operator? discussion.
The upper case P in \P{L} and \P{N} expresses the "not belonging to a category" in Unicode Categories, where the uppercase P means "not", i e the opposite of a lowercase p.
It's not perfect for a real world solution, but works as a starting point at least. Note I'm using double \\ in the Java code. Platform is Java 11.
could someone help me create a Java regex for the following polynomial expression?
Expression: 9x^2 + 4x + 2
Ive so far not been able to map the constant and im not sure if this is the best way but this is whats from my best understanding:
(-?\b\d+) [xx] || ^ (-?\d+\b)
Any help is much appreciated thanks :)
Here is a general regex pattern, which should work for a polynomial of any order:
^\d{0,}(?:[a-z](?:\^\d+)?)?(?: [+-] \d{0,}(?:[a-z](?:\^\d+)?)?)*$
Demo
I will explain the first portion of the regex, which matches any single polynomial term. The second repeating portion of the regex just recycles this pattern, with a +/- separator in between.
\d{0,} match zero or more numbers (coefficients)
(?: turn off capturing
[a-z] match a single variable letter
(?:\^\d+)? then match an optional exponent term
)? close group
Note that we handle the optional exponent by grouping it with the variable. Then, we make that entire group optional.
Im looking to match a string to the format of an expression in formal logic, where two alphabetic characters are operated on by v|^|>|=, where the characters can be preceded by ~|!|?, and where the characters maybe be surrounded by brackets and preceded again by ~|!|?. At first I thought that the following expression might do it:
s.matches("^[!?~]*[(]*[!?~]*[a-z]{1}\\s[v>=^]{1}\\s[!?~]*[a-z]{1}[)]*$")
However, I have realised that these expressions can be stacked onto one another, and I dont know how to account for that in the regex.
Examples of acceptable matches:
~p v q
~?(p ^ ~r)
!p
p v ~(!r ^ t)
~!(p = (~!q ^ t))
It is possible to add as many operators as you want, to create an enormously long expression. How do I account for this with the regex in a general format?
Thanks heaps :)
You can't fully describe that language with a plain regular expression. The problem is that any letter can be replaced by an expression. You need recursive regular expressions, and these aren't supported by Java's java.util.regex package.
This is a feature that, as far as I know, started in Perl 4 or so and has appeared in a few package that advertise "Perl-Compatible Regular Expressions" (PCRE). It's not part of standard Java, Python, Ruby, C++, and I don't believe the .NET libraries for C#, VB.Net, C++/CLI etc. have it either.
I am trying to create simple IDE and coloring my JTextPane based on
Strings (" ")
Comments (// and /* */)
Keywords (public, int ...)
Numbers (integers like 69 and floats like 1.5)
The way i color my source code is by overwritting the insertString and removeString methods inside the StyledDocument.
After much testing, i have completed comments and keywords.
Q1: As for my Strings coloring, I color my strings based on this regular expression:
Pattern strings = Pattern.compile("\"[^\"]*\"");
Matcher matcherS = strings.matcher(text);
while (matcherS.find()) {
setCharacterAttributes(matcherS.start(), matcherS.end() - matcherS.start(), red, false);
}
This works 99% of the time except for when my string contains a specific kind of string where there is a "\ inside the code. This messes up my whole color coding.
Can anyone correct my regular expression to fix my error?
Q2: As for Integers and Decimal coloring, numbers are detected based on this regular expression:
Pattern numbers = Pattern.compile("\\d+");
Matcher matcherN = numbers.matcher(text);
while (matcherN.find()) {
setCharacterAttributes(matcherN.start(), matcherN.end() - matcherN.start(), magenta, false);
}
By using the regular expression "\d+", I am only handling integers and not floats. Also, integers that are part of another string are matched which is not what i want inside an IDE. Which is the correct expression to use for integer color coding?
Below is a screenshot of the output:
Thank you for any help in advance!
For the strings, this is probably the fastest regex -
"\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\""
Formatted:
" [^"\\]*
(?: \\ . [^"\\]* )*
"
For integers and decimal numbers, the only foolproof expression I know of is
this -
"(?:\\d+(?:\\.\\d*)?|\\.\\d+)"
Formatted:
(?:
\d+
(?: \. \d* )?
| \. \d+
)
As a side note, If you're doing each independently from the start of
the string you could be possibly overlapping highlights.
Try with:
\\b\\d+(\\.\\d+)?\\b for int, float and double,
"(?<=[{(,=\\s+]+)".+?"(?=[,;)+ }]+)" for Strings,
For Integer go with
(?<!(\\^|\\d|\\.))[+-]?(\\d+(\\.\\d+)?)(?!(x|\\d|\\.))
Match a String ignoring the \" situations
".*?(?<!\\)"
The above will start a match once it sees a " and it will continue matching on anything until it gets to the next " which is not preceded by a \. This is achieved using the lookbehind feature explained very well at http://www.regular-expressions.info/lookaround.html
Match all numbers with & without decimal points
(\d+)(\.\d+)? will give you at least one digit followed by a point and any number of other digits greater than 1.
The question of matching numbers inside strings can be achieved in 2 ways :
a Modifying the above so that they have to exist with whitespace on either side \W(\d+)(\.\d+)?\W, which I don't think will be satisfactory in mathematical situations (ie 10+10) or at the end of an expression (ie 10;).
b Making this a matter of precedence. If the String colouring is checked after the numbers then that part of the string will be coloured pink at first but then immediately overwritten with red. String colouring takes precedence.
R1: I believe there is no regex-based answer to non-escaped " characters in the middle of an ongoing string. You'd need to actively process the text to eliminate or circumvent the false-positives for characters that are not meant to be matched, based on your specific syntax rules (which you didn't specify).
However:
If you mean to simply ignore escaped ones, \", like java does, then I believe you can simply include the escape+quote pair in the center as a group, and the greedy * will take care of the rest:
\"((\\\\\")|[^\"])*\"
R2: I believe the following regex would work for finding both integers and fractions:
\\d+(\.\\d+)?
You can expand it to find other kinds of numerals too. For example, \\d+([\./]\\d+)?, would additionally match numerals like "1/4".
What's a good regular expression for real numbers in Java?
I created the regex ([-]?[\\d]*\\.[\\d]*) and tested it against the following expected outputs:
in out works?
--------------------------------------
qwerty34.34abcd 34.34 yes
qwe-34.34.34abcd -34.34 no
+17.-9abc 17.0 yes
-.abc0 0.0 no
I have to get out exactly what the column(out).
Please provide a regex that passes all of these tests.
Try the following:
((\+|-)?([0-9]+)(\.[0-9]+)?)|((\+|-)?\.?[0-9]+)
This will match real numbers, including integers, with or without a sign (and with or without a number before the decimal point).
The regular expression must satisfy these requirements:
There may be an optional sign. Try [+-]?.
There may be 0 or more optional digits. Try [0-9]*.
There may be a decimal point. Try [.] to avoid backslashitis.
There may be 0 or more digits after the decimal point. Try [0-9]*.
At least one of #2 and #4 has a digit. This would use [0-9]+, and it can use alternation.
Does this pattern work: [+-]?([0-9]*[.]?[0-9]+|[0-9]+[.]?[0-9]*)? I'm worried about catastrophic backtracking, however.
^[+-]?(?:\d+\.?\d*|\d*\.\d+)$
? is for making + or - optionals
?: is for non capturing group
\ is an escape char
How about the next one:
^-?(([1-9][0-9]*)|(0))(?:\.[0-9]+)?$
or the next one if you want to allow the "+" sign:
^(-|\+)?(([1-9][0-9]*)|(0))(?:\.[0-9]+)?$
BTW, here's a nice website for testing and sharing regexp.
Or this one: [0-9]+\.[0-9]*([eE][\+\-]?[0-9]+)?