Extract variable from a simple mathematical equation (Java, String) - java

I will be handling a bunch of strings that will be of the following format:
"2*salary"
"salary+2"
"2*salary/3"
My goal is to pull out just "salary". I do not however want to eliminate non-characters because I might have something like "2*id3", a mixture of characters and numbers as the variable name (note: it will never be all numbers). I currently use:
Pattern pattern = Pattern.compile("[\\w_]+");
However, for something like "2*salary" this results in "2" and "salary" being found.

You're probably looking for this:
Pattern.compile("[a-zA-Z]\\w+");
... in other words, match the sequence of characters that begins with a letter. That'll match 'salary', but won't match '2' (and '2salary' too).
If you in fact do need to match 2salary, use this:
Pattern.compile("[0-9]*[A-Za-z]\\w+");
(I have replaced [\w_] with just \w, it actually includes underscore).

That is because 2*salary matches twice your "word" character definition \w which is [a-zA-Z0-9_], the first is 2 and the and match is salary
In your case you need something like "[a-zA-Z][\w]*"

Related

Regex: Match any character a, specific character b and then a again

I am trying to implement an algorithm on java and I need a way to match a pattern where I find any character (lets name it a) then the character 'X' and then the same character a from before. Initial thought was regex, although after some time failing to find a way to do that I am thinking of iterating through all characters and checking them one by one...
But before that if anyone could help, I need something so that ( "AXA", "EXE", "RXR", etc) would match while ("AXB", "EXA", "TXX", etc) would not.
Tried using something like ".X." but of course failed as it matched anything before and after 'X'...
Is there a way to match something like that ?
Capture the leading char, and use a back reference:
(.)X\1
See live demo.
Note that in java you need to use 2 slashes to make a literal slash:
"AXA".matches("(.)X\\1") // true

Could I give Java a regular expression when java should not split an string?

Can I give the String.split method a parameter which tells it when it must not split the given string? In my particular case, I have text documents with lots of text and symbols. But in every file there are many different symbols. This is what I want to achieve:
string.split(not(A-Z,ß,ä,ö,ü));
So basically, I want String.split to only split whenever it finds a character that is not part of the German set of characters.
I hope you can help me.
There are three tokens in regular expressions that allow you to do exactly what you want to achieve:
[] creates a character class which contains all characters that are listed inside. In your particular case, you'd want this to be [a-zßäöü] as this character group contains all characters a through z, ß, ä, ö and ü.
^ negates the contents of a character class. So, using the character class from above, you'd use [^a-zßäöü] if you wanted to match any character that is not part of the character group.
Additionally, adding (?i) in front of your regular expression causes it to be case insensitive, which allows your expression to match the uppercase letters as well without having to actually add them to your expression.
So, adding those three tokens together, you get the regular expression (?i)[^a-zßäöü]. Now the only thing left is to put them into your String.split method and you're done:
string.split("(?i)[^a-zßäöü]");
Mr.Human,
If I'm understanding your question correctly, you want to split a string on non-German characters?
So,
abcdöyüp
becomes
a, b, c, dö, yü, p
If that is the case, then unfortunately you need to specify the set of characters that are non-German, e.g. [A-Z] to split on. If you are trying to accomplish something other than this, please clarify and/or provide an example.

Regular expression non-greedy but still

I have some larger text which in essence looks like this:
abc12..manycharshere...hi - abc23...manyothercharshere...jk
Obviously there are two items, each starting with "abc", the numbers (12 and 23) are interesting as well as the "hi" and "jk" at the end.
I would like to create a regular expression which allows me to parse out the numbers, but only if the two characters at the end match, i.e. I am looking for the number related to "jk", but the following regular expression matches the whole string and thus returns "12", not "23" even when non-greedy matching the area with the following:
abc([0-9]+).*?jk
Is there a way to construct a regular expression which matches text like the one above, i.e. retrieving "23" for items ending in "jk"?
Basically I would need something like "match abc followed by a number, but only if there is "jk" at the end before another instance of "abc followed by a number appears"
Note: the texts/matches are an abstraction here, the actual text is more complicated, espially the things that can appear as "manyothercharactershere", I simplified to show the underlying problem more clearly.
Use a regex like this. .*abc([0-9]+).*?jk
demo here
I think you want something like this,
abc([0-9]+)(?=(?:(?!jk|abc[0-9]).)*jk)
DEMO
You need to use negative lookahead here to make it work:
abc(?!.*?abc)([0-9]+).*?jk
RegEx Demo
Here (?!.*?abc) is negative lookahead that makes sure to match abc where it is NOT followed by another abc thus making sure closes string between abc and jk is matched.
Being non-greedy does not change the rule, that the first match is returned. So abc([0-9]+).*?jk will find the first jk after “abcnumber” rather than the last one, but still match the first “abcnumber”.
One way to solve this is to tell that the dot should not match abc([0-9]+):
abc([0-9]+)((?!abc([0-9]+)).)*jk
If it is not important to have the entire pattern being an exact match you can do it simpler:
.*(abc([0-9]+).*?jk)
In this case, it’s group 1 which contains your intended match. The pattern uses a greedy matchall to ensure that the last possible “abcnumber” is matched within the group.
Assuming that hyphen separates "items", this regex will capture the numbers from the target item:
abc([0-9]+)[^-]*?jk
See demo

Split on non arabic characters

I have a String like this
أصبح::ينال::أخذ::حصل (على)::أحضر
And I want to split it on non Arabic characters using java
And here's my code
String s = "أصبح::ينال::أخذ::حصل (على)::أحضر";
String[] arr = s.split("^\\p{InArabic}+");
System.out.println(Arrays.toString(arr));
And the output was
[, ::ينال::أخذ::حصل (على)::أحضر]
But I expect the output to be
[ينال,أخذ,حصل,على,أحضر]
So I don't know what's wrong with this?
You need a negated class, and to do that, you need square brackets [ ... ]. Try to split with this:
"[^\\p{InArabic}]+"
If \\p{InArabic} matches any arabic character, then [^\\p{InArabic}] will match any non-arabic character.
Another option you can consider is an equivalent syntax, using P instead of p to indicate the opposite of the \\p{InArabic} character class like #Pshemo mentioned:
"\\P{InArabic}+"
This works just like \\W is the opposite of \\w.
The only possible advantage you get with the first syntax over the second (again like #Pshemo mentioned), is that if you want to add other characters to the list of characters which shouldn't match, for example, if you want to match all non \\p{InArabic} except periods, the first one is more flexible:
"[^\\p{InArabic}.]+"
^
Otherwise, if you really want to use \\P{InArabic}, you'll need subtraction within classes:
"[\\P{InArabic}&&[^.]]+"
The expression you want is "\\P{InArabic}+"
This means match any (non-zero) number of characters that are not Arabic.

if then condition using regex in java

I have a pattern which goes like this
String1 :"String2",
i have to validate this pattern. here if u see there are two cases, the somestring1 can contain special characters if it is given within double quotes.
eg: "xxxx-xxx" :"yyyyyyyy",--------> is valid
but xxxx-xxx :"yyyyyyyy",--------> is not valid
"xxxx-xxx :"yyyyyyyy",--------> is not valid
So i need to create a regex which will check whether the double quotes is closed properly if it is present in String1.
Short answer: Regex doesn't work like that.
What you can do however, is to use two separate patterns to validate:
\"[^\"]+?\" :.*
To check the one that can contain special characters, and:
[a-zA-Z]+? :.*
To check the one that can't
EDIT:
Thinking some more about it, you could combine the two patterns above like so:
^(\"[^\"]+?\"|[a-zA-Z]+?) :.*$
Which will match something :"something" and "some-thing" :"something" but not "some-thing : "something" or some-thing : "something". Assuming that the string only contains the given text.
If I understand your question right, this simple regex should work
\"string1\" :\"string2\"
Maybe something like this?
(?<normalString>^[a-zA-Z]+$)|(?<specialString>^".*?"$)
This will capture only a-z characters and put them in the "normalString" group, or if there's an string within quotation marks, capture that and put it in the "specialString" group.

Categories

Resources