I have a pattern which goes like this
String1 :"String2",
i have to validate this pattern. here if u see there are two cases, the somestring1 can contain special characters if it is given within double quotes.
eg: "xxxx-xxx" :"yyyyyyyy",--------> is valid
but xxxx-xxx :"yyyyyyyy",--------> is not valid
"xxxx-xxx :"yyyyyyyy",--------> is not valid
So i need to create a regex which will check whether the double quotes is closed properly if it is present in String1.
Short answer: Regex doesn't work like that.
What you can do however, is to use two separate patterns to validate:
\"[^\"]+?\" :.*
To check the one that can contain special characters, and:
[a-zA-Z]+? :.*
To check the one that can't
EDIT:
Thinking some more about it, you could combine the two patterns above like so:
^(\"[^\"]+?\"|[a-zA-Z]+?) :.*$
Which will match something :"something" and "some-thing" :"something" but not "some-thing : "something" or some-thing : "something". Assuming that the string only contains the given text.
If I understand your question right, this simple regex should work
\"string1\" :\"string2\"
Maybe something like this?
(?<normalString>^[a-zA-Z]+$)|(?<specialString>^".*?"$)
This will capture only a-z characters and put them in the "normalString" group, or if there's an string within quotation marks, capture that and put it in the "specialString" group.
Related
Can I give the String.split method a parameter which tells it when it must not split the given string? In my particular case, I have text documents with lots of text and symbols. But in every file there are many different symbols. This is what I want to achieve:
string.split(not(A-Z,ß,ä,ö,ü));
So basically, I want String.split to only split whenever it finds a character that is not part of the German set of characters.
I hope you can help me.
There are three tokens in regular expressions that allow you to do exactly what you want to achieve:
[] creates a character class which contains all characters that are listed inside. In your particular case, you'd want this to be [a-zßäöü] as this character group contains all characters a through z, ß, ä, ö and ü.
^ negates the contents of a character class. So, using the character class from above, you'd use [^a-zßäöü] if you wanted to match any character that is not part of the character group.
Additionally, adding (?i) in front of your regular expression causes it to be case insensitive, which allows your expression to match the uppercase letters as well without having to actually add them to your expression.
So, adding those three tokens together, you get the regular expression (?i)[^a-zßäöü]. Now the only thing left is to put them into your String.split method and you're done:
string.split("(?i)[^a-zßäöü]");
Mr.Human,
If I'm understanding your question correctly, you want to split a string on non-German characters?
So,
abcdöyüp
becomes
a, b, c, dö, yü, p
If that is the case, then unfortunately you need to specify the set of characters that are non-German, e.g. [A-Z] to split on. If you are trying to accomplish something other than this, please clarify and/or provide an example.
Sample Input:
a:b
a.in:b
asds.sdsd:b
a:b___a.sds:bc___ab:bd
Sample Output:
a:replaced
a.in:replaced
asds.sdsd:replaced
a:replaced___a.sds:replaced___ab:replaced
String which comes after : should be replaced with custom function.
I have done the same without Regex. I feel it can be replaced with regex as we are trying to extract string out of specific pattern.
For first three cases, it's simple enough to extract String after :, but I couldn't find a way to deal with third case, unless I split the string ___ and apply the approach for first type of pattern and again concatenate them.
Just replace only the letters with exists next to : with the string replaced.
string.replaceAll("(?<=:)[A-Za-z]+", "replaced");
DEMO
or
If you also want to deal with digits, then add \d inside the char class.
string.replaceAll("(?<=:)[A-Za-z\\d]+", "replaced");
(:)[a-zA-Z]+
You can simply do this with string.replaceAll.Replace by $1replaced.See demo.
https://regex101.com/r/fX3oF6/18
I will be handling a bunch of strings that will be of the following format:
"2*salary"
"salary+2"
"2*salary/3"
My goal is to pull out just "salary". I do not however want to eliminate non-characters because I might have something like "2*id3", a mixture of characters and numbers as the variable name (note: it will never be all numbers). I currently use:
Pattern pattern = Pattern.compile("[\\w_]+");
However, for something like "2*salary" this results in "2" and "salary" being found.
You're probably looking for this:
Pattern.compile("[a-zA-Z]\\w+");
... in other words, match the sequence of characters that begins with a letter. That'll match 'salary', but won't match '2' (and '2salary' too).
If you in fact do need to match 2salary, use this:
Pattern.compile("[0-9]*[A-Za-z]\\w+");
(I have replaced [\w_] with just \w, it actually includes underscore).
That is because 2*salary matches twice your "word" character definition \w which is [a-zA-Z0-9_], the first is 2 and the and match is salary
In your case you need something like "[a-zA-Z][\w]*"
I am trying to validate a string in a 'iterative way' and all my tryouts just fail!
I find it a bit complicated and i'm guessing maybe you could teach me how to do it right.
I assume that most of you will suggest me to use regex patterns but i dont really know how, and in general, how can a regex be defined for infinite "sets"?
The string i want to validate is
"ANYTHING|NUMBER_ONLY,ANYTHING|NUMBER_ONLY..."
for example: "hello|5,word|10" and "hello|5,word|10," are both valid.
note: I dont mind if the string ends with or without a comma ','.
Kleene star (*) lets you define "infinite sets" in regular expressions. Following pattern should do the trick:
[^,|]+\|\d+(,[^,|]+\|\d+)*,?
A----------B--------------C-
Part A matches the first element. Part B matches any following elements (notice the star). Part C is the optional comma at the end.
WARNING: Remember to escape backslashes in Java string.
I'd suggest splitting your string to array by | delimiter. And validate each part separately. Each part (except first one) should match following pattern \d+(,.*)?
UPDATED
Split by , and validate each part with .*|\d+
I am trying to take from a file all the valid words. Valid words are defined as normal characters that can appear like so:
don't won't can't
and I have to ignore commas periods and exclamation points.
I have gotten the expression to just get characters but now it won't get words like don't and can't or won't.
This is the expression I am using "[^A-Za-z]+" and I have tried "\'[^A-Za-z]+" but this breaks and allows all characters. Does anyone have any idea what I can use to get normal words including don't and won't and can't and such words.
Thank you very much
[^A-Za-z] Would mean anything NOT matching those character ranges! Try this:
[A-Za-z']
You may need to escape the single quote, in which case you'll probably need to escape the slash that escapes it:
[A-Za-z\\']
Another way (using abbreviations) is: \b[\w']+
This will match letters from any language and exclude numbers.
\b[\p{L}\!\'\?]+
Here is a very good resource for regular expressions.
http://www.regular-expressions.info/