I'm trying to come up with a pattern for finding every text that is between double or single quotation marks in java source code. This is what I have:
"(.*?)"|’(.*?)’
Debuggex Demo
This works for almost every case I guess except one:
"text\"moretext\"evenmore"
Debuggex Demo
This could be used as a valid String definition, because the quotes are escaped. The pattern does not recognize the inner part more text.
Any ideas for a pattern that accounts for this case?
You can use this regex to match single or double quotes string ignoring all escaped quotes:
(["'])([^\\]*?(?:\\.[^\\]*?)*)\1
RegEx Demo
RegEx Breakup:
(["']): Match single or double quote and capture it in group #1
(: Start Capturing group #2
[^\\]*?: Match 0 or more of any characters that is not a \
(?:`: Start non-capturing group
\\: Match a \
.: Followed by any character that is escaped
[^\\]*?: Followed by 0 or more of any non-\ characters
)*: End non-capturing group. Match 0 or more of this non-capturing group
): End capturing group #2
\1: Match closing single or double quote matches in group #1
That should work: "([^"\\]|\\.)*"|'([^'\\]|\\.)*' Regexr test.
Explanation:
" matches ".
[^"\\]|\\. negates match of \ & "(i.e. makes it to consume \") or continues match of \ and any character.
* continue match.
" matches "
Same for '.
Related
I get three parameters in a string. Each parameter is written in the form: Quotes, Name, Quotes, Equals sign, Quotes, Text, Quotes. The parameter separator is a space.
Example 1:
"param1"="Peter" "param2"="Harald" "param3"="Marie"
With java.util.regex.Matcher I can find any name and text by the following regex:
"([^"]*)"\s*=\s*"([^"]*)"
Now, however, there may be a quotation mark in the text. This is masked by a backslash.
Example 2:
"param1"="Peter" "param2"="Har\"ald" "param3"="Marie"
I have built the following regex:
"([^"]*)"\s*=\s*("([^"]*(\\")*[^"]*)*[^\\]")
This works well for example 2, but is not a universal solution.
If the backslash is at the end of a parameter-value, the solution does not work anymore.
Example 3:
"param1"="Peter" "param2"="Harald\" "param3"="Marie"
If the backslash is at the end of the value, the matcher interprets "Harald\" " as the value of parameter 2 instead of "Harald\".
Do you have a universal solution for this problem? Thanks in advance for your input.
Kind regards
Dominik
You may use this regex in Java:
\"([^\"]*)\"\h*=\h*(\"[^\\\"]*(?:\\(?=\"(?:\h|$))|(?:\\.[^\\\"]*))*\")
RegEx Demo
RegEx Demo:
\"([^\"]*)\": Match quoted string a parameter name
\h*=\h*: Match = surrounded with optional spaces
(: Start capture group #1
\": Match opening "
[^\\\"]*: Match 0 or more of non-quote, non-backslash characters
(?::
\\: Match a \
(?=\"(?:\h|$)): Must be followed by a " that has a whitespace or line afterwards
|: OR
(?:\\.[^\\\"]*))*: Match an escaped character followed by 0 or more of non-quote, non-backslash characters
\": Match closing "
): End capture group #1
I need some help with a Java regexp.
I'm working with a file that has JSON similar format:
['zul.wgt.Label','f6DQof',{sclass:'class',style:'font-weight: bold;',prolog:' ',value:'xxxx'},{},[]],
['zul.wgt.Label','f6DQpf',{sclass:'class notranslate',style:'font-weight: bold;',prolog:' ',value:'xxxx'},
['zul.wgt.Label','f6DQof',{sclass:'class',style:'font-weight: bold;',prolog:' ',label:'xxxx'},{},[]]
['zul.wgt.Label','f6DQof',{sclass:'class',style:'font-weight: bold;',prolog:' ',label:'xxxx'},{},[]]
I need to match any label or value data that is not preceded by a "notranslate" value on the sclass property.
I've been working on an almost working Regexp but I need the final push to match only what I've previously wrote
((?!.*?notranslate)sclass:'[\w\s]+'.*?)((value|label):'(.*?)')
Right now it matches anything from sclass that it's not followed by 'notranslate'
Thanks for your help
The values of your current regex are in the 4th capturing group
You could also use 1 capturing group instead of 4:
^(?!.*\bsclass:'[^']*\bnotranslate\b[^']*').*\b(?:label|value):'([^']+)'
Regex demo
That would match:
^ Assert start of the string
(?! Negative lookahead to assert that what is on the right does not
.*\bsclass: Match any character 0+ times followed by class:
'[^']*\bnotranslate\b[^']*' Match notranslate between single quotes and word boundaries
) Close non capturing group
.* match any character 0+ times
\b(?:label|value): Match either label or value followed by :
'([^']+)' Match ', capture in a group matching not ' 1+ times and match '
Java demo
So I'm trying to separate the following two groups formatted as:
FIRST - GrouP second.group.txt
The first group can contain any character
The second group is a dot(.) delimited string.
I'm using the following regex to separate these two groups:
([A-Z].+).*?([a-z]+\.[a-z]+)
However, it gives a wrong result:
1: FIRST - GrouP second.grou
2: p.txt
I don't understand because I'm using "nongreedy" separater (.*?) instead of the greedy one (. *)
What am I doing wrong here?
Thanks
You can this regex to match both groups:
\b([A-Z].+?)\s*\b([a-z]+(?:\.[a-z]+)+)\b
RegEx Demo
Breakup:
\b # word boundary
([A-Z].+?) # match [A-Z] followed by 1 or more chars (lazy)
\s* # match 0 or more spaces
\b # word boundary
([a-z]+ # match 1 or more of [a-z] chars
(?:\.[a-z]+)+) # match a group of dot followed by 1 or more [a-z] chars
\b # word boundary
PS: (?:..) is used for non-capturing group.
This is one possible solution that should be pretty compact:
(.*?-\s*\S+)|(\S+\.?)+
https://regex101.com/r/iW8mE5/1
It is looking for anything followed by a dash, zero or more spaces, and then non-whitespace characters. And if it doesn't find that, it looks for non-whitespace followed by an optional decimal.
I want to write a regex in Java to check if a string ends in double consonant.
My regex is not working.
\\w+[^aeiou]\\1$
Appreciate your help
Thanks a ton.
It doesn't work since \1 references a non-existent subpattern. You need to assign a capturing group. Capturing groups could be used later on in the regular expression as a backreference to what was matched in that captured group.
\\w+([^aeiou])\\1$
Based off the comment above about your regular expression not only matching double consonants, I would consider combining an intersection with negation to make sure the grouped character is an actual letter character.
(?i)\\w+([a-z&&[^aeiou]])\\1$
This might work.
# "(?i)\\w+(?:(?![aeiou])[a-z]){2}$"
(?i) # Case independent
\w+
(?:
(?! [aeiou] ) # Not a vowel ahead
[a-z] # Consonant only
){2}
$
I have a problem with the following expression:
String REGEX_Miasto_Dwu_Czlonowe="\D+\s\D+";
Pattern pat_Miasto = Pattern.compile(REGEX_Miasto_Dwu_Czlonowe);
Matcher mat_Miasto_Dwu_Czlonowe = pat_Miasto.matcher(adres);
Because the above pattern matches
"80-227 GDAŃSK DOSTUDZIENKI 666";
"83000 PRUSZCZ GDANSKI UL. TYSIACLECIA 666";
But it only should match this expression : "PRUSZCZ GDANSKI UL. TYSIACLECIA 666";
THX for help.
You got some problems in your regexp.
first you have to change all backslashes to double backslashes but if you have matches it coulde be a copy and paste error
\D matches non-digits. Was this your intention?
\D+\s\D+
Debuggex Demo
Therefore you match some non digits followed by one space followed by some non digits.
I think it is more or less by incident that your expression matches.
This could be a solution for your regexp:
^\d+\D+\d+$
Debuggex Demo
If you want to match your second line as a literal, so that only this line matches you could use:
Matcher.quoteReplacement(String s)
to build this kind of expression from a simple String. All control characters are well escaped.
Your regex matches any number of non-digit \D+ a space \s then any number of non-digit. So it matches the first string:
80-227 GDAŃSK DOSTUDZIENKI 666
// ^__________\D+_________^^____________\D+_________^
// |
// a space
I guess you want:
[^\s\d]+\s[^\s\d]+
Any number of non-space/non digit, a space then any number of non-space/non-digit.