I am trying to create a single regex expression which will have the following result on two different example texts:
Example 1
Example text 1: "App Name: Person Name"
Captured group 1: "App Name"
Captured group 2: "Person Name"
Example 2
Example text 2: "App Name (1 factor): Person Name"
Captured group 1: "App Name"
Captured group 2: "Person Name"
The regex expression I have come up with is: (.*)(\s\(.*\))?:\s(.*)
But it doesn't seem to be capturing correctly and I can't see why.
I am trying this in Java on Android (and I am using a double slash to escape in the string)
I think what you're looking for is something like:
([A-Za-z0-9\s]*)(\s\(.*\))?:\s(.*)
The (.*) in the first group you have is capturing every character greedily. You have to specify what kind of characters can come before the (. I used regex101.com to test, and it seems to work for your provided cases.
You may use
^(.*?)(?:\s*\([^()]*\))?:\s*(.*)$
See the regex demo.
Details
^ - start of string
(.*?) - Capturing group 1: any zero or more chars other than line break chars, as few as possible
(?:\s*\([^()]*\))? - an optional non-capturing group matching 1 or 0 occurrences of
\s* - 0+ whitespaces
\([^()]*\) - a (, zero or more chars other than ( and ) and then )
: - a colon
\s* - 0 or more whitespaces
(.*) - Capturing group 2: any zero or more chars other than line break chars, as many as possible
$ - end of string.
Just do a non capture of : as so:
Pattern
([^:\n]+)\s*:\s*([^:\n]+)
See the demo here.
Related
Question: How can I first capture a group(s) between two characters, and second match a character within that matched group(s)?
Given Input:
atribute="value1" AND atrribute="*value2"
Problem 1:
I want to capture a group between two characters, unlimited number of times.
Regex solution:
(?<==|!=|>|>=|<|<=|IN|NOT IN).*?(?=AND|OR|$)
Captured groups:
"value1"
"*value2"
Problem 2:
I want to match a character within the captured group(s)
Attempted regex solution 1:
(\*)(?<==|!=|>|>=|<|<=|IN|NOT IN).*?(?=AND|OR|$)
Attempted regex solution 2:
[*](?<==|!=|>|>=|<|<=|IN|NOT IN).*?(?=AND|OR|$)
My issue: neither of the above attempted solutions capture the asterisks in the input string. How can I achieve this?
You can place the capture group after the lookbehind, and then optionally match " followed by capturing the asterix
(?<==|!=|>|>=|<|<=|IN|NOT IN)(?:\"(\*))?.*?(?=AND|OR|$)
Regex demo
I need to replace string 'name' with fullName in the following kind of strings:
software : (publisher:abc and name:oracle)
This needs to be replaced as:
software : (publisher:abc and fullName:xyz)
Now, basically, part "name:xyz" can come anywhere inside parenthesis. e.g.
software:(name:xyz)
I am trying to use groups and the regex I built looks :
(\bsoftware\s*?:\s*?\()((.*?)(\s*?(and|or)\s*?))(\bname:.*?\)\s|:.*?\)$)
You may use
\b(software\s*:\s*\([^()]*)\bname:\w+
and replace with $1fullName:xyz. See the regex demo and the regex graph:
Details
\b - word boundary
(software\s*:\s*\([^()]*) - Capturing group 1 ($1 in the replacement pattern is a placeholder for the value captured in this group):
software - a word
\s*:\s* - a : enclosed with 0+ whitespaces
\( - a ( char
[^()]* - 0 or more chars other than ( and )
\bname - whole word name
: - colon
\w+ - 1 or more letters, digits or underscores.
Java sample code:
String result = s.replaceAll("\\b(software\\s*:\\s*\\([^()]*)\\bname:\\w+", "$1fullName:xyz");
I need some help with a Java regexp.
I'm working with a file that has JSON similar format:
['zul.wgt.Label','f6DQof',{sclass:'class',style:'font-weight: bold;',prolog:' ',value:'xxxx'},{},[]],
['zul.wgt.Label','f6DQpf',{sclass:'class notranslate',style:'font-weight: bold;',prolog:' ',value:'xxxx'},
['zul.wgt.Label','f6DQof',{sclass:'class',style:'font-weight: bold;',prolog:' ',label:'xxxx'},{},[]]
['zul.wgt.Label','f6DQof',{sclass:'class',style:'font-weight: bold;',prolog:' ',label:'xxxx'},{},[]]
I need to match any label or value data that is not preceded by a "notranslate" value on the sclass property.
I've been working on an almost working Regexp but I need the final push to match only what I've previously wrote
((?!.*?notranslate)sclass:'[\w\s]+'.*?)((value|label):'(.*?)')
Right now it matches anything from sclass that it's not followed by 'notranslate'
Thanks for your help
The values of your current regex are in the 4th capturing group
You could also use 1 capturing group instead of 4:
^(?!.*\bsclass:'[^']*\bnotranslate\b[^']*').*\b(?:label|value):'([^']+)'
Regex demo
That would match:
^ Assert start of the string
(?! Negative lookahead to assert that what is on the right does not
.*\bsclass: Match any character 0+ times followed by class:
'[^']*\bnotranslate\b[^']*' Match notranslate between single quotes and word boundaries
) Close non capturing group
.* match any character 0+ times
\b(?:label|value): Match either label or value followed by :
'([^']+)' Match ', capture in a group matching not ' 1+ times and match '
Java demo
I have a string that can come in different forms - one example below
%VAR('SERVER','DEFAULT') %VAR('LOC','NYC')Run_ServerRestart.ps1
I want to separate them as individual items and am not able to separate out the command in the end as a match. My regex returns 2 matches for the above example while I am looking to get 3 matches. Any ideas on what I maybe doing wrong and how to separate out them as below and if there is a more efficient way to achieve this ? We need to accommodate for spaces anywhere in between as well.
%VAR('SERVER','DEFAULT')
%VAR('LOC','NYC')
Run_ServerRestart.ps1
Here is my regex so far:
/%VAR\(\s*'([^']+)'\s*\)*\,*\s*'([^']+)'\s*\)|/*\\*[\w-]+\s*\.?\S*/g
However, this above regex is not matching some of my examples below.
c:\abc\def.txt - should show 1 match
%VAR('SERVER','USA')\C:\batch.bat - should show 2 matches - %VAR('SERVER', 'USA') and \c:\batch.bat
%VAR('SERVER','NYC') - should show 1 match
%VAR('SERVER','NYC') %VAR('APP','NNJ')Run_Command.ps1 - should show 3 matches
%VAR('SERVER','NYC') %VAR('APP','NNJ') and Run_Command.ps1
%VAR('SERVER','NYC') -File - should show 2 matches - %VAR('SERVER','NYC') and -File
/usr/bin/cat - should show 1 match
%VAR('SERVER','NYC')BATCH1.bat - should show 2 matches - %VAR('SERVER','NYC') and BATCH1.bat
ftp -s:D:\\apps\\scripts\\Intel\\daily_job.ftp - should show 1 match
%VAR('SERVER') - should show 1 match
Regex: %VAR\([^\)]+\)|[\S]+(?:\s[\S]+)?
Details:
[^] Match a single character not present in the list
[] Match a single character present in the list
+ Matches between one and unlimited times
* Matches between zero and unlimited times
| or
(?:) Non capturing group
? Matches between zero and one times
\S matches any non-whitespace character (equal to [^\r\n\t\f\v ])
\s matches any whitespace character (equal to [\r\n\t\f\v ])
Regex demo
Assuming they always look exactly like this you can just abuse the ) as a separator
(.*\))\s(.*\))(.*)
Example with explanation here: https://regex101.com/r/CLBFKa/1
I'm trying to split the string below into 3 groups, but with it doesn't seem to be working as expected with the pattern that I'm using. Namely, when I invoke matcher.group(3), I'm getting a null value instead of *;+g.3gpp.cs-voice;require. What's wrong with the pattern?
String: "*;+g.oma.sip-im;explicit,*;+g.3gpp.cs-voice;require"
Pattern: (\\*;.*)?(\\*;.*?\\+g.oma.sip-im.*?)(,\\*;.*)?
Expected:
Group 1: null,
Group 2: *;+g.oma.sip-im;explicit,
Group 3: ,*;+g.3gpp.cs-voice;require
Actual:
Group 1: null,
Group 2: *;+g.oma.sip-im,
Group 3: null
The result you get does actually match your pattern in a non-greedy way. Group2 is expanded to the shortest possible result
*;+g.oma.sip-im
and then the last group is left out because of the question mark at the very end. It appears to me that you are building a far too complicated regex for your purpose.
The thing is that the (,\*;.*)? does not match as the text you expect is located further in the string. You need to make the third group obligatory by removing the ? at the end, but wrap the whole .*? + Group 3 within an optional non-capturing group:
String pat = "(\\*;.*)?(\\*;.*?\\+g\\.oma\\.sip-im)(?:.*?(,\\*;.*))?";
See the regex demo.
Note that literal dots should be escaped in the regex pattern.
Details:
(\\*;.*)? - Group 1 (optional) capturing
\\*; - a *; string
.* - any zero or more chars other than linebreak symbols, as many as possible
(\\*;.*?\\+g\\.oma\\.sip-im) - Group 2 (obligatory) capturing
\\*; - a *; string
.*? - any zero or more chars other than linebreak symbols, as few as possible
\\+g\\.oma\\.sip-im - a literal string +g.oma.sip-im
(?:.*?(,\\*;.*))? - non-capturing group (optional) matching
.*? - any zero or more chars other than linebreak symbols, as few as possible
(,\\*;.*) - Group 3 (obligatory) capturing the same pattern as in Group 1.