I have a string with 5 pieces of data delimited by underscores:
AAA_BBB_CCC_DDD_EEE
I want a different regex for each component.
The regex needs to return just the one component.
For example, the first would return just AAA, the second for BBB, etc.
I am able to parse out AAA with the following:
^([^_]*)?
I see that I can do a look-around like this to find:
(?<=[^_]*_).*
BBB_CCC_DDD_EEE
But the following can not find just BBB
(?<=[^_]*_)[^_]*(?=_)
Mixing lookbehind and lookahead
^([^_]+)? // 1st
(?<=_)[^_]+ // 2nd
(?<=_)[^_]+(?=_[^_]+_[^_]+$) // 3rd
(?<=_)[^_]+(?=_[^_]+$) // 4th
[^_]+$ // 5th
Just if the lengths of the strings beetween the "_" are known it can be like this
1st match
^([^_]+)?
2nd match
(?<=_)\K[^_]+
3rd match
(?<=_[A-Za-z]{3}_)\K[^_]+
4th match
(?<=_[A-Za-z]{3}_[A-Za-z]{3}_)\K[^_]+
5th match
(?<=_[A-Za-z]{3}_[A-Za-z]{3}_[A-Za-z]{3}_)\K[^_]+
each {3} is expressing the length of the string beetween "_"
If your string is always uses underscores, you might use 1 regex to capture your values in a capturing group by repeating the pattern of what is before (in this case NOT an underscore followed by an underscore) using a quantifier which you can change like {3}.
This way you can specify using the quantifier how many times you want to repeat the pattern before and then capture your match. For your example string AAA_BBB_CCC_DDD_EEE you could use {0}, {1},{2},{3} or {4}
^(?:[^_\n]+_){3}([0-9A-Za-z]+)(?:_[^_\n]+)*$
That would match:
^ Assert position at start of the line
(?:[^_\n]+_){3} In a non capturing group (?:, match NOT and underscore or a new line one or more times [^_\n]+ followed by an underscore and repeat that n times (In this example n is 3 times)
([0-9A-Za-z]+) Capture your characters in a group using for example a character class (or use [^_]+ to match not an underscore but that will also match any white space characters)
(?:_[^_\n]+)* Following after your captured values, repeat in a non capturing group matching an underscore, NOT and underscore or a new line one or more times and repeat that pattern zero or more times to get a full match
$ Assert position at the end of the line
Related
I want to add into below regex which also pass following criteria -
^[\p{L}\d'][ \p{L}\d'-]*[\p{L}\d'-']$
Should start with letter (A-Z or a-z) only.
Can accepts only single letter also.
Accept hyphen (-), Space, dot (.) in between the string or end of the string. (No other special character)
Accept numbers in between and end to the string.
Please also want to achieve existing criteria what this regex is doing.
E.g.
Expected -
t, T, test, test123, te12st, te-st, te.st, te st, éééééé, ṪỲɎɆḂɃɀȿȸȺȔȐȳɊÉâÇë, Επίθετο
Not Expected -
12test, 1, .test, -test, , tes*t (none of the special character except hyphen, dot & space),
To match the expected and not the not expected including a single letter, you could match \pL from the start of the string. Then repeat 0+ times any of the listed in [\d\pL .-] and then assert the end of the string.
Note that not all of your expected start with a-zA-Z.
^\pL[\d\pL .-]*$
In Java
String regex = "^\\pL[\\d\\pL .-]*$";
Regex demo | Java demo
^[A-Za-z]+[\p{L}\d-.\s]*$
This is a possible solution, however these test criteria conflict with your first requirement: éééééé, ṪỲɎɆḂɃɀȿȸȺȔȐȳɊÉâÇë, Επίθετο. Where it 1) accepts one or more of A-Za-z then 2) zero or more combination of letters, numbers, hyphens, space, and periods.
If you want it to also accept those three test criteria then this is a possible solution:
^[\p{L}]+[\p{L}\d-.\s]*$
I have two lines in Array list which contains number
line1 1234 5694 7487
line2 10/02/1992 or 1992
I used different regex to get both the line, but the problem is when I use the regex ([0-9]{4}//s?)([0-9]{4}//s?)([0-9]{4}//n) . It gets the first line cool.
But for checking the line2 I used ([0-9]{2}[/-])?([0-9]{2}[/-])?([0-9]{4}).
this regex instead of returning the last line its returning first 4 numbers of the line1.
As stated in the comments below you are using .matches which returns true if the whole string can be matched.
In your pattern ([0-9]{2}[/-])?([0-9]{2}[/-])?([0-9]{4}) it would also match only 4 digits as the first 2 groups ([0-9]{2}[/-])?([0-9]{2}[/-])? are optional due to the question mark ? leaving the 3rd group ([0-9]{4}) able to match 4 digits.
What you might do instead is to use an alternation to either match a date like format where the first 2 parts including the delimiter are optional. Or match 3 times 4 digits.
.*?(?:(?:[0-9]{2}[/-]){2}[0-9]{4}|[0-9]{4}(?:\h[0-9]{4}){2}).*
Explanation
.*? Match any character except a newline non greedy
(?: Non capturing groupo
(?:[0-9]{2}[/-]){2} Repeat 2 times matching 2 digits and / or -
[0-9]{4} Match 4 digits
| Or
[0-9]{4} Match 4 digits
(?:\\h[0-9]{4}){2} Repeat 2 times matching a horizontal whitespace char and 4 digits
) Close non capturing group
.* Match 0+ times any character except a newline
Regex demo | Java demo
For example
List<String> list = Arrays.asList(
new String[]{
"10/02/1992 or 1992",
"10/02/1992",
"10/1992",
"02/1992",
"1992",
"1234 5694 7487"
}
);
String regex = ".*?(?:(?:[0-9]{2}[/-]){2}[0-9]{4}|[0-9]{4}(?:\\h[0-9]{4}){2}).*";
for (String str: list) {
if (str.matches(regex)){
System.out.println(str);
}
}
Result
10/02/1992 or 1992
10/02/1992
1234 5694 7487
Note that in your first pattern I think you mean \\s instead of //s.
The \\s will also match a newline. If you want to match a single space you could just match that or use \\h to match a horizontal whitespace character.
I need to parse and extract values from a sql log similar to the one below.
SQL^^0001^^ABCDEF^^26^^XYZ
SQL^^0002^^ABCDEF^^26^^XYZ
abc
<>()_asc wHERE
SQL^^0003^^ABCDEF^^12^^XYZ
SQL^^0004^^ABCDEF^^28^^XYZ
But the logs are not single lines always. I have a regex that can capture If it is single line. Also the fields are of fixed length except the last element. Last element can vary in length.
(\w{3})\W{2}(\d{4})\W{2}(\w{6})\W{2}(\d{2})\W{2}(.*)
^^ is the delimiter but can be any other value also.
There is no fixed end of line character but I need to capture until next line SQL in this case.
How to parse the log and extract them if its multi line log. I'm trying in Java. Java or scala is preferred.
You may leverage the fact that each record starts with exactly 3 word chars followed with ^^. Thus, the last field you match should match any lines that do not start with that pattern. If the ^^ are just an example, you may just use the whole \w{3}\W{2}\d{4}\W{2}\w{6}\W{2}\d{2}\W{2} pattern as the delimiter instead of ^^.
Use
(?m)^(\w{3})\W{2}(\d{4})\W{2}(\w{6})\W{2}(\d{2})\W{2}(.*(?:\r?\n(?!\w{3}\^\^).*)*)
See the regex demo. If the ^^ is just a placeholder, as mentioned above, replace (?!\w{3}\^\^) with (?!\w{3}\W{2}\d{4}\W{2}\w{6}\W{2}\d{2}\W{2}). Or, perhaps, a shorter one will do, too: (?!\w{3}\W{2}\d{4}\b).
Details
(?m)^ - start of a line ((?m) is a Pattern.MULTILINE embedded flag option that makes ^ match a line start rather than a string start position)
(\w{3}) - Group 1: three word chars
\W{2} - 2 non-word chars
(\d{4}) - Group 2: four digits
\W{2} - 2 non-word chars
(\w{6}) - Group 3: six word chars
\W{2} - 2 non-word chars
(\d{2}) - Group 4: 2 digits
\W{2} - 2 non-word chars
(.*(?:\r?\n(?!\w{3}\^\^).*)*) - Group 5:
.* - any 0+ chars other than line break chars, as many as possible
(?:\r?\n(?!\w{3}\^\^).*)* - zero or more consecutive occurrences of:
\r?\n(?!\w{3}\W{2}) - CRLF or LF line break not followed with 3 word and then 2 non-word chars
.* - the rest of the line
I have recently encountered this question in the text book:
I am suppose to write a method to check if a string have:
at least ten characters
only letters and digits
at least three digits
I am trying to solve it by Regx, rather than iterating through every character; this is what I got so far:
String regx = "[a-z0-9]{10,}";
But this only matches the first two conditions. How should I go about the 3rd condition?
You could use a positive lookahead for 3rd condition, like this:
^(?=(?:.*\d){3,})[a-z0-9]{10,}$
^ indicates start of string.
(?= ... ) is the positive lookahead, which will search the whole string to match whatever is between (?= and ).
(?:.*\d){3,} matches at least 3 digits anywhere in the string.
.*\d matches a digit preceded by any (or none) character (if omitted then only consecutive digits would match).
{3,} matches three or more of .*\d.
(?: ... ) is a non-capturing group.
$ indicates end of string.
Hi I am trying to do regex in java, I need to capture the last {n} words. (There may be a variable num of whitespaces between words). Requirement is it has to be done in regex.
So e.g. in
The man is very tall.
For n = 2, I need to capture
very tall.
So I tried
(\S*\s*){2}$
But this does not match in java because the initial words have to be consumed first. So I tried
^(.*)(\S*\s*){2}$
But .* consumes everything, and the last 2 words are ignored.
I have also tried
^\S?\s?(\S*\s*){2}$
Anyone know a way around this please?
You had almost got it in your first attempt.
Just change + to *.
The plus sign means at least one character, because there wasn't any space the match had failed.
On the other hand the asterisk means from zero to more, so it will work.
Look it live here: (?:\S*\s*){2}$
Using replaceAll method, you could try this regex: ((?:\\S*\\s*){2}$)|.
Your regex contains - as you already mention - a greedy subpattern that eats up the whole string and sine (\S*\s*){2} can match an empty string, it matches an empty location at the end of the input string.
Lazy dot matching (changing .* to .*?) won't do the whole job since the capturing group is quantified, and the Matcher.group(1) will be set to the last captured non-whitespaces with optional whitespaces. You need to set the capturing group around the quantified group.
Since you most likely are using Matcher#matches, you can use
String str = "The man is very tall.";
Pattern ptrn = Pattern.compile("(.*?)((?:\\S*\\s*){2})"); // no need for `^`/`$` with matches()
Matcher matcher = ptrn.matcher(str);
if (matcher.matches()) { // Group 2 contains the last 2 "words"
System.out.println(matcher.group(2)); // => very tall.
}
See IDEONE demo