I have two lines in Array list which contains number
line1 1234 5694 7487
line2 10/02/1992 or 1992
I used different regex to get both the line, but the problem is when I use the regex ([0-9]{4}//s?)([0-9]{4}//s?)([0-9]{4}//n) . It gets the first line cool.
But for checking the line2 I used ([0-9]{2}[/-])?([0-9]{2}[/-])?([0-9]{4}).
this regex instead of returning the last line its returning first 4 numbers of the line1.
As stated in the comments below you are using .matches which returns true if the whole string can be matched.
In your pattern ([0-9]{2}[/-])?([0-9]{2}[/-])?([0-9]{4}) it would also match only 4 digits as the first 2 groups ([0-9]{2}[/-])?([0-9]{2}[/-])? are optional due to the question mark ? leaving the 3rd group ([0-9]{4}) able to match 4 digits.
What you might do instead is to use an alternation to either match a date like format where the first 2 parts including the delimiter are optional. Or match 3 times 4 digits.
.*?(?:(?:[0-9]{2}[/-]){2}[0-9]{4}|[0-9]{4}(?:\h[0-9]{4}){2}).*
Explanation
.*? Match any character except a newline non greedy
(?: Non capturing groupo
(?:[0-9]{2}[/-]){2} Repeat 2 times matching 2 digits and / or -
[0-9]{4} Match 4 digits
| Or
[0-9]{4} Match 4 digits
(?:\\h[0-9]{4}){2} Repeat 2 times matching a horizontal whitespace char and 4 digits
) Close non capturing group
.* Match 0+ times any character except a newline
Regex demo | Java demo
For example
List<String> list = Arrays.asList(
new String[]{
"10/02/1992 or 1992",
"10/02/1992",
"10/1992",
"02/1992",
"1992",
"1234 5694 7487"
}
);
String regex = ".*?(?:(?:[0-9]{2}[/-]){2}[0-9]{4}|[0-9]{4}(?:\\h[0-9]{4}){2}).*";
for (String str: list) {
if (str.matches(regex)){
System.out.println(str);
}
}
Result
10/02/1992 or 1992
10/02/1992
1234 5694 7487
Note that in your first pattern I think you mean \\s instead of //s.
The \\s will also match a newline. If you want to match a single space you could just match that or use \\h to match a horizontal whitespace character.
Related
I want to extract URL strings from a log which looks like below:
<13>Mar 27 11:22:38 144.0.116.31 AgentDevice=WindowsDNS AgentLogFile=DNS.log PluginVersion=X.X.X.X Date=3/27/2019 Time=11:22:34 AM Thread ID=11BC Context=PACKET Message= Internal packet identifier=0000007A4843E100 UDP/TCP indicator=UDP Send/Receive indicator=Snd Remote IP=X.X.X.X Xid (hex)=9b01 Query/Response=R Opcode=Q Flags (hex)=8081 Flags (char codes)=DR ResponseCode=NOERROR Question Type=A Question Name=outlook.office365.com
I am looking to extract Name text which contains more that 5 digits.
A possible way suggested is (\d.*?){5,} but does not seem to work, kindly suggest another way get the field.
Example of string match:
outlook12.office345.com
outlook.office12345.com
You can look for the following expression:
Name=([^ ]*\d{5,}[^ ]*)
Explanation:
Name= look for anything that starts with "Name=", than capture if:
[^ ]* any number of characters which is not a space
\d{5,} then 5 digits in a row
[^ ]* then again, all digits up to a white space
This regular expression:
(?<=Name=).*\d{5,}.*?(?=\s|$)
would extract strings like outlook.office365666.com (with 5 or more consecutive digits) from your example input.
Demo: https://regex101.com/r/YQ5l2w/1
Try this pattern: (?=\b.*(?:\d[^\d\s]*){5,})\S*
Explanation:
(?=...) - positive lookahead, assures that pattern inside it is matched somewhere ahead :)
\b - word boundary
(?:...) - non-capturing group
\d[^\d\s]* - match digit \d, then match zero or more of any characters other than whitespace \s or digit \d
{5,} - match preceeding pattern 5 or more times
\S* - match zero or more of any characters other than space to match the string if assertion is true, but I think you just need assertion :)
Demo
If you want only consecutive numbers use simplified pattern (?=\b.*\d{5,})\S*.
Another demo
Of course, you have to add positive lookbehind: (?<=Name=) to assert that you have Name= string preceeding
Try this regex
([a-z0-9]{5,}.[a-z0-9]{5,})+.com
https://regex101.com/r/OzsChv/3
It Groups,
outlook.office365.com
outlook12.office345.com
also all url strings
I have a string with 5 pieces of data delimited by underscores:
AAA_BBB_CCC_DDD_EEE
I want a different regex for each component.
The regex needs to return just the one component.
For example, the first would return just AAA, the second for BBB, etc.
I am able to parse out AAA with the following:
^([^_]*)?
I see that I can do a look-around like this to find:
(?<=[^_]*_).*
BBB_CCC_DDD_EEE
But the following can not find just BBB
(?<=[^_]*_)[^_]*(?=_)
Mixing lookbehind and lookahead
^([^_]+)? // 1st
(?<=_)[^_]+ // 2nd
(?<=_)[^_]+(?=_[^_]+_[^_]+$) // 3rd
(?<=_)[^_]+(?=_[^_]+$) // 4th
[^_]+$ // 5th
Just if the lengths of the strings beetween the "_" are known it can be like this
1st match
^([^_]+)?
2nd match
(?<=_)\K[^_]+
3rd match
(?<=_[A-Za-z]{3}_)\K[^_]+
4th match
(?<=_[A-Za-z]{3}_[A-Za-z]{3}_)\K[^_]+
5th match
(?<=_[A-Za-z]{3}_[A-Za-z]{3}_[A-Za-z]{3}_)\K[^_]+
each {3} is expressing the length of the string beetween "_"
If your string is always uses underscores, you might use 1 regex to capture your values in a capturing group by repeating the pattern of what is before (in this case NOT an underscore followed by an underscore) using a quantifier which you can change like {3}.
This way you can specify using the quantifier how many times you want to repeat the pattern before and then capture your match. For your example string AAA_BBB_CCC_DDD_EEE you could use {0}, {1},{2},{3} or {4}
^(?:[^_\n]+_){3}([0-9A-Za-z]+)(?:_[^_\n]+)*$
That would match:
^ Assert position at start of the line
(?:[^_\n]+_){3} In a non capturing group (?:, match NOT and underscore or a new line one or more times [^_\n]+ followed by an underscore and repeat that n times (In this example n is 3 times)
([0-9A-Za-z]+) Capture your characters in a group using for example a character class (or use [^_]+ to match not an underscore but that will also match any white space characters)
(?:_[^_\n]+)* Following after your captured values, repeat in a non capturing group matching an underscore, NOT and underscore or a new line one or more times and repeat that pattern zero or more times to get a full match
$ Assert position at the end of the line
I need to parse and extract values from a sql log similar to the one below.
SQL^^0001^^ABCDEF^^26^^XYZ
SQL^^0002^^ABCDEF^^26^^XYZ
abc
<>()_asc wHERE
SQL^^0003^^ABCDEF^^12^^XYZ
SQL^^0004^^ABCDEF^^28^^XYZ
But the logs are not single lines always. I have a regex that can capture If it is single line. Also the fields are of fixed length except the last element. Last element can vary in length.
(\w{3})\W{2}(\d{4})\W{2}(\w{6})\W{2}(\d{2})\W{2}(.*)
^^ is the delimiter but can be any other value also.
There is no fixed end of line character but I need to capture until next line SQL in this case.
How to parse the log and extract them if its multi line log. I'm trying in Java. Java or scala is preferred.
You may leverage the fact that each record starts with exactly 3 word chars followed with ^^. Thus, the last field you match should match any lines that do not start with that pattern. If the ^^ are just an example, you may just use the whole \w{3}\W{2}\d{4}\W{2}\w{6}\W{2}\d{2}\W{2} pattern as the delimiter instead of ^^.
Use
(?m)^(\w{3})\W{2}(\d{4})\W{2}(\w{6})\W{2}(\d{2})\W{2}(.*(?:\r?\n(?!\w{3}\^\^).*)*)
See the regex demo. If the ^^ is just a placeholder, as mentioned above, replace (?!\w{3}\^\^) with (?!\w{3}\W{2}\d{4}\W{2}\w{6}\W{2}\d{2}\W{2}). Or, perhaps, a shorter one will do, too: (?!\w{3}\W{2}\d{4}\b).
Details
(?m)^ - start of a line ((?m) is a Pattern.MULTILINE embedded flag option that makes ^ match a line start rather than a string start position)
(\w{3}) - Group 1: three word chars
\W{2} - 2 non-word chars
(\d{4}) - Group 2: four digits
\W{2} - 2 non-word chars
(\w{6}) - Group 3: six word chars
\W{2} - 2 non-word chars
(\d{2}) - Group 4: 2 digits
\W{2} - 2 non-word chars
(.*(?:\r?\n(?!\w{3}\^\^).*)*) - Group 5:
.* - any 0+ chars other than line break chars, as many as possible
(?:\r?\n(?!\w{3}\^\^).*)* - zero or more consecutive occurrences of:
\r?\n(?!\w{3}\W{2}) - CRLF or LF line break not followed with 3 word and then 2 non-word chars
.* - the rest of the line
I have recently encountered this question in the text book:
I am suppose to write a method to check if a string have:
at least ten characters
only letters and digits
at least three digits
I am trying to solve it by Regx, rather than iterating through every character; this is what I got so far:
String regx = "[a-z0-9]{10,}";
But this only matches the first two conditions. How should I go about the 3rd condition?
You could use a positive lookahead for 3rd condition, like this:
^(?=(?:.*\d){3,})[a-z0-9]{10,}$
^ indicates start of string.
(?= ... ) is the positive lookahead, which will search the whole string to match whatever is between (?= and ).
(?:.*\d){3,} matches at least 3 digits anywhere in the string.
.*\d matches a digit preceded by any (or none) character (if omitted then only consecutive digits would match).
{3,} matches three or more of .*\d.
(?: ... ) is a non-capturing group.
$ indicates end of string.
I want to validate a textfield in a Java based app where I want to allow only comma separated numbers and they should be either 10 or 16 digits. I have a regex that ^[0-9,;]+$ to allow only numbers, but it doesn't work for 10 or 16 digits only.
You can use {n,m} to specify length.
So matching one number with either 10 or 16 digits would be
^(\d{10}|\d{16})$
Meaning: match for exactly 10 or 16 digits and the stuff before is start-of-line and the stuff behind is end-of-line.
Now add separator:
^((\d{10}|\d{16})[,;])*(\d{10}|\d{16})$
Some sequences of 10-or-16 digit followed by either , or ; and then one sequece 10-or-16 with end-of-line.
You need to escape those \ in java.
public static void main(String[] args) {
String regex = "^((\\d{10}|\\d{16})[,;])*(\\d{10}|\\d{16})$";
String y = "0123456789,0123456789123456,0123456789";
System.out.println(y.matches(regex)); //Should be true
String n = "0123456789,01234567891234567,0123456789";
System.out.println(n.matches(regex)); //should be false
}
I would probably use this regex:
(\d{10}(?:\d{6})?,?)+
Explanation:
( - Begin capture group
\d{10} - Matching at least 10 digits
(?: - Begin non capture group
\d{6} - Match 6 more digits
)? - End group, mark as optional using ?
,? - optionally capture a comma
)+ - End outer capture group, require at least 1 or more to exist? (mabye change to * for 0 or more)
The following inputs match this regex
1234567890123456,1234567890
1234567890123456
1234567890
these inputs do not match
123,1234567890
12355
123456789012
You need to have both anchors and word boundaries:
/^(?:\b(?:\d{10}|\d{16})\b,?)*$/
The anchors are necessary so you don't get false positives for partial matches and the word boundaries are necessary so you don't get false positives for 20, 26, 30, 32 digit numbers.
Here is my version
(?:\d+,){9}\d+|(?:\d+,){15}\d+
Let's review it. First of all there is a problem to say: 10 or 16. So, I have to create actually 2 expressions with | between them.
Second, the expression itself. Your version just says that you allow digits and commas. However this is not what you really want because for example string like ,,, will match your regex.
So, the regex should be like (?:\d+,){n}\d+ that means: sequence of several digits terminated by comma and then sequence of several digits, e.g. 123,45,678 (where 123,45 match the first part and 678 match the second part)
Finally we get regex that I have written in the beginning of my answer:
(?:\d+,){9}\d+|(?:\d+,){15}\d+
And do not forget that when you write regex in you java code you have to duplicate the back slash, like this:
Pattern.compile("\\d+,{9}\\d+|\\d+,{15}\\d+")
EDIT: I have just added non-capturing group (?: ...... )