I want to extract URL strings from a log which looks like below:
<13>Mar 27 11:22:38 144.0.116.31 AgentDevice=WindowsDNS AgentLogFile=DNS.log PluginVersion=X.X.X.X Date=3/27/2019 Time=11:22:34 AM Thread ID=11BC Context=PACKET Message= Internal packet identifier=0000007A4843E100 UDP/TCP indicator=UDP Send/Receive indicator=Snd Remote IP=X.X.X.X Xid (hex)=9b01 Query/Response=R Opcode=Q Flags (hex)=8081 Flags (char codes)=DR ResponseCode=NOERROR Question Type=A Question Name=outlook.office365.com
I am looking to extract Name text which contains more that 5 digits.
A possible way suggested is (\d.*?){5,} but does not seem to work, kindly suggest another way get the field.
Example of string match:
outlook12.office345.com
outlook.office12345.com
You can look for the following expression:
Name=([^ ]*\d{5,}[^ ]*)
Explanation:
Name= look for anything that starts with "Name=", than capture if:
[^ ]* any number of characters which is not a space
\d{5,} then 5 digits in a row
[^ ]* then again, all digits up to a white space
This regular expression:
(?<=Name=).*\d{5,}.*?(?=\s|$)
would extract strings like outlook.office365666.com (with 5 or more consecutive digits) from your example input.
Demo: https://regex101.com/r/YQ5l2w/1
Try this pattern: (?=\b.*(?:\d[^\d\s]*){5,})\S*
Explanation:
(?=...) - positive lookahead, assures that pattern inside it is matched somewhere ahead :)
\b - word boundary
(?:...) - non-capturing group
\d[^\d\s]* - match digit \d, then match zero or more of any characters other than whitespace \s or digit \d
{5,} - match preceeding pattern 5 or more times
\S* - match zero or more of any characters other than space to match the string if assertion is true, but I think you just need assertion :)
Demo
If you want only consecutive numbers use simplified pattern (?=\b.*\d{5,})\S*.
Another demo
Of course, you have to add positive lookbehind: (?<=Name=) to assert that you have Name= string preceeding
Try this regex
([a-z0-9]{5,}.[a-z0-9]{5,})+.com
https://regex101.com/r/OzsChv/3
It Groups,
outlook.office365.com
outlook12.office345.com
also all url strings
I don't write many regular expressions so I'm going to need some help on the one.
I need a regular expression that can validate that a string is an alphanumeric comma delimited string.
Examples:
123, 4A67, GGG, 767 would be valid.
12333, 78787&*, GH778 would be invalid
fghkjhfdg8797< would be invalid
This is what I have so far, but isn't quite right: ^(?=.*[a-zA-Z0-9][,]).*$
Any suggestions?
Sounds like you need an expression like this:
^[0-9a-zA-Z]+(,[0-9a-zA-Z]+)*$
Posix allows for the more self-descriptive version:
^[[:alnum:]]+(,[[:alnum:]]+)*$
^[[:alnum:]]+([[:space:]]*,[[:space:]]*[[:alnum:]]+)*$ // allow whitespace
If you're willing to admit underscores, too, search for entire words (\w+):
^\w+(,\w+)*$
^\w+(\s*,\s*\w+)*$ // allow whitespaces around the comma
Try this pattern: ^([a-zA-Z0-9]+,?\s*)+$
I tested it with your cases, as well as just a single number "123". I don't know if you will always have a comma or not.
The [a-zA-Z0-9]+ means match 1 or more of these symbols
The ,? means match 0 or 1 commas (basically, the comma is optional)
The \s* handles 1 or more spaces after the comma
and finally the outer + says match 1 or more of the pattern.
This will also match
123 123 abc (no commas) which might be a problem
This will also match 123, (ends with a comma) which might be a problem.
Try the following expression:
/^([a-z0-9\s]+,)*([a-z0-9\s]+){1}$/i
This will work for:
test
test, test
test123,Test 123,test
I would strongly suggest trimming the whitespaces at the beginning and end of each item in the comma-separated list.
You seem to be lacking repetition. How about:
^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$
I'm not sure how you'd express that in VB.Net, but in Python:
>>> import re
>>> x [ "123, $a67, GGG, 767", "12333, 78787&*, GH778" ]
>>> r = '^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$'
>>> for s in x:
... print re.match( r, s )
...
<_sre.SRE_Match object at 0xb75c8218>
None
>>>>
You can use shortcuts instead of listing the [a-zA-Z0-9 ] part, but this is probably easier to understand.
Analyzing the highlights:
[a-zA-Z0-9 ]+ : capture one or more (but not zero) of the listed ranges, and space.
(?:[...]+,)* : In non-capturing parenthesis, match one or more of the characters, plus a comma at the end. Match such sequences zero or more times. Capturing zero times allows for no comma.
[...]+ : capture at least one of these. This does not include a comma. This is to ensure that it does not accept a trailing comma. If a trailing comma is acceptable, then the expression is easier: ^[a-zA-Z0-9 ,]+
Yes, when you want to catch comma separated things where a comma at the end is not legal, and the things match to $LONGSTUFF, you have to repeat $LONGSTUFF:
$LONGSTUFF(,$LONGSTUFF)*
If $LONGSTUFF is really long and contains comma repeated items itself etc., it might be a good idea to not build the regexp by hand and instead rely on a computer for doing that for you, even if it's just through string concatenation. For example, I just wanted to build a regular expression to validate the CPUID parameter of a XEN configuration file, of the ['1:a=b,c=d','2:e=f,g=h'] type. I... believe this mostly fits the bill: (whitespace notwithstanding!)
xend_fudge_item_re = r"""
e[a-d]x= #register of the call return value to fudge
(
0x[0-9A-F]+ | #either hardcode the reply
[10xks]{32} #or edit the bitfield directly
)
"""
xend_string_item_re = r"""
(0x)?[0-9A-F]+: #leafnum (the contents of EAX before the call)
%s #one fudge
(,%s)* #repeated multiple times
""" % (xend_fudge_item_re, xend_fudge_item_re)
xend_syntax = re.compile(r"""
\[ #a list of
'%s' #string elements
(,'%s')* #repeated multiple times
\]
$ #and nothing else
""" % (xend_string_item_re, xend_string_item_re), re.VERBOSE | re.MULTILINE)
Try ^(?!,)((, *)?([a-zA-Z0-9])\b)*$
Step by step description:
Don't match a beginning comma (good for the upcoming "loop").
Match optional comma and spaces.
Match characters you like.
The match of a word boundary make sure that a comma is necessary if more arguments are stacked in string.
Please use - ^((([a-zA-Z0-9\s]){1,45},)+([a-zA-Z0-9\s]){1,45})$
Here, I have set max word size to 45, as longest word in english is 45 characters, can be changed as per requirement
Can anyone tell how I can write regex for a string that take one or more alphanumeric character followed by an even number of digits?
Valid:
a11a1121
bbbb11a1121
Invalid:
a11a1
I have tried ^[a-zA-Z*20-9]*$ but it is always giving true.
Can you please help in this regard?
The regex that you have mentioned will search for any number of [either a-z, or A-Z or 2 or 0-9]
You can break down your requirement to groups and then handle it accordingly.
Like you require at least one character. so you start with ^([a-zA-Z]+)$
Then you need numbers in the multiple of 2. so you add ^([a-zA-Z]+(\d\d)+)$
Now you need any number of combination of these. So the exp becomes: ^([a-zA-Z]+(\d\d)+)*$
You can use online tools like regex101 for these purpose. The provided regex in action here
You can achieve it with this regexp: ^[a-z0-9]*[a-z]+([0-9]{2})*$
Explanation :
[a-z0-9]*[a-z]+: a string of at least one character terminated by a non digit one
([0-9]{2})*: an odd sequence of digits (0 or 2*n digits). If the even sequence cannot be null, use ([0-9]{2})+ instead.
I want to replace numbers in a string if it is more than 3 digits (Phone numbers should be replaced) and it should not replace the number if it is followed by $ and if the number has decimal points. I used the below expression.
"\d{3,}+(?!\$/\.)"
Issues I face are , it is replacing numbers that are more than ten digits as i want to replace some numbers which are some ID's with more than 10 digits. Also if a number has more than 3 digits after the decimal , those numbers are also getting replaced. I dont want a number to be replaced if it has decimal points. can some body help?
For Eg, say a number string "3452678916381914". Actually it has to be replaced. But the above regex not replacing that. For numbers like $1234,45.567 - those numbers shouldn't be replaced. But above regex replacing 45.567
use lookahead and lookbehind regex, 1st assert start word boundary is not precede by a $ or ., then assert end word boundary is not follow by a $ or .
It works for both example you provided, you might need to tweak a little bit to handle some corner case
(?<![\$\.])\b\d{3,}\b(?![\$\.])
see demo, it match the first 2 but not the rest
3452678916381914 # match
1234 56789 # match
$1234,45.567
$1234
12.345
12345.6678
123$
I need to replace all non-digit charaters in the string. For instance:
String: 987sdf09870987=-0\\\`42
Replaced: 987**sdf**09870987**=-**0**\\\`**42
That's all non-digit char-sequence wrapped into ** charaters. How can I do that with String::replaceAll()?
(?![0-9]+$).*
the regex doesn't match what I want. How can I do that?
(\\D+)
You can use this and replace by **$1**.See demo.
https://regex101.com/r/fM9lY3/2
You can use a negated character class for a non-digit and use the 0th group back-reference to avoid overhead with capturing groups (it is minimal here, but still is):
String x = "987sdf09870987=-0\\\\\\`42";
x = x.replaceAll("[^0-9]+", "**$0**");
System.out.println(x);
See demo on IDEONE. Output: 987**sdf**09870987**=-**0**\\\`**42.
Also, in Java regex, character classes look neater than multiple escape symbols, that is why I prefer this [^0-9]+ pattern meaning match 1 or more (+) symbols other than (because of ^) digits from 0 to 9 ([0-9]).
A couple of words about your (?![0-9]+$).* regex. It consists of a negative lookahead (?![0-9]+$) that checks if from the current position onward there are no digits only (if there are only digits up to the end of string, the match fails), and .* matching any characters but a newline. You can see example of what it is doing here. I do not think it can help you since you need to actually match non-numbers, not just check if digits are absent.