Problem coming up with appropriate Regex expression

Problem coming up with appropriate Regex expression - java

I need to match text similar to the following text in an if statement.
REG#John Smith#14102245862#7 johns road new york#John Anthony Smith
The expression is meant to match a REG keyword at the beginning of the string then username followed by an account number composed of numbers with no specific restriction on the number of digits, then the address and lastly the name of the individual the address is registered to.
The Regex expression I had come up with is not working. The regex expression is below:
^REG\#\w\#[0-9]\#\w\#\w
May you kindly assist in showing me where I went wrong and how to make it work.
Thank you in advance

The problem is that you don't use quantifiers (* or +) and space is not included within \w which stands for [A-Za-z0-9_]. The character # does not need to be escaped (at least as far as I know in Java). Try the following Regex:
^REG#[\w ]+#\d+#[\w ]+#[\w ]+
^REG matches the beginning of the string (REG) literally
# matches self literally
[\w ]+ stands for at least one word character or space
\d+ stands for at least one digit
In Java, don't forget the double escaping:
String regex = "^REG#[\\w ]+#\\d+#[\\w ]+#[\\w ]+";

Try ^REG\#.*?\#[0-9]*?\#.*?\#.* , the operator *? means repeat until next slice of expression, in that case, \#

Related

Expression to capture only 1 occurrence for a single character but multiple for others

I am trying to use the following regex to capture following values. This is for use in Java.
(\$|£|$|£)([ 0-9.]+)
Example values which I do want to be captured via above regex which works.
$100
$100.5
$100
$100.6
£200
£200.6
But the following as gets captured which is wrong. I only want to capture values when thereis only 1 dot in the text. Not multiples.
£200.15.
£200.6.6.6.6
Is there a way to select such that multiple periods doesn't count?
I can't do something like following cos that would affect the numbers too. Please advice.
(\$|£|$|£)([ 0-9.]{1})

You can use
(\$|£|$|£)(\d+(?:\.\d+)?)\b(?!\.)
See the regex demo.
In this regex, (\d+(?:\.\d+)?)\b(?!\.) matches
(\d+(?:\.\d+)?) - Group 1: one or more digits, then an optional occurrence of . and one or more digits
\b - a word boundary
(?!\.) - not immediately followed with a . char.
Another solution for Java (where the regex engine supports possessive quantifiers) will be
(\$|£|$|£)(\d++(?:\.\d+)?+)(?!\.)
See this regex demo. \d++ and (?:\.\d+)?+ contain ++ and ?+ possessive quantifiers that prevent backtracking into the quantified subpatterns.
In Java, do not forget to double the backslashes in the string literals:
String regex = "(\\$|£|$|£)(\\d++(?:\\.\\d+)?+)(?!\\.)";

You could try this
(\$|£|$|£)([0-9]+(?:\.[0-9]+)?)$
one or more digits followed by an optional dot and some digits and then the end of the string.
EDIT: some typos fixed
And it's not ok to delete the whole sentence obove, due to one word against my self. :(

Regular expression to determine if the String consists of more than 4 numbers

I want to extract URL strings from a log which looks like below:
<13>Mar 27 11:22:38 144.0.116.31 AgentDevice=WindowsDNS AgentLogFile=DNS.log PluginVersion=X.X.X.X Date=3/27/2019 Time=11:22:34 AM Thread ID=11BC Context=PACKET Message= Internal packet identifier=0000007A4843E100 UDP/TCP indicator=UDP Send/Receive indicator=Snd Remote IP=X.X.X.X Xid (hex)=9b01 Query/Response=R Opcode=Q Flags (hex)=8081 Flags (char codes)=DR ResponseCode=NOERROR Question Type=A Question Name=outlook.office365.com
I am looking to extract Name text which contains more that 5 digits.
A possible way suggested is (\d.*?){5,} but does not seem to work, kindly suggest another way get the field.
Example of string match:
outlook12.office345.com
outlook.office12345.com

You can look for the following expression:
Name=([^ ]*\d{5,}[^ ]*)
Explanation:
Name= look for anything that starts with "Name=", than capture if:
[^ ]* any number of characters which is not a space
\d{5,} then 5 digits in a row
[^ ]* then again, all digits up to a white space

This regular expression:
(?<=Name=).*\d{5,}.*?(?=\s|$)
would extract strings like outlook.office365666.com (with 5 or more consecutive digits) from your example input.
Demo: https://regex101.com/r/YQ5l2w/1

Try this pattern: (?=\b.*(?:\d[^\d\s]*){5,})\S*
Explanation:
(?=...) - positive lookahead, assures that pattern inside it is matched somewhere ahead :)
\b - word boundary
(?:...) - non-capturing group
\d[^\d\s]* - match digit \d, then match zero or more of any characters other than whitespace \s or digit \d
{5,} - match preceeding pattern 5 or more times
\S* - match zero or more of any characters other than space to match the string if assertion is true, but I think you just need assertion :)
Demo
If you want only consecutive numbers use simplified pattern (?=\b.*\d{5,})\S*.
Another demo
Of course, you have to add positive lookbehind: (?<=Name=) to assert that you have Name= string preceeding

Try this regex
([a-z0-9]{5,}.[a-z0-9]{5,})+.com
https://regex101.com/r/OzsChv/3
It Groups,
outlook.office365.com
outlook12.office345.com
also all url strings

Regex-How to prevent repeated special characters?

I don't have an experience on Regular Expressions. I need to a regular expression which doesn't allow to repeat of special characters (+-*/& etc.)
The string can contain digits, alphanumerics, and special characters.
This should be valid : abc,df
This should be invalid : abc-,df
i will be really appreciated if you can help me ! Thanks for advance.

Two solutions presented so far match a string that is not allowed.
But the tilte is How to prevent..., so I assume that the regex
should match the allowed string. It means that the regex should:
match the whole string if it does not contain 2
consecutive special characters,
not match otherwise.
You can achieve this putting together the following parts:
^ - start of string anchor,
(?!.*[...]{2}) - a negative lookahead for 2 consecutive special
characters (marked here as ...), in any place,
a regex matching the whole (non-empty) string,
$ - end of string anchor.
So the whole regex should be:
^(?!.*[!##$%^&*()\-_+={}[\]|\\;:'",<.>\/?]{2}).+$
Note that within a char class (between [ and ]) a backslash
escaping the following char should be placed before - (if in
the middle of the sequence), closing square bracket,
a backslash itself and / (regex terminator).
Or if you want to apply the regex to individual words (not the whole
string), then the regex should be:
\b(?!\S*[!##$%^&*()\-_+={}[\]|\\;:'",<.>\/?]{2})\S+

[\,\+\-\*\/\&]{2,} Add more characters in the square bracket if you want.
Demo https://regex101.com/r/CBrldL/2

Use the following regex to match the invalid string.
[^A-Za-z0-9]{2,}

[^\w!\s]{2,} This would be a shortest version to match any two consecutive special characters (ignoring space)
If you want to consider space, please use [^\w]{2,}

Regex to detect number within String

I'm confronted with a String:
[something] -number OR number [something]
I want to be able to cast the number. I do not know at which position is occures. I cannot build a sub-string because there's no obvious separator.
Is there any method how I could extract the number from the String by matching a pattern like
[-]?[0..9]+
, where the minus is optional? The String can contain special characters, which actually drives me crazy defining a regex.

-?\b\d+\b
That's broken down by:
-? (optional minus sign)
\b word boundary
\d+ 1 or more digits
[EDIT 2] - nod to Alan Moore
Unfortuantely Java doesn't have verbatim strings, so you'll have to escape the Regex above as:
String regex = "-?\\b\\d+\\b"
I'd also recommend a site like http://regexlib.com/RETester.aspx or a program like Expresso to help you test and design your regular expressions
[EDIT] - after some good comments
If haven't done something like *?(-?\d+).* (from #Voo) because I wasn't sure if you wanted to match the entire string, or just the digits. Both versions should tell you if there are digits in the string, and if you want the actual digits, use the first regex and look for group[0]. There are clever ways to name groups or multiple captures, but that would be a complicated answer to a straight forward question...

Help with regex

I'm constructing a regex which will accept at least 1 alpha numerical character and any number of spaces.
Right now I've got...[A-Za-z0-9]+[ \t\r\n]* which I understand to be at least 1 alphanumeric OR at least 1 space. How would I fix this?
EDIT: To answer the comments below I want it to accept strings which contain ATLEAST 1 alphanumeric AND any number of (including no) spaces. Right now it will accept JUST a whitespace.
EDIT2: To clarify, I don't want the any number of whitespace (including 0) to be accepted unless there is at least 1 alphanumeric character

\s*\p{Alnum}[\p{Alnum}\s]*
Your regex, [A-Za-z0-9]+[ \t\r\n]*, requires the string to start with a letter or digit (or, more accurately, it doesn't start matching until it sees one). Adding \s* allows the match to start with whitespace, but you still won't match any alphanumerics after the first whitespace character that follows an alphanumeric (for example, it won't match the xyz in abc xyz. Changing the trailing \s* to [\p{Alnum}\s]* fixes that problem.
On a side note, \p{Alnum} is exactly equivalent to [A-Za-z0-9] in Java, which is not the case in all regex flavors. I used \p{Alnum}, not just because it's shorter, but because it gives more protection from typos like [A-z] (which is syntactically valid, but almost certainly not what the author really meant).
EDIT: Performance should be considered, too. I originally included a + after the first \p{Alnum}, but I realized that wasn't a good idea. If this were part of a longer regex, and the regex didn't match right away, it could end up wasting a lot of time trying to match the same groups of characters with \p{Alnum}+ or [\p{Alnum}\s]*. The leading \s* is okay, though, because \s doesn't match any of the characters that \p{Alnum} matches.

Any one or more word char zero or more whitespace
\w+\s*

Hey try this ([^\s]+\s*) [^\s] means catch everything that is not white space, while \s* means that an white space is optional (if you really want at least one white space put + instead of )
Edit: sory mine catch everithing not only alphanumeric (put ([a-zA-Z0-9]+\s) for alphanumeric)

This should do the trick:
\s*\p{Alnum}+\s*
\p{Alnum} is an alphanumeric character: [\p{Alpha}\p{Digit}]
* says "zero or more times"
+ says "at least one" (not "or" as you seem to believe, or is written |)
| means "or"
\s is a whitespace character: [ \t\n\x0B\f\r]
EDIT: To answer the comments below I want it to accept strings which contain AT LEAST 1 alphanumeric AND any number of (including no) spaces.
The pattern I suggested requires at least one alpha numeric character.
EDIT2: To clarify, I don't want the any number of whitespace (including 0) to be accepted unless there is at least 1 alphanumeric character
The pattern I suggested will not accept only white space characters only.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Problem coming up with appropriate Regex expression - java

Try ^REG\#.?\#[0-9]?\#.?\#. , the operator *? means repeat until next slice of expression, in that case, \#

Related

Expression to capture only 1 occurrence for a single character but multiple for others

Regular expression to determine if the String consists of more than 4 numbers

Regex-How to prevent repeated special characters?

Regex to detect number within String

Help with regex

Categories

Resources