REGEX Expression to exclude toll-free numbers - java

I have this regex expression written that should extract toll-free numbers but when there is a number like 1-800-343-2432 (when there is a 1 before the 800 stuff) it doesn't work
(?!(\$|#|800|855|866|877|888))\(?[\\s.-]*([0-9]{3})?[\\s.-]*\)?[\\s.-]*[0-9]{3}[\\s.-]*[0-9]{4}
how can i modify this expression to not take numbers like 1-866-343-1232 too ?!

Without checking your full regex you can use this regex to block 1-888:
(?!(?:1-)?(\\$|#|800|855|866|877|888))\(?[\\s.-]*([0-9]{3})?[\\s.-]*\)?[\\s.-]*[0-9]{3}[\\s.-]*[0-9]{4}

Prepend (1-)? to your regex. This will work for optional 1-.

Modifying your regular expression:
(\+)?(1-)?\(?(\\$|#|800|855|866|877|888)\)?[\\s.-]*([0-9]{3})?[\\s.-]*\)?[\\s.-]*[0-9]{3}[\\s.-]*[0-9]{4}
The key differences here are the following:
(\+)? :: A lazy quantifier `?' matches a + character if it takes place prior to the 1. Many numbers display like +1-800-343-2432
(1-)? :: Matches a 1 followed by a - character. The ? is a lazy quantifier that matches the 1- if it exists.
And I also added \(? and \)? which allow you to match on numbers that present in the format +1-(800)-343-2432

Related

Regex match optional string greedy inbetween two random strings

I am looking for a way to match an optional ABC in the following strings.
Both strings should be matched either way, if ABC is there or not:
precedingstringwithundefinedlenghtABCsubsequentstringwithundefinedlength
precedingstringwithundefinedlenghtsubsequentstringwithundefinedlength
I've tried
.*(ABC).*
which doesn't work for an optional ABC but making ABC non greedy doesn't work either as the .* will take all the pride:
.*(ABC)?.*
This is NOT a duplicate to e.g. Regex Match all characters between two strings as I am looking for a certain string inbetween two random string, kind of the other way around.
You can use
.*(ABC).*|.*
This works like this:
.*(ABC).* pattern is searched for first, since it is the leftmost part of an alternation (see "Remember That The Regex Engine Is Eager"), it looks for any zero or more chars other than line break chars as many as possible, then captures ABC into Group 1 and then matches the rest of the line with the right-hand .*
| - or
.* - is searched for if the first alternation part does not match.
Another solution without the need to use alternation:
^(?:.*(ABC))?.*
See this regex demo. Details:
^ - start of string
(?:.*(ABC))? - an optional non-capturing group that matches zero or more chars other than line break chars as many as possible and then captures into Group 1 an ABC char sequence
.* - zero or more chars other than line break chars as many as possible.
I’ve come up with an answer myself:
Using the OR operator seems to work:
(?:(?:.*(ABC))|.*).*
If there’s a better way, feel free to answer and I will accept it.
You could use this regex: .*(ABC){0,1}.*. It means any, optional{min,max}, any. It is easier to read. I can' t say if your solution or mine is faster due to the processing speed.
Options:
{value} = n-times
{min,} = min to infinity
{min,max} = min to max
.+([ABC])?.+ should do the job

Complicated regex and possible simple way to do it [duplicate]

I don't write many regular expressions so I'm going to need some help on the one.
I need a regular expression that can validate that a string is an alphanumeric comma delimited string.
Examples:
123, 4A67, GGG, 767 would be valid.
12333, 78787&*, GH778 would be invalid
fghkjhfdg8797< would be invalid
This is what I have so far, but isn't quite right: ^(?=.*[a-zA-Z0-9][,]).*$
Any suggestions?
Sounds like you need an expression like this:
^[0-9a-zA-Z]+(,[0-9a-zA-Z]+)*$
Posix allows for the more self-descriptive version:
^[[:alnum:]]+(,[[:alnum:]]+)*$
^[[:alnum:]]+([[:space:]]*,[[:space:]]*[[:alnum:]]+)*$ // allow whitespace
If you're willing to admit underscores, too, search for entire words (\w+):
^\w+(,\w+)*$
^\w+(\s*,\s*\w+)*$ // allow whitespaces around the comma
Try this pattern: ^([a-zA-Z0-9]+,?\s*)+$
I tested it with your cases, as well as just a single number "123". I don't know if you will always have a comma or not.
The [a-zA-Z0-9]+ means match 1 or more of these symbols
The ,? means match 0 or 1 commas (basically, the comma is optional)
The \s* handles 1 or more spaces after the comma
and finally the outer + says match 1 or more of the pattern.
This will also match
123 123 abc (no commas) which might be a problem
This will also match 123, (ends with a comma) which might be a problem.
Try the following expression:
/^([a-z0-9\s]+,)*([a-z0-9\s]+){1}$/i
This will work for:
test
test, test
test123,Test 123,test
I would strongly suggest trimming the whitespaces at the beginning and end of each item in the comma-separated list.
You seem to be lacking repetition. How about:
^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$
I'm not sure how you'd express that in VB.Net, but in Python:
>>> import re
>>> x [ "123, $a67, GGG, 767", "12333, 78787&*, GH778" ]
>>> r = '^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$'
>>> for s in x:
... print re.match( r, s )
...
<_sre.SRE_Match object at 0xb75c8218>
None
>>>>
You can use shortcuts instead of listing the [a-zA-Z0-9 ] part, but this is probably easier to understand.
Analyzing the highlights:
[a-zA-Z0-9 ]+ : capture one or more (but not zero) of the listed ranges, and space.
(?:[...]+,)* : In non-capturing parenthesis, match one or more of the characters, plus a comma at the end. Match such sequences zero or more times. Capturing zero times allows for no comma.
[...]+ : capture at least one of these. This does not include a comma. This is to ensure that it does not accept a trailing comma. If a trailing comma is acceptable, then the expression is easier: ^[a-zA-Z0-9 ,]+
Yes, when you want to catch comma separated things where a comma at the end is not legal, and the things match to $LONGSTUFF, you have to repeat $LONGSTUFF:
$LONGSTUFF(,$LONGSTUFF)*
If $LONGSTUFF is really long and contains comma repeated items itself etc., it might be a good idea to not build the regexp by hand and instead rely on a computer for doing that for you, even if it's just through string concatenation. For example, I just wanted to build a regular expression to validate the CPUID parameter of a XEN configuration file, of the ['1:a=b,c=d','2:e=f,g=h'] type. I... believe this mostly fits the bill: (whitespace notwithstanding!)
xend_fudge_item_re = r"""
e[a-d]x= #register of the call return value to fudge
(
0x[0-9A-F]+ | #either hardcode the reply
[10xks]{32} #or edit the bitfield directly
)
"""
xend_string_item_re = r"""
(0x)?[0-9A-F]+: #leafnum (the contents of EAX before the call)
%s #one fudge
(,%s)* #repeated multiple times
""" % (xend_fudge_item_re, xend_fudge_item_re)
xend_syntax = re.compile(r"""
\[ #a list of
'%s' #string elements
(,'%s')* #repeated multiple times
\]
$ #and nothing else
""" % (xend_string_item_re, xend_string_item_re), re.VERBOSE | re.MULTILINE)
Try ^(?!,)((, *)?([a-zA-Z0-9])\b)*$
Step by step description:
Don't match a beginning comma (good for the upcoming "loop").
Match optional comma and spaces.
Match characters you like.
The match of a word boundary make sure that a comma is necessary if more arguments are stacked in string.
Please use - ^((([a-zA-Z0-9\s]){1,45},)+([a-zA-Z0-9\s]){1,45})$
Here, I have set max word size to 45, as longest word in english is 45 characters, can be changed as per requirement

Extract exactly n digits in a sentence using REGEX

Example
The no.s 1234 65
Input: n
For n=4, the output should be 1234
For n=2, the output should be : 65 (not 12)
Tried \d{n} which gives 12 and \d{n,} gives 1234 but i want the exact matching one.
Pattern p = Pattern.compile("//\d{n,}");
you need negative lookaround assertion: (?<!..): negative look behind, and (?!..): negative look ahead : regex101
(?<!\d)\d{4}(?!\d)
however not all regex engine supports them, maybe a work around may match also the preceeding character and following character (contrary to look-around which are 0 width matches), (\D matches all excpet a digit)
(?:^|\D)(\d{4})(?:\D|$)
I think what you meant is the \b character.
Hence, the regex you're looking for would be (for n=2):
\b\d{2}\b
From what I understand, you're looking for a regex that will match a number in a string which has n digits, taking into into account the spacing between the numbers. If that's the case, you're looking for something like this:
\b\d{4}\b
The \b will ensure the match is constrained to the start/end of a 'word' where a word is the boundary between anything matched by \w (which includes digits) and anything matched by the opposite, \W (which includes spaces).
I don't code in java but I can try to answer this using regex in general.
If your number is in the format d1d2d3d4 d5d6 and you want to extract digits d5d6, create 3 groups as r'([0-9]+)("/s")([0-9]+)' – each set of parenthesis () represent one group. Now, extract the third group only in another object which is your required output.

Regular Expression to replace integers with floats

We're trying to replace integer values with float values in a String, for example:
#var1 * #var2/ 1+100 - 2 + 1.5 - .5
The regular expression should match 1, 100 and 2, but not numbers which are already floats, eg 1.5 and .5
I've gotten as far as /[^\w](\d+)/, which finds digits by themselves.
Now, how do I exclude numbers from this regular expression, that are followed by \.?\d+?
The RegEx should work in Java or Actionscript 3.
Regular Expression Test
This will work in Java: /(?<![.\w])\d+(?![.\w])/. It uses both lookahead and lookbehind to stop matching digits that are either preceeded or succeeded by a dot/letter.
Why no trying lookahead? I believe it works with Java, no clue about ActionScript.
[^\w](\d+)(?![.]\d+)/
Would match only those sequences of digits not immediately followed by a dot integer(s).
You can use a negative look ahead for this
(?<!\.\d+)
would exclude this, but you need to combine this with an anchor otherwise you will get a parital match.
/(?<!\B|\.)(\d+)(?!\.\d+)\b/
You should also change the non-word character before the digits. I used here a negative lookbehind assertion (?<!\B|\.). it ensures that there is no dot before the digit or not a non word boundary (double negation to match on a word boundary.)
See it here on Regexr
If your regex syntax supports lookahead, you can use that. Java does; not sure about AS3.
/[^\w.]\d+(?!\.)/
Note that in this case you'd want to use the entire matched string in the replacement.
My suggestion is
\b(?<!\.)\d+(?!\.)\b

Regex for XXYXX

I'm looking for a regular expression for Java to find digits structured like this:
XXYXX or XYYYX
So results could be 66266 or 71117
Thanks for your help!
Try this regex:
input.matches("((\\d)\\1\\d\\1\\1|(\\d)(\\d)\\3\\3\\2)");
It uses back references to handle repeating numbers and the regex "or (A|B)
Note that this regex will match 99999, which is allowable by your definition (ie X and Y may be the same digit).
Also note the escaped back slashes \\ for specifying a single backslash in the regex in a java String.
I'd suggest (\d)\1\d\1\1 for the first case and (\d)(\d)\2\2\1 for the second. Mixing them both with non-capturing groups:
(?:(\d)\1\d\1\1)|(?:(\d)(\d)\2\2\1)
Not sure how Java plays with Regex though.

Categories

Resources