Regex to avoid some characters and some character combinations

Regex to avoid some characters and some character combinations - java

I have to create a text box where user input comment and I have to validate that, this input does not contains below characters(character combination).
:|, &, ; , $ , % , # , ' , " , \' , \" , <> , (), +, CR, LF, \
The list above is comma delimited, so if two characters appear between
a set of commas, it’s the character combination that is potentially
malicious, not the character in isolation
I tried to create regex for this and tried Positive Lookahead also, but not working anything for me. I have gone through some earlier questions also, but not found solution for my query.
I am able to validate single malicious characters but not the combination.

As for characters, that's really simple. You can just specify what type of chars are not allowed in the string by using [^], in your case, [^&;$%#\'\"+\\]
[^&;$%#\'\"+\\]* will match a string that doesn't contain the mentioned symbols.
As for the combinations, regex has negative lookahead for that. Before the engine starts matching something, it can test if there aren't patterns present in the string. Syntax: (?!.*thing1|.*thing2|...) (the .* is needed so that the whole string is checked, not only the next word, so (?!.*:\||.*<>|.*\(\)|.*CR|.*LF)
All together: ^(?!.*:\||.*<>|.*\(\)|.*CR|.*LF)[^&;$%#\'\"+\\]*$

Related

Regex pattern matching with multiple strings

Forgive me. I am not familiarized much with Regex patterns.
I have created a regex pattern as below.
String regex = Pattern.quote(value) + ", [NnoneOoff0-9\\-\\+\\/]+|[NnoneOoff0-9\\-\\+\\/]+, "
+ Pattern.quote(value);
This regex pattern is failing with 2 different set of strings.
value = "207e/160";
Use Case 1 -
When channelStr = "207e/160, 149/80"
Then channelStr.matches(regex), returns "true".
Use Case 2 -
When channelStr = "207e/160, 149/80, 11"
Then channelStr.matches(regex), returns "false".
Not able to figure out why? As far I can understand it may be because of the multiple spaces involved when more than 2 strings are present with separated by comma.
Not sure what should be correct pattern I should write for more than 2 strings.
Any help will be appreciated.

If you print your pattern, it is:
\Q207e/160\E, [NnoneOoff0-9\-\+\/]+|[NnoneOoff0-9\-\+\/]+, \Q207e/160\E
It consists of an alternation | matching a mandatory comma as well on the left as on the right side.
Using matches(), should match the whole string and that is the case for 207e/160, 149/80 so that is a match.
Only for this string 207e/160, 149/80, 11 there are 2 comma's, so you do get a partial match for the first part of the string, but you don't match the whole string so matches() returns false.
See the matches in this regex demo.
To match all the values, you can use a repeating pattern:
^[NnoeOf0-9+/-]+(?:,\h*[NnoeOf0-90+/-]+)*$
^ Start of string
[NnoeOf0-9\\+/-]+
(?: Non capture group
,\h* Match a comma and optional horizontal whitespace chars
[NnoeOf0-90-9\\+/-]+ Match 1+ any of the listed in the character class
)* Close the non capture group and optionally repeat it (if there should be at least 1 comma, then the quantifier can be + instead of *)
$ End of string
Regex demo
Example using matches():
String channelStr1 = "207e/160, 149/80";
String channelStr2 = "207e/160, 149/80, 11";
String regex = "^[NnoeOf0-9+/-]+(?:,\\h*[NnoeOf0-90+/-]+)*$";
System.out.println(channelStr1.matches(regex));
System.out.println(channelStr2.matches(regex));
Output
true
true
Note that in the character class you can put - at the end not having to escape it, and the + and / also does not have to be escaped.

You can use regex101 to test your RegEx. it has a description of everything that's going on to help with debugging. They have a quick reference section bottom right that you can use to figure out what you can do with examples and stuff.
A few things, you can add literals with \, so \" for a literal double quote.
If you want the pattern to be one or more of something, you would use +. These are called quantifiers and can be applied to groups, tokens, etc. The token for a whitespace character is \s. So, one or more whitespace characters would be \s+.
It's difficult to tell exactly what you're trying to do, but hopefully pointing you to regex101 will help. If you want to provide examples of the current RegEx you have, what you want to match and then the strings you're using to test it I'll be happy to provide you with an example.

^(?:[NnoneOoff0-9\\-\\+\\/]+ *(?:, *(?!$)|$))+$
^ Start
(?: ... ) Non-capturing group that defines an item and its separator. After each item, except the last, the separator (,) must appear. Spaces (one, several, or none) can appear before and after the comma, which is specified with *. This group can appear one or more times to the end of the string, as specified by the + quantifier after the group's closing parenthesis.
Regex101 Test

Complicated regex and possible simple way to do it [duplicate]

I don't write many regular expressions so I'm going to need some help on the one.
I need a regular expression that can validate that a string is an alphanumeric comma delimited string.
Examples:
123, 4A67, GGG, 767 would be valid.
12333, 78787&*, GH778 would be invalid
fghkjhfdg8797< would be invalid
This is what I have so far, but isn't quite right: ^(?=.*[a-zA-Z0-9][,]).*$
Any suggestions?

Sounds like you need an expression like this:
^[0-9a-zA-Z]+(,[0-9a-zA-Z]+)*$
Posix allows for the more self-descriptive version:
^[[:alnum:]]+(,[[:alnum:]]+)*$
^[[:alnum:]]+([[:space:]]*,[[:space:]]*[[:alnum:]]+)*$ // allow whitespace
If you're willing to admit underscores, too, search for entire words (\w+):
^\w+(,\w+)*$
^\w+(\s*,\s*\w+)*$ // allow whitespaces around the comma

Try this pattern: ^([a-zA-Z0-9]+,?\s*)+$
I tested it with your cases, as well as just a single number "123". I don't know if you will always have a comma or not.
The [a-zA-Z0-9]+ means match 1 or more of these symbols
The ,? means match 0 or 1 commas (basically, the comma is optional)
The \s* handles 1 or more spaces after the comma
and finally the outer + says match 1 or more of the pattern.
This will also match
123 123 abc (no commas) which might be a problem
This will also match 123, (ends with a comma) which might be a problem.

Try the following expression:
/^([a-z0-9\s]+,)*([a-z0-9\s]+){1}$/i
This will work for:
test
test, test
test123,Test 123,test
I would strongly suggest trimming the whitespaces at the beginning and end of each item in the comma-separated list.

You seem to be lacking repetition. How about:
^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$
I'm not sure how you'd express that in VB.Net, but in Python:
>>> import re
>>> x [ "123, $a67, GGG, 767", "12333, 78787&*, GH778" ]
>>> r = '^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$'
>>> for s in x:
... print re.match( r, s )
...
<_sre.SRE_Match object at 0xb75c8218>
None
>>>>
You can use shortcuts instead of listing the [a-zA-Z0-9 ] part, but this is probably easier to understand.
Analyzing the highlights:
[a-zA-Z0-9 ]+ : capture one or more (but not zero) of the listed ranges, and space.
(?:[...]+,)* : In non-capturing parenthesis, match one or more of the characters, plus a comma at the end. Match such sequences zero or more times. Capturing zero times allows for no comma.
[...]+ : capture at least one of these. This does not include a comma. This is to ensure that it does not accept a trailing comma. If a trailing comma is acceptable, then the expression is easier: ^[a-zA-Z0-9 ,]+

Yes, when you want to catch comma separated things where a comma at the end is not legal, and the things match to $LONGSTUFF, you have to repeat $LONGSTUFF:
$LONGSTUFF(,$LONGSTUFF)*
If $LONGSTUFF is really long and contains comma repeated items itself etc., it might be a good idea to not build the regexp by hand and instead rely on a computer for doing that for you, even if it's just through string concatenation. For example, I just wanted to build a regular expression to validate the CPUID parameter of a XEN configuration file, of the ['1:a=b,c=d','2:e=f,g=h'] type. I... believe this mostly fits the bill: (whitespace notwithstanding!)
xend_fudge_item_re = r"""
e[a-d]x= #register of the call return value to fudge
(
0x[0-9A-F]+ | #either hardcode the reply
[10xks]{32} #or edit the bitfield directly
)
"""
xend_string_item_re = r"""
(0x)?[0-9A-F]+: #leafnum (the contents of EAX before the call)
%s #one fudge
(,%s)* #repeated multiple times
""" % (xend_fudge_item_re, xend_fudge_item_re)
xend_syntax = re.compile(r"""
\[ #a list of
'%s' #string elements
(,'%s')* #repeated multiple times
\]
$ #and nothing else
""" % (xend_string_item_re, xend_string_item_re), re.VERBOSE | re.MULTILINE)

Try ^(?!,)((, *)?([a-zA-Z0-9])\b)*$
Step by step description:
Don't match a beginning comma (good for the upcoming "loop").
Match optional comma and spaces.
Match characters you like.
The match of a word boundary make sure that a comma is necessary if more arguments are stacked in string.

Please use - ^((([a-zA-Z0-9\s]){1,45},)+([a-zA-Z0-9\s]){1,45})$
Here, I have set max word size to 45, as longest word in english is 45 characters, can be changed as per requirement

Java, Regex, strip unwanted characters [trailing, leading, between]

i need help for an regular expression to strip unwanted characters from an String (in Java).
I solved this issue with 4 regular expression following each other.
The replace will be called many times [peeks: 50+ times/sec] it and decreases performance.
But i think it sure possible with an single expression, so the performance will be increased a little.
The TestString is
" ! ... my-Cruc i#l_\\/Disp lay.Na#m3 ?;()! "
The tasks i like to perform with regex
Remove all leading non-alpha charcters – [Beginning of String]
Remove all trailing non-alphanumeric characters – [End of String]
Remove all non-alphanumeric characters(except [_-.]) between
So the result will be
my-Cruil_Display.Nam3
The Problem is the switch between, the built-in patterns Alnum and alpha, depending on position in string (beginning, end) and the exception characters [_-.] between them.
I tried this many times in the last few days, but i do not get it to work.
Removing leading non-alpha characters is working with regex
^([^\\p{Alpha}]+)?
But if i append the „between“ it doesnt work longer anything
Removing trailing non-alpha charcter with regex
([^\\p{Alnum}]+$)
is working , but not im combination with all other regex
One of the last tries are
(^[^\\p{Alpha}]+)?[^\\p{Alnum}\\._-]+([^\\p{Alnum}]+$)
Can anyone help to get this working

You may use
^\P{Alpha}+|\P{Alnum}+$|[^\p{Alnum}_.-]
Java:
s = s.replaceAll("^\\P{Alpha}+|\\P{Alnum}+$|[^\\p{Alnum}_.-]", "");
Or, to make it Unicode aware, add the (?U) flag:
s = s.replaceAll("(?U)^\\P{Alpha}+|\\P{Alnum}+$|[^\\p{Alnum}_.-]", "");
Details
^\P{Alpha}+ - any 1 or more chars other than alphabetic chars at the start of the string
| - or
\P{Alnum}+$ - any 1 or more chars other than alphanumeric chars at the end of the string
| - or
[^\p{Alnum}_.-] - any char other than alphanumeric, _, . and - chars anywhere in the string
See the regex demo.

vaadin RegexpValidator regex

I am using vaadins builtin RegexpValidator to check for valid form fields.I have a description field that can have any charater as long as it's not empty.Initialy i was using ".+" which pretty much worked but when i converted that field from TextField to TextArea it didn't match my strings anymore as ".+" doesn't check for newlines or blank spaces.
I have tried doing "(.|\n|\r)+" but that includes writing a blank space or a newline aswell.
I need only to make sure that i have entered atleast one character it doesn't matter what.
Normally with regex you can check for blanks with "^\s*$" but vaadins RegexpValidator result must match your string so what i am looking is basicaly the opposite of "^\s*$" ? but including atleast one character? RegexpValidator is really confusing me

If you want to allow a pure whitespace text you can use
[\s\S]+
\s a whitespace character
\S a non whitespace character
that would match at least one character and would also match newlines, because they are included in \s
If you want to have at least one Non-whitespace character you can use
^\s*\S
That would check for 0 or more whitespaces at the start of the string (this would cover leading newlines) and it would be successful when it finds the first non whitespace.

Removing all whitespace characters except for " "

I consider myself pretty good with Regular Expressions, but this one is appearing to be surprisingly tricky: I want to trim all whitespace, except the space character: ' '.
In Java, the RegEx I have tried is: [\s-[ ]], but this one also strips out ' '.
UPDATE:
Here is the particular string that I am attempting to strip spaces from:
project team manage key
Note: it would be the characters between "team" and "manage". They appear as a long space when editing this post but view as a single space in view mode.

Try using this regular expression:
[^\S ]+
It's a bit confusing to read because of the double negative. The regular expression [\S ] matches the characters you want to keep, i.e. either a space or anything that isn't a whitespace. The negated character class [^\S ] therefore must match all the characters you want to remove.

Using a Guava CharMatcher:
String text = ...
String stripped = CharMatcher.WHITESPACE.and(CharMatcher.isNot(' '))
.removeFrom(text);
If you actually just want that trimmed from the start and end of the string (like String.trim()) you'd use trimFrom rather than removeFrom.

There's no subtraction of character classes in Java, otherwise you could use [\s--[ ]], note the double dash. You can always simulate set subtraction using intersection with the complement, so
[\s&&[^ ]]
should work. It's no better than [^\S ]+ from the first answer, but the principle is different and it's good to know both.

I solved it with this:
anyString.replace(/[\f\t\n\v\r]*/g, '');
It is just a collection of all possible white space characters excluding blank (so actually
\s without blanks). It includes tab, carriage return, new line, vertical tab and form feed characters.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex to avoid some characters and some character combinations - java

Related

Regex pattern matching with multiple strings

Complicated regex and possible simple way to do it [duplicate]

Java, Regex, strip unwanted characters [trailing, leading, between]

vaadin RegexpValidator regex

Removing all whitespace characters except for " "

Categories

Resources