How would I use regex to allow certain characters? - java

Mainly I am using regex, and what my code does essentially, is sends a client return code if it does not contain the characters in regex. My problem is, I do not know how to allow spaces.
Currently this is my code, I would like to have allow a space, a-z, A-Z and 0-9.
if (username.length() < 1 || username.length() >= 13
|| !username.matches("[a-zA-Z_0-9]"))
{
session.getLoginPackets().sendClientPacket(3);
return;
}

The regex you're looking for is [a-zA-Z_0-9][a-zA-Z_0-9 ]* assuming you don't want a name to start with spaces.

I am quite sure you want to be Unicode compliant, so you should use
[\p{L}\p{Nd}][\p{L}\p{Nd} ]*
I created two character classes to ensure that it is not starting with a space, if this check is not needed, just remove the first class and change the quantifier of the second to a +.
From regular-expressions.info
\p{Nd} or \p{Decimal_Digit_Number}: a digit zero through nine in any script except ideographic scripts.
\p{L} or \p{Letter}: any kind of letter from any language.
More about Unicode in Java

use the \w metasequence (words, letters, underscores), plus a space (or \s to match tabs too), in a character class:
var pattern = #"[\w ]{1,12}"; //c#, validates length also.
edit: this seems to work for single spacing only, does not validate the length though:
var pattern = #"^(\w+\s?)+$";

Try this one
if (username.length() < 1 || username.length() >= 13
|| !username.matches("[a-zA-Z0-9 ]+"))
{
session.getLoginPackets().sendClientPacket(3);
return;
}

Related

What is the Regular Expression for special characters (everything except digits & alphabets) in java

I wish to convert all the special characters(everything except digits & alphabets) to the character '*' by using replaceAll()
string2=string1.replaceAll("[RegX]", "*")
What shall I write in place of RegX ?
The following should work : \W. It matches anything other than a letter, digit or underscore.
We just need to add the _ to pick it up also.
string2 = string1.replaceAll("[\\W_]", "*")
Demo
The regexp "[^\\p{IsLatin}\\p{Digit}]" will also recognize language specific characters like œŒŸÉÑÜÚÓöÄß.

Regex starts with Capital letter and having length < 70

I want to match a string starting with capital letter and have length < 70.
I tried this regex ([A-Z][a-zA-Z\s\/\-]*\:?\'?) to check if the string starts with capital letter. It is working fine. But to check length, I changed to (([A-Z][a-zA-Z\s\/\-]*\:?\'?){4,70}) and it is not working.
Though, I can check the length using length() method of string in if statement. Doing so would make if statement lengthy. I want to combine length checking in regex itself. I think it can be done in regex, but I am not sure how.
Update(Forgot to mention): String can have either of two symbol- :,' and only one of two will be there for either zero or one time in the string.
E.g : Acceptable String : Looking forwards to an opportunity, WORK EXPERIENCE: , WORK EXPERIENCE- , India's Prime Minister
UnAcceptable String : Work Experience:: , Manager's Educational Qualification- , work experience: , Education - 2014 - 2017 , Education (Graduation)
Kindly help me.
Thanks in advance.
You'll certainly need anchors and lookarounds
(?=^[^-':\n]*[-':]{0,1}[^-':\n]*$)^[A-Z][-':\w ]{4,70}$
Thus, a string between 5-71 characters will be matched, see a demo on regex101.com. Additionally, it checks for the presence of zero or one of your Special characters (with the help of lookarounds, that is).
I would add ^ and $ to your regex:
^[A-Z].{,69}$
should work. This means:
^ beginning of the string
[A-Z] any capital character (in English anyway)
.{0,69} up to 69 other characters
$ end of the string
for a total length of up to 70 characters...
why would the if statement be lengthy?
String str = "Scary";
if (str.length() < 70 && str.charAt(0) >= 'A') {
}
Specify a lookaround assertion at the start of the regex that asserts that it may contain between 4 and 70 characters :
(?=.{4,70}$)
You would write so :
String regex = "(?=.{4,70}$)[A-Z][a-zA-Z\\s\\/\\-]*\\:?\\'?";
Working REGEX =
/\A^[A-Z][A-Za-z]*\z/

Java Regular Expressions "\\d[a-zA-z]?"

I have this regular expression in Java : \\d[a-zA-z] ?
It should only allow letters followed by 1 or 0 character.
However when I match it with, for example, 17c or even 21, it does not result in a match. This is the code I use:
if (!(pattern.matches("\\d[a-zA-z]?"))) {
Throw error...;
}
Thanks for any help!
Well that's because 17c and 21 don't match "\d[a-zA-z]?". This expression matches one digit followed by a character.
Try to use "\d+[a-zA-z]*" which matches one or more digits followed by zero or more characters.
It should only allow letters followed by 1 or 0 character.
Try with this regex:
[a-zA-Z]+.?
Here:
[a-zA-Z]+ matches at least one letter (upper and lower case)
.? matches any character (except newline) zero or one times.
possibility \d+[a-zA-Z]? the one you are looking

Regular Expression Wildcard Matching

I have a list of about 120 thousand english words (basically every word in the language).
I need a regular expression that would allow searching through these words using wildcards characters, a.k.a. * and ?.
A few examples:
if the user searches for m?st*, it would match for example master or mister or mistery.
if the user searches for *ind (any word ending in ind), it would match wind or bind or blind or grind.
Now, most users (especially the ones who are not familiar with regular expressions) know that ? is a replacement for exactly 1 character, while * is a replacement for 0, 1 or more characters. I absolutely want to build my search feature based on this.
My questions is: How do I convert what the user types (m?st* for example) to a regular expression ?
I searched the web (obviously including this website) and all I could find were tutorials that tried to teach me too much or questions that were somewhat similar, but not enough as to provide an answer to my own problem.
All I could figure out was that I have to replace ? with .. So m?st* becomes m.st*. However, I have no idea what to replace * with.
Any help would be greatly appreciated. Thank you.
PS: I'm totally new to regular expressions. I know how powerful they can be, but I also know they can be very hard to learn. So I just never took the time do to it...
Unless you want some funny behaviour, I would recommend you use \w instead of .
. matches whitespace and other non-word symbols, which you might not want it to do.
So I would replace ? with \w and replace * with \w*
Also if you want * to match at least one character, replace it with \w+ instead. This would mean that ben* would match bend and bending but not ben - it's up to you, just depends what your requirements are.
Take a look at this library: https://github.com/alenon/JWildcard
It wraps all not wildcard specific parts by regex quotes, so no special chars processing needed:
This wildcard:
"mywil?card*"
will be converted to this regex string:
"\Qmywil\E.\Qcard\E.*"
If you wish to convert wildcard to regex string use:
JWildcard.wildcardToRegex("mywil?card*");
If you wish to check the matching directly you can use this:
JWildcard.matches("mywild*", "mywildcard");
Default wildcard rules are "?" -> ".", "" -> ".", but you can change the default behaviour if you wish, by simply defining the new rules.
JWildcard.wildcardToRegex(wildcard, rules, strict);
You can use sources or download it directly using maven or gradle from Bintray JCenter: https://bintray.com/yevdo/jwildcard/jwildcard
Gradle way:
compile 'com.yevdo:jwildcard:1.4'
Maven way:
<dependency>
<groupId>com.yevdo</groupId>
<artifactId>jwildcard</artifactId>
<version>1.4</version>
</dependency>
Replace ? with . and * with .*.
Here is a way to transform wildcard into regex:
Prepend all special characters ([{\^-=$!|]}).+ with \ - so they are matched as characters and don't make user experience unexpected. Also you could enclose it within \Q (which starts the quote) and \E (which ends it). Also see paragraph about security.
Replace * wildcard with \S*
Replace ? wildcard with \S?
Optionally: prepend pattern with ^ - this will enforce exact match with the beginning.
Optionally: append $ to pattern - this will enforce exact match with the end.
\S - stand for non-space character, which happens zero or more times.
Consider using reluctant (non-greedy) quantifiers if you have characters to match after * or +. This can be done by adding ? after * or + like this: \S*? and \S*+?
Consider security: user will send you code to run (because regex is kind of a code too, and user string is used as the regex). You should avoid passing unescaped regex to any other parts of application and only use to filter data retrieved by other means. Because if you do user can affect speed of your code by supplying different regex withing wildcard string - this could be used in DoS attacks.
Example to show execution speeds of similar patterns:
seq 1 50000000 > ~/1
du -sh ~/1
563M
time grep -P '.*' ~/1 &>/dev/null
6.65s
time grep -P '.*.*.*.*.*.*.*.*' ~/1 &>/dev/null
12.55s
time grep -P '.*..*..*..*..*.*' ~/1 &>/dev/null
31.14s
time grep -P '\S*.\S*.\S*.\S*.\S*\S*' ~/1 &>/dev/null
31.27s
I'd suggest against using .* simply because it can match anything, and usually things are separated with spaces.
Replace all '?' characters with '\w'
Replace all '*' characters with '\w*'
The '*' operator repeats the previous item '.' (any character) 0 or more times.
This assumes that none of the words contain '.', '*', and '?'.
This is a good reference
http://www.regular-expressions.info/reference.html
Replace * with .* (the regex equivalent of "0 or more of any character").
. is an expression that matches any one character, as you've discovered. In your hours of searching, you undoubtedly also stumbled across *, which is a repetition operator that when used after an expression matches the preceding expression zero or more times in a row.
So the equivalent to your meaning of * is putting these two together: .*. This then means "any character zero or more times".
See the Regex Tutorial on repetition operators.
function matchWild(wild,name)
{
if (wild == '*') return true;
wild = wild.replace(/\./g,'\\.');
wild = wild.replace(/\?/g,'.');
wild = wild.replace(/\\/g,'\\\\');
wild = wild.replace(/\//g,'\\/');
wild = wild.replace(/\*/g,'(.+?)');
var re = new RegExp(wild,'i');
return re.test(name);
}
This is what I use:
String wildcardToRegex(String wildcardString) {
// The 12 is arbitrary, you may adjust it to fit your needs depending
// on how many special characters you expect in a single pattern.
StringBuilder sb = new StringBuilder(wildcardString.length() + 12);
sb.append('^');
for (int i = 0; i < wildcardString.length(); ++i) {
char c = wildcardString.charAt(i);
if (c == '*') {
sb.append(".*");
} else if (c == '?') {
sb.append('.');
} else if ("\\.[]{}()+-^$|".indexOf(c) >= 0) {
sb.append('\\');
sb.append(c);
} else {
sb.append(c);
}
}
sb.append('$');
return sb.toString();
}
Special character list from https://stackoverflow.com/a/26228852/1808989.

How can I allow one space, A-Z, a-z, 0-9 in regex?

Mainly I am using regex, and what my code does essentially, is sends a client return code if it does not contain the characters in regex. My problem is, I do not know how to allow spaces.
Currently this is my code, I would like to have allow a space, a-z, A-Z and 0-9.
if (username.length() < 1 || username.length() >= 13
|| !username.matches("[a-zA-Z_0-9]"))
{
session.getLoginPackets().sendClientPacket(3);
return;
}
It depends on the specific regex class you're using as to what the magic sequences are, but usually either \s, or :space: will work. For some languages where space in the regex isn't ignored you can just put the space in directly: [a-zA-Z_0-9 ] will also work.
The biggest thing missing is the repetition of the regex. For example:
if (username.length() < 1 || username.length() >= 13 || !username.matches("^[a-zA-Z_0-9 ]+$")) {
session.getLoginPackets().sendClientPacket(3);
return;
}
The space character must come at the end of the character set (which I think is mostly what you were asking). The other symbols:
* '^' is 'the beginning of the entire string'
* '$' is 'the end of the string' (unless there are newlines...)
* '+' is 'what's in the [...] character set, at least once'
So, add the space at the end of [ ] and use a '+' at the end, and you should have it.
Worth noting you can do everything within the regex, e.g.:
if (!username.matches("^[a-zA-Z0-9_ ]{1,13}$") {
session.getLoginPackets().sendClientPacket(3);
return;
}
The {1,13} is a boundary saying "at least once, at most 13 times" (inclusive)
I also should point out I believe java supports some shortcuts, e.g. for any upper- or lower-cased letter: [A-z0-9_ ]
--
EDIT:
After several comments re: the 'single space', I have to admit I still am not reading the requirement that way.
If the trick is 'only allows one space', this should work:
if (username.length() < 1 || username.length() >= 13 || !username.matches("^[A-z0-9_]*\\s[A-z0-9_]*$")) {
session.getLoginPackets().sendClientPacket(3);
return;
}
Basically, you retain the size boundaries originally, then ensure it is made up of groups of letters, numbers, and underscore, with exactly one space.
Try this: [a-zA-Z_0-9]*( )?[a-zA-Z_0-9]*
This allows exactly one or no spaces within every combination of the characters a-z,A-Z,_,0-9.
try
!username.matches("[a-zA-Z_0-9 ]"))
OR
!username.matches("[a-zA-Z_0-9\s]"))
The reason \s is better is because it includes all the white space characters: e.g. tabs
Try this
^[a-zA-Z_0-9]+(?: [a-zA-Z_0-9]+)?$
the string starts with at least one alphanumeric and then optional a space and one or more alphanumerics till the end if the string.
See it here on Regexr
Since [a-zA-Z_0-9] is equivalent to \w you can simplify it to
^\w+(?: \w+)?$
If you want to be Unicode compliant, you should use the option Pattern.UNICODE_CHARACTER_CLASS see here for more details:
Enables the Unicode version of Predefined character classes and POSIX character classes.
means, \w matches all Unicode code points that have the property Letter and digit.

Categories

Resources