Java regex help - java

A string must not include spaces or special characters. Only a-z, A-Z, 0-9, the underscore, and the period characters are allowed.
How do I achieve this?
Update:
All the solutions posted worked for me.
Thanks everyone for helping out.

if (!myString.matches("^[a-zA-Z0-9._]*$")) {
// fail ...
}
or you can use the \w character class (shorthand for [a-zA-Z_0-9])
if (!myString.matches("^[\\w.]*$")) {
// fail ...
}

I am certain by the time I finish typing this, you will have received you answer. So here is some genuine advice to go with it - Take the time (hour or so) to learn the basics of regular expressions.
You will be surprised how often they show up in solutions to 'real world' problems.
Great testing resource -> http://gskinner.com/RegExr/

A different solution:
text = text.replaceAll("[\\w.]", "");
It removes the unwanted characters instead of just detecting them.
From Sun's website:
\w A word character: [a-zA-Z_0-9]

"[\\w,]+" should do the trick

You could simply delete all the characters that don't match the set [a-zA-Z0-9_.]. Alternatively you could replace characters not in the set with a valid character (e.g. the underscore). Finally you could altogether reject any string that does not consist solely of characters in the permitted set.

You can either make a "all characters must be one of these" regular expression or simply ask if any of the characters you dislike are present at all and if so reject the string. I believe the latter will be the easiest to write and understand later.

Related

Java regex not matching German "Umlaut" OR underscore

I'm trying to "play around" with some REST APIs and Java code.
As I am using German language mainly, I already managed it to get the Apache HTTP Client to work with UTF-8 encoding to make sure "Umlaut" are handled the right way.
Still I can't get my regex to match my words correctly.
I try to find words/word combinations like "Büro_Licht" from string like ..."type":"Büro_Licht"....
Using regex expression ".*?type\":\"(\\w+).*?" returns "B" for me, as it doesn't recognize the "ü" as a word character. Clearly, as \w is said to be [a-z A-Z 0-9]. Within strings with no special characters I get the full "Office_Light" meanwhile.
So I tried another hint mentioned here in like nearly the same question (which I could not comment, because I lack of reputation points).
Using regex expression ".*?type\":\"(\\p{L}).*?" returns "Büro" for me. But here again it cuts on the underscore for a reason I don't understand.
Is there a nice way to combine both expressions to get the "full" word including underscores and special characters?
If you have to keep using regex, which is not a great tool for parsing JSON, try \p{L}_. In your case it would be:
String regex = ".*?type\":\"[\\p{L}_]+\"";
With on-line example: https://regex101.com/r/57oFD5/2
\p{L} matches any kind of letter from any language
_ matches the character _ literally (case sensitive)
This will get hectic if you need to support other languages, whitespaces and various other UTF code points. For example do you need to support random number of white spaces around :? Take a look at this answer on removing emojis, there are many corner cases.

Regex to match a period but not if there is a period on either side (Java)

I'm looking for a regex that will match a period character, ONLY if none of that period's surrounding characters are also periods.
Fine by me... leave! FAIL
Okay.. You win. SUCCEED
Okay. SUCCEED //Note here, the period is the last char in the string.
I was thinking do:
[^\\.*]\\.
But that is just wrong and probably not at all in the right direction. I hope this question helps others in the same situation as well.
Thanks.
You need to wrap the dot in negative look arounds:
(?<![.])[.](?![.])
I prefer [.] over \\., because:
It's easier to read - there are too many back slashes in java literals already
[.] looks a bit like an X wing fighter from Star Wars ™
You can use negative look ahead and look behind or this alternative regex:
String regex = "(^\\.[^\\.]|[^\\.]\\.[^\\.]|[^\\.]\\.$)";
The first alternative check the beginning ^ of the string (if it can start with a dot), the second looks for any dot inside and the third looks for a dot at the end of the string $.
That regex will still match any period that isn't preceded by another period.
[^\.]\.[^\.] Takes care of both sides of the target period.
EDIT: Java doesn't have a raw string like Python, so you would need full escapes: [^.]\\.[^.]|^\\.[^.]|[^.]\\.$

validating input string "RX-EZ12345678912345B" using regex

I need to validate input string which should be in the below format:
<2_upper_case_letters><"-"><2_upper_case_letters><14-digit number><1_uppercase_letter>
Ex: RX-EZ12345678912345B
I tried something like this ^[IN]-?[A-Z]{0,2}?\\d{0,14}[A-Z]{0,1} but its not giving the expected result.
Any help will be appreciated.
Thanks
Your biggest problem is the [IN] at the beginning, which matches only one letter, and only if it's I or N. If you want to match two of any letters, use [A-Z]{2}.
Once you fix that, your regex will still only match RX-E. That's because [A-Z]{0,2}? starts out trying to consume nothing, thanks to the reluctant quantifier, {0,2}?. Then \d{0,14} matches zero digits, and [A-Z]{0,1} greedily consumes the E.
If you want to match exactly 2 letters and 14 digits, use [A-Z]{2} and \d{14}. And since you're validating the string, you should end the regex with the end anchor, $. Result:
^[A-Z]{2}-[A-Z]{2}\d{14}[A-Z]$
...or, as a Java string literal:
"^[A-Z]{2}-[A-Z]{2}\\d{14}[A-Z]$"
As #nhahtdh observed, you don't really have to use the anchors if you're using Java's matches() method to apply the regex, but I recommend doing so anyway. It communicates your intent better, and it makes the regex portable, in case you have to use it in a different flavor/context.
EDIT: If the first two characters should be exactly IN, it would be
^IN-[A-Z]{2}\d{14}[A-Z]$
Simply translating your requirements into a java regex:
"^[A-Z]{2}-[A-Z]{2}\\d{14}[A-Z]$"
This will allow you to use:
if (!input.matches("^[A-Z]{2}-[A-Z]{2}\\d{14}[A-Z]$")) {
// do something because input is invalid
}
Not sure what you are trying to do at the beginning of your current regex.
"^[A-Z]{2}-[A-Z]{2}\\d{14}[A-Z]$"
The regex above will strictly match the input string as you specified. If you use matches function, ^ and $ may be omitted.
Since you want exact number of repetitions, you should specify it as {<number>} only. {<number>,<number>} is used for variable number of repetitions. And ? specify that the token before may or may not appear - if it must be there, then specifying ? is incorrect.
^[A-Z]{2}-[A-Z]{2}\\d{14}[A-Z]$
This should solve your purpose. You can confirm it from here
This should solve your problem. Check out the validity here
^[A-Z]{2}-[A-Z]{2}[0-9]{14}[A-Z]$
^([A-Z]{2,2}[-]{1,1}[A-Z]{2,2}[0-9]{14,14}[A-Z]{1,1}){1,1}$

How to use regex to remove punctuations in a sentence

I am trying to take from a file all the valid words. Valid words are defined as normal characters that can appear like so:
don't won't can't
and I have to ignore commas periods and exclamation points.
I have gotten the expression to just get characters but now it won't get words like don't and can't or won't.
This is the expression I am using "[^A-Za-z]+" and I have tried "\'[^A-Za-z]+" but this breaks and allows all characters. Does anyone have any idea what I can use to get normal words including don't and won't and can't and such words.
Thank you very much
[^A-Za-z] Would mean anything NOT matching those character ranges! Try this:
[A-Za-z']
You may need to escape the single quote, in which case you'll probably need to escape the slash that escapes it:
[A-Za-z\\']
Another way (using abbreviations) is: \b[\w']+
This will match letters from any language and exclude numbers.
\b[\p{L}\!\'\?]+
Here is a very good resource for regular expressions.
http://www.regular-expressions.info/

Blank spaces in regular expression

I use this regular to validate many of the input fields of my java web app:
"^[a-zA-Z0-9]+$"
But i need to modify it, because i have a couple of fields that need to allow blank spaces(for example: Address).
How can i modify it to allow blank spaces(if possible not at the start).
I think i need to use some scape character like \
I tried a few different combinations but none of them worked. Can somebody help me with this regex?
I'd suggest using this:
^[a-zA-Z0-9][a-zA-Z0-9 ]+$
It adds two things: first, you're guaranteed not to have a space at the beginning, while allowing characters you need. Afterwards, letters a-z and A-Z are allowed, as well as all digits and spaces (there's a space at the end of my regex).
If you want to use only a whitespace, you can do:
^[a-zA-Z0-9 ]+$
If you want to include tabs \t, new-line \n \r\n characters, you can do:
^[a-zA-Z0-9\s]+$
Also, as you asked, if you don't want the whitespace to be at the begining:
^[a-zA-Z0-9][a-zA-Z0-9 ]+$
Use this: ^[a-zA-Z0-9]+[a-zA-Z0-9 ]+$. This should work. First atom ensures that there must be at least one character at beginning.
try like this ^[a-zA-Z0-9 ]+$ that is, add a space in it
This regex dont allow spaces at the end of string, one downside it accepts underscore character also.
^(\w+ )+\w+|\w+$
Try this one: I assume that any input with a length of at least one character is valid. The previously mentioned answers does not take that into account.
"^[a-zA-Z0-9][a-zA-Z0-9 ]*$"
If you want to allow all whitespace characters, replace the space by "\s"

Categories

Resources