Java Regular expression validation - java

I want to validate a string which allows only alpha numeric values and only
one dot character and only underscore character in java .
String fileName = (String) request.getParameter("read");
I need to validate the fileName retrieving from the request and should
satisfy the above criteria
I tried in "^[a-zA-Z0-9_'.']*$" , but this allows more than one dot character
I need to validate my string in the given scenarios ,
1 . Filename contains only alpha numeric values .
2 . It allows only one dot character (.) , example : fileRead.pdf ,
fileWrite.txt etc
3 . it allows only underscore characters . All the other symbols should be
declined
Can any one help me on this ?

You should use String.matches() method :
System.out.println("My_File_Name.txt".matches("\\w+\\.\\w+"));
You can also use java.util.regex package.
java.util.regex.Pattern pattern =
java.util.regex.Pattern.compile("\\w+\\.\\w+");
java.util.regex.Matcher matcher = pattern.matcher("My_File_Name.txt");
System.out.println(matcher.matches());
For more information about REGEX and JAVA, look at this page :
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

You could use two negative lookaheads here:
^((?!.*\..*\.)(?!.*_.*_)[A-Za-z0-9_.])*$
Each lookahead asserts that either a dot or an underscore does not occur two times, implying that it can occur at most once.
It wasn't completely clear whether you require one dot and/or underscore. I assumed not, but my regex could be easily modified to this requirement.
Demo

You can first check the special characters which have the number limits.
Here is the code:
int occurance = StringUtils.countOccurrencesOf("123123..32131.3", ".");
or
int count = StringUtils.countMatches("123123..32131.3", ".");
If it does not match your request you can discard it before regex check.
If there is no problem you can now put your String to alphanumeric value check.

Related

Regex pattern matching with multiple strings

Forgive me. I am not familiarized much with Regex patterns.
I have created a regex pattern as below.
String regex = Pattern.quote(value) + ", [NnoneOoff0-9\\-\\+\\/]+|[NnoneOoff0-9\\-\\+\\/]+, "
+ Pattern.quote(value);
This regex pattern is failing with 2 different set of strings.
value = "207e/160";
Use Case 1 -
When channelStr = "207e/160, 149/80"
Then channelStr.matches(regex), returns "true".
Use Case 2 -
When channelStr = "207e/160, 149/80, 11"
Then channelStr.matches(regex), returns "false".
Not able to figure out why? As far I can understand it may be because of the multiple spaces involved when more than 2 strings are present with separated by comma.
Not sure what should be correct pattern I should write for more than 2 strings.
Any help will be appreciated.
If you print your pattern, it is:
\Q207e/160\E, [NnoneOoff0-9\-\+\/]+|[NnoneOoff0-9\-\+\/]+, \Q207e/160\E
It consists of an alternation | matching a mandatory comma as well on the left as on the right side.
Using matches(), should match the whole string and that is the case for 207e/160, 149/80 so that is a match.
Only for this string 207e/160, 149/80, 11 there are 2 comma's, so you do get a partial match for the first part of the string, but you don't match the whole string so matches() returns false.
See the matches in this regex demo.
To match all the values, you can use a repeating pattern:
^[NnoeOf0-9+/-]+(?:,\h*[NnoeOf0-90+/-]+)*$
^ Start of string
[NnoeOf0-9\\+/-]+
(?: Non capture group
,\h* Match a comma and optional horizontal whitespace chars
[NnoeOf0-90-9\\+/-]+ Match 1+ any of the listed in the character class
)* Close the non capture group and optionally repeat it (if there should be at least 1 comma, then the quantifier can be + instead of *)
$ End of string
Regex demo
Example using matches():
String channelStr1 = "207e/160, 149/80";
String channelStr2 = "207e/160, 149/80, 11";
String regex = "^[NnoeOf0-9+/-]+(?:,\\h*[NnoeOf0-90+/-]+)*$";
System.out.println(channelStr1.matches(regex));
System.out.println(channelStr2.matches(regex));
Output
true
true
Note that in the character class you can put - at the end not having to escape it, and the + and / also does not have to be escaped.
You can use regex101 to test your RegEx. it has a description of everything that's going on to help with debugging. They have a quick reference section bottom right that you can use to figure out what you can do with examples and stuff.
A few things, you can add literals with \, so \" for a literal double quote.
If you want the pattern to be one or more of something, you would use +. These are called quantifiers and can be applied to groups, tokens, etc. The token for a whitespace character is \s. So, one or more whitespace characters would be \s+.
It's difficult to tell exactly what you're trying to do, but hopefully pointing you to regex101 will help. If you want to provide examples of the current RegEx you have, what you want to match and then the strings you're using to test it I'll be happy to provide you with an example.
^(?:[NnoneOoff0-9\\-\\+\\/]+ *(?:, *(?!$)|$))+$
^ Start
(?: ... ) Non-capturing group that defines an item and its separator. After each item, except the last, the separator (,) must appear. Spaces (one, several, or none) can appear before and after the comma, which is specified with *. This group can appear one or more times to the end of the string, as specified by the + quantifier after the group's closing parenthesis.
Regex101 Test

regex to filter out string

I'm filtering out string using below regex
^(?!.*(P1 | P2)).*groupName.*$
Here group name is specific string which I replace at run time. This regex is already running fine.
I've two input strings which needs to pass through from this regex. Can't change ^(?!.*(P1 | P2)) part of regex, so would like to change regex after this part only. Its a very generic regex which is being used at so many places, so I have only place to have changes is groupName part of regex. Is there any way where only 2 string could pass through this regex ?
1) ADMIN-P3-UI-READ-ONLY
2) ADMIN-P3-READ-ONLY
In regex groupName is a just a variable which will be replaced at run time with required string. In this case I want 2 string to be passed, so groupName part can be replaced with READ-ONLY but it will pass 1 string too.
Can anyone suggest on this how to make this work ?
You could use negative lookBehind:
(?<!UI-)READ-ONLY
so there must be no UI- before READ-ONLY
You can add another lookahead at the very start of your pattern to further restrict what it matches because your pattern is of the "match-everything-but" type.
So, it may look like
String extraCondition = "^(?!.*UI)";
String regex = "^(?!.*(P1|P2)).*READ-ONLY.*$";
String finalRegex = extraCondition + regex;
The pattern will look like
^(?!.*UI)^(?!.*(P1|P2)).*READ-ONLY.*$
matching
^(?!.*UI) - no UI after any zero or more chars other than line break chars as many as possible from the start of string
^(?!.*(P1|P2)) - no P1 nor P2 after any zero or more chars other than line break chars as many as possible from the start of string
.*READ-ONLY - any zero or more chars other than line break chars as many as possible and then READ-ONLY
.*$ - the rest of the string. Note you may safely remove $ here unless you want to make sure there are no extra lines in the input string.

parsing numerical address

I have been trying to parse a numerical address from a string using regex.
So far, I have been able to successfully get the numerical address (partially) 63.88.73.26:80 from the string http://63.88.73.26:80/. However I have been trying to skip over the :80/, and have had no luck.
What I have tried so far is:
Pattern.compile("[0-999].*[0-999][\\p{Digit}]", Pattern.DOTALL);
however does still includes :80
I dont know what I am missing here, I have tried to check for \p{Digit} at the end, but that doesn't do much either
Thanks for your time!
You are looking for a positive look ahead (?=...). This will match only if it is followed by a specific expression, the one in the positive look ahead's parenthesis. In it's simplest form you could have
[0-9\.]+(?=:[0-9]{0,4})
Though you may want to change the [0-9\.]+ part (match 1 or more digit or full stop) with something more complete to check that you have a properly formed address
Check out regexr.com where you can fiddle your expression to your heart's content until it works...
Note that Pshemo indicated the right approach with URL and getHost():
Gets the host name of this URL, if applicable. The format of the host conforms to RFC 2732, i.e. for a literal IPv6 address, this method will return the IPv6 address enclosed in square brackets ('[' and ']').
Thus, it is best to use the proper tool here:
import java.net.*;
....
String str = new URL("http:" + "//63.88.73.26:80/").getHost();
System.out.println(str); // => 63.88.73.26
See the Java demo
You mention that you want to learn regex, so let's inspect your pattern:
[0-999] - matches any 1 digit, a single digit (0-9 creates a range that matches 0..9, and the two 9s are redundant and can be removed)
.* - any 0+ chars, greedily, i.e. up to the last...
[0-999] - see above (any 1 digit)
[\\p{Digit}] - any Unicode digit
That means, you match a string starting with a digit and up to the last occurrence of 2 consecutive digits.
You need a sequence of digits and dots. There are multiple ways to extract such strings.
Using verbose pattern with exact character specification together with how many occurrences you need: [0-9]{1,3}(?:\.[0-9]{1,3}){3} (the whole match - matcher.group() - holds the required value).
Using the "brute-force" character class approach (see Jonathan's answer), but I'd use a capturing group instead of a lookahead and use an unescaped dot since inside a character class it is treated as a literal dot: ([0-9.]+):[0-9] (now, the value is in matcher.group(1))
A "fancy" "get-string-between-two-strings" approach: all text other than : and / between http:// and : must be captured into a group - https?://([^:/]+): (again, the value is in matcher.group(1))
Some sample code (Approach #1):
Pattern ptrn = Pattern.compile("[0-9]{1,3}(?:\\.[0-9]{1,3}){3}");
Matcher matcher = ptrn.matcher("http://63.88.73.26:80/");
if (matcher.find()) {
System.out.println(matcher.group());
}
Must read: Character Classes or Character Sets.

How to match a substring following after a string satisfying the specific pattern

Imagine, that I have the string 12.34some_text.
How can I match the substring following after the second character (4 in my case) after the . character. In that particular case the string I want to match is some_text.
For the string 56.78another_text it will be another_text and so on.
All accepted strings have the pattern \d\d\.\d\d\w*
If you wish to match everything from the second character after a specific one (i.e. the dot) you can use a lookbehind, like this:
(?<=[.]\d{2})(\w*)
demo
(?<=[.]\d{2}) is a positive lookbehind that matches a dot [.] followed by two digits \d{2}.
Since you are using java and the given pattern is \d\d\.\d\d\w* you will get some_text from 12.34some_textby using
String s="12.34some_text";
s.substring(5,s.length());
and you can compare the substring!

Extract specific data from string with regex

I want to capture multiple string which match some specific patterns,
For example my string is like
String textData = "#1_Label for UK#2_Label for US#4_Label for FR#";
I want to get string between two # which match with string like for UK
Output should like this
if match string is UK than
output should be 1_Label for UK
if match string is label than
output should be 1_Label for UK, 2_Label for US and 4_Label for FR
if match string is 1_ than
output should be 1_Label for UK
I don't want to extract data via array list and extraction should be case insensitive.
Can you please help me out from this problem?
Regards,
Ashish Mishra
You can use this regex for search:
#([^#]*?Label[^#]*)(?=#)
Replace Label with your search keyword.
RegEx Demo
Java Pattern:
Pattern p = Pattern.compile( "#([^#]*?" + Pattern.quote(keyword) + "[^#]*)(?=#)" );
If the data always is between two hashes, try a regex like this: (?i)#.*your_match.*# where your_match would be UK, label, 1_ etc.
Then use this expression in conjunction with the Pattern and Matcher classes.
If you want to match multiple strings, you'd need to exclude the hashes from the match by using look-around methods as well as reluctant modifiers, e.g. (?i)(?<=#).*?label.*?(?=#).
Short breakdown:
(?i) will make the expression case insensitive
(?<=#) is a positive look-behind, i.e. the match must be preceeded by a hash (but doesn't include the hash)
.*? matches any sequence of characters but is reluctant, i.e. it tries to match as few characters as possible
(?=#) is a positive look-ahead, which means the match must be followed by a hash (also not included in the match)
Without the look-around methods the hashes would be included in the match and thus using Matcher.find() you'd skip every other label in your test string, i.e. you'd get the matches #1_Label for UK# and #4_Label for FR# but not #2_Label for US#.
Without the relucatant modifiers the expression would match everything between the first and the last hash.
As an alternative and better, replace .*? with [^#]*, which would mean that the match cannot contain any hash, thus removing the need for reluctant modifiers as well as removing the problem that looking for US would match 1_Label for UK#2_Label for US.
So most probably the final regex you're after looks like this: (?i)(?<=#)[^#]*your_match[^#]*(?=#).
([^#]*UK[^#]*) for UK
([^#]*Label[^#]*) for Label
([^#]*1_[^#]*) for 1_
Try this.Grab the captures.See demo.
http://regex101.com/r/kQ0zR5/3
http://regex101.com/r/kQ0zR5/4
http://regex101.com/r/kQ0zR5/5
I have solved this problem with below pattern,
(?i)([^#]*?us[^#]*)(?=#)
Thank you so much Anubhava, VKS and Thomas for you reply.
Regards,
Ashish Mishra

Categories

Resources