Email verification in java - java

Is there any regex to validate email which disallow .com.com i.e example#example.com.com but allow .com.uk?
I have pasted my code that should return false while user enters .com.com. Can anyone verify my code below?
String ePattern = "^[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*#" + "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})$";
Pattern p =Pattern.compile(ePattern);
Matcher m = p.matcher(studentemail);
b= m.matches();
if(!b) {
eMsg.addError("email must be in formate: example#example.xxx / example#example.xx");
request.setAttribute("errorMessage", eMsg);
requestDispatcher = request.getRequestDispatcher("/addrecord.jsp");
requestDispatcher.forward(request, response);
return;
}
else {
System.out.println("valid email id");
}

Such a regex isn't limited to Java in most cases (only if special capabilities are used that are not supported by the Java regex engine). Besides that, if you want to match #example.xx only you should adjust the second part of your regex (especially the first group in that part). But note that there are domains like xxx.co.uk which might not be valid then.
Also xx.com.com might be an invalid domain but it also might not (at least in other combinations, e.g. de.de is a valid domain) so you might want to either allow that, test for specific combinations only or restrict usage of such domains.
However, depending on your use-case it would probably be better to send a confirmation email since a regex can just tell you that it looks correct but there might still be errors.

Related

Regular expression for hgsv notation in java

HGSV nomenclature has a pattern:
xxxxx.yyyy:charactersnumbercharacters
I would like to make a regex in java and fetch the all the tokens from above eg:
it should have 5 tokens :
{ 'xxxxx', 'yyyy', 'characters', 'number' , 'characters'}
I have used simple split methodology to fetch the tokens, but I don't find its an optimal solution:
my current code is :
String hgsv = "BRAF.p:V600E";
String[] tokens = hgsv.split(".");
this.symbol = tokens[0];
String type = tokens[1].split(":")[0];
I would like to use Pattern and Matcher in Java. No idea, how to make regex for the above token.
Any clue how to do that?
(even to separate characters, numbers, characters I will be using regex). So why not to use REGEX for entire token.
I found link but this is in Python, I need similar in Java.
I think what you're probably looking for is to use capture groups, like this:
String s = "BRAF.p:V600E";
Pattern p = Pattern.compile("(\\w+)\\.(\\w+):([a-zA-Z]+)(\\d+)([a-zA-Z]+)");
Matcher m = p.matcher(s);
if (m.matches()) {
String[] parts = {m.group(1),
m.group(2),
m.group(3),
m.group(4),
m.group(5)};
// Prints "[BRAF, p, V, 600, E]"
System.out.println(Arrays.toString(parts));
} else {
// The input String is invalid.
}
That's really just a lot like a split, but it's more stable because you're using the pattern to validate the String beforehand.
Note that I have no idea if that is the exact right pattern that you should be using. I don't know the exact details of the HGSV notation you're talking about and your description is actually pretty vague. (What are e.g. xxxxx and yyyy? What are "characters"?) If you link me to some sort of specification or detailed description of this notation I can try to write a regex that's more definitely correct.
Anyhow, my example shows the basic idea. You might also see http://www.regular-expressions.info/brackets.html for more information.

apache commons-validator alternative for new gTLDS

I need to validate emails and domains. I just need a formal validation, no whois or other forms of domain lookup needed.
Currently I'm using apache's commons-validator v1.4.0
Unfortunately my customers use the new gTLDs, like .bike or .productions that are not yet supported by the DomainValidator class.
See Apache's Jira issue for more details.
Are there any sound alternatives that I may easily include in my Maven POM?
If you are not concerned about internationalized addresses, you could change last part of address, and continue to use Apache commons.
This approach is based on the fact that whatever the TLD is, the validity of the whole domain name is equivalent to the validity of the same domain name with the TLD replaced with com. For example:
abc.def.com is valid. Similarly abc.def.name, abc.def.xx--kput3i, abc.def.uk are valid.
ab,de.com is not valid. Similarly ab,de.name, ab,de.xx-kput3i, ab,de.uk are not valid.
So instead of calling
return EmailValidator.getInstance().isValid(userEmail);
You can call
if ( userEmail == null ) {
return false;
}
return EmailValidator.getInstance().isValid(userEmail.trim().replaceFirst("\\.\\p{Alpha}[\\p{Alnum}-]*\\p{Alnum}$", ".com"));
Explanation
The regular expression "\\.\\p{Alpha}[\\p{Alnum}-]*\\p{Alnum}$" checks for the TLD part: it's at the end of the string (because of the $), it starts with a dot and contains no other dot, and it conforms to the standards: begins with an ASCII Alpha character, followed by zero or more alphanumerics or dashes, and ends with an alphanumeric character.
I am using trim() because until now, if you used EmailValidator, it allows spaces before and after the address. Removing the spaces just makes it easier to replace the TLD, and it shouldn't matter as far as the validity of the address is concerned.
If the string doesn't have a valid TLD at the end, String.replaceFirst() will return it as is. It could still be valid, because email addresses of the format x#[n.n.n.n] where n.n.n.n. is a valid IP address are valid. So basically, if you didn't find a TLD, you let EmailValidator decide the validity issue itself.
Of course, if the TLD is not an IANA recognized TLD, this validation will not tell you that. An e-mail like david#galaxy.hoopie-frood will be accepted as legal,but IANA doesn't have that TLD as yet.
Checking a domain is similar, without the trim() part:
if (userDomain == null ) {
return false;
}
return DomainValidator.getInstance().isValid(userDomain.replaceFirst("\\.\\p{Alpha}[\\p{Alnum}-]*\\p{Alnum}$"));
I have also tried JavaMail's email address validation, but I don't really like it: it allows completely invalid domain names such as net-name.net- (ending with a dash) or IP addresses (which are not allowed for e-mail without square brackets around them), and it's only good for e-mail addresses, not for domains.
Internationalization
If you need to check for internationalized domains and e-mails, it's a bit different. It's easy to check for internationalized domains (for example 元気。テスト). All you need to do is convert them to ASCII with java.net.IDN.toASCII() (yielding xn--z4qx76d.xn--zckzah for my example domain - this is a valid TLD), and then do the same as I wrote above.
Internationalized e-mails are a different story. If the local part is ASCII, you can convert the domain part to ASCII. If you have to display the email address, you need to use the Unicode version, and if you have to send an email message, you use the ASCII version.
But recently a standard has been introduced for internationalized local parts as well, which also allows sending to the unicode version of the domain name without translating it to ASCII first. Whether you want to support that or not requires some thought, as not many mail servers and mail transfer agents support it at the moment.
Copied the implementation from DomainValidator and replaced the TOP_LABEL_REGEX expression with "\\p{Alpha}[\\p{Alnum}-]*\\p{Alpha}".
In addition, I removed validation against the hard coded list of approved gTLDs. This is, basically, quite weak in that it doesn't validate against the actual domains. But I think it's good enough (catches the gTLDs similar to XN--YGBI2AMMX).
See full list of approved gTLDs here.
// Copied from org.apache.commons.validator.routines.DomainValidator
private static final String DOMAIN_LABEL_REGEX = "\\p{Alnum}(?>[\\p{Alnum}-]*\\p{Alnum})*";
// Changed to include new gTLD - http://data.iana.org/TLD/tlds-alpha-by-domain.txt
private static final String TOP_LABEL_REGEX = "\\p{Alpha}[\\p{Alnum}-]*\\p{Alpha}";
// Copied from org.apache.commons.validator.routines.DomainValidator
private static final String DOMAIN_NAME_REGEX = "^(?:" + DOMAIN_LABEL_REGEX + "\\.)+" + "(" + TOP_LABEL_REGEX + ")$";
private static final RegexValidator domainRegex = new RegexValidator(DOMAIN_NAME_REGEX);
private static final EmailValidator EMAIL_VALIDATOR = new EmailValidator();
public static boolean isValidDomain(String domain) {
String[] groups = domainRegex.match(domain);
return groups != null && groups.length > 0;
}
What I often do in this situation is to checkout the source code for the library in question (it's open source remember?), modify it to suit my requirement, and then contribute the patch back to the project.
Your use case certainly sounds like it would be a useful contribution.
I made you a public suffix list Java API. The method PublicSuffixList.getRegistrableDomain() can be used for Domain validation:
PublicSuffixListFactory factory = new PublicSuffixListFactory();
PublicSuffixList suffixList = factory.build();
assertNull(suffixList.getRegistrableDomain("galaxy.hoopie-frood"));
assertNotNull(suffixList.getRegistrableDomain("example.bike"));
While DomainValidator is missing some of the new TLDs, for me the best solution was to update TLD.
DomainValidator.updateTLDOverride(ArrayType.COUNTRY_CODE_PLUS, new String[]{"someTLD"});
And then initiate EmailValidator Instance
EmailValidator.getInstance(false, true)

java regex matcher results != to notepad++ regex find result

I am trying to extract data out of a website access log as part of a java program. Every entry in the log has a url. I have successfully extracted the url out of each record.
Within the url, there is a parameter that I want to capture so that I can use it to query a database. Unfortunately, it doesn't seem that the web developers used any one standard to write the parameter's name.
The parameter is usually called "course_id", but I have also seen "courseId", "course%3DId", "course%253Did", etc. The format for the parameter name and value is usually course_id=_22222_1, where the number I want is between the "_" and "_1". (The value is always the same, even if the parameter name varies.)
So, my idea was to use the regex /^.*course_id[^_]*_(\d*)_1.*$/i to find and extract the number.
In java, my code is
java.util.regex.Pattern courseIDPattern = java.util.regex.Pattern.compile(".*course[^i]*id[^_]*_(\\d*)_1.*", java.util.regex.Pattern.CASE_INSENSITIVE);
java.util.regex.Matcher courseIDMatcher = courseIDPattern.matcher(_url);
_courseID = "";
if(courseIDMatcher.matches())
{
_courseID = retrieveCourseID(courseIDMatcher.group(1));
return;
}
This works for a lot of the records. However, some records do not record the course_id, even though the parameter is in the url. One such example is the record:
/webapps/contentDetail?course_id=_223629_1&content_id=_3641164_1&rich_content_level=RICH&language=en_US&v=1&ver=4.1.2
However, I used notepad++ to do a regex replace on this (in fact, every) url using the regex above, and the url was successfully replaced by the course ID, implying that the regex is not incorrect.
Am I doing something wrong in the java code, or is the java matcher broken?

Website/URL Validation Regex in JAVA

I need a regex string to match URL starting with "http://", "https://", "www.", "google.com"
the code i tried using is:
//Pattern to check if this is a valid URL address
Pattern p = Pattern.compile("(http://|https://)(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?");
Matcher m;
m=p.matcher(urlAddress);
but this code only can match url such as "http://www.google.com"
I know this ma be a dupicate question but i have tried all of the regex provided and it does not suit my requirement. Willl someone please help me? Thank you.
You need to make (http://|https://) part in your regex as optional one.
^(http:\/\/|https:\/\/)?(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?$
DEMO
You can use the Apache commons library(org.apache.commons.validator.UrlValidator) for validating a url:
String[] schemes = {"http","https"}.
UrlValidator urlValidator = new UrlValidator(schemes);
And use :-
urlValidator.isValid(your url)
Then there is no need of regex.
Link:-
https://commons.apache.org/proper/commons-validator/apidocs/org/apache/commons/validator/routines/UrlValidator.html
If you use Java, I recommend use this RegEx (I wrote it by myself):
^(https?:\/\/)?(www\.)?([\w]+\.)+[‌​\w]{2,63}\/?$
"^(https?:\\/\\/)?(www\.)?([\\w]+\\.)+[‌​\\w]{2,63}\\/?$" // as Java-String
to explain:
^ = line start
(https?://)? = "http://" or "https://" may occur.
(www.)? = "www." may orrur.
([\w]+.)+ = a word ([a-zA-Z0-9]) has to occur one or more times. (extend here if you need special characters like ü, ä, ö or others in your URL - remember to use IDN.toASCII(url) if you use special characters. If you need to know which characters are legal in general: https://kb.ucla.edu/articles/what-characters-can-go-into-a-valid-http-url
[‌​\w]{2,63} = a word ([a-zA-Z0-9]) with 2 to 63 characters has to occur exactly one time. (a TLD (top level domain (for example .com) can not be shorter than 2 or longer than 63 characters)
/? = a "/"-character may occur. (some people or servers put a / at the end... whatever)
$ = line end
-
If you extend it by special characters it could look like this:
^(https?:\/\/)?(www\.)?([\w\Q$-_+!*'(),%\E]+\.)+[‌​\w]{2,63}\/?$
"^(https?:\\/\\/)?(www\.)?([\\w\\Q$-_+!*'(),%\\E]+\\.)+[‌​\\w]{2,63}\\/?$" // as Java-String
The answer of Avinash Raj is not fully correct.
^(http:\/\/|https:\/\/)?(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?$
The dots are not escaped what means it matches with any character. Also my version is simpler and I never heard of a domain like "test..com" (which actually matches...)
Demo: https://regex101.com/r/vM7wT6/279
Edit:
As I saw some people needing a regex which also matches servers directories I wrote this:
^(https?:\/\/)?([\w\Q$-_+!*'(),%\E]+\.)+(\w{2,63})(:\d{1,4})?([\w\Q/$-_+!*'(),%\E]+\.?[\w])*\/?$
while this may not be the best one, since I didn't spend too much time with it, maybe it helps someone. You can see how it works here: https://regex101.com/r/vM7wT6/700
It also matches urls like "hello.to/test/whatever.cgi"
Java compatible version of #Avinash's answer would be
//Pattern to check if this is a valid URL address
Pattern p = Pattern.compile("^(http://|https://)?(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?$");
Matcher m;
m=p.matcher(urlAddress);
boolean matches = m.matches();
pattern="w{3}\.[a-z]+\.?[a-z]{2,3}(|\.[a-z]{2,3})"
this will only accept addresses like e.g www.google.com & www.google.co.in
//I use that
static boolean esURL(String cadena){
boolean bandera = false;
bandera = cadena.matches("\\b(https://?|ftp://|file://|www.)[-a-zA-Z0-9+&##/%?=~_|!:,.;]*[-a-zA-Z0-9+&##/%=~_|]");
return bandera;
}

RegEx for 4 values within a link (This or this or this etc)

Having a bit of trouble with this.
Say I have a link that could contain these values:
Balearic|Ibiza|Majorca|Menorca
Example1: http://site.com/Menorca
Example2: http://site.com/Ibiza
I just need a RegEx to say if the link contains any of those 4 (case insensitive as well)
Can someone point in the right direction - it is not in a particular language but the software I work in is Java based.
Thanks a lot - and I'll keep trying in the meantime! :)
You can just use:
// assuming url is your URL variable
if (url.matches("(?i)^http://site\.com/(Balearic|Ibiza|Majorca|Menorca)\b.*$")) {
// match succeeded
}
(?i) will make sure case is ignored while doing this comparison.
Your list of values is a valid regex, just add the "i" option to make it case insensitive, per http://rubular.com/r/euscuu7Fwj
Here you can find a regexp that matches what you ask:
^http://site\.com/(Balearic|Ibiza|Majorca|Menorca)$
to use it in Java, you may want to do the following:
String url = "http://site.com/Ibiza";
Pattern p = Pattern.compile("^http://site\.com/(Balearic|Ibiza|Majorca|Menorca)$", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(url); // get a matcher object
Usually regex against URL is not a good practice.
Here's a mixed approach. It will look for your values only in the path of your URLs, thus both simplifying the regular expression and validating the URL. Case-insensitive.
try {
URI menorca = new URI("http://site.com/Menorca");
System.out.println(menorca.getPath().substring(1));
URI ibiza = new URI("http://site.com/Ibiza");
System.out.println(ibiza.getPath().substring(1));
Pattern pattern = Pattern.compile("Balearic|Ibiza|Majorca|Menorca", Pattern.CASE_INSENSITIVE);
System.out.println(pattern.matcher(menorca.getPath()).find());
System.out.println(pattern.matcher(ibiza.getPath()).find());
}
catch (URISyntaxException use) {
use.printStackTrace();
}
Output:
Menorca
Ibiza
true
true

Categories

Resources