Java regex string output not as expected

Java regex string output not as expected - java

I'm trying to write some code to validate email addresses based on specific guidelines given to me, and one of the guidelines is that an address such as cath#[10.1.1] should be valid. I've gotten stuck and can't figure out what's wrong with my regex string:
"[A-Za-z0-9._%+-]+[#|_at_]+[\\[|[A-Za-z0-9-]]+[0-9\\.|_dot_]+[\\]|com|com.au|co.ca|co.nz|co.us|co.uk]{2,4}"
this is some example output:
Enter an email address
cath#[10.1.1]
cath#[10.1.1] is not a valid email address
cath#[10.1.1.a]
cath#[10.1.1.a] is a valid email address
cath#[10.1.1.]
cath#[10.1.1.] is a valid email address
The last two input/outputs should be invalid, whilst the first should be valid. Could anyone possibly point me in the right direction? Thanks
EDIT - here is my code if it helps anyone
import java.util.*;
import java.lang.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class EmailAddresses {
public static void main(String[] args) {
String line;
System.out.println("Enter an email address");
Scanner scan = new Scanner(System.in);
while (scan.hasNextLine()) {
line = scan.nextLine();
Pattern pattern = Pattern.compile("[A-Za-z0-9._%+-]+(?:#|_at_)(?:\\[|[A-Za-z0-9-])(?:0-9\\.|_dot_)(?:\\]|com|com\\.au|co\\.ca|co\\.nz|co\\.us|co\\.uk){2,4}");
Matcher mat = pattern.matcher(line);
if(mat.matches()){
line = line.toLowerCase();
System.out.println(line + " is a valid email address");
}else{
System.out.println(line + " is not a valid email address");
}
}
}
}

Here is what the regex flavor understands with the initial regex:
I think there is a misconcepttion. Brackets [] create a character class: a sequence of characters alternatives.
Here brackets are used to declare a sequence of words alternatives, that's not the intented behavior. For declaring sequence of words alternatives, use a non-capturing group (?:...) and inside this group , use the logical operator |.
For example:
"[\\[|[A-Za-z0-9-]]+" becomes "(?:\\[|[A-Za-z0-9-])+"
Try this regex instead:
^[A-Za-z0-9._%+-]+(?:#|_at_)(?:\[(?:\d|\.|_dot_)+(?<!\.)\]|[A-Za-z\d._-]+\.(?:com|com\.au|co\.ca|co\.nz|co\.us|co\.uk))$
Description
Demo
http://regex101.com/r/dS8qF4

Since you are not restricted to using a single regex, I suggest you split the check.
For instance, here is a method which will try and find the separator in your input:
private static int trySeparator(final String input, final String separator)
{
int ret = input.indexOf(separator);
if (ret == -1)
return ret;
return ret == input.lastIndexOf(separator) ? ret : -1;
}
Use that within your main validation method for # and _at_, then separate the first and second parts and check them separately. Much easier than a single regex, more testable ;)

Related

Extracting hashtags from user input

I have been asking similar questions before so this may be taken down but I feel like the code I have now should work but it doesn't.
String post [] = new String [100];
System.out.println("\nType your post");
String userPost = input.nextLine();
post[0] = userPost;
String hashtags ="";
for (int i = 0; i<post.length && post[i]!=null;i++){
String[]words = post[i].split(" ");
for(int j=0;j<words.length;j++){
if(words[j].trim().startsWith("#")){
hashtags+=words[j].trim() + " ";
}
}
}
if(hashtags.trim().isEmpty())
System.out.println("No hashtags were typed");
else
System.out.println("Hashtags found: " + hashtags );
I feel like this should work but when running this code, it skips asking for user input and immediately prints No hashtags were typed.

What should you do is to use regular expression Java API, extracting all searched hashtags from the provided String variable with the proper regexp:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class HashtagsFinder {
public static void main() {
String input = "#HASHTAGS #HASHTAG #ANOTHER_HASHTAG BUT#THISISNOTAHASHTAG";
Matcher matcher = Pattern.compile("\\B#\\w+").matcher(input);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
A regular expression passed as the Pattern.compile() argument matches expressions in line with the following rules:
\B - ensure that there is no word boundary at the beginning.
# - ensure that there is exactly one hash char at the beginning of the word - with the previous part, it ensures that the hash char is always at the beginning of the word.
\w+ - match one or more of the possible word character (alphanumeric and undescore), so we won't be collecting expressions with just the hash char.
Output of the code pasted above:
#HASHTAGS
#HASHTAG
#ANOTHER_HASHTAG
What is also important - instead of the standard java.util.regex package, you should use https://github.com/google/re2j, since it's faster and provides safer compliation (standard Java regular expressions library uses backtracking to compile passed regex, which may cause our program to exceed available stack space).

Accept only letters in constructor using scanner in Java

I guys, i'm trying to create an interactive constructor that take name and surname of the users as input in a scanner...all works, but for design now i want to accept only letters in name and surname, i've tryed with Pattern and Matcher, but the costructor still set number as name or surname (it tell me that is an invalid input but still set it in the User)
public User(){
System.out.println("insert name and surname");
System.out.println("Name: ");
Scanner input = new Scanner(System.in);
Pattern p = Pattern.compile("a,z - A,Z");
name = input.nextLine();
Matcher m = p.matcher(name);
if(m.matches()) {
this.setName(name);
}else{
System.out.println("invalid input");
}
System.out.println("SURNAME:");
surname= input.nextLine();
this.setSurname(surname);
p.matcher(surname);
System.out.println(Welcome);
System.out.println("--------------------------");
}

There's a lot of things going on here that aren't quite right. You are on your own for the stuff other than the regex-issue but consider the other points noted below.
the constructor should not be interactive - collect inputs and pass them to the constructor
your regex pattern is wrong so it will not match the inputs you actually want
you are reading the name into the name variable and then testing it - this is why it reports bad input but still stores it
you have no error recovery for handling bad input
write methods to do thing like build a user or get user input rather than trying to do everything in one place. Limit responsibilities and it is easier to write, debug, and maintain.
Regex
As written, your pattern will probably only match itself because the pattern is not well-defined. I think what you are trying to do with your regex is "^[a-zA-Z]+$".
The ^ starts the match at the beginning of the String and the $ ends the match at the end of the String. Together it means the input must be an exact match to the pattern (i.e. no extraneous characters).
The [a-zA-Z] defines a character class of alphabet characters.
The + indicates one or more characters of the preceding character class match.
Note that String has a convenience method for pattern-matching so you can do something like
String regex = "^[a-zA-Z]+$";
String input = ...
if (input.matches(regex)) { ...
Regarding how to create an instance of the User. Write methods to do things and let the constructor simply construct the object.
// Constructor is simple - just assign parameter arguments to members
public User(String name, String surname) {
this.name = name;
this.surname = surname;
}
// a method for constructing a User - get the user input and invoke constructor
public User buildUser(Scanner in) {
// define the regex in a constant so its not repeated
String name = promptForString("Enter name", NAME_REGEX, in);
String surname = promptForString("Enter surname", in);
return new User(name, surname);
}
// get the input, check against regex, return it if valid
public String promptForString(String prompt, String regex, Scanner in) {
for (;;) {
String input = in.readLine();
if (!input.matches(regex)) {
System.out.println("Invalid input - try again");
continue;
}
return input;
}
}

First I'd say such complex logic shouldn't be used in a constructor. Use it in our main method or any other dedicated method and construct the object from the processed results:
...
Scanner scanner = new Scanner(System.in);
Pattern p = Pattern.compile(???);
User user = new User();
String name = scanner.nextLine().trim()
if( p.matcher(name).matches() )
{
user.setName(name);
}
...
Now let's talk about why you code is probably not working correctly and that's because of Regex. The regex expression you use maybe does not what you think. With external third-party tools like Regexr you can see what your expressions do.
In your case you only want to allow character but not numbers. This can be done with the Regex [A-Za-z] Now we'd check if the string is a single letter... However names rarely consist of a single character so we've to add a quantifier operator to the regex. Now let's assume a name can range between 0 and infinite letters. In such a case the correct quantifier would be + (at least 1) which leads to our final result: Pattern p = Pattern.compile("[A-Za-z]+")
Code I used for testing:
String val = "";
Pattern pattern = Pattern.compile( "^[A-Za-z]+$" );
try ( Scanner scanner = new Scanner( System.in ) )
{
String input = scanner.nextLine().trim();
Matcher m = pattern.matcher( input );
if(m.matches())
val = input;
}
System.out.println("Value: " + val);

Write a regular expression to count sentences

I have a String :
"Hello world... I am here. Please respond."
and I would like to count the number of sentences within the String. I had an idea to use a Scanner as well as the useDelimiter method to split any String into sentences.
Scanner in = new Scanner(file);
in.useDelimiter("insert here");
I'd like to create a regular expression which can go through the String I have shown above and identify it to have two sentences. I initially tried using the delimiter:
[^?.]
It gets hung up on the ellipses.

You could use a regular expression that checks for a non end of sentence, followed by an end of sentence like:
[^?!.][?!.]
Although as #Gabe Sechan points out, a regular expression may not be accurate when the sentence includes abbreviated words such as Dr., Rd., St., etc.

this could help :
public int getNumSentences()
{
List<String> tokens = getTokens( "[^!?.]+" );
return tokens.size();
}
and you can also add enter button as separator and make it independent on your OS by the following line of code
String pattern = System.getProperty("line.separator" + " ");
actually you can find more about the
Enter
here : Java regex: newline + white space
and hence finally the method becomes :
public int getNumSentences()
{
List<String> tokens = getTokens( "[^!?.]+" + pattern + "+" );
return tokens.size();
}
hope this could help :) !

A regular expression probably isn't the right tool for this. English is not a regular language, so regular expressions get hung up- a lot. For one thing you can't even be sure a period in the middle of the text is an end of sentence- abbreviations (like Mr.), acronyms with periods, and initials will screw you up as well. Its not the right tool.

For your sentence : "Hello world... I am here. Please respond."
The code will be :
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class JavaRegex {
public static void main(String[] args) {
int count=0;
String sentence = "Hello world... I am here. Please respond.";
Pattern pattern = Pattern.compile("\\..");
Matcher matcher = pattern.matcher(sentence);
while(matcher.find()) {
count++;
}
System.out.println("No. of sentence = "+count);
}
}

Need help in Regex to exclude splitting string within "

I need to split a String based on comma as seperator, but if the part of string is enclosed with " the splitting has to stop for that portion from starting of " to ending of it even it contains commas in between.
Can anyone please help me to solve this using regex with look around.

Resurrecting this question because it had a simple regex solution that wasn't mentioned. This situation sounds very similar to ["regex-match a pattern unless..."][4]
\"[^\"]*\"|(,)
The left side of the alternation matches complete double-quoted strings. We will ignore these matches. The right side matches and captures commas to Group 1, and we know they are the right ones because they were not matched by the expression on the left.
Here is working code (see online demo):
import java.util.regex.*;
import java.util.List;
class Program {
public static void main (String[] args) {
String subject = "\"Messages,Hello\",World,Hobbies,Java\",Programming\"";
Pattern regex = Pattern.compile("\"[^\"]*\"|(,)");
Matcher m = regex.matcher(subject);
StringBuffer b = new StringBuffer();
while (m.find()) {
if(m.group(1) != null) m.appendReplacement(b, "SplitHere");
else m.appendReplacement(b, m.group(0));
}
m.appendTail(b);
String replaced = b.toString();
String[] splits = replaced.split("SplitHere");
for (String split : splits)
System.out.println(split);
} // end main
} // end Program
Reference
How to match pattern except in situations s1, s2, s3

Please try this:
(?<!\G\s*"[^"]*),
If you put this regex in your program, it should be:
String regex = "(?<!\\G\\s*\"[^\"]*),";
But 2 things are not clear:
Does the " only start near the ,, or it can start in the middle of content, such as AAA, BB"CC,DD" ? The regex above only deal with start neer , .
If the content has " itself, how to escape? use "" or \"? The regex above does not deal any escaped " format.

java phone number validation

Here is my problem:
Create a constructor for a telephone number given a string in the form xxx-xxx-xxxx or xxx-xxxx for a local number. Throw an exception if the format is not valid.
So I was thinking to validate it using a regular expression, but I don't know if I'm doing it correctly. Also what kind of exception would I have to throw? Do I need to create my own exception?
public TelephoneNumber(String aString){
if(isPhoneNumberValid(aString)==true){
StringTokenizer tokens = new StringTokenizer("-");
if(tokens.countTokens()==3){
areaCode = Integer.parseInt(tokens.nextToken());
exchangeCode = Integer.parseInt(tokens.nextToken());
number = Integer.parseInt(tokens.nextToken());
}
else if(tokens.countTokens()==2){
exchangeCode = Integer.parseInt(tokens.nextToken());
number = Integer.parseInt(tokens.nextToken());
}
else{
//throw an excemption here
}
}
}
public static boolean isPhoneNumberValid(String phoneNumber){
boolean isValid = false;
//Initialize reg ex for phone number.
String expression = "(\\d{3})(\\[-])(\\d{4})$";
CharSequence inputStr = phoneNumber;
Pattern pattern = Pattern.compile(expression);
Matcher matcher = pattern.matcher(inputStr);
if(matcher.matches()){
isValid = true;
}
return isValid;
}
Hi sorry, yes this is homework. For this assignments the only valid format are xxx-xxx-xxxx and xxx-xxxx, all other formats (xxx)xxx-xxxx or xxxxxxxxxx are invalid in this case.
I would like to know if my regular expression is correct

So I was thinking to validate it using a regular expression, but I don't know if I'm doing it correctly.
It indeed looks overcomplicated. Also, matching xxx-xxx-xxxx or xxx-xxxx where x is a digit can be done better with "(\\d{3}-){1,2}\\d{4}". To learn more about regex I recommend to go through http://regular-expressions.info.
Also what kind of exception would I have to throw? Do I need to create my own exception?
A ValidatorException seems straight forward.
public static void isPhoneNumberValid(String phoneNumber) throws ValidatorException {
if (!phoneNumber.matches(regex)) {
throws ValidatorException("Invalid phone number");
}
}
If you don't want to create one yourself for some reasons, then I'd probably pick IllegalArgumentException, but still, I don't recommend that.
That said, this validation of course doesn't cover international and/or external telephone numbers. Unless this is really homework, I'd suggest to rethink the validation.

^(([(]?(\d{2,4})[)]?)|(\d{2,4})|([+1-9]+\d{1,2}))?[-\s]?(\d{2,3})?[-\s]?((\d{7,8})|(\d{3,4}[-\s]\d{3,4}))$
matches:
(0060)123-12345678, (0060)12312345678, (832)123-1234567, (006)03-12345678,
(006)03-12345678, 00603-12345678, 0060312345678
0000-123-12345678, 0000-12-12345678, 0000-1212345678 ... etc.
1234-5678, 01-123-4567
Can replace '-' with SPACE i.e (0080) 123 12345678
Also matches +82-123-1234567, +82 123 1234567, +800 01 12345678 ... etc.
More for house-hold/private number.
Not for 1-800-000-0000 type of number
*Tested with Regex tester http://regexpal.com/

You could match those patterns pretty easily as suggested by BalusC.
As a side note, StringTokenizer has been deprecated. From JavaDoc:
StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.
An easier way to split your string into the appropriate segments would be:
String phoneParts[] = phoneNumber.split("-");

String pincode = "589877";
Pattern pattern = Pattern.compile("\\d{6}");
\d indicates the digits. inside the braces the number of digits
Matcher matcher = pattern.matcher(pincode);
if (matcher.matches()) {
System.out.println("Pincode is Valid");
return true;
} else {
System.out.println("pincode must be a 6 digit Number");

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java regex string output not as expected - java

Related

Extracting hashtags from user input

Accept only letters in constructor using scanner in Java

Write a regular expression to count sentences

Need help in Regex to exclude splitting string within "

java phone number validation

Categories

Resources