Unique regex for first name and last name - java

I have a single input where users should enter name and surname. The problem is i need to use checking regEx. There's a list of a requirements:
The name should start from Capital Letter (not space)
There can't be space stacks
It's obligate to support these Name and Surname (all people are able to write theirs first/name). Example:
John Smith
and
Armirat Bair Hossan
And the last symbol shouldn't be space.
Please help,
ATM i have regex like
^\\p{L}\\[p{L} ,.'-]+$
but it denies ALL input, which is not good
Thanks for helping me
UPDATE:
CORRECT INPUT:
"John Smith"
"Alberto del Muerto"
INCORRECT
" John Smith "
" John Smith"

You can use
^[\p{Lu}\p{M}][\p{L}\p{M},.'-]+(?: [\p{L}\p{M},.'-]+)*$
or
^\p{Lu}\p{M}*+(?:\p{L}\p{M}*+|[,.'-])++(?: (?:\p{L}\p{M}*+|[,.'-])++)*+$
See the regex demo and demo 2
Java declaration:
if (str.matches("[\\p{Lu}\\p{M}][\\p{L}\\p{M},.'-]+(?: [\\p{L}\\p{M},.'-]+)*")) { ... }
// or if (str.matches("\\p{Lu}\\p{M}*+(?:\\p{L}\\p{M}*+|[,.'-])++(?: (?:\\p{L}\\p{M}*+|[,.'-])++)*+")) { ... }
The first regex breakdown:
^ - start of string (not necessary with matches() method)
[\p{Lu}\p{M}] - 1 Unicode letter (incl. precomposed ones as \p{M} matches diacritics and \p{Lu} matches any uppercase Unicode base letter)
[\p{L}\p{M},.'-]+ - matches 1 or more Unicode letters, a ,, ., ' or - (if 1 letter names are valid, replace + with - at the end here)
(?: [\p{L}\p{M},.'-]+)* - 0 or more sequences of
- a space
[\p{L}\p{M},.'-]+ - 1 or more characters that are either Unicode letters or commas, or periods, or apostrophes or -.
$ - end of string (not necessary with matches() method)
NOTE: Sometimes, names contain curly apostrophes, you can add them to the character classes ([‘’]).
The 2nd regex is less effecient but is more accurate as it will only match diacritics after base letters. See more about matching Unicode letters at regular-expressions.info:
To match a letter including any diacritics, use \p{L}\p{M}*+.

Try this one
^[^- '](?=(?![A-Z]?[A-Z]))(?=(?![a-z]+[A-Z]))(?=(?!.*[A-Z][A-Z]))(?=(?!.*[- '][- ']))[A-Za-z- ']{2,}$
There is also an interactive Demo of this pattern available at an external website.

You made a typo: the second \\ should be in front of p.
However even then there is a check missing for a trailing space
"^\\p{L}[\\p{L} ,.'-]+$"
For a .matches the following would suffice
"\\p{L}[\\p{L} ,.'-]*[\\p{L}.]"
Names like "del Rey, Hidalgo" do not require an initial capital.
Also I would advise to simply .trim() the input; imagine a user regarding at the input being rejected for a spurious blank.

Try this
^[A-Z][a-z]+(([\s][A-Z])?[a-z]+){1,2}$
but use \\ instead \ for java

Related

Regex for no white space at the beginning and at the and and only one in between

I am creating a user input form where a user can enter his name and I want to use a regex for the following pattern:
^[A-Za-z_äÄöÖüÜß_.-]*
It only should accept the above letters, and dots if any and slashes if any.
Moreover I want it to accept white spaces but not at the beginning and not at the end and only one white space between name parts.
E.g. if user's name is Dora F. T. Kov
it should be valid.
If I am adding \\s to my regex, it allows any amount of white spaces in my string anywhere.
How could I rewrite it based on the above concept?
Thank you a lot in advance!
How about:
^[A-Za-z_äÄöÖüÜß_.-]+(?: [A-Za-z_äÄöÖüÜß_.-]+)*$
See regex demo
^ - Matches start of string
[A-Za-z_äÄöÖüÜß_.-]+ Matches one or more of these allowed characters
(?: [A-Za-z_äÄöÖüÜß_.-]+)* - Followed by 0 or more occurrences of: single space followed by one or more of your allowed characters.
$ - Matches end of string
Try: ^[A-Za-z_äÄöÖüÜß_.-]*(?: [A-Za-z_äÄöÖüÜß_.-]*)*$
See it working here

How to make a regex to match string that starts with 0-9 or a-z with accents and must accept only this special character - _ ' between words?

My pattern must match a String that:
Can start with number
Can start with letters with accents or without accents too
Can't start with spaces
Can't start with special characters
Allow spaces between words
Do not accept special character except: - _ '
My current patter is: ^[^_\W][\p{L}\s0-9À-ÖØ-öø-ÿ.'-]+$
Valid examples:
Blockquote
João Antonio
João-Antonio
João's Company
Peter Müller
François Hollande
Patrick O'Brian
Silvana Koch-Mehrin
Invalid examples:
Company N#me
100% Company
\Company
\s Company
_Blockquote
Please help me!
My best was:
/^[^_\W][\p{L}\s0-9À-ÖØ-öø-ÿ.'-]+$/gi
Test: https://regexr.com/521r2
First letter:
Start with number, letter
Exclude accents and special chars
[^\W_]
Rest of the text:
Include number, letter and accents
Include _, -, ' and
[0-9] & [A-Za-zÀ-ÖØ-öø-ÿ] & [_\-\' ]
Here you are:
^[^\W_][0-9A-Za-zÀ-ÖØ-öø-ÿ_\-\' ]+$
See this question
When you have to deal with complicated Regex, use the Regexr'!
I think the requirements are not too clear but based on your examples:
^[a-zA-ZÀ-ÖØ-öø-ÿ][ '_-a-zA-ZÀ-ÖØ-öø-ÿ]+$
^ = beginning of line
$ = end of line
[a-zA-ZÀ-ÖØ-öø-ÿ] = matches all these characters specified including one with accents
[ '_-a-zA-ZÀ-ÖØ-öø-ÿ] = same as above except it includes the quote, blank space, underscore
+ = one or more (greedy)
See this for more details and examples link
Best.

Problem coming up with appropriate Regex expression

I need to match text similar to the following text in an if statement.
REG#John Smith#14102245862#7 johns road new york#John Anthony Smith
The expression is meant to match a REG keyword at the beginning of the string then username followed by an account number composed of numbers with no specific restriction on the number of digits, then the address and lastly the name of the individual the address is registered to.
The Regex expression I had come up with is not working. The regex expression is below:
^REG\#\w\#[0-9]\#\w\#\w
May you kindly assist in showing me where I went wrong and how to make it work.
Thank you in advance
The problem is that you don't use quantifiers (* or +) and space is not included within \w which stands for [A-Za-z0-9_]. The character # does not need to be escaped (at least as far as I know in Java). Try the following Regex:
^REG#[\w ]+#\d+#[\w ]+#[\w ]+
^REG matches the beginning of the string (REG) literally
# matches self literally
[\w ]+ stands for at least one word character or space
\d+ stands for at least one digit
In Java, don't forget the double escaping:
String regex = "^REG#[\\w ]+#\\d+#[\\w ]+#[\\w ]+";
Try ^REG\#.*?\#[0-9]*?\#.*?\#.* , the operator *? means repeat until next slice of expression, in that case, \#

Find java comments (multi and single line) using regex

I found the following regex online at http://regexlib.com/
(\/\*(\s*|.*?)*\*\/)|(\/\/.*)
It seems to work well for the following matches:
// Compute the exam average score for the midterm exam
/**
* The HelloWorld program implements an application that
*/
BUT it also tends to match
http://regexr.com/foo.html?q=bar
at least starting at the //
I'm new to regex and a total infant, but I read that if you put a caret at the beginning it forces the match to start at the beginning of the line, however this doesn't seem to work on RegExr.
I'm using the following:
^(\/\*(\s*|.*?)*\*\/)|(\/\/.*)$
The regex you are looking for is one that allows the comment beginning (// or /*) to appear anywhere except in each of the regexps that result in tokens that can contain those substrings inside. If you look at the lexical structure of java language, you'll see that the only lexical element that can contain a // or a /* inside is the string literal, so to match a comment inside a string you have to match all the string (for not having a string literal before your match that happens to begin a string literal --- and contain your comment inside)
So, the string before your comment should be composed of any valid string that don't begin a string literal (without ending) and so, it can be rounded by any number of string literals with any string that doesn't form a string literal in between. If you consider a string literal, it should be matched by the following:
\"()*\"
and the inside of the parenthesis must be filled with something that cannot be a \n, a single ", a single \, and also not a unicode literal \uxxxx that results in a valid " (java forbids to use normal java characters to be encoded as unicode sequences, so this last case doesn't apply) but can be a escaped \\ or a escaped \", so this leads to
\"([^\\\"\n]|\\.)*\"
and this can be repeated any number of times optionaly, and preceded of any character not being a " (that should begin the last part considered):
([^\\"](\"([^\\\"\n]|\\.)*\")?)*
well, the previous part to our valid string should be matched by this string, and then comes the comment string, it can be any of two forms:
\/\/[^\n]*$
or
/\*([^\*]|\*[^\/])*\*\/
(this is, a slash, an asterisk (escaped), and any number of things that can be: either something different than a * or * followed by something not a /, to finally reach a */ sequence)
These can be grouped in an alternative group, as in:
(\/\/[^\n]*\n|\/\*([^\*]|\*[^\/])*\*\/)
finally, our expression shows:
^([^\\"](\"([^\\\"\n]|\\.)*\")?)*(\/\/[^\n]*|\/\*([^\*]|\*[^/])*\*\/)
But you should be careful that your matched comment begins not at the beginning, but in the 4th group (in the mark of the 4th left parenthesis) and the regexp should match the string from the beginning, see demo
Note
Think you are matching not only the comment, but the text before. This makes the result match to be composed of what is before the matching you want and the matched. Also think that if you try this regexp with several comments in sequence, it will match only the last, as we have not covered the case of a /* ... /* .... */ sequence (the comment is also something that can be embedded into a comment, but considering also this case will make you hate regexps forever. The correct way to cope with this problem is to write a lex/flex specification to get the java tokens and you'll only get them, but this is out of scope in this explanation. See an probably valid example here.
You can try this pattern:
(?ms)^[^'"\n]*?(?:(?:"(?:\\.|[^"])*"|'\\?.')[^'"\n]*?)*((?:(?://[^\n]*|/\*.*?\*/)[ \t]*)+)
This captures comments in group 1, but only if the comment is not inside a string. Demo.
Breakdown:
(?ms) multiline flag, makes ^ match at the start of a line
singleline flag makes . match newlines
^ start of line
[^'"\n]*? match anything but " or ' or newline
(?: then, any number strings:
(?:
" start with a quote...
(?: ...followed by any number of...
\\. ...a backslash and the escaped character
| or
[^"] any character other than "
)*
" ...and finally the closing quote
| or...
'\\?.' a single character in single quotes, possibly escaped
)
[^'"\n]*? and everything up to the next string or newline
)*
( finally, capture (any number of) comments:
(?:
(?: either...
//[^\n]* a single line comment
| or
/\*.*?\*/ a multiline comment
)
[ \t]* and any subsequent comments if only separated by whitespace
)+
)

Java Regular Expression: what is " '- "

I came up to a line in java that uses regular expressions.
It needs a user input of Last Name
return lastName.matches( "[a-zA-z]+([ '-][a-zA-Z]+)*" );
I would like to know what is the function of the [ '-].
Also why do we need both a "+" and a "*" at the same time, and the [ '-][a-zA-Z] is in brackets?
Your RE is: [a-zA-z]+([ '-][a-zA-Z]+)*
I'll break it down into its component parts:
[a-zA-Z]+
The string must begin with any letter, a-z or A-Z, repeated one or more times (+).
([ '-][a-zA-Z]+)*
[ '-]
Any single character of <space>, ', or -.
[a-zA-Z]+
Again, any letter, a-z or A-Z, repeated once or more times.
This combination of letters ('- and a-ZA-Z) may then be repeated zero or more times.
Why [ '-]? To allow for hiphenated names, such as Higgs-Boson or names with apostrophes, such as O'Reilly, or names with spaces such as Van Dyke.
The expression [ '-] means "one of ', , or -". The order is very important - the dash must be the last one, otherwise the character class would be considered a range, and other characters with code points between the space and the quote ' would be accepted as well.
+ means "one or more repetitions"; * means "zero or more repetitions", referring to the term of the regular expression preceding the + or * modifier.]
Overall, the expression matches groups of lowercase and uppercase letters separated by spaces, dashes, or single quotes.
it means it can be any of the characters space ' or - ( space, quote dash )
the - can be done as \- as it also can mean a range... like a-z
This looks like it is a pattern to match double-barreled (space or hyphen) or I-don't-know-what-to-call-it names like O'Grady... for example:
It would match
counter-terrorism
De'ville
O'Grady
smith-jones
smith and wesson
But it will not match
jones-
O'Learys'
#hashtag
Bob & Sons
The idea is, after the first [A-Za-z]+ consumes all the letters it can, the match will end right there unless the next character is a space, an apostrophe, or a hyphen ([ '-]). If one of those characters is present, it must be followed by at least one more letter.
A lot of people have difficulty with this. The naively write something like [A-Za-z]+[ '-]?[A-Za-z]*, figuring both the separator and the extra chunks of letters are optional. But they're not independently optional; if there is a separator ([ '-]), it must be followed by at least one more letter. Otherwise it would treat strings like R'- j'-' as valid. Your regex doesn't have that problem.
By the way, you've got a typo in your regex: [a-zA-z]. You want to watch out for that, because [A-z] does match all the uppercase and lowercase letters, so it will seem to be working correctly as long as the inputs are valid. But it also matches several non-letter characters whose code points happen to lie between those of Z and a. And very few IDEs or regex tools will catch that error.

Categories

Resources