match with java 8 regex string form any language

match with java 8 regex string form any language - java

I try to match with java 8 regex string form any language
as long as it includes letters, digits and . or -
String s = "בלה בלה";
String pattern= "^[\\p{L}\\p{Digit}_.-]*$";
return s.matches(pattern);
what am i missing as this code returns null for hebrew valid string.

You may add a whitespace to your pattern, and use \w instead of \p{L}\p{Digit}_ while passing the Pattern.UNICODE_CHARACTER_CLASS flag:
String s = "בלה בלה";
String pattern= "(?U)[\\w\\s.-]*";
System.out.println(s.matches(pattern));
// => true
See the Java demo
Since the pattern is used inside String#matches() method, the ^ and $ anchors are not necessary. If you plan to use the pattern with the Pattern#find() method, enclose the pattern within anchors as in the original code ("^(?U)[\\w\\s.-]*$").
Pattern details:
(?U) - the Pattern.UNICODE_CHARACTER_CLASS embedded modifier flag that makes shorthand character classes Unicode aware (you may see what \w matches in this mode)
[\\w\\s.-]* - zero or more:
\w - word chars (letters, digits, _ and some more)
\s - whitespaces
. - a dot (no need to escape it inside a character class)
- - a hyphen (no need as it is at the end of the character class)

Related

Was there any difference between using regex in java and regex in javascript?

I have a requirement to build a regex pattern to validate a String in Java. Hence I build a pattern
[A-Z][a-z]*\s?[A-Z]?[a-z]*$ for the conditions:
Should start with caps
Every other Word should start with caps
No numbers included
no consecutive two spaces allowed
Pattern.matches("[A-Z][a-z]*\s?[A-Z]?[a-z]*$","Joe V") returns false for me in java.
But the same pattern returns true for the data "Joe V" in regexr.com.
What might be the cause?

Javascript has native support for regex while Java doesn't. Since Java uses \ for special signs in strings (like \n) you have to escape the \ to actually be a \ sign. That's done with another \. So any \ you use in Java should be written as \\.
Thus your regex / code should be:
Pattern.matches("[A-Z][a-z]*\\s?[A-Z]?[a-z]*$", "Joe V")
which returns true
P.s. \s is interpreted as a Space in any Java-String

You can use
Pattern.matches("[A-Z][a-z]*(?:\\s[A-Z][a-z]*)*","Joe V")
Pattern.matches("\\p{Lu}\\p{Ll}*(?:\\s\\p{Lu}\\p{Ll}*)*","Joe V")
See the regex demo #1 and regex demo #2.
Note that .matches requires a full string match, hence the use of ^ and $ anchors on the testing site and their absence in the code.
Details:
^ - start of string (implied in .matches)
[A-Z] / \p{Lu} - an (Unicode) uppercase letter
[a-z]* / \p{Ll}* - zero or more (Unicode) lowercase letters
(?:\s[A-Z][a-z]*)* / (?:\s\p{Lu}\p{Ll}*)* - zero or more sequences of
\s - one whitespace
[A-Z][a-z]* /\p{Lu}\p{Ll}* - an uppercase (Unicode) letter and then zero or more (Unicode) lowercase letters.
$ - end of string (implied in .matches)

Regular Expression Pattern - Query

I am trying to find create a pattern that would satisfy below rules
mydomain.com
www.mydomain.com
www.alternatedomain.com
www100.mydomain.com
online.mydomain.com
subl.mydomain.com
The pattern that i have created so far doesnt work.
I may or may not have values before mydomain.
private static final String MY_PATTERN =
"((www*|online|subl)*\\.((mydomain|alternatedomain)\\.(com)$))";
And if it has values it should belong to a restrictive set

I suggest using
String rx = "^(?:(?:www\\d*|online|subl)*\\.)?(?:mydomain|alternatedomain)\\.com$";
See the regex demo
I removed or converted to non-capturing groups all capturing groups in the pattern. If you are using those parts of the string later, revert them.
If you use the regex with .matches() method remove ^ and $, they are redundant then, as the method makes sure the entire string matches the pattern.
Details
^ - start of string
(?:(?:www\\d*|online|subl)*\\.)? - an optional non-capturing group matching
(?:www\\d*|online|subl)* - www and 0+ digits, or online or subl
\\. - a dot
(?:mydomain|alternatedomain) - a non-capturing group matching either mydomain or alternatedomain
\\.com - .com substring
$ - end of string.

Try ((www\\d*|online|subl)\\.)?(mydomain|alternatedomain)\\.com
You can test your regex online here but don't forget to replace the \\ with a single \ (because in Java code \\ means a \ in regex)

Java Regex, match pattern, pair of words

i am using regex to check correctness of the string in my application. I want to check if string has a following pattern: x=y&a=b&... x,y,a,b etc. can be empty.
Example of correct strings:
abc=def&gef=cda&pdf=cdf
=&gef=def
abc=&gef=def
=abc&gef=def
Example of incorrect strings:
abc=def&gef=cda&
abc=def&gef==cda&
abc=defgef=cda&abc=gda
This is my code showing current solution:
String pattern = "[[a-zA-Z0-9]*[=]{1}[a-zA-Z0-9]*[&]{1}]*";
if(!Pattern.matches(pattern, s)){
throw new IllegalArgumentException(s);
}
This solution is bad because it accepts strings like:
abc=def&gef=def&
Can anyone help me with correct pattern?

You may use the following regex:
^[a-zA-Z0-9]*=[a-zA-Z0-9]*(?:&[a-zA-Z0-9]*=[a-zA-Z0-9]*)*$
See the regex demo
When used with matches(), the ^ and $ anchors may be omitted.
Details:
^ - start of string
[a-zA-Z0-9]* - 0+ alphanumeric chars (may be replaced with \p{Alnum})
= - a = symbol
[a-zA-Z0-9]* - 0+ alphanumeric chars
= - a = symbol
(?: - start of a non-capturing group matching sequences of...
& - a & symbol
[a-zA-Z0-9]*=[a-zA-Z0-9]* - same as above
)* - ... zero or more occurrences
$ - end of string
NOTE: If you want to make the pattern more generic, you may match any char other than = and & with a [^&=] pattern that would replace a more restrictive [a-zA-Z0-9] pattern:
^[^=&]*=[^=&]*(?:&[^=&]*=[^=&]*)*$
See this regex demo

I believe you want this.
([a-zA-Z0-9]*=[a-zA-Z0-9]*&)*[a-zA-Z0-9]*=[a-zA-Z0-9]*
This matches any number of repetitions like x=y, with a & after each one; followed by one repetition like x=y without the following &.

Here you go:
^\w*=\w*(?:&(?:\w*=\w*))*$
^ is the starting anchor
(\w*=\w*) is to represent parameters like abc=def
\w matches a word character [a-zA-Z0-9_]
\w* represents 0 or more characters
& represents tha actual ampersand literal
(&(\w*=\w*))* matches any subsequents parameters like &b=d etc.
$ represents the ending anchor
Regex101 Demo
EDIT: Made all groups non-capturing.
Note: As #WiktorStribiżew has pointed out in the comments, \w will match _ as well, so above regex should be modified to exclude underscores if they are to be avoided in the pattern, i.e [A-Za-z0-9]

Restrict consecutive characters using Java Regex

I need to allow alphanumeric characters , "?","." , "/" and "-" in the given string. But I need to restrict consecutive - only.
For example:
www.google.com/flights-usa should be valid
www.google.com/flights--usa should be invalid
currently I'm using ^[a-zA-Z0-9\\/\\.\\?\\_\\-]+$.
Please suggest me how to restrict consecutive - only.

You may use grouping with quantifiers:
^[a-zA-Z0-9/.?_]+(?:-[a-zA-Z0-9/.?_]+)*$
See the regex demo
Details:
^ - start of string
[a-zA-Z0-9/.?_]+ - 1 or more characters from the set defined in the character class (can be replaced with [\w/.?]+)
(?:-[a-zA-Z0-9/.?_]+)* - zero or more sequences ((?:...)*) of:
- - hyphen
[a-zA-Z0-9/.?_]+ - see above
$ - end of string.
Or use a negative lookahead:
^(?!.*--)[a-zA-Z0-9/.?_-]+$
^^^^^^^^^
See the demo here
Details:
^ - start of string
(?!.*--) - a negative lookahead that will fail the match once the regex engine finds a -- substring after any 0+ chars other than a newline
[a-zA-Z0-9/.?_-]+ - 1 or more chars from the set defined in the character class
$ - end of string.
Note that [a-zA-Z0-9_] = \w if you do not use the Pattern.UNICODE_CHARACTER_CLASS flag. So, the first would look like "^[\\w/.?]+(?:-[\\w/.?]+)*$" and the second as "^(?!.*--)[\\w/.?-]+$".

One approach is to restrict multiple dashes with negative look-behind on a dash, like this:
^(?:[a-zA-Z0-9\/\.\?\_]|(?<!-)-)+$
The right side of the |, i.e. (?<!-)-, means "a dash, unless preceded by another dash".
Demo.

I'm not sure of the efficiency of this, but I believe this should work.
^([a-zA-Z0-9\/\.\?\_]|\-([^\-]|$))+$
For each character, this regex checks if it can match [a-zA-Z0-9\/\.\?\_], which is everything you included in your regex except the hyphen. If that does not match, it instead tries to match \-([^\-]|$), which matches a hyphen not followed by another hyphen, or a hyphen at the end of the string.
Here's a demo.

Java regex "[.]" vs "."

I'm trying to use some regex in Java and I came across this when debugging my code.
What's the difference between [.] and .?
I was surprised that .at would match "cat" but [.]at wouldn't.

[.] matches a dot (.) literally, while . matches any character except newline (\n) (unless you use DOTALL mode).
You can also use \. ("\\." if you use java string literal) to literally match dot.

The [ and ] are metacharacters that let you define a character class. Anything enclosed in square brackets is interpreted literally. You can include multiple characters as well:
[.=*&^$] // Matches any single character from the list '.','=','*','&','^','$'
There are two specific things you need to know about the [...] syntax:
The ^ symbol at the beginning of the group has a special meaning: it inverts what's matched by the group. For example, [^.] matches any character except a dot .
Dash - in between two characters means any code point between the two. For example, [A-Z] matches any single uppercase letter. You can use dash multiple times - for example, [A-Za-z0-9] means "any single upper- or lower-case letter or a digit".
The two constructs above (^ and -) are common to nearly all regex engines; some engines (such as Java's) define additional syntax specific only to these engines.

regular-expression constructs
. => Any character (may or may not match line terminators)
and to match the dot . use the following
[.] => it will matches a dot
\\. => it will matches a dot
NOTE: The character classes in Java regular expression is defined using the square brackets "[ ]", this subexpression matches a single character from the specified or, set of possible characters.
Example : In string address replaces every "." with "[.]"
public static void main(String[] args) {
String address = "1.1.1.1";
System.out.println(address.replaceAll("[.]","[.]"));
}
if anything is missed please add :)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

match with java 8 regex string form any language - java

I try to match with java 8 regex string form any language as long as it includes letters, digits and . or - String s = "בלה בלה"; String pattern= "^[\\p{L}\\p{Digit}_.-]*$"; return s.matches(pattern); what am i missing as this code returns null for hebrew valid string.

Related

Was there any difference between using regex in java and regex in javascript?

Regular Expression Pattern - Query

Java Regex, match pattern, pair of words

Restrict consecutive characters using Java Regex

Java regex "[.]" vs "."

Categories

Resources