Java regular expression matching two consecutive consonants - java

I'm trying to match only strings with two consecutive consonants. but no matter what input I give to myString this never evaluates to true, so I have to assume something is wrong with the syntax of my regex. Any ideas?
if (Pattern.matches("([^aeiou]&&[^AEIOU]){2}", myString)) {...}
Additional info:
myString is a substring of at most two characters
There is no whitespace, as this string is the output of a .split with a whitespace delimiter
I'm not worried about special characters, as the program just concatenates and prints the result, though if you'd like to show me how to include something like [b-z]&&[^eiou] in your answer I would appreciate it.
Edit:
After going through these answers and testing a little more, the code I finally used was
if (myString.matches("(?i)[b-z&&[^eiou]]{2}")) {...}

[^aeiou] matches non-letter characters as well, so you should use a different pattern:
Pattern rx = Pattern.compile("[bcdfghjklmnpqrstuvwxyz]{2}", Pattern.CASE_INSENSITIVE);
if (rx.matches(myString)) {...}
If you would like to use && for an intersection, you can do it like this:
"[a-z&&[^aeiou]]{2}"
Demo.

To use character class intersection, you need to wrap your syntax inside of a bracketed expression. The below matches characters that are both lowercase letters and not vowels.
[a-z&&[^aeiou]]{2}

Related

Regular expression for year with optional closing parenthesis

I am struggling to get the following regular expression (in Java) to work nicely. I want to see if a string has a year, and the strings can be
Mar 3, 2014
or sometimes with a closing parenthesis such as
Mar 3, 2014)
I am using
text.matches("\\b((19|20)\\d{2})(\\)?)\\b")
which works in most cases, but does not match if string ends at the parenthesis
If I use
text.matches("\\b((19|20)\\d{2})(\\)?)$")
it matches text that ends after the parenthesis but not a string that has another space
I thought that \b would include end of string, but cannot get it to work.
I know I can use two regex's but that seems really ugly.
Your main problem is that matches checks if entire string matches regex. What you want is to test if string contains substring which can be matched by regex. To do so use
Pattern p = Pattern.compile(yourRegex);
Matcher m = p.matcher(stringYouWantToTest);
if (m.find()){
//tested string contains part which can be matched by regex
}else{
//part which could be matched by regex couldn't be found
}
You can also surround your regex with .* to let it match characters beside part you wanted to find and use matches like you are doing now,
if(yourString.matches(".*"+yourRegex+".*"))
but this will have to iterate over entire string.
In other words you can try to find \\b(19|20)\\d{2}\\b using Pattern/Matcher or use something like matches(".*\\b(19|20)\\d{2}\\b.*").
BTW parenthesis ) are not included in \w class so \b will accept place between \w and ) as word boundary so for instance "9)" will match regex \d\b\).
Your question isn't very clear, but from what I understand, this should work for you:
text.matches("((?:19|20)(?:\\d){2})\\)?");
Demo: http://regex101.com/r/lO0aH4/3
You could try something like :
".*(19|20)[0-9]{2}\\)?$"
I'm not sure it could help you, it would better to give us a complete example of string to match. Must the string be ended by a year (with optional parenthesis) or may it be something else after ?

Split on non arabic characters

I have a String like this
أصبح::ينال::أخذ::حصل (على)::أحضر
And I want to split it on non Arabic characters using java
And here's my code
String s = "أصبح::ينال::أخذ::حصل (على)::أحضر";
String[] arr = s.split("^\\p{InArabic}+");
System.out.println(Arrays.toString(arr));
And the output was
[, ::ينال::أخذ::حصل (على)::أحضر]
But I expect the output to be
[ينال,أخذ,حصل,على,أحضر]
So I don't know what's wrong with this?
You need a negated class, and to do that, you need square brackets [ ... ]. Try to split with this:
"[^\\p{InArabic}]+"
If \\p{InArabic} matches any arabic character, then [^\\p{InArabic}] will match any non-arabic character.
Another option you can consider is an equivalent syntax, using P instead of p to indicate the opposite of the \\p{InArabic} character class like #Pshemo mentioned:
"\\P{InArabic}+"
This works just like \\W is the opposite of \\w.
The only possible advantage you get with the first syntax over the second (again like #Pshemo mentioned), is that if you want to add other characters to the list of characters which shouldn't match, for example, if you want to match all non \\p{InArabic} except periods, the first one is more flexible:
"[^\\p{InArabic}.]+"
^
Otherwise, if you really want to use \\P{InArabic}, you'll need subtraction within classes:
"[\\P{InArabic}&&[^.]]+"
The expression you want is "\\P{InArabic}+"
This means match any (non-zero) number of characters that are not Arabic.

Java regex match all characters except

What is the correct syntax for matching all characters except specific ones.
For example I'd like to match everything but letters [A-Z] [a-z] and numbers [0-9].
I have
string.matches("[^[A-Z][a-z][0-9]]")
Is this incorrect?
Yes, you don't need nested [] like that. Use this instead:
"[^A-Za-z0-9]"
It's all one character class.
If you want to match anything but letters, you should have a look into Unicode properties.
\p{L} is any kind of letter from any language
Using an uppercase "P" instead it is the negation, so \P{L} would match anything that is not a letter.
\d or \p{Nd} is matching digits
So your expression in modern Unicode style would look like this
Either using a negated character class
[^\p{L}\p{Nd}]
or negated properties
[\P{L}\P{Nd}]
The next thing is, matches() matches the expression against the complete string, so your expression is only true with exactly one char in the string. So you would need to add a quantifier:
string.matches("[^\p{L}\p{Nd}]+")
returns true, when the complete string has only non alphanumerics and at least one of them.
Almost right. What you want is:
string.matches("[^A-Za-z0-9]")
Here's a good tutorial
string.matches("[^A-Za-z0-9]")
Lets say that you want to make sure that no Strings have the _ symbol in them, then you would simply use something like this.
Pattern pattern = Pattern.compile("_");
Matcher matcher = Pattern.matcher(stringName);
if(!matcher.find()){
System.out.println("Valid String");
}else{
System.out.println("Invalid String");
}
You can negate character classes:
"[^abc]" // matches any character except a, b, or c (negation).
"[^a-zA-Z0-9]" // matches non-alphanumeric characters.

Regular expression to match a character only once before any whitespace

In Java, what regular expression would I use to match a string that has exactly one colon and makes sure that the colon appears before any whitespace?
For example, it should match these strings:
label: print "Enter input"
But: I still had the money.
ghjkdhfjkgjhalergfyujhrageyjdfghbg:
area:54
But not
label: print "Enter input:"
There was one more thing: I still had the money.
ghfdsjhgakjsdhfkjdsagfjkhadsjkhflgadsjklfglsd
area::54
If you use it with matches (which requires to match the entire string), you could use
[^\\s:]*:[^:]*
Which means: arbitrarily many non-whitespace, non-: characters, then a :, then more arbitrarily many non-: characters.
I've really only used two regex concepts: (negated) character classes and repetition.
If you want to require at least one character before or after :, replace the corresponding * with + (as jlordo pointed out in a comment).
The following should work:
^[^\s:]*:(?!.*:)
If your strings can contain line breaks, use the DOTALL flag or change the regex to the following:
(?s)^[^\s:]*:(?!.*:)
It depends on what we call white space, it could be
[^\\p{Space}:]*:[^:]
The following should get you started:
Matcher MatchedPattern = Pattern.compile("^(\\w+\\:{1}[\"\\w\\s\\.]*)$").matcher("yourstring");

Regex for java's String.matches method?

Basically my question is this, why is:
String word = "unauthenticated";
word.matches("[a-z]");
returning false? (Developed in java1.6)
Basically I want to see if a string passed to me has alpha chars in it.
The String.matches() function matches your regular expression against the whole string (as if your regex had ^ at the start and $ at the end). If you want to search for a regular expression somewhere within a string, use Matcher.find().
The correct method depends on what you want to do:
Check to see whether your input string consists entirely of alphabetic characters (String.matches() with [a-z]+)
Check to see whether your input string contains any alphabetic character (and perhaps some others) (Matcher.find() with [a-z])
Your code is checking to see if the word matches one character. What you want to check is if the word matches any number of alphabetic characters like the following:
word.matches("[a-z]+");
with [a-z] you math for ONE character.
What you’re probably looking for is [a-z]*

Categories

Resources