java regex to accept any word other than none - java

I need a regular expression to match any string other than none.
I tried using
regular exp ="^[^none]$",
But it does not work.

If you are matching a String against a specific word in Java you should use equals(). In this case you want to invert the match so your logic becomes:
if(!theString.equals("none")) {
// do stuff here
}
Much less resource hungry, and much more intuitive.
If you need to match a String which contains the word "none", you are probably looking for something like:
if(theString.matches("\\bnone\\b")) {
/* matches theString if the substring "none" is enclosed between
* “word boundaries”, so it will not match for example: "nonetheless"
*/
}
Or if you can be fairly certain that “word boundaries” mean a specific delimiter you can still evade regular expressions by using the indexOf() method:
int i = theString.indexOf("none");
if(i > -1) {
if(i > 0) {
// check theString.charAt(i - 1) to see if it is a word boundary
// e.g.: whitespace
}
// the 4 is because of the fact that "none" is 4 characters long.
if((theString.length() - i - 4) > 0) {
// check theString.charAt(i + 4) to see if it is a word boundary
// e.g.: whitespace
}
}
else {
// not found.
}

You can use the regular expression (?!^none$).*. See this question for details: Regex inverse matching on specific string?
The reason "^[^none]$" doesn't work is that you are actually matching all strings except the strings "n", "o", or "e".
Of course, it would be easier to just use String.equals like so: !"none".equals(testString).

Actually this is the regex to match all words except "word":
Pattern regex = Pattern.compile("\\b(?!word\\b)\\w+\\b");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group()
// match start: regexMatcher.start()
// match end: regexMatcher.end()
}
You must use word boundaries so that "word" is not contained in other words.
Explanation:
"
\b # Assert position at a word boundary
(?! # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
Lorem # Match the characters “Lorem” literally
\b # Assert position at a word boundary
)
\w # Match a single character that is a “word character” (letters, digits, etc.)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\b # Assert position at a word boundary
"

This is the regex you are looking for:
Pattern p = Pattern.compile("^(?!none$).*$");
Matcher m = p.matcher("your string");
System.out.println(s + ": " + (m.matches() ? "Match" : "NO Match"));
Having that said, if you are not forced to use a regex that matches everything but "none", the more simple, fast, clear, and easy to write and understand is this:
Pattern p = Pattern.compile("^none$");
Then, you just exclude the matches.
Matcher m = p.matcher("your string");
System.out.println(s + ": " + (m.matches() ? "NO Match" : "Match"));

Related

Regex NotBlank and doesnt contains <

I try to create a regex for a String which is NotBlank and cannot contain "<".
My question is what Im doing wrong thank you.
"(\\A(?!\\s*\\Z))|([^<]+)"
Edit
Maybe this way how to combine this regex
^[^<]+$
with this regex
\\A(?!\\s*\\Z).+
With regex, you can use
\A(?!\s+\z)[^<]+\z
(?U)\A(?!\s+\z)[^<]+\z
The (?U) is only necessary when you expect any Unicode chars in the input.
In Java, when used with matches, the anchors on both ends are implicit:
text.matches("(?U)(?!\\s+\\z)[^<]+")
The regex in matches is executed once and requires the full string match. Here, it matches
\A - (implicit in matches) - start of string
(?U) - Pattern.UNICODE_CHARACTER_CLASS option enabled so that \s could match any Unicode whitespaces
(?!\\s+\\z) - until the very end of string, there should be no one or more whitespaces
[^<]+ - one or more chars other than <
\z - (implicit in matches) - end of string.
See the Java test:
String texts[] = {"Abc <<", " ", "", "abc 123"};
Pattern p = Pattern.compile("(?U)(?!\\s+\\z)[^<]+");
for(String text : texts)
{
Matcher m = p.matcher(text);
System.out.println("'" + text + "' => " + m.matches());
}
Output:
'Abc <<' => false
' ' => false
'' => false
'abc 123' => true
See an online regex test (modified to fit the single multiline string demo environment so as not to cross over line boundaries.)
You can try to use this regex:
[^<\s]+
Any char that is not "<", for 1 or more times.
Here is the example to test it: https://regex101.com/r/9ptt15/2
However, you can try to solve it without a regular expression:
boolean isValid = s != null && !s.isEmpty() && s.indexOf(" ") == -1 && s.indexOf("<") == -1;

Detect non Latin characters with regex Pattern in Java

I THINK Latin characters are what I mean in my question, but I'm not entirely sure what the correct classification is. I'm trying to use a regex Pattern to test if a string contains non Latin characters. I'm expecting the following results
"abcDE 123"; // Yes, this should match
"!##$%^&*"; // Yes, this should match
"aaàààäää"; // Yes, this should match
"ベビードラ"; // No, this shouldn't match
"😀😃😄😆"; // No, this shouldn't match
My understanding is that the built-in {IsLatin} preset simply detects if any of the characters are Latin. I want to detect if any characters are not Latin.
Pattern LatinPattern = Pattern.compile("\\p{IsLatin}");
Matcher matcher = LatinPattern.matcher(str);
if (!matcher.find()) {
System.out.println("is NON latin");
return;
}
System.out.println("is latin");
TL;DR: Use regex ^[\p{Print}\p{IsLatin}]*$
You want a regex that matches if the string consists of:
Spaces
Digits
Punctuation
Latin characters (Unicode script "Latin")
Easiest way is to combine \p{IsLatin} with \p{Print}, where Pattern defines \p{Print} as:
\p{Print} - A printable character: [\p{Graph}\x20]
\p{Graph} - A visible character: [\p{Alnum}\p{Punct}]
\p{Alnum} - An alphanumeric character: [\p{Alpha}\p{Digit}]
\p{Alpha} - An alphabetic character: [\p{Lower}\p{Upper}]
\p{Lower} - A lower-case alphabetic character: [a-z]
\p{Upper} - An upper-case alphabetic character: [A-Z]
\p{Digit} - A decimal digit: [0-9]
\p{Punct} - Punctuation: One of !"#$%&'()*+,-./:;<=>?#[\]^_`{|}~
\x20 - A space:
Which makes \p{Print} the same as [\p{ASCII}&&\P{Cntrl}], i.e. ASCII characters that are not control characters.
The \p{Alpha} part overlaps with \p{IsLatin}, but that's fine, since the character class eliminates duplicates.
So, regex is: ^[\p{Print}\p{IsLatin}]*$
Test
Pattern latinPattern = Pattern.compile("^[\\p{Print}\\p{IsLatin}]*$");
String[] inputs = { "abcDE 123", "!##$%^&*", "aaàààäää", "ベビードラ", "😀😃😄😆" };
for (String input : inputs) {
System.out.print("\"" + input + "\": ");
Matcher matcher = latinPattern.matcher(input);
if (! matcher.find()) {
System.out.println("is NON latin");
} else {
System.out.println("is latin");
}
}
Output
"abcDE 123": is latin
"!##$%^&*": is latin
"aaàààäää": is latin
"ベビードラ": is NON latin
"😀😃😄😆": is NON latin
All Latin Unicode character classes are:
\p{InBasic_Latin}: U+0000–U+007F
\p{InLatin-1_Supplement}: U+0080–U+00FF
\p{InLatin_Extended-A}: U+0100–U+017F
\p{InLatin_Extended-B}: U+0180–U+024F
So, the answer is either
Pattern LatinPattern = Pattern.compile("^[\\p{InBasicLatin}\\p{InLatin-1Supplement}\\p{InLatinExtended-A}\\p{InLatinExtended-B}]+$");
Pattern LatinPattern = Pattern.compile("^[\\x00-\\x{024F}]+$"); //U+0000-U+024F
Note that underscores are removed from the Unicode property class names in Java.
See the Java demo:
List<String> strs = Arrays.asList(
"abcDE 123", // Yes, this should match
"!##$%^&*", // Yes, this should match
"aaàààäää", // Yes, this should match
"ベビードラ", // No, this shouldn't match
"😀😃😄😆"); // No, this shouldn't match
Pattern LatinPattern = Pattern.compile("^[\\p{InBasicLatin}\\p{InLatin-1Supplement}\\p{InLatinExtended-A}\\p{InLatinExtended-B}]+$");
//Pattern LatinPattern = Pattern.compile("^[\\x00-\\x{024F}]+$"); //U+0000-U+024F
for (String str : strs) {
Matcher matcher = LatinPattern.matcher(str);
if (!matcher.find()) {
System.out.println(str + " => is NON Latin");
//return;
} else {
System.out.println(str + " => is Latin");
}
}
Note: if you replace .find() with .matches(), you can throw away ^ and $ in the pattern.
Output:
abcDE 123 => is Latin
!##$%^&* => is Latin
aaàààäää => is Latin
ベビードラ => is NON Latin
😀😃😄😆 => is NON Latin

Checking if there is whitespace between two elements in a String

I am working with Strings where I need to separate two chars/elements if there is a whitespace between them. I have seen a former post on SO about the same however it still has not worked for me as intended yet. As you would assume, I could just check if the String contains(" ") and then substring around the space. However my strings could possibly contains countless whitespaces at the end despite not having whitespace in between characters. Hence my question is "How do I detect a whitespace between two chars (numbers too) " ?
//Example with numbers in a String
String test = "2 2";
final Pattern P = Pattern.compile("^(\\d [\\d\\d] )*\\d$");
final Matcher m = P.matcher(test);
if (m.matches()) {
System.out.println("There is between space!");
}
You would use String.strip() to remove any leading or trailing whitespace, followed by String.split(). If there is a whitespace, the array will be of length 2 or greater. If there is not, it will be of length 1.
Example:
String test = " 2 2 ";
test = test.strip(); // Removes whitespace, test is now "2 2"
String[] testSplit = test.split(" "); // Splits the string, testSplit is ["2", "2"]
if (testSplit.length >= 2) {
System.out.println("There is whitespace!");
} else {
System.out.println("There is no whitespace");
}
If you need an array of a specified length, you can also specify a limit to split. For example:
"a b c".split(" ", 2); // Returns ["a", "b c"]
If you want a solution that only uses regex, the following regex matches any two groups of characters separated by a single space, with any amount of leading or trailing whitespace:
\s*(\S+\s\S+)\s*
Positive lookahead and lookbehind may also work if you use the regex (?<=\\w)\\s(?=\\w)
\w : a word character [a-zA-Z_0-9]
\\s : whitespace
(?<=\\w)\\s : positive lookbehind, matches if a whitespace preceeded by a \w
\\s(?=\\w) : positive lookahead, matches if a whitespace followed by a \w
List<String> testList = Arrays.asList("2 2", " 245 ");
Pattern p = Pattern.compile("(?<=\\w)\\s(?=\\w)");
for (String str : testList) {
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(str + "\t: There is a space!");
} else {
System.out.println(str + "\t: There is not a space!");
}
}
Output:
2 2 : There is a space!
245 : There is not a space!
The reason you pattern does not work as expected is because ^(\\d [\\d\\d] )*\\d$ which can be simplified to (\\d \\d )*\\d$ starts by repeating 0 or more times what is between the parenthesis.
Then it matches a digit at the end of the string. As the repetition is 0 or more times, it is optional and it would also match just a single digit.
If you want to check if there is a single space between 2 non whitespace chars:
\\S \\S
Regex demo | Java demo
final Pattern P = Pattern.compile("\\S \\S");
final Matcher m = P.matcher(test);
if (m.find()) {
System.out.println("There is between space!");
}
Here is the simplest way you can do it:
String testString = " Find if there is a space. ";
testString.trim(); //This removes all the leading and trailing spaces
testString.contains(" "); //Checks if the string contains a whitespace still
You can also use a shorthand method in one line by chaining the two methods:
String testString = " Find if there is a space. ";
testString.trim().contains(" ");
Use
String text = "2 2";
Matcher m = Pattern.compile("\\S\\s+\\S").matcher(text.trim());
if (m.find()) {
System.out.println("Space detected.");
}
Java code demo.
text.trim() will remove leading and trailing whitespaces, \S\s+\S pattern matches a non-whitespace, then one or more whitespace characters, and then a non-whitespace character again.

Masking using regular expressions for below format

I am trying to write a regular expression to mask the below string. Example below.
Input
A1../D//FASDFAS--DFASD//.F
Output (Skip first five and last two Alphanumeric's)
A1../D//FA***********D//.F
I am trying using below regex
([A-Za-z0-9]{5})(.*)(.{2})
Any help would be highly appreciated.
You solve your issue by using Pattern and Matcher with a regex which match multiple groups :
String str = "A1../D//FASDFAS--DFASD//.F";
Pattern pattern = Pattern.compile("(.*?\\/\\/..)(.*?)(.\\/\\/.*)");
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
str = matcher.group(1)
+ matcher.group(2).replaceAll(".", "*")
+ matcher.group(3);
}
Detail
(.*?\\/\\/..) first group to match every thing until //
(.*?) second group to match every thing between group one and three
(.\\/\\/.*) third group to match every thing after the last character before the // until the end of string
Outputs
A1../D//FA***********D//.F
I think this solution is more readable.
If you want to do that with a single regex you may use
text = text.replaceAll("(\\G(?!^|(?:[0-9A-Za-z][^0-9A-Za-z]*){2}$)|^(?:[^0-9A-Za-z]*[0-9A-Za-z]){5}).", "$1*");
Or, using the POSIX character class Alnum:
text = text.replaceAll("(\\G(?!^|(?:\\p{Alnum}\\P{Alnum}*){2}$)|^(?:\\P{Alnum}*\\p{Alnum}){5}).", "$1*");
See the Java demo and the regex demo. If you plan to replace any code point rather than a single code unit with an asterisk, replace . with \P{M}\p{M}*+ ("\\P{M}\\p{M}*+").
To make . match line break chars, add (?s) at the start of the pattern.
Details
(\G(?!^|(?:[0-9A-Za-z][^0-9A-Za-z]*){2}$)|^(?:[^0-9A-Za-z]*[0-9A-Za-z]){5}) -
\G(?!^|(?:[0-9A-Za-z][^0-9A-Za-z]*){2}$) - a location after the successful match that is not followed with 2 occurrences of an alphanumeric char followed with 0 or more chars other than alphanumeric chars
| - or
^(?:[^0-9A-Za-z]*[0-9A-Za-z]){5} - start of string, followed with five occurrences of 0 or more non-alphanumeric chars followed with an alphanumeric char
. - any code unit other than line break characters (if you use \P{M}\p{M}*+ - any code point).
Usually, masking of characters in the middle of a string can be done using negative lookbehind (?<!) and positive lookahead groups (?=).
But in this case lookbehind group can't be used because it does not have an obvious maximum length due to unpredictable number of non-alphanumeric characters between first five alphanumeric characters (. and / in the A1../D//FA).
A substring method can used as a workaround for inability to use negative lookbehind group:
String str = "A1../D//FASDFAS--DFASD//.F";
int start = str.replaceAll("^((?:\\W{0,}\\w{1}){5}).*", "$1").length();
String maskedStr = str.substring(0, start) +
str.substring(start).replaceAll(".(?=(?:\\W{0,}\\w{1}){2})", "*");
System.out.println(maskedStr);
// A1../D//FA***********D//.F
But the most straightforward way is to use java.util.regex.Pattern and java.util.regex.Matcher:
String str = "A1../D//FASDFAS--DFASD//.F";
Pattern pattern = Pattern.compile("^((?:\\W{0,}\\w{1}){5})(.+)((?:\\W{0,}\\w{1}){2})");
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
String maskedStr = matcher.group(1) +
"*".repeat(matcher.group(2).length()) +
matcher.group(3);
System.out.println(maskedStr);
// A1../D//FA***********D//.F
}
\W{0,} - 0 or more non-alphanumeric characters
\w{1} - exactly 1 alphanumeric character
(\W{0,}\w{1}){5} - 5 alphanumeric characters and any number of alphanumeric characters in between
(?:\W{0,}\w{1}){5} - do not capture as a group
^((?:\\W{0,}\\w{1}){5})(.+)((?:\\W{0,}\\w{1}){2})$ - substring with first five alphanumeric characters (group 1), everything else (group 2), substring with last 2 alphanumeric characters (group 3)

Subtracting characters in a back reference from a character class in java.util.regex.Pattern

Is it possible to subtract the characters in a Java regex back reference from a character class?
e.g., I want to use String#matches(regex) to match either:
any group of characters that are [a-z'] that are enclosed by "
Matches: "abc'abc"
Doesn't match: "1abc'abc"
Doesn't match: 'abc"abc'
any group of characters that are [a-z"] that are enclosed by '
Matches: 'abc"abc'
Doesn't match: '1abc"abc'
Doesn't match: "abc'abc"
The following regex won't compile because [^\1] isn't supported:
(['"])[a-z'"&&[^\1]]*\1
Obviously, the following will work:
'[a-z"]*'|"[a-z']*"
But, this style isn't particularly legible when a-z is replaced by a much more complex character class that must be kept the same in each side of the "or" condition.
I know that, in Java, I can just use String concatenation like the following:
String charClass = "a-z";
String regex = "'[" + charClass + "\"]*'|\"[" + charClass + "']*\"";
But, sometimes, I need to specify the regex in a config file, like XML, or JSON, etc., where java code is not available.
I assume that what I'm asking is almost definitely not possible, but I figured it wouldn't hurt to ask...
One approach is to use a negative look-ahead to make sure that every character in between the quotes is not the quotes:
(['"])(?:(?!\1)[a-z'"])*+\1
^^^^^^
(I also make the quantifier possessive, since there is no use for backtracking here)
This approach is, however, rather inefficient, since the pattern will check for the quote character for every single character, on top of checking that the character is one of the allowed character.
The alternative with 2 branches in the question '[a-z"]*'|"[a-z']*" is better, since the engine only checks for the quote character once and goes through the rest by checking that the current character is in the character class.
You could use two patterns in one OR-separated pattern, expressing both your cases:
// | case 1: [a-z'] enclosed by "
// | | OR
// | | case 2: [a-z"] enclosed by '
Pattern p = Pattern.compile("(?<=\")([a-z']+)(?=\")|(?<=')([a-z\"]+)(?=')");
String[] test = {
// will match group 1 (for case 1)
"abcd\"efg'h\"ijkl",
// will match group 2 (for case 2)
"abcd'efg\"h'ijkl",
};
for (String t: test) {
Matcher m = p.matcher(t);
while (m.find()) {
System.out.println(m.group(1));
System.out.println(m.group(2));
}
}
Output
efg'h
null
null
efg"h
Note
There is nothing stopping you from specifying the enclosing characters or the character class itself somewhere else, then building your Pattern with components unknown at compile-time.
Something in the lines of:
// both strings are emulating unknown-value arguments
String unknownEnclosingCharacter = "\"";
String unknownCharacterClass = "a-z'";
// probably want to catch a PatternSyntaxException here for potential
// issues with the given arguments
Pattern p = Pattern.compile(
String.format(
"(?<=%1$s)([%2$s]+)(?=%1$s)",
unknownEnclosingCharacter,
unknownCharacterClass
)
);
String[] test = {
"abcd\"efg'h\"ijkl",
"abcd'efg\"h'ijkl",
};
for (String t: test) {
Matcher m = p.matcher(t);
while (m.find()) {
// note: only main group here
System.out.println(m.group());
}
}
Output
efg'h

Categories

Resources