Regex NOT operator doesn't work - java

I'm trying to filter files in a folder. I need the files that don't end with ".xml-test". The following regex works as expected (ok1,ok2,ok3 = false, ok4 = true)
String regex = ".+\\.xml\\-test$";
boolean ok1 = Pattern.matches(regex, "database123.xml");
boolean ok2 = Pattern.matches(regex, "database123.sql");
boolean ok3 = Pattern.matches(regex, "log_file012.txt");
boolean ok4 = Pattern.matches(regex, "database.xml-test");
Now I just need to negate it, but it doesn't work for some reason:
String regex = "^(.+\\.xml\\-test)$";
I still get ok1,ok2,ok3 = false, ok4 = true
Any ideas? (As people pointed, this could be done easily without regex. But for arguments sake assume I have to use a single regex pattern and nothing else (ie !Pattern.matches(..); is also not allowed))

I think you are looking for:
if (! someString.endsWith(".xml-test")) {
...
}
No regular expression required. Throw this into a FilenameFilter as follows:
public accept(File dir, String name) {
return ! name.endsWith(".xml-test");
}

The meaning of ^ changes depending on its position in the regexp. When the symbol is inside a character class [] as the first character, it means negation of the character class; when it is outside a character class, it means the beginning of line.
The easiest way to negate a result of a match is to use a positive pattern in regex, and then to add a ! on the Java side to do the negation, like this:
boolean isGoodFile = !Pattern.matches(regex, "database123.xml");

The following Java regex asserts that a string does NOT end with: .xml-test:
String regex = "^(?:(?!\\.xml-test$).)*$";
This regex walks the string one character at a time and asserts that at each and every position the remainder of the string is not .xml-test.
Simple!

^ - is not a negation in regexp, this is a symbol indicating beginning of line
you probably need (?!X) X, via zero-width negative lookahead
But I suggest you to use File#listFiles method with FilenameFilter implementation:
name.endsWith(".xml-test")

If you really need to test it with regex, then you should use negative lookbehinds from Pattern class:
String reges = "^.*(?<!\\.xml-test)$"
How it works:
first you match whole string: from start (^) all characters (.*),
you check if what have already matched doesn't have ".xml-test" at end (lookbehind at position you already matched),
you test if it's end of string.

Related

String equal/contain none of them gets what I want

I have a string that can look somewhat like:
NCC_johjon (\users\johanjo\tomcattest\oysters\NCC_johjon, port 16001), utv_johjon (\users\johanjo\tomcattest\oysters\utv_johjon, port 16000)
and there could be like a lot of NCC_etskys, NCC_homyis and so on and I want to check if somewhere in the string there is an part that says "NCC_joh" already existing. I tried with like
if(oysters.contains("NCC_joh")){
System.out.println("HEJ HEJ HEJ HALLÅ HALLÅ HALLÅ");
}
but if there is an NCC_johjon in there it will go in the if case, but I only want to go in if exact that part exist not longer not shorter and .equal it needs to look like the whole String which is not what I want either. anyone got any idea? would be better if what I worked with were a list of Strings but I don't have that.
the oysterPaths is an Collection at first
Collection<TomcatResource> oysterPaths = TomcatResource.listCats(Paths.get(tomcatsPath));
Use regular expressions.
if (oysters.matches("(?s).*\\bNCC_joh\\b.*")) {
where
(?s) = single line mode, DOT-ALL, so . will match a newline too.
. = any char
.* = zero or more occurrences of . (any char)
\b = word boundary
String.matches does a match of the pattern over the entire string, hence the need for .* at begin and end.
(Word boundaries of course means, that between them a word has to be placed.)
This is similar to https://stackoverflow.com/a/49879388/2735286, but I would suggest to use the find method using this regular expression:
\bNCC_joh\b
Using the find method will simplify the regular expression and you will exclusively search for what is relevant.
Here is the corresponding method you can use:
public static boolean superExactMatch(String expression) {
Pattern p = Pattern.compile("\\bNCC_joh\\b", Pattern.MULTILINE);
final Matcher matcher = p.matcher(expression);
final boolean found = matcher.find();
if(found) {
// For debugging purposes to see where the match happened in the expression
System.out.println(matcher.start() + " " + matcher.end());
}
return found;
}

What is the Regex for decimal numbers in Java?

I am not quite sure of what is the correct regex for the period in Java. Here are some of my attempts. Sadly, they all meant any character.
String regex = "[0-9]*[.]?[0-9]*";
String regex = "[0-9]*['.']?[0-9]*";
String regex = "[0-9]*["."]?[0-9]*";
String regex = "[0-9]*[\.]?[0-9]*";
String regex = "[0-9]*[\\.]?[0-9]*";
String regex = "[0-9]*.?[0-9]*";
String regex = "[0-9]*\.?[0-9]*";
String regex = "[0-9]*\\.?[0-9]*";
But what I want is the actual "." character itself. Anyone have an idea?
What I'm trying to do actually is to write out the regex for a non-negative real number (decimals allowed). So the possibilities are: 12.2, 3.7, 2., 0.3, .89, 19
String regex = "[0-9]*['.']?[0-9]*";
Pattern pattern = Pattern.compile(regex);
String x = "5p4";
Matcher matcher = pattern.matcher(x);
System.out.println(matcher.find());
The last line is supposed to print false but prints true anyway. I think my regex is wrong though.
Update
To match non negative decimal number you need this regex:
^\d*\.\d+|\d+\.\d*$
or in java syntax : "^\\d*\\.\\d+|\\d+\\.\\d*$"
String regex = "^\\d*\\.\\d+|\\d+\\.\\d*$"
String string = "123.43253";
if(string.matches(regex))
System.out.println("true");
else
System.out.println("false");
Explanation for your original regex attempts:
[0-9]*\.?[0-9]*
with java escape it becomes :
"[0-9]*\\.?[0-9]*";
if you need to make the dot as mandatory you remove the ? mark:
[0-9]*\.[0-9]*
but this will accept just a dot without any number as well... So, if you want the validation to consider number as mandatory you use + ( which means one or more) instead of *(which means zero or more). That case it becomes:
[0-9]+\.[0-9]+
If you on Kotlin, use ktx:
fun String.findDecimalDigits() =
Pattern.compile("^[0-9]*\\.?[0-9]*").matcher(this).run { if (find()) group() else "" }!!
Your initial understanding was probably right, but you were being thrown because when using matcher.find(), your regex will find the first valid match within the string, and all of your examples would match a zero-length string.
I would suggest "^([0-9]+\\.?[0-9]*|\\.[0-9]+)$"
There are actually 2 ways to match a literal .. One is using backslash-escaping like you do there \\., and the other way is to enclose it inside a character class or the square brackets like [.]. Most of the special characters become literal characters inside the square brackets including .. So use \\. shows your intention clearer than [.] if all you want is to match a literal dot .. Use [] if you need to match multiple things which represents match this or that for example this regex [\\d.] means match a single digit or a literal dot
I have tested all the cases.
public static boolean isDecimal(String input) {
return Pattern.matches("^[-+]?\\d*[.]?\\d+|^[-+]?\\d+[.]?\\d*", input);
}

Regex Multiple Strings With "or" Operator

I need to establish a java regex that will recognize the following 3 cases:
Any combination/amount of the following characters: "ACTGactg:"
or
Any single question marks "?"
or
Any string "NTC"
I will list what I have tried so far and the errors that have arisen.
public static final VALID_STRING = "[ACTGactg:]*";
// Matches the first case but not the second or third
// as expected.
public static final VALID_STRING = "\\?|[ACTGactg:]*";
// Matches all 3 conditions when my understanding leads me to
// believe that it should not except the third case of "NTC"
public static final VALID_STRING = "?|[ACTGactg:]*";
// Yields PatternSyntaxException dangling metacharacter ?
What I would expect to be accurate is the following:
public static final VALID_STRING = "NTC|\\?|[ACTGacgt:]*";
But I want to make sure that if I take away the "NTC" that any "NTC" string will appear as invalid.
Here is the method I am using to test these regexs.
private static boolean isValid(String thisString){
boolean valid = false;
Pattern checkRegex = Pattern.compile(VALID_STRING);
Matcher matchRegex = checkRegex.matcher(thisString);
while (matchRegex.find()){
if (matchRegex.group().length != 0){
valid = true;
}
}
return valid;
}
So here are my closing questions:
Could the "\\?" regex possible be acting as a wild card character that is accepting the "NTC" string?
Are the or operators "|" appropriate here?
Do I need to make use of parenthesis when using these or operators?
Here are some example incoming strings:
A:C
T:G
AA:CC
T:C:A:G
NTC
?
Thank you
Yes the provided regex would be ok:
public static final VALID_STRING = "NTC|\\?|[ACTGacgt:]+";
...
boolean valid = str.matches(VALID_STRING);
If your remove NTC| from the regex the string NTC becomes invalid.
You can test it and experiment yourself here.
Since you are using the Matcher.find() method, you are looking for your pattern anywhere in the string.
This means the strings A:C, T:G, AA:CC etc. match in their entirety. But how about NTC?
It matches because find() looks for a match anywhere. the TC part of it matches, therefore you get true.
If you want to match only the strings in their entirety, either use the match() method, or use ^ and $.
Note that you don't have to check that the match is longer than 0, if you change your pattern to [ACTGactg:]+ instead of [ACTGactg:]*.

Why doesn't /0/g match in a string that contains zeroes?

This code always returns "false" at last, even if Integer contains any zero:
Integer i = (int) rand(1, 200); // random [1;200)
String regexp = "/0/g";
Pattern p = Pattern.compile(regexp);
Matcher m = p.matcher(i.toString());
print(i);
print(m.matches());
What is the reason? I don't get where the mistake could be.
Needed: m.matches() = "true" if Integer contains one or more zero.
The problem is that you're giving the regular expression incorrectly. The string you give Pattern.compile is just the text of the expression, without / on either side, and without flags; flags are specified separately.
So in your case, you'd just want:
String regexp = "0";
There's no "global" flag; instead, you use the methods on the resulting Matcher as appropriate to what you're doing.
Needed: m.matches() = "true" if Integer contains one or more zero.
Then you don't want to use Matcher#matches, you want Match#find. Or if you need to use Matcher#matches, the expression would be:
String regexp = ".*0.*";
...e.g., any number of any character, then a 0, then any number of any character. That way, the entire string can match the expression.
Of course, if you just want to know there's a zero, it's much simpler to just use
boolean flag = String.valueOf(i).indexOf('0') != -1;
In this particular case you don't need a regex at all since you are looking for a literal character, use indexOf:
if (Str.indexOf( '0' ) != -1) {
...
about your original pattern:
regex don't need to be enclosed between delimiters in Java, so slashes are useless. The global modifier isn't needed too because the global nature is determined by the method you choose. (in other words, the only way to obtain several results is to use the find method in a loop to obtain the different results)
print(m.find());
Matcher will match from beginning.Use find as 0 input is not possible in your case.
Using find will enable you to locate 0 anywhere in the string.
matches tries to match the expression against the entire string and implicitly add a ^ at the start and $ at the end of your pattern, meaning it will not look for a substring. Hence false.
Also change your regex to "0" as suggested by the other answer.
Try,
String regexp = ".*0.*";
Pattern p = Pattern.compile(regexp);
Matcher m = p.matcher(i.toString());
if(m.find()){
System.out.println(i);
System.out.println(m.matches());
}
Regex :

java string.matches requires matches of too much of the string

The following code:
String s = "casdfsad";
System.out.println(s.matches("[a-z]"));
System.out.println(s.matches("^[a-z]"));
System.out.println(s.matches("^[a-z].*"));
outputs
false
false
true
But why is that? I did not specify any $ at the end of any of the patterns.
Does String.matches add ^ and $ implicitly to force a full string match?
Why? And can I disable full string matching, perhaps by using another method?
Edit:
If String.matches implicitly adds ^ and $, why don't String.replaceAll or String.replaceFirst also do this? Isn't this inconsistent?
Unfortunately there is no find method in String you must use Matcher.find().
Pattern pattern = Pattern.compile("[a-z]");
Matcher matcher = pattern.matcher("casdfsad");
System.out.println(matcher.find());
will output
true
EDIT: If you want to find full strings and you don't need regular expressions you can use String.indexOf(), e.g.
String someString = "Hello World";
boolean isHelloContained = someString.indexOf("Hello") > -1;
System.out.println(isHelloContained);
someString = "Some other string";
isHelloContained = someString.indexOf("Hello") > -1;
System.out.println(isHelloContained);
will output
true
false
Try, by putting + of greedy quantifier you can match whole String. Because, s has more than one character. So,to match you should choose a quantifier which will match, more than one a-z range character. For String.matches, you don't need boundary character ^ and $.
String s = "casdfsad";
System.out.println(s.matches("[a-z]+"));// It will be true
You are trying to use a single character regex for a Sring?
You could try :
String s = "casdfsad";
System.out.println(s.matches("[a-z]+"));
System.out.println(s.matches("^[a-z]+"));
System.out.println(s.matches("^[a-z].*"));
The third one matches because of the *. String.matches is not adding any ^ and $ implicitly to force a full string match.

Categories

Resources