String.matches() with \n - java

Why the String::matches method return false when I put \n into the String?
public class AppMain2 {
public static void main(String[] args) {
String data1 = "\n London";
System.out.println(data1.matches(".*London.*"));
}
}

It doesn't match because "." in regex may not match line terminators as in the documentation here :
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#sum

By default Java's . does not match newlines. To have . include newlines, set the Pattern.DOTALL flag with (?s):
System.out.println(data1.matches("(?s).*London.*"));
Note for those coming from other regex flavors, the Java documentation use of the term "match" is different from other languages. What is meant is Java's string::matches() returns true only if the entire string is matched, i.e. it behaves as if a ^ and $ were added to the head and tail of the passed regex, NOT simply that it contains a match.

If you want true, you need use Pattern.DOTALL or (?s).
By this way . match any characters included \n
String data1 = "\n London";
Pattern pattern = Pattern.compile(".*London.*", Pattern.DOTALL);
System.out.println(data1.matches(pattern));
or :
System.out.println(data1.matches("(?s).*London.*"));

"\n" considered as newline so String.matches searching for the pattern to in new line.so returning false try something like this.
Pattern.compile(".London.", Pattern.MULTILINE);

Related

Regular expression to handle two different file extensions

I am trying to create a regular expression that takes a file of name
"abcd_04-04-2020.txt" or "abcd_04-04-2020.txt.gz"
How can I handle the "OR" condition for the extension. This is what I have so far
if(fileName.matches("([\\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}.[a-zA-Z]{3})")){
Pattern.compile("[._]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.");
}
This handles only the .txt. How can I handle ".txt.gz"
Thanks
Why not just use endsWith instead complex regex
if(fileName.endsWith(".txt") || fileName.endsWith(".txt.gz")){
Pattern.compile("[._]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.");
}
You can use the below regex to achieve your purpose:
^[\w-]+\d{2}-\d{2}-\d{4}\.txt(?:\.gz)?$
Explanation of the above regex:]
^,$ - Matches start and end of the test string resp.
[\w-]+ - Matches word character along with hyphen one or more times.
\d{} - Matches digits as many numbers as mentioned in the curly braces.
(?:\.gz)? - Represents non-capturing group matching .gz zero or one time because of ? quantifier. You could have used | alternation( or as you were expecting OR) but this is legible and more efficient too.
You can find the demo of the above regex here.
IMPLEMENTATION IN JAVA:
import java.util.regex.*;
public class Main
{
private static final Pattern pattern = Pattern.compile("^[\\w-]+\\d{2}-\\d{2}-\\d{4}\\.txt(?:\\.gz)?$", Pattern.MULTILINE);
public static void main(String[] args) {
String testString = "abcd_04-04-2020.txt\nabcd_04-04-2020.txt.gz\nsomethibsnfkns_05-06-2020.txt\n.txt.gz";
Matcher matcher = pattern.matcher(testString);
while(matcher.find()){
System.out.println(matcher.group(0));
}
}
}
You can find the implementation of the above regex in java in here.
NOTE: If you want to match for valid dates also; please visit this.
You can replace .[a-zA-Z]{3} with .txt(\.gz)
if(fileName.matches("([\\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}).txt(\.gz)?")){
Pattern.compile("[._]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.");
}
? will work for your required | . Try adding
(.[a-zA-Z]{2})?
to your original regex
([\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}.[a-zA-Z]{3}(.[a-zA-Z]{2})?)
A possible way of doing it:
Pattern pattern = Pattern.compile("^[\\w._-]+_\\d{2}-\\d{2}-\\d{4}(\\.txt(\\.gz)?)$");
Then you can run the following test:
String[] fileNames = {
"abcd_04-04-2020.txt",
"abcd_04-04-2020.tar",
"abcd_04-04-2020.txt.gz",
"abcd_04-04-2020.png",
".txt",
".txt.gz",
"04-04-2020.txt"
};
Arrays.stream(fileNames)
.filter(fileName -> pattern.matcher(fileName).find())
.forEach(System.out::println);
// output
// abcd_04-04-2020.txt
// abcd_04-04-2020.txt.gz
I think what you want (following from the direction you were going) is this:
[\\w._-]+[0-9]{2}-[0-9]{2}-[0-9]{4}\\.[a-zA-Z]{3}(?:$|\\.[a-zA-Z]{2}$)
At the end, I have a conditional statement. It has to either match the end of the string ($) OR it has to match a literal dot followed by 2 letters (\\.[a-zA-Z]{2}). Remember to escape the ., because in regex . means "match any character".

String equal/contain none of them gets what I want

I have a string that can look somewhat like:
NCC_johjon (\users\johanjo\tomcattest\oysters\NCC_johjon, port 16001), utv_johjon (\users\johanjo\tomcattest\oysters\utv_johjon, port 16000)
and there could be like a lot of NCC_etskys, NCC_homyis and so on and I want to check if somewhere in the string there is an part that says "NCC_joh" already existing. I tried with like
if(oysters.contains("NCC_joh")){
System.out.println("HEJ HEJ HEJ HALLÅ HALLÅ HALLÅ");
}
but if there is an NCC_johjon in there it will go in the if case, but I only want to go in if exact that part exist not longer not shorter and .equal it needs to look like the whole String which is not what I want either. anyone got any idea? would be better if what I worked with were a list of Strings but I don't have that.
the oysterPaths is an Collection at first
Collection<TomcatResource> oysterPaths = TomcatResource.listCats(Paths.get(tomcatsPath));
Use regular expressions.
if (oysters.matches("(?s).*\\bNCC_joh\\b.*")) {
where
(?s) = single line mode, DOT-ALL, so . will match a newline too.
. = any char
.* = zero or more occurrences of . (any char)
\b = word boundary
String.matches does a match of the pattern over the entire string, hence the need for .* at begin and end.
(Word boundaries of course means, that between them a word has to be placed.)
This is similar to https://stackoverflow.com/a/49879388/2735286, but I would suggest to use the find method using this regular expression:
\bNCC_joh\b
Using the find method will simplify the regular expression and you will exclusively search for what is relevant.
Here is the corresponding method you can use:
public static boolean superExactMatch(String expression) {
Pattern p = Pattern.compile("\\bNCC_joh\\b", Pattern.MULTILINE);
final Matcher matcher = p.matcher(expression);
final boolean found = matcher.find();
if(found) {
// For debugging purposes to see where the match happened in the expression
System.out.println(matcher.start() + " " + matcher.end());
}
return found;
}

Matching three or more identical characters - Java program [duplicate]

I have this small piece of code
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]"))
{
System.out.println(s);
}
}
Supposed to print
dkoe
but it prints nothing!!
Welcome to Java's misnamed .matches() method... It tries and matches ALL the input. Unfortunately, other languages have followed suit :(
If you want to see if the regex matches an input text, use a Pattern, a Matcher and the .find() method of the matcher:
Pattern p = Pattern.compile("[a-z]");
Matcher m = p.matcher(inputstring);
if (m.find())
// match
If what you want is indeed to see if an input only has lowercase letters, you can use .matches(), but you need to match one or more characters: append a + to your character class, as in [a-z]+. Or use ^[a-z]+$ and .find().
[a-z] matches a single char between a and z. So, if your string was just "d", for example, then it would have matched and been printed out.
You need to change your regex to [a-z]+ to match one or more chars.
String.matches returns whether the whole string matches the regex, not just any substring.
java's implementation of regexes try to match the whole string
that's different from perl regexes, which try to find a matching part
if you want to find a string with nothing but lower case characters, use the pattern [a-z]+
if you want to find a string containing at least one lower case character, use the pattern .*[a-z].*
Used
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]+"))
{
System.out.println(s);
}
}
I have faced the same problem once:
Pattern ptr = Pattern.compile("^[a-zA-Z][\\']?[a-zA-Z\\s]+$");
The above failed!
Pattern ptr = Pattern.compile("(^[a-zA-Z][\\']?[a-zA-Z\\s]+$)");
The above worked with pattern within ( and ).
Your regular expression [a-z] doesn't match dkoe since it only matches Strings of lenght 1. Use something like [a-z]+.
you must put at least a capture () in the pattern to match, and correct pattern like this:
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("(^[a-z]+$)"))
{
System.out.println(s);
}
}
You can make your pattern case insensitive by doing:
Pattern p = Pattern.compile("[a-z]+", Pattern.CASE_INSENSITIVE);

java string.matches requires matches of too much of the string

The following code:
String s = "casdfsad";
System.out.println(s.matches("[a-z]"));
System.out.println(s.matches("^[a-z]"));
System.out.println(s.matches("^[a-z].*"));
outputs
false
false
true
But why is that? I did not specify any $ at the end of any of the patterns.
Does String.matches add ^ and $ implicitly to force a full string match?
Why? And can I disable full string matching, perhaps by using another method?
Edit:
If String.matches implicitly adds ^ and $, why don't String.replaceAll or String.replaceFirst also do this? Isn't this inconsistent?
Unfortunately there is no find method in String you must use Matcher.find().
Pattern pattern = Pattern.compile("[a-z]");
Matcher matcher = pattern.matcher("casdfsad");
System.out.println(matcher.find());
will output
true
EDIT: If you want to find full strings and you don't need regular expressions you can use String.indexOf(), e.g.
String someString = "Hello World";
boolean isHelloContained = someString.indexOf("Hello") > -1;
System.out.println(isHelloContained);
someString = "Some other string";
isHelloContained = someString.indexOf("Hello") > -1;
System.out.println(isHelloContained);
will output
true
false
Try, by putting + of greedy quantifier you can match whole String. Because, s has more than one character. So,to match you should choose a quantifier which will match, more than one a-z range character. For String.matches, you don't need boundary character ^ and $.
String s = "casdfsad";
System.out.println(s.matches("[a-z]+"));// It will be true
You are trying to use a single character regex for a Sring?
You could try :
String s = "casdfsad";
System.out.println(s.matches("[a-z]+"));
System.out.println(s.matches("^[a-z]+"));
System.out.println(s.matches("^[a-z].*"));
The third one matches because of the *. String.matches is not adding any ^ and $ implicitly to force a full string match.

Regular expression to match unescaped special characters only

I'm trying to come up with a regular expression that can match only characters not preceded by a special escape sequence in a string.
For instance, in the string Is ? stranded//? , I want to be able to replace the ? which hasn't been escaped with another string, so I can have this result : **Is Dave stranded?**
But for the life of me I have not been able to figure out a way. I have only come up with regular expressions that eat all the replaceable characters.
How do you construct a regular expression that matches only characters not preceded by an escape sequence?
Use a negative lookbehind, it's what they were designed to do!
(?<!//)[?]
To break it down:
(
?<! #The negative look behind. It will check that the following slashes do not exist.
// #The slashes you are trying to avoid.
)
[\?] #Your special charactor list.
Only if the // cannot be found, it will progress with the rest of the search.
I think in Java it will need to be escaped again as a string something like:
Pattern p = Pattern.compile("(?<!//)[\\?]");
Try this Java code:
str="Is ? stranded//?";
Pattern p = Pattern.compile("(?<!//)([?])");
m = p.matcher(str);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, m.group(1).replace("?", "Dave"));
}
m.appendTail(sb);
String s = sb.toString().replace("//", "");
System.out.println("Output: " + s);
OUTPUT
Output: Is Dave stranded?
I was thinking about this and have a second simplier solution, avoiding regexs. The other answers are probably better but I thought I might post it anyway.
String input = "Is ? stranded//?";
String output = input
.replace("//?", "a717efbc-84a9-46bf-b1be-8a9fb714fce8")
.replace("?", "Dave")
.replace("a717efbc-84a9-46bf-b1be-8a9fb714fce8", "?");
Just protect the "//?" by replacing it with something unique (like a guid). Then you know any remaining question marks are fair game.
Use grouping. Here's one example:
import java.util.regex.*;
class Test {
public static void main(String[] args) {
Pattern p = Pattern.compile("([^/][^/])(\\?)");
String s = "Is ? stranded//?";
Matcher m = p.matcher(s);
if (m.matches)
s = m.replaceAll("$1XXX").replace("//", "");
System.out.println(s + " -> " + s);
}
}
Output:
$ java Test
Is ? stranded//? -> Is XXX stranded?
In this example, I'm:
first replacing any non-escaped ? with "XXX",
then, removing the "//" escape sequences.
EDIT Use if (m.matches) to ensure that you handle non-matching strings properly.
This is just a quick-and-dirty example. You need to flesh it out, obviously, to make it more robust. But it gets the general idea across.
Match on a set of characters OTHER than an escape sequence, then a regex special character. You could use an inverted character class ([^/]) for the first bit. Special case an unescaped regex character at the front of the string.
String aString = "Is ? stranded//?";
String regex = "(?<!//)[^a-z^A-Z^\\s^/]";
System.out.println(aString.replaceAll(regex, "Dave"));
The part of the regular expression [^a-z^A-Z^\\s^/] matches non-alphanumeric, whitespace or non-forward slash charaters.
The (?<!//) part does a negative lookbehind - see docco here for more info
This gives the output Is Dave stranded//?
try matching:
(^|(^.)|(.[^/])|([^/].))[special characters list]
I used this one:
((?:^|[^\\])(?:\\\\)*[ESCAPABLE CHARACTERS HERE])
Demo: https://regex101.com/r/zH1zO3/4

Categories

Resources