why dot inside square brackets doesn't match any character? - java

Why this [.]+ Java regular expression doesn't match my "foo" text, while .+ matches perfectly (tested here)?

[.] is equivalent to escaping the . (dot) character, i.e. \\..
Once the character appears in a character class, it loses its status as a special character.
As foo doesn't contain any dots, nothing is matched. .+, on the other hand, is a wildcard greedy expression that matches everything.

Related

How to write a Regular expression to match any non alphabet or number and also matching dot

I need to match any special character in a string. For example, if the string has & % € (), etc. I could have Unicode alphabets such as ä ö å.
But I also want to match a dot "." For example, if I have a string as "8x8 Inc." . It should return true. Because it has a .
I tried a few expression so far but none of them worked for me. Please let me how it can be done? Thanks in advance!
You can do that one:
[^a-zA-Z\d\s] -> basically anything outside the group of all a-Z characters, digits and spaces. It will capture all other characters including special letters ä, dots, commas, braces etc
A simpler version would be [^\w\s] and it would match any non word/space characters but it will not match ä ö å
Java Regex .* will match all characters.
If you want to match only dot(.) then use escape character like \. It will match only dot(.) in string.
And in Java Program you have to use it like.
String regex="\\.";
Take a look at Unicode character classes. For your example, I think something like "(\\p{IsAlphabetic}|\\d)+" should work

Can I negate the dot?

The following regular expression matches the character a:
"a"
The following regular expression matches all characters except a:
"[^a]"
The following regular expression matches a ton of characters:
"."
How do I match everything that is not matched by "."? I can't use the same technique as above:
"[^.]"
because inside the brackets, the . changes meaning and only stands for the character . itself :(
The below negative lookahead will work.
(?:(?!.)[\S\s])
Java regex would be,
"(?:(?!.)[\\S\\s])"
DEMO
The idea behind the above regex is, it would match only \r or \n or \t or \f that is the characters which aren't matched by a dot (Multiline mode).
"[^\\.]"
use double backslash for regex used character. for example
\\.\\]\\[\\-\\)\\(\\?

Java regex "[.]" vs "."

I'm trying to use some regex in Java and I came across this when debugging my code.
What's the difference between [.] and .?
I was surprised that .at would match "cat" but [.]at wouldn't.
[.] matches a dot (.) literally, while . matches any character except newline (\n) (unless you use DOTALL mode).
You can also use \. ("\\." if you use java string literal) to literally match dot.
The [ and ] are metacharacters that let you define a character class. Anything enclosed in square brackets is interpreted literally. You can include multiple characters as well:
[.=*&^$] // Matches any single character from the list '.','=','*','&','^','$'
There are two specific things you need to know about the [...] syntax:
The ^ symbol at the beginning of the group has a special meaning: it inverts what's matched by the group. For example, [^.] matches any character except a dot .
Dash - in between two characters means any code point between the two. For example, [A-Z] matches any single uppercase letter. You can use dash multiple times - for example, [A-Za-z0-9] means "any single upper- or lower-case letter or a digit".
The two constructs above (^ and -) are common to nearly all regex engines; some engines (such as Java's) define additional syntax specific only to these engines.
regular-expression constructs
. => Any character (may or may not match line terminators)
and to match the dot . use the following
[.] => it will matches a dot
\\. => it will matches a dot
NOTE: The character classes in Java regular expression is defined using the square brackets "[ ]", this subexpression matches a single character from the specified or, set of possible characters.
Example : In string address replaces every "." with "[.]"
public static void main(String[] args) {
String address = "1.1.1.1";
System.out.println(address.replaceAll("[.]","[.]"));
}
if anything is missed please add :)

Multiple Regular Expressions

I'm not used to them and having trouble with the java syntax "matches".
I have two files one is 111.123.399.555.xml the other one is Conf.xml.
Now I only want to get the first file with regular expressions.
string.matches("[1-9[xml[.]]]");
doesnt work.
How to do this?
The use of string.matches("[1-9[xml[.]]]"); will not work because [] will create a character class group, not a capturing group.
What this means is that, to java, your expression is saying "match any of: [1-to-9 [or x, or m, or l [or *any*]]]" (*any* here is because you did not escape the ., and as it, it will create a match any character command)
Important:
"\" is recognized by java as a literal escape character, and for it to be sent to the matcher as an actual matcher's escape character (also "\", but in string form), it itself needs to be escaped, thus, when you mean to use "\" on the matcher, you must actually use "\\".
This is a bit confusing when you are not used to it, but to sum it up, to send an actual "\" to be matched to the matcher, you might have to use "\\\\"! The first "\\" will become "\" to the matcher, thus a scape character, and the second "\\", escaped by the first, will become the actual "\" string!
The correct pattern-string to match for a ###.###.###.###.xml pattern where the "#" are always numbers, is string.matches("(\\d{3}\\.){4}xml"), and how it works is as follows:
The \\d = will match a single digit character. It is the same as
using [0-9], just simpler.
The {3} specifies matching for "exactly 3 times" for the previous
\\d. Thus matching ###.
The \\. matches a single dot character.
The () enclosing the previous code says "this is a capturing group"
to the matcher. It is used by the next {4}, thus creating a "match
this whole ###. group exactly 4 times", thus creating "match ###.###.###.###.".
And finally, the xml before the pattern-string ends will match
exactly "xml", which, along the previous items, makes the exact match for that pattern: "###.###.###.###.xml".
For further learning, read Java's Pattern docs.
string.matches("[1-9.]+\\.xml")
should do it.
[1-9.]+ matches one or more digits between 1 and 9 and/or periods. (+ means "one or more", * means "zero or more", ? means "zero or one").
\.xml matches .xml. Since . means "any character" in a regex, you need to escape it if you want it to mean a literal period: \. (and since this is in a Java string, the backslash itself needs to be escaped by doubling).

How to escape a square bracket for Pattern compilation?

I have comma separated list of regular expressions:
.{8},[0-9],[^0-9A-Za-z ],[A-Z],[a-z]
I have done a split on the comma. Now I'm trying to match this regex against a generated password. The problem is that Pattern.compile does not like square brackets that is not escaped.
Can some please give me a simple function that takes a string like so: [0-9] and returns the escaped string \[0-9\].
For some reason, the above answer didn't work for me. For those like me who come after, here is what I found.
I was expecting a single backslash to escape the bracket, however, you must use two if you have the pattern stored in a string. The first backslash escapes the second one into the string, so that what regex sees is \]. Since regex just sees one backslash, it uses it to escape the square bracket.
\\]
In regex, that will match a single closing square bracket.
If you're trying to match a newline, for example though, you'd only use a single backslash. You're using the string escape pattern to insert a newline character into the string. Regex doesn't see \n - it sees the newline character, and matches that. You need two backslashes because it's not a string escape sequence, it's a regex escape sequence.
You can use Pattern.quote(String).
From the docs:
public static String quote​(String s)
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.
You can use the \Q and \E special characters...anything between \Q and \E is automatically escaped.
\Q[0-9]\E
Pattern.compile() likes square brackets just fine. If you take the string
".{8},[0-9],[^0-9A-Za-z ],[A-Z],[a-z]"
and split it on commas, you end up with five perfectly valid regexes: the first one matches eight non-line-separator characters, the second matches an ASCII digit, and so on. Unless you really want to match strings like ".{8}" and "[0-9]", I don't see why you would need to escape anything.

Categories

Resources