Java - Escape star in character class in regex [duplicate] - java

This question already has answers here:
Including a hyphen in a regex character bracket?
(6 answers)
Closed 3 years ago.
I would like to match a line with a regular expression. The line contains two numbers which could be divided by plus, minus or a star (multiplication). However, I am not sure how to escape the star.
line.matches("[0-9]*[+-*][0-9]*");
I tried also line.matches("[0-9]*[+-\\*][0-9]*"); but it does not work either.
Should I put the star into separate group? Why does the escaping \\* not work in this case?

* is not metacharacter in character class ([...]) so you don't need to escape it at all. What you need to escape is - because inside character class it is responsible for creating range of characters like [a-z].
So instead of "[+-*]" which represents all characters placed in Unicode Table between + and * use
"[+\\-*]"
or place - where it can't be used as range indicator
at start of character class [-+*]
at end of character class [+*-]
or right after other range if you have one [a-z-+*]
BTW if you would like to add \ literal to your operators you need to write it in regex as \\ (it is metacharacter used for escaping or to access standard character classes like \w so we also need to escape it). But since \ is also special in String literals (it can be used to represent characters via \n \r \t or \uXXXX), you also need to escape it there as well. So in regex \ needs to be represented as \\ which as string literal is written as "\\\\".
BTW 2: to represent digit instead of [0-9] you can use \d (written in string literal as "\\d" since \ is special there and requires escaping).
BTW 3: if you want to make sure that there will be at least two numbers in string (one on each side of operator) you need to use + instead of * at [0-9]* since + represents one or more occurrence of previous element, while * represents zero or more occurrences.
So your code can look like
line.matches("\\d+[-+*]\\d+");

Related

Why does Java-regex matches underscore? [duplicate]

This question already has answers here:
Java RegEx meta character (.) and ordinary dot?
(9 answers)
Closed 2 years ago.
I was trying to match the URL pattern string.string. for any number of string. using ^([^\\W_]+.)([^\\W_]+.)$ as a first attempt, and it works for matching two consecutive patterns. But then, when I generalize it to ^([^\\W_]+.)+$ stops working and matches the wrong pattern "string.str_ing.".
Do you know what is incorrect with the second version?
You need to escape your . character, else it will match any character including _.
^([^\\W_]+\.?)+$
this can be your generalised regex
With ^([^\\W_]+.)([^\\W_]+.)$ you match any two words with restricted set of characters. Although, you have not escaped the ., it still works as long as the first word is matched first string, then any literal (that's what unescaped . means) and then string again.
In the latter one the unescaped dot (.) is a part of the capturing group occurring at least once (since you use +), therefore it allows any character as a divisor. In other words string.str_ing. is understood as:
string as the 1st word
str as the 2nd word
ing as the 3rd word
... as long as the unescaped dot (.) allows any divisor (both . literally and _).
Escape the dot to make the Regex work as intented (demo):
^([^\\W_]+\.)+$
[^\W] seems a weird choice - it's matching 'not not-a-word-character'. I haven't thought it through, but that sounds like it's equivalent to \w, i.e., matching a word character.
Either way, with ^\W and \w, you're asking to match underscores - which is why it matches the string with the underscore. "Word characters" are uppercase alphabetics, lowercase alphabetics, digits, and underscore.
You probably want [a-z]+ or maybe [A-Za-z0-9]+

Java - How to set the dilimiter to multiple different things using .useDelimiter()?

I wish to use Scanner class method : .useDilimiter() to parse a file, previously I would've used a series of .replaceAll() statements to replace what I wanted the dilimiter to be with white space.
Anyway, I'm trying to make a Scanner's dilimiter the any of the following characters: ., (,),{,},[,],,,! and standard white space. How would I go about doing this?
Scanner uses regular expression (regex) to describe delimiter. By default it is \p{javaWhitespace}+ which represents one or more (due to + operator) whitespaces.
In regex to represent single character from set of characters we can use character class [...]. But since [ and ] in regex represents start and end of character class these characters are metacharacters (even inside character class). To treat them as literals we need to escape them first. We can do it by
adding \ (in string written as "\\") before them,
or by placing them in \Q...\E which represents quote section (where all characters are considered as literals, not metacharacters).
So regex representing one of ( ) { } [ ] , ! characters can look like "[\\Q(){}[],!\\E]".
If you want to add support for standard delimiter you can combine this regex with \p{javaWhitespace}+ using OR operator which is |.
So your code can look like:
yourScanner.useDelimiter("[\\Q(){}[],!\\E]|\\p{javaWhitespace}+");

Java Regular Expression - how to use backslash [duplicate]

This question already has answers here:
java, regular expression, need to escape backslash in regex
(4 answers)
Closed 6 years ago.
I am really confused with how to escape. Sometimes I just need to prepend a backslash but sometimes I need to prepend double backslash like "\\.".
Could any one tell me why?
Also, could anyone give me an explanation of difference in
String.split("\t"),
String.split("\\t"),
String.split("\\\t"),
String.split("\\\\t")?
Backslash is special character in string literals - we can use it to create \n or escape " like \".
But backslash is also special in regular expression engine - for instance we can use it to use default character classes like \w \d \s.
So if you want to create string which will represent regex/text like \w you need to write it as "\\w".
If you want to write regex which will represent \ literal then text representing such regex needs to look like \\ which means String representing such text needs to be written as "\\\\".
In other words we need to escape backslash twice:
- once in regex \\
- and once in string "\\\\".
If you want to pass to regex engine literal which will represent tab then you don't need to escape backslash at all. Java will understand "\t" string as string representing tab character and you can pass such string to your regex engine without problems.
For our comfort regex engine in Java interprets text representing \t (also \r and \n) same way as string literals interpret "\t". In other words we can pass to regex engine text which will represent \ character and t character and be sure that it will be interpreted as representation of tab character.
So code like split("\t") or split("\\t") will try to split on tab.
Code like split("\\\\t") will try to split text not on tab character, but on \ character followed by t. It happens because "\\\\" as explained represents text \\ which regex engine sees as escaped \ (so it is treated as literal).

regex for allowing only certain special characters and also including the alphanumerical characters

I'm struggling with REGEX and require it for a program.
The input require only alphanumerical keys and also (allow only comma,:,space,/,- in special chars)
I have tried = (^[a-zA-Z0-9,:\S/-]*$)
As far as i understand and please correct me if I'm wrong.
a-zA-Z0-9 - The alphanumerical keys.
,: - Comma and colon
\S - Space
/ - I'm not sure how to represent a forward slash thus i escaped it
- - Dash also not sure if it is needed to escape it.
Would be appreciated if this can be corrected and also a explanation of each part.
Thanks in advance.
You can replace a-zA-Z0-9 with just \\w which is short for [a-zA-Z_0-9]. Furthermore, \\S is any character, but not a whitespace, you should use a \\s instead. You don't need to escape /, and even - if it's the first one or the last one, because if it's placed between two characters it could be interpreted as range and you'll have to escape it. So, you can make your regex like ^([\w,:\s/-]*)$
The \S shorthand matches any character except whitespace, just the opposite of what you want. Lowercase \s matches whitespace [\t\v\n\r\f ]. But if you only want spaces, just put a space in the character class.
a hyphen - needs to be escaped inside characters, unless it's the first or last character in the character class, but you could always escape it just to be sure.
Slashes / don't need to be escaped. They're escaped in other languages where you use them as pattern delimiters. ie: /regex/i.
Besides hyphens and shorthands, only backslashes \\ and closing brackets \] need to be escaped.
Remember in java, you always need to use double backslashes (one is interpreted by java, the other by the regex engine).
Regex
pattern = "^[a-zA-Z0-9 ,:/\\-]*$"
Move the Start of Line ^ and End of Line $ outside the group - like
^([a-zA-Z0-9,:\S/-]*)$
That should do it.

Java Regex Escape Characters

I'm learning Regex, and running into trouble in the implementation.
I found the RegexTestHarness on the Java Tutorials, and running it, the following string correctly identifies my pattern:
[\d|\s][\d]\.
(My pattern is any double digit, or any single digit preceded by a space, followed by a period.)
That string is obtained by this line in the code:
Pattern pattern =
Pattern.compile(console.readLine("%nEnter your regex: "));
When I try to write a simple class in Eclipse, it tells me the escape sequences are invalid, and won't compile unless I change the string to:
[\\d|\\s][\\d]\\.
In my class I'm using`Pattern pattern = Pattern.compile();
When I put this string back into the TestHarness it doesn't find the correct matches.
Can someone tell me which one is correct? Is the difference in some formatting from console.readLine()?
\ is special character in String literals "...". It is used to escape other special characters, or to create characters like \n \r \t.
To create \ character in string literal which can be used in regex engine you need to escape it by adding another \ before it (just like you do in regex when you need to escape its metacharacters like dot \.). So String representing \ will look like "\\".
This problem doesn't exist when you are reading data from user, because you are already reading literals, so even if user will write in console \n it will be interpreted as two characters \ and n.
Also there is no point in adding | inside class character [...] unless your intention is to make that class also match | character, remember that [abc] is the same as (a|b|c) so there is no need for | in "[\\d|\\s]".
If you want to represent a backslash in a Java string literal you need to escape it with another backslash, so the string literal "\\s" is two characters, \ and s. This means that to represent the regular expression [\d\s][\d]\. in a Java string literal you would use "[\\d\\s][\\d]\\.".
Note that I also made a slight modification to your regular expression, [\d|\s] will match a digit, whitespace, or the literal | character. You just want [\d\s]. A character class already means "match one of these", since you don't need the | for alternation within a character class it loses its special meaning.
My pattern is any double digit or single digit preceded by a space, followed by a period.)
Correct regex will be:
Pattern pattern = Pattern.compile("(\\s\\d|\\d{2})\\.");
Also if you're getting regex string from user input then your should call:
Pattern.quote(useInputRegex);
To escape all the regex special characters.
Also you double escaping because 1 escape is handled by String class and 2nd one is passed on to regex engine.
What is happening is that escape sequences are being evaluated twice. Once for java, and then once for your regex.
the result is that you need to escape the escape character, when you use a regex escape sequence.
for instance, if you needed a digit, you'd use
"\\d"

Categories

Resources