Here is the error:
Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed character class near index 3
], [
^
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.clazz(Pattern.java:2493)
at java.util.regex.Pattern.sequence(Pattern.java:2030)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at java.util.regex.Pattern.compile(Pattern.java:1665)
at java.util.regex.Pattern.<init>(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at java.lang.String.split(String.java:2313)
at java.lang.String.split(String.java:2355)
at testJunior2013.J2.main(J2.java:31)
This is the area of the code that is causing the issues.
String[][] split = new String[1][rows];
split[0] = (Arrays.deepToString(array2d)).split("], ["); //split at the end of an array row
What does this error mean and what needs to be done to fix the code above?
TL;DR
You want:
.split("\\], \\[")`
Escape each square bracket twice — once for each context in which you need to strip them from their special meaning: within a Regular Expression first, and within a Java String secondly.
Consider using Pattern#quote when you need your entire pattern to be interpreted literally.
Explanation
String#split works with a Regular Expression but [ and ] are not standard characters, regex-wise: they have a special meaning in that context.
In order to strip them from their special meaning and simply match actual square brackets, they need to be escaped, which is done by preceding each with a backslash — that is, using \[ and \].
However, in a Java String, \ is not a standard character either, and needs to be escaped as well.
Thus, just to split on [, the String used is "\\[" and you are trying to obtain:
.split("\\], \\[")
A sensible alternative
However, in this case, you're not just semantically escaping a few specific characters in a Regular Expression, but actually wishing that your entire pattern be interpreted literally: there's a method to do just that 🙂
Pattern#quote is used to signify that the:
Metacharacters [...] in your pattern will be given no special meaning.
(from the Javadoc linked above)
I recommend, in this case, that you use the following, more sensible and readable:
.split(Pattern.quote("], ["))
Split receives a regex and [, ] characters have meaning in regex, so you should escape them with \\[ and \\].
The way you are currently doing it, the parser finds a ] without a preceding [ so it throws that error.
String.split() takes a regular expression, not a normal string as an argument. In a regular expression, ] and [ are special characters, which need to be preceded by backslashes to be taken literally. Use .split("\\], \\["). (the double backslashes tell Java to interpret the string as "\], \[").
.split("], [")
^---start of char class
end----?
Change it to
.split("], \[")
^---escape the [
Try to use it
String stringToSplit = "8579.0,753.34,796.94,\"[784.2389999999999,784.34]\",\"[-4.335912230999999, -4.3603307895,4.0407909059, 4.08669583455]\",[],[],[],0.1744,14.4,3.5527136788e-15,0.330667850653,0.225286999939,Near_Crash";
String [] arraySplitted = stringToSplit.replaceAll("\"","").replaceAll("\\[","").replaceAll("\\]","").trim().split(",");
Related
I tried this but it doesn't work :
[^\s-]
Any Ideas?
[^\s-]
should work and so will
[^-\s]
[] : The char class
^ : Inside the char class ^ is the
negator when it appears in the beginning.
\s : short for a white space
- : a literal hyphen. A hyphen is a
meta char inside a char class but not
when it appears in the beginning or
at the end.
It can be done much easier:
\S which equals [^ \t\r\n\v\f]
Which programming language are you using? May be you just need to escape the backslash like "[^\\s-]"
In Java:
String regex = "[^-\\s]";
System.out.println("-".matches(regex)); // prints "false"
System.out.println(" ".matches(regex)); // prints "false"
System.out.println("+".matches(regex)); // prints "true"
The regex [^-\s] works as expected. [^\s-] also works.
See also
Regular expressions and escaping special characters
regular-expressions.info/Character class
Metacharacters Inside Character Classes
The hyphen can be included right after the opening bracket, or right before the closing bracket, or right after the negating caret.
Note that regex is not one standard, and each language implements its own based on what the library designers felt like. Take for instance the regex standard used by bash, documented here: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05.
If you are having problems with regular expressions not working, it might be good to simplify it, for instance using "[^ -]" if this covers all forms of whitespace in your case.
Try [^- ], \s will match 5 other characters beside the space (like tab, newline, formfeed, carriage return).
I wish to use Scanner class method : .useDilimiter() to parse a file, previously I would've used a series of .replaceAll() statements to replace what I wanted the dilimiter to be with white space.
Anyway, I'm trying to make a Scanner's dilimiter the any of the following characters: ., (,),{,},[,],,,! and standard white space. How would I go about doing this?
Scanner uses regular expression (regex) to describe delimiter. By default it is \p{javaWhitespace}+ which represents one or more (due to + operator) whitespaces.
In regex to represent single character from set of characters we can use character class [...]. But since [ and ] in regex represents start and end of character class these characters are metacharacters (even inside character class). To treat them as literals we need to escape them first. We can do it by
adding \ (in string written as "\\") before them,
or by placing them in \Q...\E which represents quote section (where all characters are considered as literals, not metacharacters).
So regex representing one of ( ) { } [ ] , ! characters can look like "[\\Q(){}[],!\\E]".
If you want to add support for standard delimiter you can combine this regex with \p{javaWhitespace}+ using OR operator which is |.
So your code can look like:
yourScanner.useDelimiter("[\\Q(){}[],!\\E]|\\p{javaWhitespace}+");
Consider the array:
new Pattern[] {Pattern.compile("\\["),Pattern.compile("\\]") };
Intellij IDEA tells me that \\ is redundant and tells me to replace this with ] e.g. the result is:
new Pattern[] {Pattern.compile("\\["),Pattern.compile("]") };
Why in the first Pattern.compile("\\[") is the \\ OK, but for the second it is redundant?
The ] symbol is not a special regex operator outside the character class if there is no corresponding unescaped [ before it. Only special characters require escaping. A [ is a special regex operator outside a character class (as it may mark the starting point of a character class). Once the Java regular expression engine sees an unescaped [ in the pattern, it knows there must be a ] to close the character class ahead. Whether it is escaped or not, it does not matter for the engine. If there is no opening [ in the expression, the ] is treated as a mere literal ] symbol. So, [abc] will match a, b or c, and \[abc] or \[abc\] will match [abc] literal character sequence.
So, the [ should be escaped always, and ] does not have to be escaped outside a character class.
When used inside a character class, both [ and ] must be escaped inside a Java regular expression as they may form intersection/subtraction patterns, unless the ] appears at the beginning of a character class (i.e. "[a]".replaceAll("[]\\[]", "") returns a).
Other regex flavors
icu onigmo - In ICU and Onigmo regex flavor, ] behaves the same as in Java regex flavor. Languages affected: swift, ruby, r (stringr), kotlin, groovy.
pcre boost .net re2 python posix - In Boost, PCRE, ] is not a special char (i.e. needs no escaping) outside a character class, and is a special char (=needs escaping) inside a character class (where it does not need escaping only if it is the first char in the character class.) It is not an error to escape it everywhere where it is supposted to match a literal ] char. Languages/tools affected: php, perl, c#/vb.net/etc., python, sed, grep, awk, elixir, r (both default base R TRE and PCRE enabled with "perl=TRUE"), tcl, google-sheets.
ecmascript - In ECMAScript flavors, ] is not special outside a character class, while [ is special outside a character class. Inside a character class, ] must ALWAYS be escaped, even if it is the first char in the character class. [ inside a character class is not special, but escaping it is an error if the regexp is compiled with the /u flag (in JavaScript). So, be careful here. Languages affected: javascript, dart, c++, vba, google-apps-script (which uses JavaScript).
The ] is considered metacharacter only when it is used to close character set [...].
If before ] there is no unclosed and unescaped opening square bracket [ then ] is as simple literal which doesn't require escaping (but allows it, which is why your IDE gives you "warning" instead of error).
Only place when you may want to escape ] is inside character set when you want regex to treat is as simple symbol instead of metacharacter which is closing character set.
For instance regex like "[ab\\]cd]" represents a or b or ] or c or d.
BUT similar regex can be also written like [a-d]|]. Notice that last ] is not "special" because there is no opened character class before it. So it is considered as literal - character without special meaning, which means it doesn't require escaping.
I am new to regex. I have this regex:
\[(.*[^(\]|\[)].*)\]
Basically it should take this:
[[a][b][[c]]]
And be able to replace with:
[dd[d]]
abc, d are unrelated. Needless to say the regex bit isn't working. it replaces the entire string with "d" in this case.
Any explanation or aid would be great!
EDIT:
I tried another regex,
\[([^\]]{0})\]
This one worked for the case where brackets contain no inner brackets and nothing else inside. But it doesn't work for the described case.
You need to know that . dot is special character which represents "any character beside new line mark" and * is greedy so it will try to find maximal match.
In your regex \[(.*[^(\]|\[)].*)\] first .* will represent maximal set of characters between [ and [^(\]|\[)].*)\]] and this part can be understood as non [ or ] character, optional other characters .* and finally ]. So this regex will match your entire input.
To get rid of that problem remove both .* from your regex. Also you don't need to use | or ( ) inside [^...].
System.out.println("[[a][b][[c]]]".replaceAll("\\[[^\\]\\[]\\]", "d"));
Output: [dd[d]]
\[(\[a\])(\[b\])\[(\[c\])\]\]
If you need to double backslashes in the current context (such as you are placing it in a "" style string):
\\[(\\[a\\])(\\[b\\])\\[(\\[c\\])\\]\\]
An example replacement for a, b and c is [^\]]*, or if you need to escape backslashes [^\\]]*.
Now you can replace capture one, capture two and capture three each with d.
If the string you are replacing in is not exactly of that format, then you want to do a global replacement with
(\[a\])
replacing a,
(\[[^\]]*\])
doubling backslashes,
(\\[[^\\]]*\\])
Try this:
System.out.println("[[a][b][[c]]]".replaceAll("\\[[^]\\[]]", "d"));
if a,b,c are in real world more than one character, use this:
System.out.println("[[a][b][[c]]]".replaceAll("\\[[^]\\[]++]", "d"));
The idea is to use a character class that contains all characters but [ and ]. The class is: [^]\\[] and other square brackets in the pattern are literals.
Note that a literal closing square bracket don't need to be escaped at the first position in a character class and outside a character class.
How can I split a string using [ as the delimiter?
String line = "blah, blah [ tweet, tweet";
if I do
line.split("[");
I get an error
Exception in thread "main" java.util.regex.PatternSyntaxException:
Unclosed character class near index 1 [
Any help?
The [ is a reserved char in regex, you need to escape it,
line.split("\\[");
Just escape it :
line.split("\\[");
[ is a special metacharacter in regex which needs to be escaped if not inside a character class such as in your case.
The split method operates using regular expressions. The character [ has special meaning in those; it is used to denote character classes between [ and ]. If you want to use a literal opening square bracket, use \\[ to escape it as a special character. There's two slashes because a backslash is also used as an escape character in Java String literals. It can get a little confusing typing regular expressions in Java code.
Please use "\\[" instead of "[".
The [ character is interpreted as a special regex character, so you have to escape it:
line.split("\\[");
if have to split between [] then try str.split("[\\[\\]]");