I wish to use Scanner class method : .useDilimiter() to parse a file, previously I would've used a series of .replaceAll() statements to replace what I wanted the dilimiter to be with white space.
Anyway, I'm trying to make a Scanner's dilimiter the any of the following characters: ., (,),{,},[,],,,! and standard white space. How would I go about doing this?
Scanner uses regular expression (regex) to describe delimiter. By default it is \p{javaWhitespace}+ which represents one or more (due to + operator) whitespaces.
In regex to represent single character from set of characters we can use character class [...]. But since [ and ] in regex represents start and end of character class these characters are metacharacters (even inside character class). To treat them as literals we need to escape them first. We can do it by
adding \ (in string written as "\\") before them,
or by placing them in \Q...\E which represents quote section (where all characters are considered as literals, not metacharacters).
So regex representing one of ( ) { } [ ] , ! characters can look like "[\\Q(){}[],!\\E]".
If you want to add support for standard delimiter you can combine this regex with \p{javaWhitespace}+ using OR operator which is |.
So your code can look like:
yourScanner.useDelimiter("[\\Q(){}[],!\\E]|\\p{javaWhitespace}+");
Related
I tried this but it doesn't work :
[^\s-]
Any Ideas?
[^\s-]
should work and so will
[^-\s]
[] : The char class
^ : Inside the char class ^ is the
negator when it appears in the beginning.
\s : short for a white space
- : a literal hyphen. A hyphen is a
meta char inside a char class but not
when it appears in the beginning or
at the end.
It can be done much easier:
\S which equals [^ \t\r\n\v\f]
Which programming language are you using? May be you just need to escape the backslash like "[^\\s-]"
In Java:
String regex = "[^-\\s]";
System.out.println("-".matches(regex)); // prints "false"
System.out.println(" ".matches(regex)); // prints "false"
System.out.println("+".matches(regex)); // prints "true"
The regex [^-\s] works as expected. [^\s-] also works.
See also
Regular expressions and escaping special characters
regular-expressions.info/Character class
Metacharacters Inside Character Classes
The hyphen can be included right after the opening bracket, or right before the closing bracket, or right after the negating caret.
Note that regex is not one standard, and each language implements its own based on what the library designers felt like. Take for instance the regex standard used by bash, documented here: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05.
If you are having problems with regular expressions not working, it might be good to simplify it, for instance using "[^ -]" if this covers all forms of whitespace in your case.
Try [^- ], \s will match 5 other characters beside the space (like tab, newline, formfeed, carriage return).
I'm trying to detect profanity using regex. But I want to detect the word even if they've spaced out the word like "Profa nity". However when using the "(?x)" option it still doesn't want to detect.
I currently got:
(?ix).*Bad Word.*
I've tried using http://www.rubular.com to debug the expression with not luck.
If it helps in any way it's for at Teamspeak Bot where I want to kick the user for having banned words in their name. In the config it refers to http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html where I can't find anything relating to the (?) options.
The bot itself can be found here: https://forum.teamspeak.com/threads/51286-JTS3ServerMod-Multifunction-TS3-Server-Bot-(Idle-Record-Away-Mute-Welcome-)
when using the "(?x)" option it still doesn't want to detect
The (?x) is an embedded flag option (also known as an inline modifier/option) enables the Pattern.COMMENTS option, also known as free-spacing mode that enables comments inside regular expressions and makes the regex engine ignore all regular whitespace inside the pattern. As per Free-Spacing in Character Classes:
In free-spacing mode, whitespace between regular expression tokens is ignored. Whitespace includes spaces, tabs, and line breaks. Note that only whitespace between tokens is ignored. a b c is the same as abc in free-spacing mode. But \ d and \d are not the same. The former matches d, while the latter matches a digit. \d is a single regex token composed of a backslash and a "d". Breaking up the token with a space gives you an escaped space (which matches a space), and a literal "d".
Likewise, grouping modifiers cannot be broken up. (?>atomic) is the same as (?> ato mic ) and as ( ?>ato mic). They all match the same atomic group. They're not the same as (? >atomic). The latter is a syntax error. The ?> grouping modifier is a single element in the regex syntax, and must stay together. This is true for all such constructs, including lookaround, named groups, etc.
So, to match a single space in a pattern with the (?x) modifier, you need to escape it:
String reg = "(?ix).*Bad\\ Word.*"; // Escaped space matches a space in free spacing mode
String reg = "(?ix).* Bad\\ Word .*"; // More formatting spaces, same pattern
NOTE that you CAN'T put the space into a character class to make it meaningful in a Java regex. See below:
Java, however, does not treat a character class as a single token in free-spacing mode. Java does ignore spaces, line breaks, and comments inside character classes. So in Java's free-spacing mode, [abc] is identical to [ a b c ].
Besides, I think you actually wanted to make sure your pattern can match full strings that may contain line breaks. That means, you need (?s), Pattern.DOTALL, modifier:
String reg = "(?is).*Bad Word.*";
Also, to match any whitespace, you may rely on \s:
String reg = "(?ix).*Bad\\sWord.*"; // To only match 1 whitespace
String reg = "(?ix).*Bad\\s+Word.*"; // To account for 1 or more whitespaces
Here is the error:
Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed character class near index 3
], [
^
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.clazz(Pattern.java:2493)
at java.util.regex.Pattern.sequence(Pattern.java:2030)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at java.util.regex.Pattern.compile(Pattern.java:1665)
at java.util.regex.Pattern.<init>(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at java.lang.String.split(String.java:2313)
at java.lang.String.split(String.java:2355)
at testJunior2013.J2.main(J2.java:31)
This is the area of the code that is causing the issues.
String[][] split = new String[1][rows];
split[0] = (Arrays.deepToString(array2d)).split("], ["); //split at the end of an array row
What does this error mean and what needs to be done to fix the code above?
TL;DR
You want:
.split("\\], \\[")`
Escape each square bracket twice — once for each context in which you need to strip them from their special meaning: within a Regular Expression first, and within a Java String secondly.
Consider using Pattern#quote when you need your entire pattern to be interpreted literally.
Explanation
String#split works with a Regular Expression but [ and ] are not standard characters, regex-wise: they have a special meaning in that context.
In order to strip them from their special meaning and simply match actual square brackets, they need to be escaped, which is done by preceding each with a backslash — that is, using \[ and \].
However, in a Java String, \ is not a standard character either, and needs to be escaped as well.
Thus, just to split on [, the String used is "\\[" and you are trying to obtain:
.split("\\], \\[")
A sensible alternative
However, in this case, you're not just semantically escaping a few specific characters in a Regular Expression, but actually wishing that your entire pattern be interpreted literally: there's a method to do just that 🙂
Pattern#quote is used to signify that the:
Metacharacters [...] in your pattern will be given no special meaning.
(from the Javadoc linked above)
I recommend, in this case, that you use the following, more sensible and readable:
.split(Pattern.quote("], ["))
Split receives a regex and [, ] characters have meaning in regex, so you should escape them with \\[ and \\].
The way you are currently doing it, the parser finds a ] without a preceding [ so it throws that error.
String.split() takes a regular expression, not a normal string as an argument. In a regular expression, ] and [ are special characters, which need to be preceded by backslashes to be taken literally. Use .split("\\], \\["). (the double backslashes tell Java to interpret the string as "\], \[").
.split("], [")
^---start of char class
end----?
Change it to
.split("], \[")
^---escape the [
Try to use it
String stringToSplit = "8579.0,753.34,796.94,\"[784.2389999999999,784.34]\",\"[-4.335912230999999, -4.3603307895,4.0407909059, 4.08669583455]\",[],[],[],0.1744,14.4,3.5527136788e-15,0.330667850653,0.225286999939,Near_Crash";
String [] arraySplitted = stringToSplit.replaceAll("\"","").replaceAll("\\[","").replaceAll("\\]","").trim().split(",");
I am new to regex. I have this regex:
\[(.*[^(\]|\[)].*)\]
Basically it should take this:
[[a][b][[c]]]
And be able to replace with:
[dd[d]]
abc, d are unrelated. Needless to say the regex bit isn't working. it replaces the entire string with "d" in this case.
Any explanation or aid would be great!
EDIT:
I tried another regex,
\[([^\]]{0})\]
This one worked for the case where brackets contain no inner brackets and nothing else inside. But it doesn't work for the described case.
You need to know that . dot is special character which represents "any character beside new line mark" and * is greedy so it will try to find maximal match.
In your regex \[(.*[^(\]|\[)].*)\] first .* will represent maximal set of characters between [ and [^(\]|\[)].*)\]] and this part can be understood as non [ or ] character, optional other characters .* and finally ]. So this regex will match your entire input.
To get rid of that problem remove both .* from your regex. Also you don't need to use | or ( ) inside [^...].
System.out.println("[[a][b][[c]]]".replaceAll("\\[[^\\]\\[]\\]", "d"));
Output: [dd[d]]
\[(\[a\])(\[b\])\[(\[c\])\]\]
If you need to double backslashes in the current context (such as you are placing it in a "" style string):
\\[(\\[a\\])(\\[b\\])\\[(\\[c\\])\\]\\]
An example replacement for a, b and c is [^\]]*, or if you need to escape backslashes [^\\]]*.
Now you can replace capture one, capture two and capture three each with d.
If the string you are replacing in is not exactly of that format, then you want to do a global replacement with
(\[a\])
replacing a,
(\[[^\]]*\])
doubling backslashes,
(\\[[^\\]]*\\])
Try this:
System.out.println("[[a][b][[c]]]".replaceAll("\\[[^]\\[]]", "d"));
if a,b,c are in real world more than one character, use this:
System.out.println("[[a][b][[c]]]".replaceAll("\\[[^]\\[]++]", "d"));
The idea is to use a character class that contains all characters but [ and ]. The class is: [^]\\[] and other square brackets in the pattern are literals.
Note that a literal closing square bracket don't need to be escaped at the first position in a character class and outside a character class.
I consider myself pretty good with Regular Expressions, but this one is appearing to be surprisingly tricky: I want to trim all whitespace, except the space character: ' '.
In Java, the RegEx I have tried is: [\s-[ ]], but this one also strips out ' '.
UPDATE:
Here is the particular string that I am attempting to strip spaces from:
project team manage key
Note: it would be the characters between "team" and "manage". They appear as a long space when editing this post but view as a single space in view mode.
Try using this regular expression:
[^\S ]+
It's a bit confusing to read because of the double negative. The regular expression [\S ] matches the characters you want to keep, i.e. either a space or anything that isn't a whitespace. The negated character class [^\S ] therefore must match all the characters you want to remove.
Using a Guava CharMatcher:
String text = ...
String stripped = CharMatcher.WHITESPACE.and(CharMatcher.isNot(' '))
.removeFrom(text);
If you actually just want that trimmed from the start and end of the string (like String.trim()) you'd use trimFrom rather than removeFrom.
There's no subtraction of character classes in Java, otherwise you could use [\s--[ ]], note the double dash. You can always simulate set subtraction using intersection with the complement, so
[\s&&[^ ]]
should work. It's no better than [^\S ]+ from the first answer, but the principle is different and it's good to know both.
I solved it with this:
anyString.replace(/[\f\t\n\v\r]*/g, '');
It is just a collection of all possible white space characters excluding blank (so actually
\s without blanks). It includes tab, carriage return, new line, vertical tab and form feed characters.