Pattern for Guava Splitter - java

I need to split String by comma or dot or backslach :
Pattern stringPattern = Pattern.compile("\\s+|,|\\\\|");
Splitter.on(stringPattern).omitEmptyStrings().split(description));
but this pattern don't work , what is wrong ?

Why not use a CharMatcher?
Splitter.on(CharMatcher.anyOf(",.\\")).omitEmptyStrings().split(description);
Given your simple problem, I don't think you need the regular expressions.

The correct regex for comma or dot or backslash is [.,\\], so in Java that's
Pattern.compile("[.,\\\\]")
I do like Olivier's suggestion of CharMatcher though.

I'd use string.split with the regular expressions. Following should work (I have not tried)
description.split(",.\\")
Then do null check (as such splitter has extra api for the same).
Patterns are useful for identifying "groups". Any regular expression related splitting can be equally done with strings (instead of pattern)-that is not to discourage from using Guava!

Related

Replace a String containing "$" with "\$" in Java

How can I do it? I made a research but I could not find a clear answer.I tried to use
pass = pass.replaceAll("$", "\\$");
but It does not work.
use
pass = pass.replace("$", "\\$");
It will also replace all occurrences. See JavaDoc.
If you prefer the hard way and want to use a regex, you need:
pass = pass.replaceAll("\\$", "\\\\\\$");
This can be simplified with Matcher.quoteReplacement() but still, only use replaceAll() when you need to replace something that matches a regular expression, and use replace() when you have to replace a literal sequence.
The problem is that String.replaceAll uses regular expressions, where both \ and $ have special meanings. You don't want that as far as I can tell - you just want to replace the strings verbatim. As such, you should use String.replace:
pass = pass.replace("$", "\\$");
(Personally I think the fact that replaceAll uses regular expressions is a design mistake, but that's another matter.)

Regex with -, ::, ( and )

I need to split the string
(age-is-25::OR::last_name-is-qa6)::AND::(age-is-20::OR::first_name-contains-test)
into
string[0] = (age-is-25::OR::last_name-is-qa6)
string[1] = AND
string[2] = (age-is-20::OR::first_name-contains-test)
I tried writing so many regex expressions, but nothing works as expected.
Using the following regex, Matcher.groupCount() which returns 2 but assigning results to an arraylist returns null as the elements.
Pattern pattern = Pattern.compile("(\\)::)?|(::\\()?");
I tried to split it using ):: or ::(.
I know the regex looks too stupid, but being a beginner this is the best I could write.
You can use positive lookahead and lookbehind to match the first and last parentheses.
String str = "(age-is-25::OR::last_name-is-qa6)::AND::(age-is-20::OR::first_name-contains-test)";
for (String s : str.split("(?<=\\))::|::(?=\\()"))
System.out.println(s);
Outputs:
(age-is-25::OR::last_name-is-qa6)
AND
(age-is-20::OR::first_name-contains-test)
Just a note however: It seems like you are parsing some kind of recursive language. Regular expressions are not good at doing this. If you are doing advanced parsing I would recommend you to look at other parsing methods.
To me it looks like a big part of your stress comes from the need for escaping special characters in your search term. I highly recommend to not do manual escaping of special characters, but instead to use Pattern.quote(...) for the escaping.
This should works
"(?<=\\))::|::(?=\\()"
This should work for you.
\)::|::\(
textString.split("\\)::|::\\(")
should work.

Splitting string with character sequence as a delimiter

The requirement is to split strings in Java so that the following
"this#{s}is#{s}a#{s}string"
would result in the following array
["this","is","a","string"]
As you can see here the delimiter is the character sequence "#{s}".
What is the fastest and efficient way of doing this using existing tools?
Am I right to assume that using regex (String.split()) is a bit of wasting because we are splitting using static string?
I got the assumption from here http://www.javamex.com/tutorials/regular_expressions/splitting_tokenisation_performance.shtml .
But I cannot use StringTokenizer since the delimiter is a sequence of char.
Note: currently I'm using String.split() and have no problem with that. This is pure curiosity.
Faster than using String.split is Pattern.split: i.e., precompile the pattern and store that for subsequent use. If you use the same pattern all the time, and do a lot of splitting using that pattern, it may be worth putting that pattern into a static field or something.
Also, if your pattern contains no regex metacharacters, you can pass in Pattern.LITERAL when creating the pattern. This is something you can't do with String.split. :-P

Refactor Regex Pattern - Java

I have the following aaaa_bb_cc string to match and written a regex pattern like
\\w{4}+\\_\\w{2}\\_\\w{2} and it works. Is there any simple regex which can do this same ?
You don't need to escape the underscores:
\w{4}+_\w{2}_\w{2}
And you can collapse the last two parts, if you don't capture them anyway:
\w{4}+(?:_\w{2}){2}
Doesn't get shorter, though.
(Note: Re-add the needed backslashes for Java's strings, if you like; I prefer to omit them while talking about regular expressions :))
I sometimes do what I call "meta-regexing" as follows:
String pattern = "x{4}_x{2}_x{2}".replace("x", "[a-z]");
System.out.println(pattern); // prints "[a-z]{4}_[a-z]{2}_[a-z]{2}"
Note that this doesn't use \w, which can match an underscore. That is, your original pattern would match "__________".
If x really needs to be replaced with [a-zA-Z0-9], then just do it in the one place (instead of 3 places).
Other examples
Regex for metamap in Java
How do I convert CamelCase into human-readable names in Java?
Yes, you can use just \\w{4}_\\w{2}_\\w{2} or maybe \\w{4}(_\\w{2}){2}.
Looks like your \w does not need to match underscore, so you can use [a-zA-Z0-9] instead
[a-zA-Z0-9]{4}_[a-zA-Z0-9]{2}_[a-zA-Z0-9]{2}

Escaping a String from getting regex parsed in Java

In Java, suppose I have a String variable S, and I want to search for it inside of another String T, like so:
if (T.matches(S)) ...
(note: the above line was T.contains() until a few posts pointed out that that method does not use regexes. My bad.)
But now suppose S may have unsavory characters in it. For instance, let S = "[hi". The left square bracket is going to cause the regex to fail. Is there a function I can call to escape S so that this doesn't happen? In this particular case, I would like it to be transformed to "\[hi".
String.contains does not use regex, so there isn't a problem in this case.
Where a regex is required, rather rejecting strings with regex special characters, use java.util.regex.Pattern.quote to escape them.
As Tom Hawtin said, you need to quote the pattern. You can do this in two ways (edit: actually three ways, as pointed out by #diastrophism):
Surround the string with "\Q" and "\E", like:
if (T.matches("\\Q" + S + "\\E"))
Use Pattern instead. The code would be something like this:
Pattern sPattern = Pattern.compile(S, Pattern.LITERAL);
if (sPattern.matcher(T).matches()) { /* do something */ }
This way, you can cache the compiled Pattern and reuse it. If you are using the same regex more than once, you almost certainly want to do it this way.
Note that if you are using regular expressions to test whether a string is inside a larger string, you should put .* at the start and end of the expression. But this will not work if you are quoting the pattern, since it will then be looking for actual dots. So, are you absolutely certain you want to be using regular expressions?
Try Pattern.quote(String). It will fix up anything that has special meaning in the string.
Any particular reason not to use String.indexOf() instead? That way it will always be interpreted as a regular string rather than a regex.
Regex uses the backslash character '\' to escape a literal. Given that java also uses the backslash character you would need to use a double bashslash like:
String S = "\\[hi"
That will become the String:
\[hi
which will be passed to the regex.
Or if you only care about a literal String and don't need a regex you could do the following:
if (T.indexOf("[hi") != -1) {
T.contains() (according to javadoc : http://java.sun.com/javase/6/docs/api/java/lang/String.html) does not use regexes. contains() delegates to indexOf() only.
So, there are NO regexes used here. Were you thinking of some other String method ?

Categories

Resources