I'm sure I'm just overlooking something here...
Is there a simple way to split a String on an explicit character without applying RegEx rules?
For instance, I receive a string with a dynamic delimiter, I know the 5th character defines the delimiter.
String s = "This,is,a,sample";
For this, it's simple to do
String delimiter = String.valueOf(s.charAt(4));
String[] result = s.split(delimiter);
However, when I have a delimiter that's a special RegEx character, this doesn't work:
String s = "This*is*a*sample";
So... is there a way to split the string on an explicit character without trying to apply extra RegEx rules? I feel like I must be missing something pretty simple.
split uses a regular expression as its argument. * is a meta-character used to match zero of more characters in regular expressions, You could use Pattern#quote to avoid interpreting the character
String[] result = s.split(Pattern.quote(delimiter));
You need not to worry about the character type If you use Pattern
Pattern regex = Pattern.compile(s.charAt(4));
Matcher matcher = regex.matcher(yourString);
if (matcher.find()){
//do something
}
You can run Pattern.quote on the delimiter before feeding it in. This will create a string literal and escape any regex specific chars:
delimiter = Pattern.quote(delimiter);
StringUtils.split(s, delimiter);
That will treat the delimiter as just a character, not use it like a regex.
StringUtils is a part of the ApacheCommons library, which is tons of useful methods. It is worth taking a look, could save you some time in the future.
Simply put your delimiter between []
String delimiter = "["+s.charAt(4)+"]";
String[] result = s.split(delimiter);
Since [ ] is the regex matches any characters between [ ]. You can also specify a list of delimiters like [*,.+-]
Related
I have a regex pattern like "(\\d{4},\\d{2},\\d{2} :\\d{2}:\\d{2}:\\d{2})"
I am passing this pattern as argument to a function which tokenizes the input string based on ",".
Example:
func((\\d{4},\\d{2},\\d{2} :\\d{2}:\\d{2}:\\d{2}),func(n))";
How do I escape the comma in the regex while tokenizing?
Can you please post the function which tokenizes the string? Could help with respect to your code then.
With no such information, you could use split() as follows(if all you want to do is split on ","):
String s = "Messages,Hello\,World,Hobbies,Java\,Programming";
System.out.println(Arrays.toString(s.split("(?<!\\\\),")));
Refer - http://www.javacreed.com/how-to-split-a-string-with-escaped-delimiters/
You could replace your code with:
String str = "(\\d{4}\\,\\d{2}\\,\\d{2} \\d{2}:\\d{2}:\\d{2}), func(a)";
String[] tokens = str.split("(?<!\\\\),");
System.out.println(Arrays.toString(tokens));
This will give you a string array of tokens split on ","
The #Derryl Thomas answer is probably the correct answer.
Here is an alternate technique.
Use something else to indicate the comma in your regex.
Split based on commas.
Change the "something else" back to a comma.
For example:
Instead of "(\\d{4},\\d{2},\\d{2} :\\d{2}:\\d{2}:\\d{2})"
Use "(\\d{4}boppity\\d{2}boppity\\d{2} :\\d{2}:\\d{2}:\\d{2})"
Do the split based on comma.
Change the "boppity" in the regex to a ","; perhaps like this:
newStringVariable = yourStringVariable.replace("boppity", ",")
I want to match a string which occurs after a certain pattern but I am not able to come up with a regex to do that (I am using Java).
For example, let's say I have this string,
caa,abb,ksmf,fsksf,fkfs,admkf
and I want my regex to match only those commas which are prefixed by abb. How do I do that? Is it even possible using regexes?
If I use the regex abb, it matches the whole string abb, but I only want to match the comma after that.
I ask this because I wanted to use this regex in a split method which accepts a regex. If I pass abb, as the regex, it will consider the string abb, to be the delimiter and not the , which I want.
Any help would be greatly appreciated.
String test = "caa,abb,ksmf,fsksf,fkfs,admkf";
String regex = "(?<=abb),";
String[] split = test.split(regex);
for(String s : split){
System.out.println(s);
}
Output:
caa,abb
ksmf,fsksf,fkfs,admkf
See here for information:
https://www.regular-expressions.info/lookaround.html
I want to split a string by: "?/". My string is: hello?/hi/hello.
My code is:
String [] list=myString.split("/?/");
My output is: [HELLO,hi,hellow] but I want to see: [hello,hi/hello].
How can I do that?
You need to escape ? otherwise it is interpreted as a meta character.
The simplest pattern to meet your needs is:
String[] list = myString.split("\\?/");
If you're not familiar with regular expressions, you can let Pattern.quote() do the work for you: it accepts a string and escapes any pesky special characters that would otherwise break your literal split expression:
String[] list = myString.split(Pattern.quote("?/"));
Try this
String [] list = myString.split("\\?/");
Your regexp should rather be "\\?/" (? needs to be escaped with a \)
System.out.println(Arrays.toString("hello?/hi/hello".split("\\?/")))
the split mechanism takes regular expressions as input so you need to escape special characters with a double backslash (which will escape to a single backslash within the string)
String [] list = myString.split("\\?\\/");
I have a string of numbers that are a little weird. The source I'm pulling from has a non-standard formatting and I'm trying to switch from a .split where I need to specify an exact method to split on (2 spaces, 3 spaces, etc.) to a replaceall regex.
My data looks like this:
23574 123451 81239 1234 19274 4312457 1234719
I want to end up with
23574,xxxxx,xxxxx,xxxx
So I can just do a String.split on the ,
I will use \s Regex
This is its usage on Java
String[] numbers = myString.split("\\s+");
final Iterable<String> splitted = Splitter.on('').trimResults().omitEmptyStrings().split(input);
final String output = Joiner.on(',').join(splitted);
with Guava Splitter and Joiner
String pattern = "(\s+)";
Pattern regex = Pattern.compile(pattern);
Matcher match = r.matcher(inputString);
match.replaceAll(",");
String stringToSplit = match.toString();
I think that should do it for you. If not, googling for the Matcher and Pattern classes in the java api will be very helpful.
I understand this problem as a way to obtain integer numbers from a string with blank (not only space) separators.
The accepted solution does not work if the separator is a TAB \t for instance or if it has an \n at the end.
If we define an integer number as a sequence of digits, the best way to solve this is using a simple regular expression. Checking the Java 8 Pattern API, we can find that \D represents any non digit character:
\D A non-digit: [^0-9]
So if the String.split() method accepts a regular expression with the possible separators, it is easy to send "\\D+" to a trimmed string and get the result in one shot like this.
String source = "23574 123451 81239 1234 19274 4312457 1234719";
String trimmed = source.trim();
String[] numbers = trimmed.split("\\D+");
It is translated as split this trimmed string using any non digit character sequence as a possible separator.
i think it is a weird question. So here is my splitting:
String s = "asd#asd";
String[] raw1 = s.split("#"); // this has size of two raw[0] = raw[1] = "asd"
However,
String s = "asd$asd";
String[] raw2 = s.split("$"); // this has size of ONE
raw2 is not splitted. Does anyone know why?
Because split() takes a regexp, and $ indicates the end-of-line. If you need to split on a character that is actually a regexp metacharacter, then you'll need to escape it.
See Pattern for the regexp metacharacters.
You may find that StringTokenizer is more appropriate for your needs. This will take a list of characters that you should split on, and it won't interpret them as regular expression metacharacters. However it's a little more verbose and unweildy to use. As Nandkumar notes below, the latest docs states that it is discouraged in new code.
Because split() takes a regex and $ matches the end of a line.
You have to escape it :
s.split("\\$");
See Pattern documentation for more information on regexes.
You have to escape it:
String s = "asd$asd";
String[] raw2 = s.split("\\$"); // this has size of TWO
You need to escape special character, make it
s.split("\\$");