I've written some code which utilizes the split() method to return the first item delimited by periods. After a little testing I found that the array I split the string into has a length of 0 so I assume it's not splitting at all. It may be relevant that in some cases there is no period and I want the entire string returned. To compensate for this, I added a period onto the end of each String. See below:
longText=longText+".";
String tempName[]=longText.split(".");
String realName=tempName[0];
System.out.println(realName);
return realName;
The method String#split takes a regular expression as an argument. See Java Doc
The following: Split String on dot . as delimiter will help you
Related
So i have the following problem:
I have to tokenize a string using String.split() and the tokens must be in the form 07dd ddd ddd, where d is a digit. I thought of using the following regex : ^(07\\d{2}\\s\\d{3}\\d{3}) and pass it as an argument to String.split(). But for some reason, although i do have substrings under that form, it outputs the whole initial string and doesn't tokenize it.
I initially thought that it was using an empty string as a splitter, as an empty string indeed matches that regex, but even after I added & (.)+ to the regex in order to assure that the splitter hasn't got length 0, it still outputs the whole initial string.
I know that i could have used Pattern's and Matchers to solve it much faster, but i have to use String.split(). Any ideas why this happens?
A Few Pointers
Your pattern ^(07\d{2}\s\d{3}\d{3}) is missing a space between the two last groups of digits
The reason you get the whole string back is that this pattern was never found in the first place: there is no split
If you split on this pattern (once fixed), the resulting array will be strings that are in-between this pattern (these tokens are actually removed)
If you want to use this pattern (once fixed), you need a Match All not a Split. This will look like arrayOfMatches = yourString.match(/pattern/g);
If you want to split, you need to use a delimiter that is present between the token (this delimiter could in fact just be a zero-width position asserted by the 07 about to follow)
Further Reading
Match All and Split are Two Sides of the Same Coin
I don't see why does the following output makes sense.
String split method on an empty String returning an array of String with length 1
String[] split = "".split(",");
System.out.println(split.length);
Returns array of String with length 1
String[] split = "Java".split(",");
System.out.println(split.length);
Returns array of String with length 1
How to differentiate??
From the documentation:
The array returned by this method contains each substring of this string that is terminated by another substring that matches the given expression or is terminated by the end of the string.
To answer your question, it does what it is expected to do: the returned substring is terminated by the end of the input string (as there was no , to be found). The documentation also states:
If the expression does not match any part of the input then the resulting array has just one element, namely this string.
Note that this is a consequence of the first statement. It is not an additional circumstance that the Java developers added in case the search string could not be found.
I hit this, too. What it's returning is the string up to but not including the split character. If you want to get no strings, use StringTokenizer:
StringTokenizer st = new StringTokenizer(someString,',');
int numberOfSubstrings = st.countTokens();
It's returning the original string (which in this case is the empty string) since there was no , to split on.
It returns one because you are measuring the size of the split array, which contains one element: an empty string.
What is the difference between StringUtils.splitByWholeSeparatorPreserveAllTokens() and String.split()?
With splitByWholeSeparatorPreserveAllTokens, we could limit the number of parameters that are returned in an array. Is this the only difference?
java.lang.String.split();
Usage:
The array returned by this method contains each substring of this string that is terminated by another substring that matches the given expression or is terminated by the end of the string. The substrings in the array are in the order in which they occur in this string. If the expression does not match any part of the input then the resulting array has just one element, namely this string.
org.apache.commons.lang.StringUtils.splitPreserveAllTokens();
Usage:
Splits the provided text into an array, separator specified, preserving all tokens, including empty tokens created by adjacent separators. This is an alternative to using StringTokenizer.
Read more: kickjava_src_apache_StringUtils
and String.split() uses the final class Pattern to split.
Pattern.compile(regex).split(this , limit);
in StringUtils uses splitWorker(String str, char separatorChar, boolean preserveAllTokens) , it's own method, which is a Performance tune for 2.0 (JDK1.4).
I found folowing difference between String.split and splitByWholeSeparatorPreserveAllTokens
splitByWholeSeparatorPreserveAllTokens handles Null values where
String.split() doesn't
In splitByWholeSeparatorPreserveAllTokensAdjacent separators are
treated as separators for empty tokens.
I have some Strings. They contain some data. Example: "Alberto Macano. Here is description." And another example: "Pablo Don Carlo. Description here."
What I need: A method to split The Name from description. e.g getting the name in one string, and the description in another string. It woudl be easier if id know how much words will name contain, but it can contatin up to 5-6 words, so idk how mcuh will it be. Exact thing that i know, that a punct splits them.
You can use the .split(String regex) method to split the string into an array of strings. So for instance:
String line = "Alberto Macano. Here is description.";
String[] words = line.split("\\.");
The 'words' variable will contain the following:
{0}: Alberto Macano
{1}: Here is description
You might notice that there are two slashes before the period sign, this is because the period is a special keyword in regular expressions, so it has to be escaped by a slash. You might want to look at the Java Regex Documentation for more information.
Use the split(String regex) method in the String class to obtain an array of String objects by splitting a String up based on some regular expression.
[String.split][1] will give you an array of Strings divided on regular expression matches. There's a summary of regular expression constructs in the java.util.regex.Pattern API here.
I can't see a reason why the Matcher would return a match on the pattern, but split will return a zero length array on the same regex pattern. It should return something -- in this example I'm looking for a return of 2 separate strings containing "param/value".
public class MyClass {
protected Pattern regEx = "(([a-z])+/{1}([a-z0-9])+/?)*";
public void someMethod() {
String qs = "param/value/param/value";
Matcher matcherParamsRegEx = this.regEx.matcher(qs);
if (matcherParamsRegEx.matches()) { // This finds a match.
String[] parameterValues = qs.split(this.regEx.pattern()); // No matches... zero length array.
}
}
}
The pattern can match the entire string. split() doesn't return the match, only what's in between. Since the pattern matches the whole string that only leaves an empty string to return. I think you might be under a misconception as to what split() does.
For example:
String qs = "param/value/param/value";
String pieces = qs.split("/");
will return an array of 4 elements: param, value, param, value.
Notice that what you search on ("/") isn't returned.
Your regex is somewhat over-complicated. For one thing you're using {1}, which is unnecessary. Second, when you do ([a-z])+ you will capture exactly one latter (the last one encountered. Compare that to ([a-z]+), which will capture the entire match. Also, you don't even need to capture for this. The pattern can be simplified to:
protected Pattern regEx = Pattern.compile("[a-z]+/([a-z0-9]+/?)*");
Technically this:
protected Pattern regEx = "(([a-z])+/{1}([a-z0-9])+/?)*";
is a compiler error, so what you actually ran versus what you posted could be anything.
The problem here is that split splits around matches of your regex. You have two consecutive matches with nothing else in between, so there is nothing left for split to return.
I can't see any way for you to get what you want from that string using split, but if you can use a different delimiter to separate pairs than you do to separate name and value, that will help a lot.
Otherwise, you might split on slashes and take alternating results as names and values, but this is error-prone.
The regex is matching--if it weren't, you would get a one-element array, that element being the whole original string. You just have the wrong idea about how split() works. On the first match attempt it finds "param/value/" and stores everything preceding that match as the first token: an empty string. The second attempt finds "param/value" and stores whatever lay between it and the first match as the next token: another empty string. The third match attempt fails, so whatever was between the second match and the end of the string becomes the final token: yet another empty string.
Having stored all the tokens, split() iterates through them in reverse, checking for trailing empty tokens. The third token is indeed empty, so it deletes that one. The second one is also empty, so it deletes that one. You see where this is going? You can force split() to preserve trailing empty matches by passing a negative integer as the second argument, but that obviously doesn't do you any good. You need to rethink your problem (whatever it is) in terms of how the regex package actually works.