Java StringTokenizer, empty null tokens - java

I am trying to split a string into 29 tokens..... stringtokenizer won't return null tokens. I tried string.split, but I believe I am doing something wrong:
String [] strings = line.split(",", 29);
sample inputs:
10150,15:58,23:58,16:00,00:00,15:55,23:55,15:58,00:01,16:03,23:58,,,,,16:00,23:22,15:54,00:03,15:59,23:56,16:05,23:59,15:55,00:01,,,,
10155,,,,,,,,,,,07:30,13:27,07:25,13:45,,,,,,,,,,,07:13,14:37,08:01,15:23
10160,10:00,16:02,09:55,16:03,10:06,15:58,09:48,16:07,09:55,16:00,,,,,09:49,15:38,10:02,16:04,10:00,16:00,09:58,16:01,09:57,15:58,,,,

If you want the trailing empty strings to be kept, but you don't want to give a magic number for maximum, use a negative limit:
line.split(",", -1)
If line.equals("a,,c"), then line.split(",", -1)[1].isEmpty(); it's not null. This is because when "," is the delimiter, then ",," has an empty string between the two delimiters, not null.
Example:
Using the explanation above, consider the following example: ",,"
Although you might expect ",", null, and ",".
The actual result is ",", "" and ","
If you want null instead of empty strings in the array returned by split, then you'd have to manually scan the array and replace them with null. I'm not sure why s == null is better than s.isEmpty(), though.
See also
Java String.indexOf and empty strings

Use StringUtils.splitPreserveAllTokens() in Apache Commons Lang library

If you want empty tokens to be retained string.split won't work satisfactorily. StringTokenizer will also no work.
I have come with following method, which might be helpful for you
public static String[] splitTotokens(String line, String delim){
String s = line;
int i = 0;
while (s.contains(delim)) {
s = s.substring(s.indexOf(delim) + delim.length());
i++;
}
String token = null;
String remainder = null;
String[] tokens = new String[i];
for (int j = 0; j < i; j++) {
token = line.substring(0, line.indexOf(delim));
//System.out.print("#" + token + "#");
tokens[j] = token;
remainder = line.substring(line.indexOf(delim) + delim.length());
//System.out.println("#" + remainder + "#");
line = remainder;
}
return tokens;`
}

use this org.springframework.util.StringUtils
org.springframework.util.StringUtils.delimitedListToStringArray(data, delimit);
This class delivers some simple functionality provides easy-to-use methods to convert between delimited strings, such as CSV strings, and collections and arrays.

If you want empty tokens to be retained string.split() won't work satisfactorily. StringTokenizer will also not work. I have come with following method, which might be helpful for you:
public static String[] splitTotokens(String line, String delim){
String s = line;
int i = 0;
while (s.contains(delim)) {
s = s.substring(s.indexOf(delim) + delim.length());
i++;
}
String token = null;
String remainder = null;
String[] tokens = new String[i];
for (int j = 0; j < i; j++) {
token = line.substring(0, line.indexOf(delim));
// System.out.print("#" + token + "#");
tokens[j] = token;
remainder = line.substring(line.indexOf(delim) + delim.length());
//System.out.println("#" + remainder + "#");
line = remainder;
}
return tokens;
}

Related

split a string when there is a change in character without a regular expression

There is a way to split a string into repeating characters using a regex function but I want to do it without using it.
for example, given a string like: "EE B" my output will be an array of strings e.g
{"EE", " ", "B"}
my approach is:
given a string I will first find the number of unique characters in a string so I know the size of the array. Then I will change the string to an array of characters. Then I will check if the next character is the same or not. if it is the same then append them together if not begin a new string.
my code so far..
String myinput = "EE B";
char[] cinput = new char[myinput.length()];
cinput = myinput.toCharArray(); //turn string to array of characters
int uniquecha = myinput.length();
for (int i = 0; i < cinput.length; i++) {
if (i != myinput.indexOf(cinput[i])) {
uniquecha--;
} //this should give me the number of unique characters
String[] returninput = new String[uniquecha];
Arrays.fill(returninput, "");
for (int i = 0; i < uniquecha; i++) {
returninput[i] = "" + myinput.charAt(i);
for (int j = 0; j < myinput.length - 1; j++) {
if (myinput.charAt(j) == myinput.charAt(j + 1)) {
returninput[j] += myinput.charAt(j + 1);
} else {
break;
}
}
} return returninput;
but there is something wrong with the second part as I cant figure out why it is not beginning a new string when the character changes.
You question says that you don't want to use regex, but I see no reason for that requirement, other than this is maybe homework. If you are open to using regex here, then there is a one line solution which splits your input string on the following pattern:
(?<=\S)(?=\s)|(?<=\s)(?=\S)
This pattern uses lookarounds to split whenever what precedes is a non whitespace character and what proceeds is a whitespace character, or vice-versa.
String input = "EE B";
String[] parts = input.split("(?<=\\S)(?=\\s)|(?<=\\s)(?=\\S)");
System.out.println(Arrays.toString(parts));
[EE, , B]
^^ a single space character in the middle
Demo
If I understood correctly, you want to split the characters in a string so that similar-consecutive characters stay together. If that's the case, here is how I would do it:
public static ArrayList<String> splitString(String str)
{
ArrayList<String> output = new ArrayList<>();
String combo = "";
//iterates through all the characters in the input
for(char c: str.toCharArray()) {
//check if the current char is equal to the last added char
if(combo.length() > 0 && c != combo.charAt(combo.length() - 1)) {
output.add(combo);
combo = "";
}
combo += c;
}
output.add(combo); //adds the last character
return output;
}
Note that instead of using an array (has a fixed size) to store the output, I used an ArrayList, which has a variable size. Also, instead of checking the next character for equality with the current one, I preferred to use the last character for that. The variable combo is used to temporarily store the characters before they go to output.
Now, here is one way to print the result following your guidelines:
public static void main(String[] args)
{
String input = "EEEE BCD DdA";
ArrayList<String> output = splitString(input);
System.out.print("[");
for(int i = 0; i < output.size(); i++) {
System.out.print("\"" + output.get(i) + "\"");
if(i != output.size()-1)
System.out.print(", ");
}
System.out.println("]");
}
The output when running the above code will be:
["EEEE", " ", "B", "C", "D", " ", "D", "d", "A"]

Get index of 3rd comma

For example I have this string params: Blabla,1,Yooooooo,Stackoverflow,foo,chinese
And I want to get the string testCaseParams until the 3rd comma: Blabla,1,Yooooooo
and then remove it and the comma from the original string so I get thisStackoverflow,foo,chinese
I'm trying this code but testCaseParams only shows the first two values (gets index of the 2nd comma, not 3rd...)
//Get how many parameters this test case has and group the parameters
int amountOfInputs = 3;
int index = params.indexOf(',', params.indexOf(',') + amountOfInputs);
String testCaseParams = params.substring(0,index);
params = params.replace(testCaseParams + ",","");
You can hold the index of the currently-found comma in a variable and iterate until the third comma is found:
int index = 0;
for (int i=0; i<3; i++) index = str.indexOf(',', index);
String left = str.substring(0, index);
String right = str.substring(index+1); // skip comma
Edit: to validate the string, simply check if index == -1. If so, there are not 3 commas in the string.
One option would be a clever use of String#split:
String input = "Blabla,1,Yooooooo,Stackoverflow,foo,chinese";
String[] parts = input.split("(?=,)");
String output = parts[0] + parts[1] + parts[2];
System.out.println(output);
Demo
One can use split with a limit of 4.
String input = "Blabla,1,Yooooooo,Stackoverflow,foo,chinese";
String[] parts = input.split(",", 4);
if (parts.length == 4) {
String first = parts[0] + "," + parts[1] + "," + parts[2];
String second = parts[3]; // "Stackoverflow,foo,chinese"
}
You can split with this regex to get the 2 pats:
String[] parts = input.split("(?<=\\G.*,.*,.*),");
It will result in parts equal to:
{ "Blabla,1,Yooooooo", "Stackoverflow,foo,chinese" }
\\G refers to the previous match or the start of the string.
(?<=) is positive look-behind.
So it means match a comma for splitting, if it is preceded by 2 other commas since the previous match or the start of the string.
This will keep empty strings between commas.
I offer this here just as a "fun" one line solution:
public static int nthIndexOf(String str, String c, int n) {
return str.length() - str.replace(c, "").length() < n ? -1 : n == 1 ? str.indexOf(c) : c.length() + str.indexOf(c) + nthIndexOf(str.substring(str.indexOf(c) + c.length()), c, n - 1);
}
//Usage
System.out.println(nthIndexOf("Blabla,1,Yooooooo,Stackoverflow,foo,chinese", ",", 3)); //17
(It's recursive of course, so will blow up on large strings, it's relatively slow, and certainly isn't a sensible way to do this in production.)
As a more sensbile one liner using a library, you can use Apache commons ordinalIndexOf(), which achieves the same thing in a more sensible way!

Split starting alphabetical characters from numeric characters

I want to split a string so that I get starting alphabetical string(until the first numeric digit occured). And the other alphanumeric string.
E.g.:
I have a string forexample: Nesc123abc456
I want to get following two strings by splitting the above string: Nesc, 123abc456
What I have tried:
String s = "Abc1234avc";
String[] ss = s.split("(\\D)", 2);
System.out.println(Arrays.toString(ss));
But this just removes the first letter from the string.
You could maybe use lookarounds so that you don't consume the delimiting part:
String s = "Abc1234avc";
String[] ss = s.split("(?<=\\D)(?=\\d)", 2);
System.out.println(Arrays.toString(ss));
ideone demo
(?<=\\D) makes sure there's a non-digit before the part to be split at,
(?=\\d) makes sure there's a digit after the part to be split at.
You need the quantifier.
Try
String[] ss = s.split("(\\D)*", 2);
More information here: http://docs.oracle.com/javase/tutorial/essential/regex/quant.html
Didn't you try replaceAll?
String s = ...;
String firstPart = s.replaceAll("[0-9].*", "");
String secondPart = s.substring(firstPart.length());
You can use:
String[] arr = "Nesc123abc456".split("(?<=[a-zA-Z])(?![a-zA-Z])", 2);
//=> [Nesc, 123abc456]
split is a destructive process so you would need to find the index of the first numeric digit and use substrings to get your result. This would also probably be faster than using a regex since those have a lot more heuristics behind them
int split = string.length();
for(int i = 0; i < string.length(); i ++) {
if (Character.isDigit(string.charAt(i)) {
split = i;
break;
}
}
String[] parts = new String[2];
parts[0] = string.substring(0, split);
parts[1] = string.substring(split);
I think this is what you asked:
String s = "Abc1234avc";
String numbers = "";
String chars = "";
for(int i = 0; i < s.length(); i++){
char c = s.charAt(i);
if(Character.isDigit(c)){
numbers += c + "";
}
else {
chars += c + "";
}
}
System.out.println("Numbers: " + numbers + "; Chars: " + chars);

How to split the string in java?

String str = "AlwinX-road-9:00pm-kanchana travels-25365445421";
String[] names = str.split("-");
I want output like following:
AlwinX-road
9:00pm
kanchana travels
25365445421
Use pattern matching to match your requirement
String str = "AlwinX-road-9:00pm-kanchana travels-25365445421";
String regex = "(^[A-Z-a-z ]+)[-]+(\\d+:\\d+pm)[-]([a-z]+\\s+[a-z]+)[-](\\d+)";
Matcher matcher = Pattern.compile( regex ).matcher( str);
while (matcher.find( ))
{
String roadname = matcher.group(1);
String time = matcher.group(2);
String travels = matcher.group(3);
String digits= matcher.group(4);
System.out.println("time="+time);
System.out.println("travels="+travels);
System.out.println("digits="+digits);
}
Since you want to include the delimiter in your first output line, you can do the split, and merge the first two element with a -: -
String[] names = str.split("-");
System.out.println(names[0] + "-" + names[1])
for (int i = 2;i < names.length; i++) {
System.out.println(names[i])
}
The split() method can't distinguish the dash in AlwinX-road and the other dashes in the string, it treats all the dashes the same. You will need to do some sort of post processing on the resulting array. If you will always need the first two strings in the array joined you can just do that. If your strings are more complex you will need to add additional logic to join the strings in the array.
One way you could do it, assuming the first '-' is always part of a two part identifier.
String str = "AlwinX-road-9:00pm-kanchana travels-25365445421";
String[] tokens = str.split("-");
String[] output = new String[tokens.length - 1];
output[0] = tokens[0] + '-' + tokens[1];
System.out.println(output[0]);
for(int i = 1; i < output.length; i++){
output[i] = tokens[i+1];
System.out.println(output[i]);
}
Looks like you want to split (with removal of all dashes but the first one).
String str = "AlwinX-road-9:00pm-kanchana travels-25365445421";
String[] names = str.split("-");
for (String value : names)
{
System.out.println(value);
}
So its produces:
AlwinX
road
9:00pm
kanchana travels
25365445421
Notice that "AlwinX" and "road" we split as well since they had a dash in between. So you will need custom logic to handle this case. here is an example how to do it (I used StringTokenizer):
StringTokenizer tk = new StringTokenizer(str, "-", true);
String firstString = null;
String secondString = null;
while (tk.hasMoreTokens())
{
final String token = tk.nextToken();
if (firstString == null)
{
firstString = token;
continue;
}
if (secondString == null && firstString != null && !token.equals("-"))
{
secondString = token;
System.out.println(firstString + "-" + secondString);
continue;
}
if (!token.equals("-"))
{
System.out.println(token);
}
}
This will produce:
AlwinX-road
9:00pm
kanchana travels
25365445421
from your format, I think you want to split the first one just before the time part. You can do it this way:
String str =yourString;
String beforetime=str.split("-\\d+:\\d+[ap]m")[0]; //this is your first token,
//AlwinX-road in your example
String rest=str.substring(beforetime.length()+1);
String[] restNames=rest.split("-");
If you really need it all together in one array then see the code below:
String[] allTogether=new String[restNames.length+1];//the string with all your tokens
allTogether[0]=beforetime;
System.arraycopy(restNames, 0, allTogether, 1, restNames.length);
If you use "_" as a separator instead of "-": AlwinX-road_9:00pm_kanchana travels_25365445421
New code:
String str = new String("AlwinX-road_9:00pm_kanchana travels_25365445421");
String separator = new String("_");
String[] names = str.split(separator);
for(int i=0; i<names.length; i++){
System.out.println(names[i]);
}

Java split string on third comma

I have a string that I need to be split into 2. I want to do this by splitting at exactly the third comma.
How do I do this?
Edit
A sample string is :
from:09/26/2011,type:all,to:09/26/2011,field1:emp_id,option1:=,text:1234
The string will keep the same format - I want everything before field in a string.
If you're simply interested in splitting the string at the index of the third comma, I'd probably do something like this:
String s = "from:09/26/2011,type:all,to:09/26/2011,field1:emp_id,option1:=,text:1234";
int i = s.indexOf(',', 1 + s.indexOf(',', 1 + s.indexOf(',')));
String firstPart = s.substring(0, i);
String secondPart = s.substring(i+1);
System.out.println(firstPart);
System.out.println(secondPart);
Output:
from:09/26/2011,type:all,to:09/26/2011
field1:emp_id,option1:=,text:1234
Related question:
How to find nth occurrence of character in a string?
a naive implementation
public static String[] split(String s)
{
int index = 0;
for(int i = 0; i < 3; i++)
index = s.indexOf(",", index+1);
return new String[] {
s.substring(0, index),
s.substring(index+1)
};
}
This does no bounds checking and will throw all sorts of lovely exceptions if not given input as expected. Given "ABCD,EFG,HIJK,LMNOP,QRSTU" returns ["ABCD,EFG,HIJK","LMNOP,QRSTU"]
You can use this regex:
^([^,]*,[^,]*,[^,]*),(.*)$
The result is then in the two captures (1 and 2), not including the third comma.

Categories

Resources