how to parse empty tokens from a comma-separated data? - java

I have some large files with comma separated data in them. Something like:
firstname,middlename,lastname
James,Tiberius,Kirk
Mister,,Spock
Leonard,,McCoy
I'm using a StringTokenizer to parse the data:
StringTokenizer st = new StringTokenizer(sLine, ",");
while (st.hasMoreTokens()) {
String sTok = st.nextTokens;
tokens.add(tok);
}
The problem is, on lines with no middle name, I only get two tokens, { "Mister", "Spock" }, but I want three tokens, { "Mister, "", "Spock" }
QUESTION: How do I get empty tokens included when parsing my comma separated data?
Thanks!

You can use the String#split(String regex) method.
String[] split = sLine.split(",");
for (String s : split) {
System.out.println("S = " + s); //Note there will be one empty S
tokens.add(s);
}

Use split(",") instead of a StringTokenizer:
String[] aux = sLine.split(",");
for(int i = 0; i < aux.length; i++) {
String sTok = aux[i];
tokens.add(sTok);
}
You can see in the documentation that StringTokenizer is a legacy class and is only kept for retro-compatibility:
http://docs.oracle.com/javase/7/docs/api/java/util/StringTokenizer.html

Use split method, but pass -1 as the second argument to keep empty strings
sLine.split(",", -1);

Consider the use of Splitter of Guava Splitter
And you can create an splitter with or without omit empty Strings.
//Example without omit empty Strings (default)
Splitter splitterByComma = Splitter.on(",");
Iterable<String> split = splitterByComma.split("Mister,,Spock");
//Example omitting empty Strings
Splitter splitterByComma = Splitter.on(",").omitEmptyStrings();
Iterable<String> split = splitterByComma.split("Mister,,Spock");

Related

Split string with blankline

I pull string from textarea by .gettext() and it have a blankline.
test1111
test222
I need to split this String and keep into array, then array[0]=test1111
array[1]=test222
How can I do?
Something like this should work:
String[] lines = text.split("\\s*\\n+\\s*");
Or better if you're using Java 8 (per Pshemo), use "\\R+"
This will skip over multiple blank lines, or lines filled with white space, and should trim leading and ending whitespace as well.
String[] lines = jtextFieldName_you_used.getText().split("\\n");
It is to store textarea elements to array.Hope you find this helpful.
Try this.
String str = abc.getText();
for (String retval: str.split(" ")){ System.out.println(retval); }
If i understood your task right:
final String[] lines = data.split("\\n");
final String results[] = new String[lines.length];
int offset = 0;
for (String line : lines) {
results[offset] = line.split("\\s")[1];
offset++;
}
results:
test1111
blankline
test222
p.s: without data checks

StringTokenizer in java. Why is it adding one more space

I am using jdk 1.6 (it is older but ok). I have a function like this:
public static ArrayList gettokens(String input, String delim)
{
ArrayList tokenArray = new ArrayList();
StringTokenizer tokens = new StringTokenizer(input, delim);
while (tokens.hasMoreTokens())
{
tokenArray.add(tokens.nextToken());
}
return tokenArray;
}
My initial intention is to use tokens to clear the input string of duplicate emails (that is initial).
Let's say I have
input = ", email-1#email.com, email-2#email.com, email-3#email.com"; //yes with , at the beginning
delim = ";,";
And when I run above function the result is:
[email-1#email.com, email-2#email.com, email-3#email.com]
Which is fine, but there is added one more space between , and email .
Why is that? and how to fix it?
Edit:
here is the function that prints the output:
List<String> tokens = StringUtility.gettokens(email, ";,");
Set<String> emailSet = new LinkedHashSet<String>(tokens);
emails = StringUtils.join(emailSet, ", ");
hehe, and now I see the answer.
Edit 2 - the root cause:
the root cause of the problem was that line of the code:
emails = StringUtils.join(emailSet, ", ");
Was adding an extra ", " when joining tokens.
From the example above, one token would look like this " email-1#email.com" and when join in applied it will add comma and space before token. So if a token has a space at the beginning of the string, then it will have two spaces between comma and space.
Example:
", " + " email-1#email.com" = ",<space><space>email-1#email.com"
When printing array list, it prints all the object comma and space separated. Your input also have a space before each comma so that causes two.
You can use:
tokenArray.add(tokens.nextToken().trim());
to remove unwanted spaces from your input.
You've got spaces in your string, and ArrayList's implementation of toString adds a space before each element. The idea is that you've got a list of "x", "y" and "z", the output should be "[x, y, z]" rather than "[x,y,z]"
Your real problem probably is that you've kept the spaces in the tokens. Fix:
public static List<String> gettokens(String input, String delim)
{
ArrayList<String> tokenArray = new ArrayList<String>();
StringTokenizer tokens = new StringTokenizer(input, delim);
while (tokens.hasMoreTokens())
{
tokenArray.add(tokens.nextToken().trim());
}
return tokenArray;
}
You can change the delim to include the sapce ", " then it would not be conatined in the tokens elements.
Easier would be to use the split() method which returns a string array, so basically the method will look like:
public static ArrayList gettokens(String input, string delim)
{
return Arrays.asList(input.split(delim));
}
I think it would be a better approach to use split method of String, just because it would be shorter. All you would need to do is :
String[] values = input.split(delim);
It will return an array instead of a List.
The reason of your space is because you are adding it in your printing method.
List<String> tokens = StringUtility.gettokens(email, ";,");
Set<String> emailSet = new LinkedHashSet<String>(tokens);
emails = StringUtils.join(emailSet, ", "); //adds a space after a comma
So StringTokenizer works as expected.
In your case, without much modifying the code, you could use trim function to clear the spaces before removing duplicates, and then join with separator ", " like this:
tokenArray.add(tokens.nextToken().trim());
And you will get result without two spaces.
There is no space or comma in between.
Try to print your ArrayList as:
for(Object obj: tokenArray )
System.out.println(obj);

Java. How to remove white space on array

For example
I split a string "+name" by +. I got an white space" " and the "name" in the array(this doesn't happen if my string is "name+").
t="+name";
String[] temp=t.split("\\+");
the above code produces
temp[0]=" "
temp[1]=name
I only wants to get "name" without whitespace..
Also if t="name+" then temp[0]=name. I'm wondering what is difference between name+ and +name. Why do I get different output.
simply loop thru the items in array like the one below and remove white space
for (int i = 0; i < temp.length; i++){
temp[i] = if(!temp[i].trim().equals("") || temp[i]!=null)temp[i].trim();
}
The value of the first array item is not a space (" ") but an empty string (""). The following snippet demonstrates the behaviour and provides a workaround: I simply strip leading delimiters from the input. Note, that this should never be used for processing csv files, because a leading delimiter will create an empty column value which is usually wanted.
for (String s : "+name".split("\\+")) {
System.out.printf("'%s'%n", s);
}
System.out.println();
for (String s : "name+".split("\\+")) {
System.out.printf("'%s'%n", s);
}
System.out.println();
for (String s : "+name".replaceAll("^\\+", "").split("\\+")) {
System.out.printf("'%s'%n", s);
}
You get the extra element for "+name"'s case is because of non-empty value "name" after the delimiter.
The split() function only "trims" the trailing delimiters that result to empty elements at the end of an array. See JavaSE Manual.
Examples of .split("\\+") output:
"+++++" = { } // zero length array because all are trailing delimiters
"+name+" = { "", "name" } // trailing delimiter removed
"name+++++" = { "name" } // trailing delimiter removed
"name+" = { "name" } // trailing delimiter removed
"++name+" = { "", "", "name" } // trailing delimiter removed
I would suggest preventing to have those extra delimiters on both ends rather than cleaning up afterwards.
to remove white space
str.replaceAll("\\W","").
String yourString = "name +";
yourString = yourString.replaceAll("\\W", "");
yourArray = yourString.split("\\+");
For a one liner :
String temp[] = t.replaceAll("(^\\++)?(\\+)?(\\+*)?", "$2").split("\\+");
This will replace all multiple plus signs by one, or a plus sign at the start by empty String, and then split on plus signs.
Which will basically eliminate empty Strings in the result.
split(String regex) is equivalent to split(String regex, int limit) with limit = 0. And the documentation of the latter states :
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
Which is why a '+' at the start works differently than a '+' at the end
You might want to give guavas Splitter a try. It has a nice fluent api to deal with emptyStrings, trim(), etc.
#Test
public void test() {
final String t1 = "+name";
final String t2 = "name+";
assertThat(split(t1), hasSize(1));
assertThat(split(t1).get(0), is("name"));
assertThat(split(t2), hasSize(1));
assertThat(split(t2).get(0), is("name"));
}
private List<String> split(final String sequence) {
final Splitter splitter = Splitter.on("+").omitEmptyStrings().trimResults();
return Lists.newArrayList(splitter.split(sequence));
}

How to prevent java.lang.String.split() from creating a leading empty string?

passing 0 as a limit argument prevents trailing empty strings, but how does one prevent leading empty strings?
for instance
String[] test = "/Test/Stuff".split("/");
results in an array with "", "Test", "Stuff".
Yeah, I know I could roll my own Tokenizer... but the API docs for StringTokenizer say
"StringTokenizer is a legacy class that is retained for compatibility
reasons although its use is discouraged in new code. It is recommended
that anyone seeking this functionality use the split"
Your best bet is probably just to strip out any leading delimiter:
String input = "/Test/Stuff";
String[] test = input.replaceFirst("^/", "").split("/");
You can make it more generic by putting it in a method:
public String[] mySplit(final String input, final String delim)
{
return input.replaceFirst("^" + delim, "").split(delim);
}
String[] test = mySplit("/Test/Stuff", "/");
Apache Commons has a utility method for exactly this: org.apache.commons.lang.StringUtils.split
StringUtils.split()
Actually in our company we now prefer using this method for splitting in all our projects.
I don't think there is a way you could do this with the built-in split method. So you have two options:
1) Make your own split
2) Iterate through the array after calling split and remove empty elements
If you make your own split you can just combine these two options
public List<String> split(String inString)
{
List<String> outList = new ArrayList<>();
String[] test = inString.split("/");
for(String s : test)
{
if(s != null && s.length() > 0)
outList.add(s);
}
return outList;
}
or you could just check for the delimiter being in the first position before you call split and ignore the first character if it does:
String delimiter = "/";
String delimitedString = "/Test/Stuff";
String[] test;
if(delimitedString.startsWith(delimiter)){
//start at the 1st character not the 0th
test = delimitedString.substring(1).split(delimiter);
}
else
test = delimitedString.split(delimiter);
I think you shall have to manually remove the first empty string. A simple way to do that is this -
String string, subString;
int index;
String[] test;
string = "/Test/Stuff";
index = string.indexOf("/");
subString = string.substring(index+1);
test = subString.split("/");
This will exclude the leading empty string.
I think there is no built-in function to remove blank string in Java. You can eliminate blank deleting string but it may lead to error. For safe you can do this by writing small piece of code as follow:
List<String> list = new ArrayList<String>();
for(String str : test)
{
if(str != null && str.length() > 0)
{
list.add(str);
}
}
test = stringList.toArray(new String[list.size()]);
When using JDK8 and streams, just add a skip(1) after the split. Following sniped decodes a (very wired) hex encoded string.
Arrays.asList("\\x42\\x41\\x53\\x45\\x36\\x34".split("\\\\x"))
.stream()
.skip(1) // <- ignore the first empty element
.map(c->""+(char)Integer.parseInt(c, 16))
.collect(Collectors.joining())
You can use StringTokenizer for this purpose...
String test1 = "/Test/Stuff";
StringTokenizer st = new StringTokenizer(test1,"/");
while(st.hasMoreTokens())
System.out.println(st.nextToken());
This is how I've gotten around this problem. I take the string, call .toCharArray() on it to split it into an array of chars, and then loop through that array and add it to my String list (wrapping each char with String.valueOf). I imagine there's some performance tradeoff but it seems like a readable solution. Hope this helps!
char[] stringChars = string.toCharArray();
List<String> stringList = new ArrayList<>();
for (char stringChar : stringChars) {
stringList.add(String.valueOf(stringChar));
}
You can only add statement like if(StringUtils.isEmpty(string)) continue; before print the string. My JDK version 1.8, no Blank will be printed.
5
this
program
gives
me
problems

Adding comma separated strings to an ArrayList and vice versa

How to add a comma separated string to an ArrayList? My string "returnedItems" could hold 1 or 20 items which I'd like to add to my ArrayList "selItemArrayList".
After the ArrayList has been populated, I'd like to later iterate through it and format the items into a comma separated string with no spaces between the items.
String returnedItems = "a,b,c";
List<String> sellItems = Arrays.asList(returnedItems.split(","));
Now iterate over the list and append each item to a StringBuilder:
StringBuilder sb = new StringBuilder();
for(String item: sellItems){
if(sb.length() > 0){
sb.append(',');
}
sb.append(item);
}
String result = sb.toString();
One-liners are always popular:
Collections.addAll(arrayList, input.split(","));
split and asList do the trick:
String [] strings = returnedItems.split(",");
List<String> list = Arrays.asList(strings);
Simple one-liner:
selItemArrayList.addAll(Arrays.asList(returnedItems.split("\\s*,\\s*")));
Of course it will be more complex if you have entries with commas in them.
This can help:
for (String s : returnedItems.split(",")) {
selItemArrayList.add(s.trim());
}
//Shorter and sweeter
String [] strings = returnedItems.split(",");
selItemArrayList = Arrays.asList(strings);
//The reverse....
StringBuilder sb = new StringBuilder();
Iterator<String> iter = selItemArrayList.iterator();
while (iter.hasNext()) {
if (sb.length() > 0)
sb.append(",");
sb.append(iter.next());
}
returnedItems = sb.toString();
If the strings themselves can have commas in them, things get more complicated. Rather than rolling your own, consider using one of the many open-source CSV parsers. While they are designed to read in files, at least OpenCSV will also parse an individual string you hand it.
Commons CSV
OpenCSV
Super CSV
OsterMiller CSV
If the individual items aren't quoted then:
QString str = "a,,b,c";
QStringList list1 = str.split(",");
// list1: [ "a", "", "b", "c" ]
If the items are quoted I'd add "[]" characters and use a JSON parser.
You could use the split() method on String to convert the String to an array that you could loop through.
Although you might be able to skip the looping and parsing with a regular expression to remove the spaces using replaceAll() on a String.
String list = "one, two, three, four";
String[] items = list.split("\\p{Punct}");
List<String> aList = Arrays.asList(items);
System.out.println("aList = " + aList);
StringBuilder formatted = new StringBuilder();
for (int i = 0; i < items.length; i++)
{
formatted.append(items[i].trim());
if (i < items.length - 1) formatted.append(',');
}
System.out.println("formatted = " + formatted.toString());
import com.google.common.base.*;
Iterable<String> items = Splitter.on(",").omitEmptyStrings()
.split("Mango,Apple,Guava");
// Join again!
String itemsString = Joiner.join(",").join(items);
String csv = "Apple, Google, Samsung";
step one : converting comma separate String to array of String
String[] elements = csv.split(",");
step two : convert String array to list of String
List<String> fixedLenghtList = Arrays.asList(elements);
step three : copy fixed list to an ArrayList
ArrayList listOfString = new ArrayList(fixedLenghtList);
System.out.println("list from comma separated String : " + listOfString);
System.out.println("size of ArrayList : " + listOfString.size());
Output :
list of comma separated String : [Apple, Google, Samsung]
size of ArrayList : 3

Categories

Resources