split method leaving space in array - java

{
ArrayList<String> node_array = new ArrayList<String>();
String allValues[] = node.split("[(,)]");
for(String value : allValues){
node_array.add(value);
}
node is a string, for example: (3,4,5,6,3)
for some reason when I verify the content of the arraylist the split seems to leave a trail of space as elements, specifically where ( and ) is supposed to be. What am I doing wrong?

You're asking split() to split at parentheses and commas. In your string, there is a blank substring right before the first separator, the opening parenthesis. split() is keeping that blank substring and returning it at the zeroth element of the resulting array.
There are plenty of examples in the documentation that illustrate how the function works.
To work around this, you can either ignore the empty strings, or flip the regex on its head and match the numbers instead of splitting at the punctuation characters.

You have defined a separator to be the one of the characters that's the first character in your String, so an empty string "" will show up in your ArrayList, because that what occurs before the first separator. However, for your application you can easily fix it like this:
ArrayList<String> node_array = new ArrayList<String>();
String allValues[] = node.split("[(,)]");
for(String value : allValues){
if(!value.equals("")) node_array.add(value);
}
return node_array;

node.replace("(","").replace(")","").split(",");
or
node.substring(1,node.length()-1).split(",");

Related

Difficulty splitting string at delimiter and keeping it

I have a string that is read in pairs, separated by comma. However, I do not always want to split at the comma because there is not always 1 comma in the input. For example, the string,
(http://www.wolframalpha.com/input/?i=103%2F30+%3D+4a-3b,+71%2F60+%3D+a+%2B+b
,http://www.wolframalpha.com/input/?i=x%5E2%2B5x%2B6,file:///tmp/foo/bar/p,d,f.pdf)
Is read in all one line. For this case, I only want to split at the ,h, and no where else in the string. Essentially, after the split, the strings should be:
http://www.wolframalpha.com/input/?i=103%2F30+%3D+4a-3b,+71%2F60+%3D+a+%2B+b
http://www.wolframalpha.com/input/?i=x%5E2%2B5x%2B6
file:///tmp/foo/bar/p,d,f.pdf
Maintaining the order of the comma in the first string. (I will get rid of parenthesis). I have looked at this stack overflow question, and while helpful, does not correctly split this string. This is in Java. Any help is appreciated.
You can use regex to do the split. Please see below code snippet.
String str = "(http://www.wolframalpha.com/input/?i=103%2F30+%3D+4a-3b,+71%2F60+%3D+a+%2B+b,http://www.wolframalpha.com/input/?i=x%5E2%2B5x%2B6)";
String[] strArr = str.split("(,(?=http))");
You will have Array of all the value which would be possible according to your requirement.
Split on 'http' then re-add it.
Psuedo-code
String input = "http://www.wolframalpha.com/input/?i=103%2F30+%3D+4a-3b,+71%2F60+%3D+a+%2B+b
,http://www.wolframalpha.com/input/?i=x%5E2%2B5x%2B6"
List<String> split = input.split('http');
List<String> finalList = new ArrayList<String>();
for(String fixup in split)
{
finalList.put( "http" + fixup );
}
Final should contain the two URLs.

How to get the desired character from the variable sized strings?

I need to extract the desired string which attached to the word.
For example
pot-1_Sam
pot-22_Daniel
pot_444_Jack
pot_5434_Bill
I need to get the names from the above strings. i.e Sam, Daniel, Jack and Bill.
Thing is if I use substring the position keeps on changing due to the length of the number. How to achieve them using REGEX.
Update:
Some strings has 2 underscore options like
pot_US-1_Sam
pot_RUS_444_Jack
Assuming you have a standard set of above formats, It seems you need not to have any regex, you can try using lastIndexOf and substring methods.
String result = yourString.substring(yourString.lastIndexOf("_")+1, yourString.length());
Your answer is:
String[] s = new String[4];
s[0] = "pot-1_Sam";
s[1] = "pot-22_Daniel";
s[2] = "pot_444_Jack";
s[3] = "pot_5434_Bill";
ArrayList<String> result = new ArrayList<String>();
for (String value : s) {
String[] splitedArray = value.split("_");
result.add(splitedArray[splitedArray.length-1]);
}
for(String resultingValue : result){
System.out.println(resultingValue);
}
You have 2 options:
Keep using the indexOf method to get the index of the last _ (This assumes that there is no _ in the names you are after). Once that you have the last index of the _ character, you can use the substring method to get the bit you are after.
Use a regular expression. The strings you have shown essentially have the pattern where in you have numbers, followed by an underscore which is in turn followed by the word you are after. You can use a regular expression such as \\d+_ (which will match one or more digits followed by an underscore) in combination with the split method. The string you are after will be in the last array position.
Use a string tokenizer based on '_' and get the last element. No need for REGEX.
Or use the split method on the string object like so :
String[] strArray = strValue.split("_");
String lastToken = strArray[strArray.length -1];
String[] s = {
"pot-1_Sam",
"pot-22_Daniel",
"pot_444_Jack",
"pot_5434_Bill"
};
for (String e : s)
System.out.println(e.replaceAll(".*_", ""));

String.split() returning a "" unexpectedly

I have a simple method splitting a string into an array. It splits it where there are non-letter characters. The line I am using right now is as follows:
String[] words = str.split("[^a-zA-Z]");
So this should split the string where there are only alphabetical characters. But the problem is that when it splits it works for some, but not all. For example:
String str = "!!day--yaz!!";
String[] words = str.split("[^a-zA-Z]");
String result = "";
for (int i = 0; i < words.length; i++) {
result += words[i] + "1 ";
}
return result;
I added the 1 in there to see where the split takes place, becuase i was getting errors on null values. Anyway, when I run this code I get an output of:
1 1 day1 1 yaz1
Why is it splitting between the first two !'s and after one of the -'s, but not after the last two !'s? Why is it even splitting there at all? Any help on this would be great!
It doesn't split before or after it splits ON the matches, therefore you get an empty String between the dashes and the bangs.
This doesn't apply to the trailing bangs, because trailing empty Strings are omitted as described in the javadoc
Trailing empty strings are therefore not included in the resulting
array.
This happens because it indeed uses every non-letter character as a delimiter. It means that string "!" will be splitted into array of 2 empty strings to the left and to the right of the exclamation sign.
Your problem can be solved withing 2 steps.
use "[^a-zA-Z]+" instead of "[^a-zA-Z]". The + will help you to avoid empty string between 2 dashes.
Remove starting and trailing non-letter characters before splitting. This will remove leading and trailing empty strings: str.replaceFirst("[^a-zA-Z]+").replaceFirst("[^a-zA-Z]+$")
Finally your split will look like:
String[] words = str..replaceFirst("[^a-zA-Z]+").replaceFirst("[^a-zA-Z]+$")split("[^a-zA-Z]");
If you want to get rid of some of the extra splits, use split("[^a-zA-Z]+") instead of split("[^a-zA-Z]"). This will match a continuous part of the String that matches the pattern.

Remove characters before a comma in a string

I was wondering what would be the best way to go about removing characters before a comma in a string, as well as removing the comma itself, leaving just the characters after the comma in the string, if the string is represented as 'city,country'.
Thanks in advance
So you want
city,country
to become
country
An easy way to do this is this:
public static void main(String[] args) {
System.out.println("city,country".replaceAll(".*,", ""));
}
This is "greedy" though, meaning it will change
city,state,country
into
country
In your case, you might want it to become
state,country
I couldn't tell from your question.
If you want "non-greedy" matching, use
System.out.println("city,state,country".replaceAll(".*?,", ""));
this will output
state, country
check this
String s="city,country";
System.out.println(s.substring(s.lastIndexOf(',')+1));
I found it faster than .replaceAll(".*,", "")
If what you are interested in is extracting data while leaving the original string intact you should use the split(String regex) function.
String foo = new String("city,country");
String[] data = foo.split(",");
The data array will now contain strings "city" and "country".
More info is available here: http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split%28java.lang.String%29
This can be done with a combination of substring and indexOf, using indexOf to determine the position of the (first) comma, and substring to extract a portion of the string relative to that position.
String s = "city,country";
String s2 = s.substring(s.indexOf(",") + 1);
You could implement a sort of substring that finds all the indexes of characters before your comma and then all you'd need to do is remove them.

Finding multiple substrings using boundaries in Java

Alright so here is my problem. Basically I have a string with 4 words in it, with each word seperated by a #. What I need to do is use the substring method to extract each word and print it out. I am having trouble figuring out the parameters for it though. I can always get the first one right, but the following ones generally have problems.
Here is the first piece of the code:
word = format.substring( 0 , format.indexOf('#') );
Now from what I understand this basically means start at the beginning of the string, and end right before the #. So using the same logic, I tried to extract the second word like so:
wordTwo = format.substring ( wordlength + 1 , format.indexOf('#') );
//The plus one so I don't start at the #.
But with this I continually get errors saying it doesn't exist. I figured that the compiler was trying to read the first # before the second word, so I rewrote it like so:
wordTwo = format.substring (wordlength + 1, 1 + wordLength + format.indexOf('#') );
And with this it just completely screws it up, either not printing the second word or not stopping in the right place. If I could get any help on the formatting of this, it would be greatly appreciated. Since this is for a class, I am limited to using very basic methods such as indexOf, length, substring etc. so if you could refrain from using anything to complex that would be amazing!
If you have to use substring then you need to use the variant of indexOf that takes a start. This means you can start look for the second # by starting the search after the first one. I.e.
wordTwo = format.substring ( wordlength + 1 , format.indexOf('#', wordlength + 1 ) );
There are however much better ways of splitting a string on a delimiter like this. You can use a StringTokenizer. This is designed for splitting strings like this. Basically:
StringTokenizer tok = new StringTokenizer(format, "#");
String word = tok.nextToken();
String word2 = tok.nextToken();
String word3 = tok.nextToken();
Or you can use the String.split method which is designed for splitting strings. e.g.
String[] parts = String.split("#");
String word = parts[0];
String word2 = parts[1];
String word3 = parts[2];
You can go with split() for this kind of formatting strings.
For instance if you have string like,
String text = "Word1#Word2#Word3#Word4";
You can use delimiter as,
String delimiter = "#";
Then create an string array like,
String[] temp;
For splitting string,
temp = text.split(delimiter);
You can get words like this,
temp[0] = "Word1";
temp[1] = "Word2";
temp[2] = "Word3";
temp[3] = "Word4";
Use split() method to do this with "#" as the delimiter
String s = "hi#vivek#is#good";
String temp = new String();
String[] arr = s.split("#");
for(String x : arr){
temp = temp + x;
}
Or if you want to exact each word... you have it already in arr
arr[0] ---> First Word
arr[1] ---> Second Word
arr[2] ---> Third Word
I suggest that you've a look at the Javadoc for String before you proceed further.
Since this is your homework, I'll give you a couple of hints and maybe you can solve it yourself:
The format for subString is public void subString(int beginIndex, int endIndex). As per the javadoc for this method:
Returns a new string that is a substring of this string. The substring
begins at the specified beginIndex and extends to the character at
index endIndex - 1. Thus the length of the substring is
endIndex-beginIndex.
Note that if you've to use this method, understand that you'll have to shift your beginIndex and endIndex each time because in your situation, you'll have multiple words that are separated by #.
However if you look closely, there's another method in String class that might be helpful to you. That's the public String[] split(String regex) method. The javadoc for this one states:
Splits this string around matches of the given regular expression.
This method works as if by invoking the two-argument split method with
the given expression and a limit argument of zero. Trailing empty
strings are therefore not included in the resulting array.
The split() method looks pretty interesting for your case. You can split your String with the delimiter that you have as the parameter to this method, get the String array and work with that.
Hope this helps you to understand your problem and get started towards a solution :)
Since this is a home work, it may be better to have try to write it your self. But I will give a clue.
Clue:
The indexOf method has another overload: int indexOf(int chr,
int fromIndex) which find the first character chr in the string
from the fromIndex.
http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/String.html
From this clue, the program will look something like this:
Find the index of the first '#' from the start of the string.
Extract the word from 0th character to that index.
Find the index of the first '#' from the character AFTER the first '#'.
Extract the word from the first '#' that index.
... Just do it until you get 4 words or the string ends.
Hope this helps.
I don't know why you're forced to use String#substring, but as others have mentioned, it seems like the wrong method for the kind of functionality you need.
String#split(String regex) is what you would use for such a problem, or, if your input sequence is something you don't control, I would suggest you look at the overloaded method String#split(String regex, int limit); this way you can impose a limit on the amount of matches you make, controlling your resulting array.

Categories

Resources