String split method is leaving empty tokens at the beginning [duplicate] - java

This question already has answers here:
Java String.split() sometimes giving blank strings
(3 answers)
Closed 9 years ago.
I have a string
String text="abc19xyz87nag";
I need to get only the numbers out of it, so I applied "\\D+" regex as below,
String text="abc19xyz87nag";
String[] tks=text.split("\\D+");
But I see a empty token in the beginning of the array
How ever I have found out two other solutions anyway as below
Using scanner
Scanner sc = new Scanner(text).useDelimiter("\\D+");
while(sc.hasNext()) {
System.out.println(sc.nextInt());
}
Using Pattern and Matcher
Matcher m = Pattern.compile("\\d+").matcher(text);
while (m.find()) {
System.out.println(m.group());
}
So Why string split is leaving empty token at the beginning?
Do I need to change the regex to avoid it?
Any help is appreciated

It is a design decision to not discard empty strings at the beginning. The rationale is that split() is often used with data like
item1, item2, item3
(here the delimitter is ',') and you want to keep the non-null items at their positions.
Now, suppose you parse lines with 3 items like above, where the first and the last are optional. If split would discard both leading and trailing empty strings, and you get 2 elements back, you couldn't decide whther the input was:
, item2, item3
or
item1, item2
By only discarding empty strings at the end, you know that every non-empty string is at its correct position.

Your Regex should be like
[^\d]
Hope it helps

String s = "smt-ing we want to split");
StringTokenizer st = new StringTokenizer();
while(st.hasMoreTokens()){
System.out.println(st.nextToken());
}

You can do some validation to remove that empty String.
String text="abc19xyz87nag";
String[] tks=text.split("\\D+");
List<String> numList=new ArrayList<>();
for(String i:tks){
if(!"".equals(i)){
numList.add(i);
}
}

Related

Difficulty splitting string at delimiter and keeping it

I have a string that is read in pairs, separated by comma. However, I do not always want to split at the comma because there is not always 1 comma in the input. For example, the string,
(http://www.wolframalpha.com/input/?i=103%2F30+%3D+4a-3b,+71%2F60+%3D+a+%2B+b
,http://www.wolframalpha.com/input/?i=x%5E2%2B5x%2B6,file:///tmp/foo/bar/p,d,f.pdf)
Is read in all one line. For this case, I only want to split at the ,h, and no where else in the string. Essentially, after the split, the strings should be:
http://www.wolframalpha.com/input/?i=103%2F30+%3D+4a-3b,+71%2F60+%3D+a+%2B+b
http://www.wolframalpha.com/input/?i=x%5E2%2B5x%2B6
file:///tmp/foo/bar/p,d,f.pdf
Maintaining the order of the comma in the first string. (I will get rid of parenthesis). I have looked at this stack overflow question, and while helpful, does not correctly split this string. This is in Java. Any help is appreciated.
You can use regex to do the split. Please see below code snippet.
String str = "(http://www.wolframalpha.com/input/?i=103%2F30+%3D+4a-3b,+71%2F60+%3D+a+%2B+b,http://www.wolframalpha.com/input/?i=x%5E2%2B5x%2B6)";
String[] strArr = str.split("(,(?=http))");
You will have Array of all the value which would be possible according to your requirement.
Split on 'http' then re-add it.
Psuedo-code
String input = "http://www.wolframalpha.com/input/?i=103%2F30+%3D+4a-3b,+71%2F60+%3D+a+%2B+b
,http://www.wolframalpha.com/input/?i=x%5E2%2B5x%2B6"
List<String> split = input.split('http');
List<String> finalList = new ArrayList<String>();
for(String fixup in split)
{
finalList.put( "http" + fixup );
}
Final should contain the two URLs.

Java Regex expression to match and store any integers

Right now, using Java, I just want it to be able to tokenize any string of integers to an array
input = 1dsa23f hj23nma9123
array = 1,23,23,9123;
I have been trying a few different ways to do it, string.matches("") and then tokenising after it's in the right format and what not but it is too limiting to the user.
It looks like you are looking for something like
String[] nums = text.split("\\D+");
\D regex is negation of \d (it is like [^\d]) which means \D+ will match one or more non-digits.
Only problem with this solution is that if your text start with non-digits result array will start with one empty string.
If you still want to use split then you can simply remove that non-digits part from start of your text.
String[] nums = text.replaceFirst("^\\D+","").split("\\D+");
Other approach than split which is focusing on finding delimiters would be focusing on finding parts which are interesting to us. So instead of searching for non-digits lets find digits.
We can do it in few ways like Patter/Matcher#find, or with Scanner. Problem here is that these approaches don't return array but single elements which you would need to store in some resizeable structure like List.
So solution using Pattern and Matcher could look like:
List<String> numbers = new ArrayList<>();
Matcher m = Pattern.compile("\\d+").matcher(yourText);
while(m.find()){
numbers.add(m.group());
}
Solution using Scanner is similar, we just need to set proper delimiter (to non-digit) and read everything which is not delimiter (delimiters at start of text will be ignored which will should prevent returning empty strings).
List<String> nums = new ArrayList<>();
Scanner sc = new Scanner(yourText);
sc.useDelimiter("\\D+");
while(sc.hasNext()){
nums.add(sc.next());
}
final String input = "1dsa23f hj23nma9123";
final String[] parts = input.split("[^0-9]+");
for (final String s: parts) {
final int i = Integer.parseInt(s);
}

I want to perform a split() on a string using a regex in Java, but would like to keep the delimited tokens in the array [duplicate]

This question already exists:
Is there a way to split strings with String.split() and include the delimiters? [duplicate]
Closed 8 years ago.
How can I format my regex to allow this?
Here's the regular expression:
"\\b[(\\w'\\-)&&[^0-9]]{4,}\\b"
It's looking for any word that is 4 letters or greater.
If I want to split, say, an article, I want an array that includes all the delimited values, plus all the values between them, all in the order that they originally appeared in. So, for example, if I want to split the following sentence: "I need to purchase a new vehicle. I would prefer a BMW.", my desired result from the split would be the following, where the italicized values are the delimiters.
"I ", "need", " to ", "purchase", " a new ", "vehicle", ". I ", "would", " ", "prefer", "a BMW."
So, all words with >4 characters are one token, while everything in between each delimited value is also a single token (even if it is multiple words with whitespace). I will only be modifying the delimited values and would like to keep everything else the same, including whitespace, new lines, etc.
I read in a different thread that I could use a lookaround to get this to work, but I can't seem to format it correctly. Is it even possible to get this to work the way I'd like?
I am not sure what you are trying to do but just in case that you want to modify words that have at least four letters you can use something like this (it will change words with =>4 letters to its upper cased version)
String data = "I need to purchase a new vehicle. I would prefer a BMW.";
Pattern patter = Pattern.compile("(?<![a-z\\-_'])[a-z\\-_']{4,}(?![a-z\\-_'])",
Pattern.CASE_INSENSITIVE);
Matcher matcher = patter.matcher(data);
StringBuffer sb = new StringBuffer();// holder of new version of our
// data
while (matcher.find()) {// lets find all words
// and change them with its upper case version
matcher.appendReplacement(sb, matcher.group().toUpperCase());
}
matcher.appendTail(sb);// lets not forget about part after last match
System.out.println(sb);
Output:
I NEED to PURCHASE a new VEHICLE. I WOULD PREFER a BMW.
OR if you change replacing code to something like
matcher.appendReplacement(sb, "["+matcher.group()+"]");
you will get
I [need] to [purchase] a new [vehicle]. I [would] [prefer] a BMW.
Now you can just split such string on every [ and ] to get your desired array.
Assuming that "word" is defined as [A-Za-z], you can use this regex:
(?<=(\\b[A-Za-z]{4,50}\\b))|(?=(\\b[A-Za-z]{4,50}\\b))
Full code:
class RegexSplit{
public static void main(String[] args){
String str = "I need to purchase a new vehicle. I would prefer a BMW.";
String[] tokens = str.split("(?<=(\\b[A-Za-z]{4,50}\\b))|(?=(\\b[A-Za-z]{4,50}\\b))");
for(String token: tokens){
System.out.print("["+token+"]");
}
System.out.println();
}
}
to get this output:
[I ][need][ to ][purchase][ a new ][vehicle][. I ][would][ ][prefer][ a BMW.]

Splitting string on multiple spaces in java [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to split a String by space
I need help while parsing a text file.
The text file contains data like
This is different type of file.
Can not split it using ' '(white space)
My problem is spaces between words are not similar. Sometimes there is single space and sometimes multiple spaces are given.
I need to split the string in such a way that I will get only words, not spaces.
str.split("\\s+") would work. The + at the end of the regular-expression, would treat multiple spaces the same as a single space. It returns an array of strings (String[]) without any " " results.
You can use Quantifiers to specify the number of spaces you want to split on: -
`+` - Represents 1 or more
`*` - Represents 0 or more
`?` - Represents 0 or 1
`{n,m}` - Represents n to m
So, \\s+ will split your string on one or more spaces
String[] words = yourString.split("\\s+");
Also, if you want to specify some specific numbers you can give your range between {}:
yourString.split("\\s{3,6}"); // Split String on 3 to 6 spaces
Use a regular expression.
String[] words = str.split("\\s+");
you can use regex pattern
public static void main(String[] args)
{
String s="This is different type of file.";
String s1[]=s.split("[ ]+");
for(int i=0;i<s1.length;i++)
{
System.out.println(s1[i]);
}
}
output
This
is
different
type
of
file.
you can use
replaceAll(String regex, String replacement) method of String class to replace the multiple spaces with space and then you can use split method.
String spliter="\\s+";
String[] temp;
temp=mystring.split(spliter);
I am giving you another method to tockenize your string if you dont want to use the split method.Here is the method
public static void main(String args[]) throws Exception
{
String str="This is different type of file.Can not split it using ' '(white space)";
StringTokenizer st = new StringTokenizer(str, " ");
while(st.hasMoreElements())
System.out.println(st.nextToken());
}
}

Give comma separated strings as input to sql "IN" clause [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
creating comma seperated string to be given as input to sql “IN” clause
HI,
i have to implement multiple select dropdown,and the selected values shud be formatted to be input to "IN" clause of sql.
Am storing the selected values in a string.But there is no delimiter between values ,so i cannot split the string.Are there any other methods for the string formatting.
Fun solution: Add the values to ArrayList or LinkedList, the call toString(). And replace '['->'(', replace ']'->')'.
If you had a collection of strings and then copied them to 1 string without delimiters you cannot separate them again. Just do not do this. If you still have problems please send more details about your task.
If you know the values of the dropdown and all values are unique and not subsets of each other, then you can use String#contains() or regular expressions to test, which values have been selected.
But it's by far easier to simply add some trivial delimiter (like the common ";") while concatenating the String that holds the selection.
Example for the contains approach
String[] legalValues = {"YES","NO","MAYBE"};
String result = getSelection(); // returns a String like "MAYBEYES"
StringBuilder inClauseBuilder = new StringBuilder();
boolean isFirst = true;
for (String legalValue:legalValues) {
if (!result.contains(legalValue)
continue;
if (isFirst) {
isFirst = false;
} else {
inClauseBuilder.append(",");
}
inClauseBuilder.append("\"").append(legalValue).append("\"");
}
String inClause = inClauseBuilder.toString();
Note - this approach will fail as soon as you have legal values like
String[] legalValues = {"YES","NO","MAYBE-YES", "MAYBE-NO"};
^^^ ^^ ^^^ ^^

Categories

Resources