Splitting string on multiple spaces in java [duplicate] - java

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to split a String by space
I need help while parsing a text file.
The text file contains data like
This is different type of file.
Can not split it using ' '(white space)
My problem is spaces between words are not similar. Sometimes there is single space and sometimes multiple spaces are given.
I need to split the string in such a way that I will get only words, not spaces.

str.split("\\s+") would work. The + at the end of the regular-expression, would treat multiple spaces the same as a single space. It returns an array of strings (String[]) without any " " results.

You can use Quantifiers to specify the number of spaces you want to split on: -
`+` - Represents 1 or more
`*` - Represents 0 or more
`?` - Represents 0 or 1
`{n,m}` - Represents n to m
So, \\s+ will split your string on one or more spaces
String[] words = yourString.split("\\s+");
Also, if you want to specify some specific numbers you can give your range between {}:
yourString.split("\\s{3,6}"); // Split String on 3 to 6 spaces

Use a regular expression.
String[] words = str.split("\\s+");

you can use regex pattern
public static void main(String[] args)
{
String s="This is different type of file.";
String s1[]=s.split("[ ]+");
for(int i=0;i<s1.length;i++)
{
System.out.println(s1[i]);
}
}
output
This
is
different
type
of
file.

you can use
replaceAll(String regex, String replacement) method of String class to replace the multiple spaces with space and then you can use split method.

String spliter="\\s+";
String[] temp;
temp=mystring.split(spliter);

I am giving you another method to tockenize your string if you dont want to use the split method.Here is the method
public static void main(String args[]) throws Exception
{
String str="This is different type of file.Can not split it using ' '(white space)";
StringTokenizer st = new StringTokenizer(str, " ");
while(st.hasMoreElements())
System.out.println(st.nextToken());
}
}

Related

How to split a string if my string contains ^ symbol in java. I want to create an array out of it [duplicate]

This question already has answers here:
How do I split a string in Java?
(39 answers)
Split string with dot as delimiter
(13 answers)
Closed 2 years ago.
I have a string, which contains ^ symbol given below.
String tempName = "afds^afcu^e200f.pdf"
I want to split it like below
[afds, afcu, e200f]
How to resolve this.
The parameter to split() is a regular expression, which has special meta-characters. If the delimiter you're splitting on contains those special characters (e.g. ^), you have two options:
Escape the characters using \, which has to be doubled in a Java string literal to \\:
String[] result = tempName.split("\\^");
If you don't want to bother with that, or if the delimiter is dynamically assigned at runtime, so you can't escape the special characters yourself, call Pattern.quote() to do it for you:
String[] result = tempName.split(Pattern.quote("^"));
you need to add \\ in split method of String to split the string by this (^), because ^ is an special character in regular expression and you need to omit it with \\:
String tempName = "afds^afcu^e200f.pdf";
String [] result = tempName.split("\\^");
System.out.println(Arrays.toString(result));
Java characters that have to be escaped in regular expressions are:
.[]{}()<>*+-=!?^$|
Two of the closing brackets (] and }) are only need to be escaped after opening the same type of bracket.
In []-brackets some characters (like + and -) do sometimes work without escape.
more info...
String.split() in Java takes a regular expression. Since ^ is a control character in regex (when at the beginning of the regex string it means "the start of the line"), we need to escape it with a backslash. Since backslash is a control character in Java string literals, we also need to escape that with another backslash.
String tempName = "afds^afcu^e200f.pdf";
String[] parts = tempName.split("\\^");
You can use the retrieve a substring without the file extension and split that according to the delimiter that is required (^). This is shown below:
public static void main(String[] args) {
String tempName = "afds^afcu^e200f.pdf";
String withoutFileFormat = tempName.substring(0, tempName.length() - 4); //retrieve the string without the file format
String[] splitArray = withoutFileFormat.split("\\^"); //split it using the "^", use escape characters
System.out.println(Arrays.toString(splitArray)); //output the result
}
Required Output:
[afds, afcu, e200f]

using regular expression as delimiter with StringTokenizer

I am new to java progrmming and came across the StringTokenizer class. The constructor accepts the string to be split and another optional delimiter string each character of which gets treated as an individual delimiter while splitting the original string. I was wondering if there is any way to split the string passing a regex as the delimiter. for example:
String s="34.5xy32.6y45.7x36xy"
StringTokenizer t=new StringTokenizer(s,"xy");
System.out.println(t.nextToken());
System.out.println(t.nextToken());
The actual output is:
34.5
32.6
However, the desired output is:
34.5
32.6y45.7x36
Hope you guys can help. Also, please suggest some way around if it is not possible with StringTokenizer class.
Thanks in advance.
p.s. Is there any way to know which character the StringTokenizer is currently using as delimiter out of the provided set?
Here you would want to use String.split(), this will give you an array with your desired output.
It will take your input and split it around exact matches of your string you provide. StringTokenizer will split around anyone of the set that you provide it rather than a regular expression.
So you change your code to:
String s="34.5xy32.6y45.7x36xy";
String[] splitString = s.split("xy");
System.out.println(splitString [0]);
System.out.println(splitString [1]);
For more complex examples you probably want boundary checking on the array also to make you don't go off the end of the array
Try with this.
String s="34.5xy32.6y45.7x36xy";
final String SPLIT_STR = "xy";
final String mainStr = "34.5xy32.6y45.7x36xy";
final String[] splitStr = mainStr.split(SPLIT_STR);
System.out.println("First Index Of xy : " +
mainStr.indexOf(SPLIT_STR));
for(int index=0; index < splitStr.length; index++) {
System.out.println("Split : " + splitStr[index]);
}

Python split semantics in Java

When I split a string in python, adjacent space delimiters are merged:
>>> str = "hi there"
>>> str.split()
['hi', 'there']
In Java, the delimiters are not merged:
$ cat Split.java
class Split {
public static void main(String args[]) {
String str = "hi there";
String result = "";
for (String tok : str.split(" "))
result += tok + ",";
System.out.println(result);
}
}
$ javac Split.java ; java Split
hi,,,,,,,,,,,,,,there,
Is there a straightforward way to get python space split semantics in java?
String.split accepts a regular expression, so provide it with one that matches adjacent whitespace:
str.split("\\s+")
If you want to emulate the exact behaviour of Python's str.split(), you'd need to trim as well:
str.trim().split("\\s+")
Quote from the Python docs on str.split():
If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a None separator returns [].
So the above is still not an exact equivalent, because it will return [''] for the empty string, but it's probably okay for your purposes :)
Use str.split("\\s+") instead. This will do what you need.
Java uses Regex to split.
so splitting on a single space will absolutely give you many array elements.
Python split, ltrims and rtrims and then takes runs of spaces into a single space when no parameter has been passed.
So it would more properly be
"my string".trim().split("\\s+");
The problem with Niklas B.'s answer is that trim has its own definition of whitespace, i.e., anything with code up to '\u0020'. The following should get close enough to the Python version, including the fix for the empty string:
class TestSplit {
private static final String[] EMPTY = {};
private static String[] pySplit(String s) {
s = s.replaceAll("^\\s+", "").replaceAll("\\s+$", "");
if (s.isEmpty()) return EMPTY;
return s.split("\\s+");
}
}
In java, String.split takes a regex. So you can do str.split(" +") to get python semantics.

Java parsing a string with lots of whitespace

I have a string with multiple spaces, but when I use the tokenizer it breaks it apart at all of those spaces. I need the tokens to contain those spaces. How can I utilize the StringTokenizer to return the values with the tokens I am splitting on?
You'll note in the docs for the StringTokenizer that it is recommended it shouldn't be used for any new code, and that String.split(regex) is what you want
String foo = "this is some data in a string";
String[] bar = foo.split("\\s+");
Edit to add: Or, if you have greater needs than a simple split, then use the Pattern and Matcher classes for more complex regular expression matching and extracting.
Edit again: If you want to preserve your space, actually knowing a bit about regular expressions really helps:
String[] bar = foo.split("\\b+");
This will split on word boundaries, preserving the space between each word as a String;
public static void main( String[] args )
{
String foo = "this is some data in a string";
String[] bar = foo.split("\\b");
for (String s : bar)
{
System.out.print(s);
if (s.matches("^\\s+$"))
{
System.out.println("\t<< " + s.length() + " spaces");
}
else
{
System.out.println();
}
}
}
Output:
this
<< 1 spaces
is
<< 6 spaces
some
<< 2 spaces
data
<< 6 spaces
in
<< 3 spaces
a
<< 1 spaces
string
Sounds like you may need to use regular expressions (http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/package-summary.html) instead of StringTokenizer.
Use String.split("\\s+") instead of StringTokenizer.
Note that this will only extract the non-whitespace characters separated by at least one whitespace character, if you want leading/trailing whitespace characters included with the non-whitespace characters that will be a completely different solution!
This requirement isn't clear from your original question, and there is an edit pending that tries to clarify it.
StringTokenizer in almost every non-contrived case is the wrong tool for the job.
I think It will be good if you use first replaceAll function to replace all the multiple spaces by a single space and then do tokenization using split function.

Splitting strings based on a delimiter

I am trying to break apart a very simple collection of strings that come in the forms of
0|0
10|15
30|55
etc etc. Essentially numbers that are seperated by pipes.
When I use java's string split function with .split("|"). I get somewhat unpredictable results. white space in the first slot, sometimes the number itself isn't where I thought it should be.
Can anybody please help and give me advice on how I can use a reg exp to keep ONLY the integers?
I was asked to give the code trying to do the actual split. So allow me to do that in hopes to clarify further my problem :)
String temp = "0|0";
String splitString = temp.split("|");
results
\n
0
|
0
I am trying to get
0
0
only. Forever grateful for any help ahead of time :)
I still suggest to use split(), it skips null tokens by default. you want to get rid of non numeric characters in the string and only keep pipes and numbers, then you can easily use split() to get what you want. or you can pass multiple delimiters to split (in form of regex) and this should work:
String[] splited = yourString.split("[\\|\\s]+");
and the regex:
import java.util.regex.*;
Pattern pattern = Pattern.compile("\\d+(?=([\\|\\s\\r\\n]))");
Matcher matcher = pattern.matcher(yourString);
while (matcher.find()) {
System.out.println(matcher.group());
}
The pipe symbol is special in a regexp (it marks alternatives), you need to escape it. Depending on the java version you are using this could well explain your unpredictable results.
class t {
public static void main(String[]_)
{
String temp = "0|0";
String[] splitString = temp.split("\\|");
for (int i=0; i<splitString.length; i++)
System.out.println("splitString["+i+"] is " + splitString[i]);
}
}
outputs
splitString[0] is 0
splitString[1] is 0
Note that one backslash is the regexp escape character, but because a backslash is also the escape character in java source you need two of them to push the backslash into the regexp.
You can do replace white space for pipes and split it.
String test = "0|0 10|15 30|55";
test = test.replace(" ", "|");
String[] result = test.split("|");
Hope this helps for you..
You can use StringTokenizer.
String test = "0|0";
StringTokenizer st = new StringTokenizer(test);
int firstNumber = Integer.parseInt(st.nextToken()); //will parse out the first number
int secondNumber = Integer.parseInt(st.nextToken()); //will parse out the second number
Of course you can always nest this inside of a while loop if you have multiple strings.
Also, you need to import java.util.* for this to work.
The pipe ('|') is a special character in regular expressions. It needs to be "escaped" with a '\' character if you want to use it as a regular character, unfortunately '\' is a special character in Java so you need to do a kind of double escape maneuver e.g.
String temp = "0|0";
String[] splitStrings = temp.split("\\|");
The Guava library has a nice class Splitter which is a much more convenient alternative to String.split(). The advantages are that you can choose to split the string on specific characters (like '|'), or on specific strings, or with regexps, and you can choose what to do with the resulting parts (trim them, throw ayway empty parts etc.).
For example you can call
Iterable<String> parts = Spliter.on('|').trimResults().omitEmptyStrings().split("0|0")
This should work for you:
([0-9]+)
Considering a scenario where in we have read a line from csv or xls file in the form of string and need to separate the columns in array of string depending on delimiters.
Below is the code snippet to achieve this problem..
{ ...
....
String line = new BufferedReader(new FileReader("your file"));
String[] splittedString = StringSplitToArray(stringLine,"\"");
...
....
}
public static String[] StringSplitToArray(String stringToSplit, String delimiter)
{
StringBuffer token = new StringBuffer();
Vector tokens = new Vector();
char[] chars = stringToSplit.toCharArray();
for (int i=0; i 0) {
tokens.addElement(token.toString());
token.setLength(0);
i++;
}
} else {
token.append(chars[i]);
}
}
if (token.length() > 0) {
tokens.addElement(token.toString());
}
// convert the vector into an array
String[] preparedArray = new String[tokens.size()];
for (int i=0; i < preparedArray.length; i++) {
preparedArray[i] = (String)tokens.elementAt(i);
}
return preparedArray;
}
Above code snippet contains method call to StringSplitToArray where in the method converts the stringline into string array splitting the line depending on the delimiter specified or passed to the method. Delimiter can be comma separator(,) or double code(").
For more on this, follow this link : http://scrapillars.blogspot.in

Categories

Resources