This question already has answers here:
How to split a string, but also keep the delimiters?
(24 answers)
Closed 7 years ago.
How do you split a string of words and retain whitespaces?
Here is the code:
String words[] = s.split(" ");
String s contains: hello world
After the code runs, words[] contains: "hello" "" world
Ideally, it should not be an empty string in the middle, but contain both whitespaces: words[] should be: "hello" " " " " world
How do I get it to have this result?
You could use lookahead/lookbehind assertions:
String[] words = "hello world".split("((?<=\\s+)|(?=\\s+))");
where (?<=\\s+) and (?=\\s+) are zero-width groups.
If you can tolerate both white spaces together in one string, you can do
String[] words = s.split("\\b");
Then words contains ("hello", " ", "world").
s.split("((?<= )|(?= ))"); is one way.
Technically, the regular expression is using lookahead and lookbehind. The single space after each = is the delimiter.
You could do something like this:
List<String> result = new LinkedList<>();
int rangeStart = 0;
for (int i = 0; i < s.length(); ++i) {
if (Character.isWhitespace(s.charAt(i))) {
if (rangeStart < i) {
result.add(s.substring(rangeStart, i));
}
result.add(Character.toString(s.charAt(i)));
rangeStart = i + 1;
}
}
if (rangeStart < s.length()) {
result.add(s.substring(rangeStart));
}
Yeah, no regexes, sue me. This way you can see how it works more easily.
Related
How to flip two words in a sentence in java like
Input: "hi how are you doing today jane"
Output: "how hi you are today doing jane"
what I tried:
String s = "hi how are you doing today jane";
ArrayList<String> al = new ArrayList<>();
String[] splitted = s.split("\\s+");
int n = splitted.length;
for(int i=0; i<n; i++) {
al.add(splitted[i]);
}
for(int i=0; i<n-1; i=i+2) {
System.out.print(al.get(i+1)+" "+al.get(i)+" ");
}
if((n%2) != 0) {
System.out.print(al.get(n - 1));
}
output I'm getting:
"how hiyou aretoday doing"
As you asked to do with only one loop and without extensive use of regex, here is another solution using Collections.swap:
String s = "hi how are you doing today jane";
List<String> splitted = new ArrayList<>(List.of(s.split("\\s+")));
for(int i = 0; i < splitted.size() - 1; i += 2)
Collections.swap(splitted, i, i + 1);
s = String.join(" ", splitted);
System.out.println(s);
Output:
how hi you are today doing jane
Since you're using split() which takes a regex, it would seem that using regex is a valid solution, so use it:
replaceAll("(\\w+)(\\W+)(\\w+)", "$3$2$1")
Explanation
(\\w+) Match first word, and capture it as group 1
(\\W+) Match the characters between the 2 words, and capture them as group 2
(\\w+) Match second word, and capture it as group 3
$3$2$1 Replace the above with the 3 groups in reverse order
Example
System.out.println("hi how are you doing today jane".replaceAll("(\\w+)(\\W+)(\\w+)", "$3$2$1"));
Output
how hi you are today doing jane
Note: Since your code used split("\\s+"), your definition of a "word" is a sequence of non-whitespace characters. To use that definition of a word, change the regex to:
replaceAll("(\\S+)(\\s+)(\\S+)", "$3$2$1")
If you want old-school fori loop and bufor/temp value solution, here you are:
public static void main(String[] args) {
String s = "hi how are you doing today jane";
String flip = flip(s);
System.out.println(flip);
}
private static String flip(String sentence) {
List<String> words = Arrays.asList(sentence.split("\\s+"));
for (int i = 0; i < words.size(); i += 2) {
if (i + 1 < words.size()) {
String tmp = words.get(i + 1);
words.set(i + 1, words.get(i));
words.set(i, tmp);
}
}
return words.stream().map(String::toString).collect(Collectors.joining(" "));
}
However Pauls solultion is way better since it is java, and we are no more in the stone age era :)
I want to select the first N words of a text string.
I have tried split() and substring() to no avail.
What I want is to select the first 3 words of the following prayer and copy them to another variable.
For example if I have a string:
String greeting = "Hello this is just an example"
I want to get into the variable Z the first 3 words so that
Z = "Hello this is"
String myString = "Copying first N numbers of words to a string";
String [] arr = myString.split("\\s+");
//Splits words & assign to the arr[] ex : arr[0] -> Copying ,arr[1] -> first
int N=3; // NUMBER OF WORDS THAT YOU NEED
String nWords="";
// concatenating number of words that you required
for(int i=0; i<N ; i++){
nWords = nWords + " " + arr[i] ;
}
System.out.println(nWords);
NOTE : Here .split() function returns an array of strings computed by splitting a given string around matches of the given regular expression
so if i write the code like follows
String myString = "1234M567M98723651";
String[] arr = myString.split("M"); //idea : split the words if 'M' presents
then answers will be : 1234 and 567 where stored into an array.
This is doing by storing the split values into the given array. first split value store to arr[0], second goes to arr[1].
Later part of the code is for concatenating the required number of split words
Hope that you can get an idea from this!!!
Thank you!
public String getFirstNStrings(String str, int n) {
String[] sArr = str.split(" ");
String firstStrs = "";
for(int i = 0; i < n; i++)
firstStrs += sArr[i] + " ";
return firstStrs.trim();
}
Now getFirstNStrings("Hello this is just an example", 3); will output:
Hello this is
You could try something like:
String greeting = "Hello this is just an example";
int end = 0;
for (int i=0; i<3; i++) {
end = greeting.indexOf(' ', end) + 1;
}
String Z = greeting.substring(0, end - 1);
N.B. This assumes there are at least three space characters in your source string. Any less and this code will probably fail.
Add this in a utility class, such as Util.java
public static String getFirstNWords(String s, int n) {
if (s == null) return null;
String [] sArr = s.split("\\s+");
if (n >= sArr.length)
return s;
String firstN = "";
for (int i=0; i<n-1; i++) {
firstN += sArr[i] + " ";
}
firstN += sArr[n-1];
return firstN;
}
Usage:
Util.getFirstNWords("This will give you the first N words", 3);
---->
"This will give"
If you use Apache Commons Lang3, you can make it a little shorter like this:
public String firstNWords(String input, int numOfWords) {
String[] tokens = input.split(" ");
tokens = ArrayUtils.subarray(tokens, 0, numOfWords);
return StringUtils.join(tokens, ' ');
}
Most of the answers posted already use regular expressions which can become an overhead if we have to process a large number of strings. Even str.split(" ") uses regular expression operations internally. dave's answer is perhaps the mos efficient, but it does not handle correctly strings that have multiple spaces occurring together, beside assuming that regular space is the only word separator and that the input string has 3 or more words (an assumption he has already called out). If using Apache Commons in an option, then I would use the following code as it is not only concise and avoids using regular expression even internally but also handled gracefully input strings that have less than 3 words:
/* Splits by whitespace characters. All characters after the 3rd whitespace,
* if present in the input string, go into the 4th "word", which could really
* be a concanetation of multiple words. For the example in the question, the
* 4th "word" in the result array would be "just an example". Invoking the
* utility method with max-splits specified is slightly more efficient as it
* avoids the need to look for and split by space after the first 3 words have
* been extracted
*/
String[] words = StringUtils.split(greeting, null, 4);
String Z = StringUtils.join((String[]) ArrayUtils.subarray(words, 0, 3), ' ');
I have a large string where I will see a sequence of digits. I have to append a character in front of the number. lets take an example. my string is..
String s= "Microsoft Ventures' start up \"98756\" accelerator wrong launched in apple in \"2012\" has been one of the most \"4241\" prestigious such programs in the country.";
I am looking for a way in Java to add a character in front of each number.so I am expecting the modified string will looks like...
String modified= "Microsoft Ventures' start up \"x98756\" accelerator wrong launched in apple in \"x2012\" has been one of the most \"x4241\" prestigious such programs in the country.";
How do I do that in Java?
The regex to find the numerical part will be "\"[0-9]+\"". The approach I will do is loop through the original string by word, if the word matches the pattern, replace it.
String[] tokens = s.split(" ");
String modified = "";
for (int i = 0 ; i < tokens.length ; i++) {
// the digits are found
if (Pattern.matches("\"[0-9]+\"", tokens[i])) {
tokens[i] = "x" + tokens[i];
}
modified = modified + tokens[i] + " ";
}
The code is simply to give you the idea, please optimize it yourself (using StringBuilder to concatenate strings and etc).
The best way I could see to do this would be to split up the string into various sbustrings and append characters onto it. Something like the following:
String s="foo \67\ blah \89\"
String modified=" ";
String temp =" ";
int index=0;
char c=' ';
for(int i=0; i<s.length(); ++i) {
c=s.charAt(i);
if (Character.isDigit(c)) {
temp=s.substring(index, i-1);
modified=modified+temp+'x';
int j=i;
while(Character.isDigit(c)) {
modified+=s[j];
++j;
c=s.charAt(j);
}
index=j;
}
}
I want to split a string so that I get starting alphabetical string(until the first numeric digit occured). And the other alphanumeric string.
E.g.:
I have a string forexample: Nesc123abc456
I want to get following two strings by splitting the above string: Nesc, 123abc456
What I have tried:
String s = "Abc1234avc";
String[] ss = s.split("(\\D)", 2);
System.out.println(Arrays.toString(ss));
But this just removes the first letter from the string.
You could maybe use lookarounds so that you don't consume the delimiting part:
String s = "Abc1234avc";
String[] ss = s.split("(?<=\\D)(?=\\d)", 2);
System.out.println(Arrays.toString(ss));
ideone demo
(?<=\\D) makes sure there's a non-digit before the part to be split at,
(?=\\d) makes sure there's a digit after the part to be split at.
You need the quantifier.
Try
String[] ss = s.split("(\\D)*", 2);
More information here: http://docs.oracle.com/javase/tutorial/essential/regex/quant.html
Didn't you try replaceAll?
String s = ...;
String firstPart = s.replaceAll("[0-9].*", "");
String secondPart = s.substring(firstPart.length());
You can use:
String[] arr = "Nesc123abc456".split("(?<=[a-zA-Z])(?![a-zA-Z])", 2);
//=> [Nesc, 123abc456]
split is a destructive process so you would need to find the index of the first numeric digit and use substrings to get your result. This would also probably be faster than using a regex since those have a lot more heuristics behind them
int split = string.length();
for(int i = 0; i < string.length(); i ++) {
if (Character.isDigit(string.charAt(i)) {
split = i;
break;
}
}
String[] parts = new String[2];
parts[0] = string.substring(0, split);
parts[1] = string.substring(split);
I think this is what you asked:
String s = "Abc1234avc";
String numbers = "";
String chars = "";
for(int i = 0; i < s.length(); i++){
char c = s.charAt(i);
if(Character.isDigit(c)){
numbers += c + "";
}
else {
chars += c + "";
}
}
System.out.println("Numbers: " + numbers + "; Chars: " + chars);
I have a string that I need to be split into 2. I want to do this by splitting at exactly the third comma.
How do I do this?
Edit
A sample string is :
from:09/26/2011,type:all,to:09/26/2011,field1:emp_id,option1:=,text:1234
The string will keep the same format - I want everything before field in a string.
If you're simply interested in splitting the string at the index of the third comma, I'd probably do something like this:
String s = "from:09/26/2011,type:all,to:09/26/2011,field1:emp_id,option1:=,text:1234";
int i = s.indexOf(',', 1 + s.indexOf(',', 1 + s.indexOf(',')));
String firstPart = s.substring(0, i);
String secondPart = s.substring(i+1);
System.out.println(firstPart);
System.out.println(secondPart);
Output:
from:09/26/2011,type:all,to:09/26/2011
field1:emp_id,option1:=,text:1234
Related question:
How to find nth occurrence of character in a string?
a naive implementation
public static String[] split(String s)
{
int index = 0;
for(int i = 0; i < 3; i++)
index = s.indexOf(",", index+1);
return new String[] {
s.substring(0, index),
s.substring(index+1)
};
}
This does no bounds checking and will throw all sorts of lovely exceptions if not given input as expected. Given "ABCD,EFG,HIJK,LMNOP,QRSTU" returns ["ABCD,EFG,HIJK","LMNOP,QRSTU"]
You can use this regex:
^([^,]*,[^,]*,[^,]*),(.*)$
The result is then in the two captures (1 and 2), not including the third comma.