Copy the first N words in a string in java - java

I want to select the first N words of a text string.
I have tried split() and substring() to no avail.
What I want is to select the first 3 words of the following prayer and copy them to another variable.
For example if I have a string:
String greeting = "Hello this is just an example"
I want to get into the variable Z the first 3 words so that
Z = "Hello this is"

String myString = "Copying first N numbers of words to a string";
String [] arr = myString.split("\\s+");
//Splits words & assign to the arr[] ex : arr[0] -> Copying ,arr[1] -> first
int N=3; // NUMBER OF WORDS THAT YOU NEED
String nWords="";
// concatenating number of words that you required
for(int i=0; i<N ; i++){
nWords = nWords + " " + arr[i] ;
}
System.out.println(nWords);
NOTE : Here .split() function returns an array of strings computed by splitting a given string around matches of the given regular expression
so if i write the code like follows
String myString = "1234M567M98723651";
String[] arr = myString.split("M"); //idea : split the words if 'M' presents
then answers will be : 1234 and 567 where stored into an array.
This is doing by storing the split values into the given array. first split value store to arr[0], second goes to arr[1].
Later part of the code is for concatenating the required number of split words
Hope that you can get an idea from this!!!
Thank you!

public String getFirstNStrings(String str, int n) {
String[] sArr = str.split(" ");
String firstStrs = "";
for(int i = 0; i < n; i++)
firstStrs += sArr[i] + " ";
return firstStrs.trim();
}
Now getFirstNStrings("Hello this is just an example", 3); will output:
Hello this is

You could try something like:
String greeting = "Hello this is just an example";
int end = 0;
for (int i=0; i<3; i++) {
end = greeting.indexOf(' ', end) + 1;
}
String Z = greeting.substring(0, end - 1);
N.B. This assumes there are at least three space characters in your source string. Any less and this code will probably fail.

Add this in a utility class, such as Util.java
public static String getFirstNWords(String s, int n) {
if (s == null) return null;
String [] sArr = s.split("\\s+");
if (n >= sArr.length)
return s;
String firstN = "";
for (int i=0; i<n-1; i++) {
firstN += sArr[i] + " ";
}
firstN += sArr[n-1];
return firstN;
}
Usage:
Util.getFirstNWords("This will give you the first N words", 3);
---->
"This will give"

If you use Apache Commons Lang3, you can make it a little shorter like this:
public String firstNWords(String input, int numOfWords) {
String[] tokens = input.split(" ");
tokens = ArrayUtils.subarray(tokens, 0, numOfWords);
return StringUtils.join(tokens, ' ');
}

Most of the answers posted already use regular expressions which can become an overhead if we have to process a large number of strings. Even str.split(" ") uses regular expression operations internally. dave's answer is perhaps the mos efficient, but it does not handle correctly strings that have multiple spaces occurring together, beside assuming that regular space is the only word separator and that the input string has 3 or more words (an assumption he has already called out). If using Apache Commons in an option, then I would use the following code as it is not only concise and avoids using regular expression even internally but also handled gracefully input strings that have less than 3 words:
/* Splits by whitespace characters. All characters after the 3rd whitespace,
* if present in the input string, go into the 4th "word", which could really
* be a concanetation of multiple words. For the example in the question, the
* 4th "word" in the result array would be "just an example". Invoking the
* utility method with max-splits specified is slightly more efficient as it
* avoids the need to look for and split by space after the first 3 words have
* been extracted
*/
String[] words = StringUtils.split(greeting, null, 4);
String Z = StringUtils.join((String[]) ArrayUtils.subarray(words, 0, 3), ' ');

Related

Split string after every n words in java and store it in an array

I have a string for example,
String s = "This is a String which needs to be split after every n words";
Suppose I have to divide this string after every 5 words of which the output should be,
Arraylist stringArr = ["This is a String which", "needs to be split after", "every n words"]
How can do this and store it in an array in java
While there isn't a built-in way for Java to do this, it's fairly easy to do using Java's standard regular-expressions.
My example below tries to be clear, rather than trying to be the "best" way.
It's based on finding groups of five "words" followed by a space, based on the regular expression ([a-zA-Z]+ ){5}) which says
• [a-zA-Z]+ find any letters, repeated (+)
• followed by a space
• (...) gather into groups
• {5} exactly 5 times
You may want things besides letters, and you may want to allow multiple spaces or any whitespace, not just spaces, so later in the example I change the regex to (\\S+\\s+){5} where \S means any non-whitespace and \s means any whitespace.
This first goes through the process in the main method, displaying output along the way that, I hope, makes it clear what's going on; then shows how the process could be made into a method.
I create a method that will split a line into groups of n words, then call it to split your string every 5 words then again but every 3 words.
Here it is:
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class LineSplitterExample
{
public static void main(String[] args)
{
String s = "This is a String which needs to be split after every n words";
//Pattern p = Pattern.compile("([a-zA-Z]+ +){5}");
Pattern p = Pattern.compile("(\\S+ +){5}");
Matcher m = p.matcher(s);
int last = 0;
List<String> collected = new ArrayList<>();
while (m.find()) {
System.out.println("Group Count = " + m.groupCount());
for (int i=0; i<m.groupCount(); i++) {
final String found = m.group(i);
System.out.printf("Group %d: %s%n", i, found);
collected.add(found);
// keep track of where the last group ended
last = m.end();
System.out.println("'m.end()' is " + last);
}
}
// collect the final part of the string after the last group
String tail = s.substring(last);
System.out.println(tail);
collected.add(tail);
String[] result = collected.toArray(new String[0]);
System.out.println("result:");
for (int n=0; n<result.length; n++) {
System.out.printf("%2d: %s%n", n, result[n]);
}
// Put a little space after the output
System.out.println("\n");
// Now use the methods...
String[] byFive = splitByWords(s, 5);
displayArray(byFive);
String[] byThree = splitByWords(s, 3);
displayArray(byThree);
}
private static String[] splitByWords(final String s, final int n)
{
//final Pattern p = Pattern.compile("([a-zA-Z]+ +){"+n+"}");
final Pattern p = Pattern.compile("(\\S+\\s+){"+n+"}");
final Matcher m = p.matcher(s);
List<String> collected = new ArrayList<>();
int last = 0;
while (m.find()) {
for (int i=0; i<m.groupCount(); i++) {
collected.add(m.group(i));
last = m.end();
}
}
collected.add(s.substring(last));
return collected.toArray(new String[0]);
}
private static void displayArray(final String[] array)
{
System.out.println("Array:");
for (int i=0; i<array.length; i++) {
System.out.printf("%2d: %s%n", i, array[i]);
}
}
}
The output I got by running this is:
Group Count = 1
Group 0: This is a String which
'm.end()' is 23
Group Count = 1
Group 0: needs to be split after
'm.end()' is 47
every n words
result:
0: This is a String which
1: needs to be split after
2: every n words
Array:
0: This is a String which
1: needs to be split after
2: every n words
Array:
0: This is a
1: String which needs
2: to be split
3: after every n
4: words
You can do it with a combination of replaceAll and split
S{N} - matches N iterations of S
() - regular expression capture group
$1 - back reference to the captured group
Replace every occurrence of N words with that occurrence followed by a special delimiter (in this case ###). Then split on that delimiter.
public static String[] splitNWords(String s, int count) {
String delim = "((?:\\w+\\s+){"+count+"})";
return s.replaceAll(delim, "$1###").split("###");
}
Demo
String s = "This is a String which needs to be split after every n words";
for (int i = 1; i < 5; i++) {
String[] arr = splitNWords(s, i);
System.out.println("Splitting on " + i + " words.");
for (String st : arr) {
System.out.println(st);
}
System.out.println();
}
prints
Splitting on 1 words.
This
is
a
String
which
needs
to
be
split
after
every
n
words
Splitting on 2 words.
This is
a String
which needs
to be
split after
every n
words
Splitting on 3 words.
This is a
String which needs
to be split
after every n
words
Splitting on 4 words.
This is a String
which needs to be
split after every n
words
I dont think there is a split every n words. You need to specify a pattern, like blank space. You can for instance, Split every blank and later iterate over the array created and make another one with tue number of words you want.
Regards

split a string when there is a change in character without a regular expression

There is a way to split a string into repeating characters using a regex function but I want to do it without using it.
for example, given a string like: "EE B" my output will be an array of strings e.g
{"EE", " ", "B"}
my approach is:
given a string I will first find the number of unique characters in a string so I know the size of the array. Then I will change the string to an array of characters. Then I will check if the next character is the same or not. if it is the same then append them together if not begin a new string.
my code so far..
String myinput = "EE B";
char[] cinput = new char[myinput.length()];
cinput = myinput.toCharArray(); //turn string to array of characters
int uniquecha = myinput.length();
for (int i = 0; i < cinput.length; i++) {
if (i != myinput.indexOf(cinput[i])) {
uniquecha--;
} //this should give me the number of unique characters
String[] returninput = new String[uniquecha];
Arrays.fill(returninput, "");
for (int i = 0; i < uniquecha; i++) {
returninput[i] = "" + myinput.charAt(i);
for (int j = 0; j < myinput.length - 1; j++) {
if (myinput.charAt(j) == myinput.charAt(j + 1)) {
returninput[j] += myinput.charAt(j + 1);
} else {
break;
}
}
} return returninput;
but there is something wrong with the second part as I cant figure out why it is not beginning a new string when the character changes.
You question says that you don't want to use regex, but I see no reason for that requirement, other than this is maybe homework. If you are open to using regex here, then there is a one line solution which splits your input string on the following pattern:
(?<=\S)(?=\s)|(?<=\s)(?=\S)
This pattern uses lookarounds to split whenever what precedes is a non whitespace character and what proceeds is a whitespace character, or vice-versa.
String input = "EE B";
String[] parts = input.split("(?<=\\S)(?=\\s)|(?<=\\s)(?=\\S)");
System.out.println(Arrays.toString(parts));
[EE, , B]
^^ a single space character in the middle
Demo
If I understood correctly, you want to split the characters in a string so that similar-consecutive characters stay together. If that's the case, here is how I would do it:
public static ArrayList<String> splitString(String str)
{
ArrayList<String> output = new ArrayList<>();
String combo = "";
//iterates through all the characters in the input
for(char c: str.toCharArray()) {
//check if the current char is equal to the last added char
if(combo.length() > 0 && c != combo.charAt(combo.length() - 1)) {
output.add(combo);
combo = "";
}
combo += c;
}
output.add(combo); //adds the last character
return output;
}
Note that instead of using an array (has a fixed size) to store the output, I used an ArrayList, which has a variable size. Also, instead of checking the next character for equality with the current one, I preferred to use the last character for that. The variable combo is used to temporarily store the characters before they go to output.
Now, here is one way to print the result following your guidelines:
public static void main(String[] args)
{
String input = "EEEE BCD DdA";
ArrayList<String> output = splitString(input);
System.out.print("[");
for(int i = 0; i < output.size(); i++) {
System.out.print("\"" + output.get(i) + "\"");
if(i != output.size()-1)
System.out.print(", ");
}
System.out.println("]");
}
The output when running the above code will be:
["EEEE", " ", "B", "C", "D", " ", "D", "d", "A"]

Get index of 3rd comma

For example I have this string params: Blabla,1,Yooooooo,Stackoverflow,foo,chinese
And I want to get the string testCaseParams until the 3rd comma: Blabla,1,Yooooooo
and then remove it and the comma from the original string so I get thisStackoverflow,foo,chinese
I'm trying this code but testCaseParams only shows the first two values (gets index of the 2nd comma, not 3rd...)
//Get how many parameters this test case has and group the parameters
int amountOfInputs = 3;
int index = params.indexOf(',', params.indexOf(',') + amountOfInputs);
String testCaseParams = params.substring(0,index);
params = params.replace(testCaseParams + ",","");
You can hold the index of the currently-found comma in a variable and iterate until the third comma is found:
int index = 0;
for (int i=0; i<3; i++) index = str.indexOf(',', index);
String left = str.substring(0, index);
String right = str.substring(index+1); // skip comma
Edit: to validate the string, simply check if index == -1. If so, there are not 3 commas in the string.
One option would be a clever use of String#split:
String input = "Blabla,1,Yooooooo,Stackoverflow,foo,chinese";
String[] parts = input.split("(?=,)");
String output = parts[0] + parts[1] + parts[2];
System.out.println(output);
Demo
One can use split with a limit of 4.
String input = "Blabla,1,Yooooooo,Stackoverflow,foo,chinese";
String[] parts = input.split(",", 4);
if (parts.length == 4) {
String first = parts[0] + "," + parts[1] + "," + parts[2];
String second = parts[3]; // "Stackoverflow,foo,chinese"
}
You can split with this regex to get the 2 pats:
String[] parts = input.split("(?<=\\G.*,.*,.*),");
It will result in parts equal to:
{ "Blabla,1,Yooooooo", "Stackoverflow,foo,chinese" }
\\G refers to the previous match or the start of the string.
(?<=) is positive look-behind.
So it means match a comma for splitting, if it is preceded by 2 other commas since the previous match or the start of the string.
This will keep empty strings between commas.
I offer this here just as a "fun" one line solution:
public static int nthIndexOf(String str, String c, int n) {
return str.length() - str.replace(c, "").length() < n ? -1 : n == 1 ? str.indexOf(c) : c.length() + str.indexOf(c) + nthIndexOf(str.substring(str.indexOf(c) + c.length()), c, n - 1);
}
//Usage
System.out.println(nthIndexOf("Blabla,1,Yooooooo,Stackoverflow,foo,chinese", ",", 3)); //17
(It's recursive of course, so will blow up on large strings, it's relatively slow, and certainly isn't a sensible way to do this in production.)
As a more sensbile one liner using a library, you can use Apache commons ordinalIndexOf(), which achieves the same thing in a more sensible way!

Java: Split string by number of characters but with guarantee that string will be split only after whitespace

I want to achieve something like this.
String str = "This is just a sample string";
List<String> strChunks = splitString(str,8);
and strChunks should should be like:
"This is ","just a ","sample ","string."
Please note that string like "sample " have only 7 characters as with 8 characters it will be "sample s" which will break down my next word "string".
Also we can go with the assumption that a word will never be larger than second argument of method (which is 8 in example) because in my use case second argument is always static with value 32000.
The obvious approach that I can think of is looping thru the given string, breaking the string after 8 chars and than searching the next white space from the end. And then repeating same thing again for remaining string.
Is there any more elegant way to achieve the same. Is there any utility method already available in some standard third libraries like Guava, Apache Commons.
Splitting on "(?<=\\G.{7,}\\s)" produces the result that you need (demo).
\\G means the end of previous match; .{7,} means seven or more of any characters; \\s means a space character.
Not a standard method, but this might suit your needs
See it on http://ideone.com/2RFIZd
public static List<String> splitString(String str, int chunksize) {
char[] chars = str.toCharArray();
ArrayList<String> list = new ArrayList<String>();
StringBuilder builder = new StringBuilder();
int count = 0;
for(char character : chars) {
if(count < chunksize - 1) {
builder.append(character);
count++;
}
else {
if(character == ' ') {
builder.append(character);
list.add(builder.toString());
count = 0;
builder.setLength(0);
}
else {
builder.append(character);
count++;
}
}
}
list.add(builder.toString());
builder.setLength(0);
return list;
}
Please note, I used the human notation for string length, because that's what your sample reflects( 8 = postion 7 in string). that's why the chunksize - 1 is there.
This method takes 3 milliseconds on a text the size of http://catdir.loc.gov/catdir/enhancements/fy0711/2006051179-s.html
Splitting String using method 1.
String text="This is just a sample string";
List<String> strings = new ArrayList<String>();
int index = 0;
while (index < text.length()) {
strings.add(text.substring(index, Math.min(index + 8,text.length())));
index += 8;
}
for(String s : strings){
System.out.println("["+s+"]");
}
Splitting String using Method 2
String[] s=text.split("(?<=\\G.{"+8+"})");
for (int i = 0; i < s.length; i++) {
System.out.println("["+s[i]+"]");
}
This uses a hacked reduction to get it done without much code:
String str = "This is just a sample string";
List<String> parts = new ArrayList<>();
parts.add(Arrays.stream(str.split("(?<= )"))
.reduce((a, b) -> {
if (a.length() + b.length() <= 8)
return a + b;
parts.add(a);
return b;
}).get());
See demo using edge case input (that breaks some other answers!)
This splits after each space, then either joins up parts or adds to the list depending on the length of the pair.

Java split string on third comma

I have a string that I need to be split into 2. I want to do this by splitting at exactly the third comma.
How do I do this?
Edit
A sample string is :
from:09/26/2011,type:all,to:09/26/2011,field1:emp_id,option1:=,text:1234
The string will keep the same format - I want everything before field in a string.
If you're simply interested in splitting the string at the index of the third comma, I'd probably do something like this:
String s = "from:09/26/2011,type:all,to:09/26/2011,field1:emp_id,option1:=,text:1234";
int i = s.indexOf(',', 1 + s.indexOf(',', 1 + s.indexOf(',')));
String firstPart = s.substring(0, i);
String secondPart = s.substring(i+1);
System.out.println(firstPart);
System.out.println(secondPart);
Output:
from:09/26/2011,type:all,to:09/26/2011
field1:emp_id,option1:=,text:1234
Related question:
How to find nth occurrence of character in a string?
a naive implementation
public static String[] split(String s)
{
int index = 0;
for(int i = 0; i < 3; i++)
index = s.indexOf(",", index+1);
return new String[] {
s.substring(0, index),
s.substring(index+1)
};
}
This does no bounds checking and will throw all sorts of lovely exceptions if not given input as expected. Given "ABCD,EFG,HIJK,LMNOP,QRSTU" returns ["ABCD,EFG,HIJK","LMNOP,QRSTU"]
You can use this regex:
^([^,]*,[^,]*,[^,]*),(.*)$
The result is then in the two captures (1 and 2), not including the third comma.

Categories

Resources