String original = "This is a sentence.Rajesh want to test the application for the word split.";
List matchList = new ArrayList();
Pattern regex = Pattern.compile(".{1,10}(?:\\s|$)", Pattern.DOTALL);
Matcher regexMatcher = regex.matcher(original);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group());
}
System.out.println("Match List "+matchList);
I need to parse text into an array of lines that do not exceed 10 characters in length and should not have a break in word at the end of the line.
I used below logic in my scenario but the problem it is parsing to the nearest white space after 10 characters if there is a break at end of line
for eg: The actual sentence is "This is a sentence.Rajesh want to test the application for the word split." But after logic execution its getting as below.
Match List [This is a , nce.Rajesh , want to , test the , pplication , for the , word , split.]
OK, so I've managed to get the following working, with max line length of 10, but also splitting the words that are longer than 10 correctly!
String original = "This is a sentence. Rajesh want to test the applications for the word split handling.";
List matchList = new ArrayList();
Pattern regex = Pattern.compile("(.{1,10}(?:\\s|$))|(.{0,10})", Pattern.DOTALL);
Matcher regexMatcher = regex.matcher(original);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group());
}
System.out.println("Match List "+matchList);
This is the result:
This is a
sentence.
Rajesh want
to test
the
applicatio
ns word
split
handling.
This question was tagged as Groovy at some point. Assuming a Groovy answer is still valid and you are not worried about preserving multiple white spaces (e.g. ' '):
def splitIntoLines(text, maxLineSize) {
def words = text.split(/\s+/)
def lines = ['']
words.each { word ->
def lastLine = (lines[-1] + ' ' + word).trim()
if (lastLine.size() <= maxLineSize)
// Change last line.
lines[-1] = lastLine
else
// Add word as new line.
lines << word
}
lines
}
// Tests...
def original = "This is a sentence. Rajesh want to test the application for the word split."
assert splitIntoLines(original, 10) == [
"This is a",
"sentence.",
"Rajesh",
"want to",
"test the",
"application",
"for the",
"word",
"split."
]
assert splitIntoLines(original, 20) == [
"This is a sentence.",
"Rajesh want to test",
"the application for",
"the word split."
]
assert splitIntoLines(original, original.size()) == [original]
I avoided regex as is doesn't pull the weight. This code word-wraps, and if a single word is more than 10 chars, breaks it. It also takes care of excess whitespace.
import static java.lang.Character.isWhitespace;
public static void main(String[] args) {
final String original =
"This is a sentence.Rajesh want to test the application for the word split.";
final StringBuilder b = new StringBuilder(original.trim());
final List<String> matchList = new ArrayList<String>();
while (true) {
b.delete(0, indexOfFirstNonWsChar(b));
if (b.length() == 0) break;
final int splitAt = lastIndexOfWsBeforeIndex(b, 10);
matchList.add(b.substring(0, splitAt).trim());
b.delete(0, splitAt);
}
System.out.println("Match List "+matchList);
}
static int lastIndexOfWsBeforeIndex(CharSequence s, int i) {
if (s.length() <= i) return s.length();
for (int j = i; j > 0; j--) if (isWhitespace(s.charAt(j-1))) return j;
return i;
}
static int indexOfFirstNonWsChar(CharSequence s) {
for (int i = 0; i < s.length(); i++) if (!isWhitespace(s.charAt(i))) return i;
return s.length();
}
Prints:
Match List [This is a, sentence.R, ajesh, want to, test the, applicatio, n for the, word, split.]
Related
I have a string for example,
String s = "This is a String which needs to be split after every n words";
Suppose I have to divide this string after every 5 words of which the output should be,
Arraylist stringArr = ["This is a String which", "needs to be split after", "every n words"]
How can do this and store it in an array in java
While there isn't a built-in way for Java to do this, it's fairly easy to do using Java's standard regular-expressions.
My example below tries to be clear, rather than trying to be the "best" way.
It's based on finding groups of five "words" followed by a space, based on the regular expression ([a-zA-Z]+ ){5}) which says
• [a-zA-Z]+ find any letters, repeated (+)
• followed by a space
• (...) gather into groups
• {5} exactly 5 times
You may want things besides letters, and you may want to allow multiple spaces or any whitespace, not just spaces, so later in the example I change the regex to (\\S+\\s+){5} where \S means any non-whitespace and \s means any whitespace.
This first goes through the process in the main method, displaying output along the way that, I hope, makes it clear what's going on; then shows how the process could be made into a method.
I create a method that will split a line into groups of n words, then call it to split your string every 5 words then again but every 3 words.
Here it is:
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class LineSplitterExample
{
public static void main(String[] args)
{
String s = "This is a String which needs to be split after every n words";
//Pattern p = Pattern.compile("([a-zA-Z]+ +){5}");
Pattern p = Pattern.compile("(\\S+ +){5}");
Matcher m = p.matcher(s);
int last = 0;
List<String> collected = new ArrayList<>();
while (m.find()) {
System.out.println("Group Count = " + m.groupCount());
for (int i=0; i<m.groupCount(); i++) {
final String found = m.group(i);
System.out.printf("Group %d: %s%n", i, found);
collected.add(found);
// keep track of where the last group ended
last = m.end();
System.out.println("'m.end()' is " + last);
}
}
// collect the final part of the string after the last group
String tail = s.substring(last);
System.out.println(tail);
collected.add(tail);
String[] result = collected.toArray(new String[0]);
System.out.println("result:");
for (int n=0; n<result.length; n++) {
System.out.printf("%2d: %s%n", n, result[n]);
}
// Put a little space after the output
System.out.println("\n");
// Now use the methods...
String[] byFive = splitByWords(s, 5);
displayArray(byFive);
String[] byThree = splitByWords(s, 3);
displayArray(byThree);
}
private static String[] splitByWords(final String s, final int n)
{
//final Pattern p = Pattern.compile("([a-zA-Z]+ +){"+n+"}");
final Pattern p = Pattern.compile("(\\S+\\s+){"+n+"}");
final Matcher m = p.matcher(s);
List<String> collected = new ArrayList<>();
int last = 0;
while (m.find()) {
for (int i=0; i<m.groupCount(); i++) {
collected.add(m.group(i));
last = m.end();
}
}
collected.add(s.substring(last));
return collected.toArray(new String[0]);
}
private static void displayArray(final String[] array)
{
System.out.println("Array:");
for (int i=0; i<array.length; i++) {
System.out.printf("%2d: %s%n", i, array[i]);
}
}
}
The output I got by running this is:
Group Count = 1
Group 0: This is a String which
'm.end()' is 23
Group Count = 1
Group 0: needs to be split after
'm.end()' is 47
every n words
result:
0: This is a String which
1: needs to be split after
2: every n words
Array:
0: This is a String which
1: needs to be split after
2: every n words
Array:
0: This is a
1: String which needs
2: to be split
3: after every n
4: words
You can do it with a combination of replaceAll and split
S{N} - matches N iterations of S
() - regular expression capture group
$1 - back reference to the captured group
Replace every occurrence of N words with that occurrence followed by a special delimiter (in this case ###). Then split on that delimiter.
public static String[] splitNWords(String s, int count) {
String delim = "((?:\\w+\\s+){"+count+"})";
return s.replaceAll(delim, "$1###").split("###");
}
Demo
String s = "This is a String which needs to be split after every n words";
for (int i = 1; i < 5; i++) {
String[] arr = splitNWords(s, i);
System.out.println("Splitting on " + i + " words.");
for (String st : arr) {
System.out.println(st);
}
System.out.println();
}
prints
Splitting on 1 words.
This
is
a
String
which
needs
to
be
split
after
every
n
words
Splitting on 2 words.
This is
a String
which needs
to be
split after
every n
words
Splitting on 3 words.
This is a
String which needs
to be split
after every n
words
Splitting on 4 words.
This is a String
which needs to be
split after every n
words
I dont think there is a split every n words. You need to specify a pattern, like blank space. You can for instance, Split every blank and later iterate over the array created and make another one with tue number of words you want.
Regards
How to flip two words in a sentence in java like
Input: "hi how are you doing today jane"
Output: "how hi you are today doing jane"
what I tried:
String s = "hi how are you doing today jane";
ArrayList<String> al = new ArrayList<>();
String[] splitted = s.split("\\s+");
int n = splitted.length;
for(int i=0; i<n; i++) {
al.add(splitted[i]);
}
for(int i=0; i<n-1; i=i+2) {
System.out.print(al.get(i+1)+" "+al.get(i)+" ");
}
if((n%2) != 0) {
System.out.print(al.get(n - 1));
}
output I'm getting:
"how hiyou aretoday doing"
As you asked to do with only one loop and without extensive use of regex, here is another solution using Collections.swap:
String s = "hi how are you doing today jane";
List<String> splitted = new ArrayList<>(List.of(s.split("\\s+")));
for(int i = 0; i < splitted.size() - 1; i += 2)
Collections.swap(splitted, i, i + 1);
s = String.join(" ", splitted);
System.out.println(s);
Output:
how hi you are today doing jane
Since you're using split() which takes a regex, it would seem that using regex is a valid solution, so use it:
replaceAll("(\\w+)(\\W+)(\\w+)", "$3$2$1")
Explanation
(\\w+) Match first word, and capture it as group 1
(\\W+) Match the characters between the 2 words, and capture them as group 2
(\\w+) Match second word, and capture it as group 3
$3$2$1 Replace the above with the 3 groups in reverse order
Example
System.out.println("hi how are you doing today jane".replaceAll("(\\w+)(\\W+)(\\w+)", "$3$2$1"));
Output
how hi you are today doing jane
Note: Since your code used split("\\s+"), your definition of a "word" is a sequence of non-whitespace characters. To use that definition of a word, change the regex to:
replaceAll("(\\S+)(\\s+)(\\S+)", "$3$2$1")
If you want old-school fori loop and bufor/temp value solution, here you are:
public static void main(String[] args) {
String s = "hi how are you doing today jane";
String flip = flip(s);
System.out.println(flip);
}
private static String flip(String sentence) {
List<String> words = Arrays.asList(sentence.split("\\s+"));
for (int i = 0; i < words.size(); i += 2) {
if (i + 1 < words.size()) {
String tmp = words.get(i + 1);
words.set(i + 1, words.get(i));
words.set(i, tmp);
}
}
return words.stream().map(String::toString).collect(Collectors.joining(" "));
}
However Pauls solultion is way better since it is java, and we are no more in the stone age era :)
How to capitalize the first and last letters of every word in a string
i have done it this way -
String cap = "";
for (int i = 0; i < sent.length() - 1; i++)
{
if (sent.charAt(i + 1) == ' ')
{
cap += Character.toUpperCase(sent.charAt(i)) + " " + Character.toUpperCase(sent.charAt(i + 2));
i += 2;
}
else
cap += sent.charAt(i);
}
cap += Character.toUpperCase(sent.charAt(sent.length() - 1));
System.out.print (cap);
It does not work when the first word is of more than single character
Please use simple functions as i am a beginner
Using apache commons lang library it becomes very easy to do:
String testString = "this string is needed to be 1st and 2nd letter-uppercased for each word";
testString = WordUtils.capitalize(testString);
testString = StringUtils.reverse(testString);
testString = WordUtils.capitalize(testString);
testString = StringUtils.reverse(testString);
System.out.println(testString);
ThiS StrinG IS NeedeD TO BE 1sT AnD 2nD Letter-uppercaseD FoR EacH
WorD
You should rather split your String with a whitespace as character separator, then for each token apply toUpperCase() on the first and the last character and create a new String as result.
Very simple sample :
String cap = "";
String sent = "hello world. again.";
String[] token = sent.split("\\s+|\\.$");
for (String currentToken : token){
String firstChar = String.valueOf(Character.toUpperCase(currentToken.charAt(0)));
String between = currentToken.substring(1, currentToken.length()-1);
String LastChar = String.valueOf(Character.toUpperCase(currentToken.charAt(currentToken.length()-1)));
if (!cap.equals("")){
cap += " ";
}
cap += firstChar+between+LastChar;
}
Of course you should favor the use of StringBuilder over String as you perform many concatenations.
Output result : HellO World. AgaiN
Your code is missing out the first letter of the first word. I would treat this as a special case, i.e.
cap = ""+Character.toUpperCase(sent.charAt(0));
for (int i = 1; i < sent.length() - 1; i++)
{
.....
Of course, there are much easier ways to do what you are doing.
Basically you just need to iterate over all characters and replace them if one of the following conditions is true:
it's the first character
it's the last character
the previous character was a whitespace (or whatever you want, e.g. punctuation - see below)
the next character is a whitespace (or whatever you want, e.g. punctuation - see below)
If you use a StringBuilder for performance and memory reasons (don't create a String in every iteration which += would do) it could look like this:
StringBuilder sb = new StringBuilder( "some words in a list even with longer whitespace in between" );
for( int i = 0; i < sb.length(); i++ ) {
if( i == 0 || //rule 1
i == (sb.length() - 1 ) || //rule 2
Character.isWhitespace( sb.charAt( i - 1 ) ) || //rule 3
Character.isWhitespace( sb.charAt( i + 1 ) ) ) { //rule 4
sb.setCharAt( i, Character.toUpperCase( sb.charAt( i ) ) );
}
}
Result: SomE WordS IN A LisT EveN WitH LongeR WhitespacE IN BetweeN
If you want to check for other rules as well (e.g. punctuation etc.) you could create a method that you call for the previous and next character and which checks for the required properties.
String stringToSearch = "this string is needed to be first and last letter uppercased for each word";
// First letter upper case using regex
Pattern firstLetterPtn = Pattern.compile("(\\b[a-z]{1})+");
Matcher m = firstLetterPtn.matcher(stringToSearch);
StringBuffer sb = new StringBuffer();
while(m.find()){
m.appendReplacement(sb,m.group().toUpperCase());
}
m.appendTail(sb);
stringToSearch = sb.toString();
sb.setLength(0);
// Last letter upper case using regex
Pattern LastLetterPtn = Pattern.compile("([a-z]{1}\\b)+");
m = LastLetterPtn.matcher(stringToSearch);
while(m.find()){
m.appendReplacement(sb,m.group().toUpperCase());
}
m.appendTail(sb);
System.out.println(sb.toString());
output:
ThiS StrinG IS NeedeD TO BE FirsT AnD LasT LetteR UppercaseD FoR EacH WorD
I was given a long text in which I need to find all the text that are embedded in a pair of & (For example, in a text "&hello&&bye&", I need to find the words "hello" and "bye").
I try using the regex ".*&([^&])*&.*" but it doesn't work, I don't know what's wrong with that.
Any help?
Thanks
Try this way
String data = "&hello&&bye&";
Matcher m = Pattern.compile("&([^&]*)&").matcher(data);
while (m.find())
System.out.println(m.group(1));
output:
hello
bye
No regex needed. Just iterate!
boolean started = false;
List<String> list;
int startIndex;
for(int i = 0; i < string.length(); ++i){
if(string.charAt(i) != '&')
continue;
if(!started) {
started = true;
startIndex = i + 1;
}
else {
list.add(string.substring(startIndex, i)); // maybe some +-1 here in indices
}
started = !started;
}
or use split!
String[] parts = string.split("&");
for(int i = 1; i < parts.length; i += 2) { // every second
list.add(parts[i]);
}
If you don't want to use regular expressions, here's a simple way.
String string = "xyz...." // the string containing "hello", "bye" etc.
String[] tokens = string.split("&"); // this will split the string into an array
// containing tokens separated by "&"
for(int i=0; i<tokens.length; i++)
{
String token = tokens[i];
if(token.length() > 0)
{
// handle edge case
if(i==tokens.length-1)
{
if(string.charAt(string.length()-1) == '&')
System.out.println(token);
}
else
{
System.out.println(token);
}
}
}
Two problems:
You're repeating the capturing group. This means that you'll only catch the last letter between &s in the group.
You will only match the last word because the .*s will gobble up the rest of the string.
Use lookarounds instead:
(?<=&)[^&]+(?=&)
Now the entire match will be hello (and bye when you apply the regex for the second time) because the surrounding &s won't be part of the match any more:
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("(?<=&)[^&]+(?=&)");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group());
}
The surrounding .* don't make sense and are unproductive. Just &([^&])*& is sufficient.
I would simplify it even further.
Check that the first char is &
Check that the last char is &
String.split("&&") on the substring between them
In code:
if (string.length < 2)
throw new IllegalArgumentException(string); // or return[], whatever
if ( (string.charAt(0) != '&') || (string.charAt(string.length()-1) != '&')
// handle this, too
String inner = string.substring(1, string.length()-1);
return inner.split("&&");
I have following String and i want to split this string into number of sub strings(by taking ',' as a delimeter) when its length reaches 36. Its not exactly splitting on 36'th position
String message = "This is some(sampletext), and has to be splited properly";
I want to get the output as two substrings follows:
1. 'This is some (sampletext)'
2. 'and has to be splited properly'
Thanks in advance.
A solution based on regex:
String s = "This is some sample text and has to be splited properly";
Pattern splitPattern = Pattern.compile(".{1,15}\\b");
Matcher m = splitPattern.matcher(s);
List<String> stringList = new ArrayList<String>();
while (m.find()) {
stringList.add(m.group(0).trim());
}
Update:
trim() can be droped by changing the pattern to end in space or end of string:
String s = "This is some sample text and has to be splited properly";
Pattern splitPattern = Pattern.compile("(.{1,15})\\b( |$)");
Matcher m = splitPattern.matcher(s);
List<String> stringList = new ArrayList<String>();
while (m.find()) {
stringList.add(m.group(1));
}
group(1) means that I only need the first part of the pattern (.{1,15}) as output.
.{1,15} - a sequence of any characters (".") with any length between 1 and 15 ({1,15})
\b - a word break (a non-character before of after any word)
( |$) - space or end of string
In addition I've added () surrounding .{1,15} so I can use it as a whole group (m.group(1)).
Depending on the desired result, this expression can be tweaked.
Update:
If you want to split message by comma only if it's length would be over 36, try the following expression:
Pattern splitPattern = Pattern.compile("(.{1,36})\\b(,|$)");
The best solution I can think of is to make a function that iterates through the string. In the function you could keep track of whitespace characters, and for each 16th position you could add a substring to a list based on the position of the last encountered whitespace. After it has found a substring, you start anew from the last encountered whitespace. Then you simply return the list of substrings.
Here's a tidy answer:
String message = "This is some sample text and has to be splited properly";
String[] temp = message.split("(?<=^.{1,16}) ");
String part1 = message.substring(0, message.length() - temp[temp.length - 1].length() - 1);
String part2 = message.substring(message.length() - temp[temp.length - 1].length());
This should work on all inputs, except when there are sequences of chars without whitespace longer than 16. It also creates the minimum amount of extra Strings by indexing into the original one.
public static void main(String[] args) throws IOException
{
String message = "This is some sample text and has to be splited properly";
List<String> result = new ArrayList<String>();
int start = 0;
while (start + 16 < message.length())
{
int end = start + 16;
while (!Character.isWhitespace(message.charAt(end--)));
result.add(message.substring(start, end + 1));
start = end + 2;
}
result.add(message.substring(start));
System.out.println(result);
}
If you have a simple text as the one you showed above (words separated by blank spaces) you can always think of StringTokenizer. Here's some simple code working for your case:
public static void main(String[] args) {
String message = "This is some sample text and has to be splited properly";
while (message.length() > 0) {
String token = "";
StringTokenizer st = new StringTokenizer(message);
while (st.hasMoreTokens()) {
String nt = st.nextToken();
String foo = "";
if (token.length()==0) {
foo = nt;
}
else {
foo = token + " " + nt;
}
if (foo.length() < 16)
token = foo;
else {
System.out.print("'" + token + "' ");
message = message.substring(token.length() + 1, message.length());
break;
}
if (!st.hasMoreTokens()) {
System.out.print("'" + token + "' ");
message = message.substring(token.length(), message.length());
}
}
}
}