How do I check that a Java String is not all whitespaces? - java

I want to check that Java String or character array is not just made up of whitespaces, using Java?
This is a very similar question except it's Javascript:
How can I check if string contains characters & whitespace, not just whitespace?
EDIT: I removed the bit about alphanumeric characters, so it makes more sense.

Shortest solution I can think of:
if (string.trim().length() > 0) ...
This only checks for (non) white space. If you want to check for particular character classes, you need to use the mighty match() with a regexp such as:
if (string.matches(".*\\w.*")) ...
...which checks for at least one (ASCII) alphanumeric character.

I would use the Apache Commons Lang library. It has a class called StringUtils that is useful for all sorts of String operations. For checking if a String is not all whitespaces, you can use the following:
StringUtils.isBlank(<your string>)
Here is the reference: StringUtils.isBlank

Slightly shorter than what was mentioned by Carl Smotricz:
!string.trim().isEmpty();

StringUtils.isBlank(CharSequence)
https://commons.apache.org/proper/commons-lang/javadocs/api-release/org/apache/commons/lang3/StringUtils.html#isBlank-java.lang.CharSequence-

If you are using Java 11 or more recent, the new isBlank string method will come in handy:
!s.isBlank();
If you are using Java 8, 9 or 10, you could build a simple stream to check that a string is not whitespace only:
!s.chars().allMatch(Character::isWhitespace));
In addition to not requiring any third-party libraries such as Apache Commons Lang, these solutions have the advantage of handling any white space character, and not just plain ' ' spaces as would a trim-based solution suggested in many other answers. You can refer to the Javadocs for an exhaustive list of all supported white space types. Note that empty strings are also covered in both cases.

This answer focusses more on the sidenote "i.e. has at least one alphanumeric character". Besides that, it doesn't add too much to the other (earlier) solution, except that it doesn't hurt you with NPE in case the String is null.
We want false if (1) s is null or (2) s is empty or (3) s only contains whitechars.
public static boolean containsNonWhitespaceChar(String s) {
return !((s == null) || "".equals(s.trim()));
}

if(target.matches("\\S"))
// then string contains at least one non-whitespace character
Note use of back-slash cap-S, meaning "non-whitespace char"
I'd wager this is the simplest (and perhaps the fastest?) solution.

If you are only checking for whitespace and don't care about null then you can use org.apache.commons.lang.StringUtils.isWhitespace(String str),
StringUtils.isWhitespace(String str);
(Checks if the String contains only whitespace.)
If you also want to check for null(including whitespace) then
StringUtils.isBlank(String str);

Just an performance comparement on openjdk 13, Windows 10. For each of theese texts:
"abcd"
" "
" \r\n\t"
" ab "
" \n\n\r\t \n\r\t\t\t \r\n\r\n\r\t \t\t\t\r\n\n"
"lorem ipsum dolor sit amet consectetur adipisici elit"
"1234657891234567891324569871234567891326987132654798"
executed one of following tests:
// trim + empty
input.trim().isEmpty()
// simple match
input.matches("\\S")
// match with precompiled pattern
final Pattern PATTERN = Pattern.compile("\\S");
PATTERN.matcher(input).matches()
// java 11's isBlank
input.isBlank()
each 10.000.000 times.
The results:
METHOD min max note
trim: 18 313 much slower if text not trimmed
match: 1799 2010
pattern: 571 662
isBlank: 60 338 faster the earlier hits the first non-whitespace character
Quite surprisingly the trim+empty is the fastest. Even if it needs to construct the trimmed text. Still faster then simple for-loop looking for one single non-whitespaced character...
EDIT:
The longer text, the more numbers differs. Trim of long text takes longer time than just simple loop. However, the regexs are still the slowest solution.

With Java-11+, you can make use of the String.isBlank API to check if the given string is not all made up of whitespace -
String str1 = " ";
System.out.println(str1.isBlank()); // made up of all whitespaces, prints true
String str2 = " a";
System.out.println(str2.isBlank()); // prints false
The javadoc for the same is :
/**
* Returns {#code true} if the string is empty or contains only
* {#link Character#isWhitespace(int) white space} codepoints,
* otherwise {#code false}.
*
* #return {#code true} if the string is empty or contains only
* {#link Character#isWhitespace(int) white space} codepoints,
* otherwise {#code false}
*
* #since 11
*/
public boolean isBlank()

The trim method should work great for you.
http://download.oracle.com/docs/cd/E17476_01/javase/1.4.2/docs/api/java/lang/String.html#trim()
Returns a copy of the string, with
leading and trailing whitespace
omitted. If this String object
represents an empty character
sequence, or the first and last
characters of character sequence
represented by this String object both
have codes greater than '\u0020' (the
space character), then a reference to
this String object is returned.
Otherwise, if there is no character
with a code greater than '\u0020' in
the string, then a new String object
representing an empty string is
created and returned.
Otherwise, let k be the index of the
first character in the string whose
code is greater than '\u0020', and let
m be the index of the last character
in the string whose code is greater
than '\u0020'. A new String object is
created, representing the substring of
this string that begins with the
character at index k and ends with the
character at index m-that is, the
result of this.substring(k, m+1).
This method may be used to trim
whitespace from the beginning and end
of a string; in fact, it trims all
ASCII control characters as well.
Returns: A copy of this string with
leading and trailing white space
removed, or this string if it has no
leading or trailing white space.leading or trailing white space.
You could trim and then compare to an empty string or possibly check the length for 0.

Alternative:
boolean isWhiteSpaces( String s ) {
return s != null && s.matches("\\s+");
}

trim() and other mentioned regular expression do not work for all types of whitespaces
i.e: Unicode Character 'LINE SEPARATOR' http://www.fileformat.info/info/unicode/char/2028/index.htm
Java functions Character.isWhitespace() covers all situations.
That is why already mentioned solution
StringUtils.isWhitespace( String ) /or StringUtils.isBlank(String)
should be used.

StringUtils.isEmptyOrWhitespaceOnly(<your string>)
will check :
- is it null
- is it only space
- is it empty string ""
https://www.programcreek.com/java-api-examples/?class=com.mysql.jdbc.StringUtils&method=isEmptyOrWhitespaceOnly

While personally I would be preferring !str.isBlank(), as others already suggested (or str -> !str.isBlank() as a Predicate), a more modern and efficient version of the str.trim() approach mentioned above, would be using str.strip() - considering nulls as "whitespace":
if (str != null && str.strip().length() > 0) {...}
For example as Predicate, for use with streams, e. g. in a unit test:
#Test
public void anyNonEmptyStrippedTest() {
String[] strings = null;
Predicate<String> isNonEmptyStripped = str -> str != null && str.strip().length() > 0;
assertTrue(Optional.ofNullable(strings).map(arr -> Stream.of(arr).noneMatch(isNonEmptyStripped)).orElse(true));
strings = new String[] { null, "", " ", "\\n", "\\t", "\\r" };
assertTrue(Optional.ofNullable(strings).map(arr -> Stream.of(arr).anyMatch(isNonEmptyStripped)).orElse(true));
strings = new String[] { null, "", " ", "\\n", "\\t", "\\r", "test" };
}

public static boolean isStringBlank(final CharSequence cs) {
int strLen;
if (cs == null || (strLen = cs.length()) == 0) {
return true;
}
for (int i = 0; i < strLen; i++) {
if (!Character.isWhitespace(cs.charAt(i))) {
return false;
}
}
return true;
}

Related

CodingBat - Java - Warmup-2 - "stringYak" algorithm

I'm presently trying to understand a particular algorithm at the CodingBat platform.
Here's the problem presented by CodingBat:
*Suppose the string "yak" is unlucky. Given a string, return a version where all the "yak" are removed, but the "a" can be any char. The "yak" strings will not overlap.
Example outputs:
stringYak("yakpak") → "pak"
stringYak("pakyak") → "pak"
stringYak("yak123ya") → "123ya"*
Here's the official code solution:
public String stringYak(String str) {
String result = "";
for (int i=0; i<str.length(); i++) {
// Look for i starting a "yak" -- advance i in that case
if (i+2<str.length() && str.charAt(i)=='y' && str.charAt(i+2)=='k') {
i = i + 2;
} else { // Otherwise do the normal append
result = result + str.charAt(i);
}
}
return result;
}
I can't make sense of this line of code below. Following the logic, result would only return the character at the index, not the remaining string.
result = result + str.charAt(i);
To me it would make better sense if the code was presented like this below, where the substring function would return the letter of the index and the remaining string afterwards:
result = result + str.substring(i);
What am I missing? Any feedback from anyone would be greatly helpful and thank you for your valuable time.
String concatenation
In order to be on the same page, let's recap how string concatenation works.
When at least one of the operands in the expression with plus sign + is an instance of String, plus sign will be interpreted a string concatenation operator. And the result of the execution of the expression will be a new string created by appending the right operand (or its string representation) to the left operand (or its string representation).
String str = "allow";
char ch = 'h';
Object obj = new Object();
System.out.println(ch + str); // prints "hallow"
System.out.println("test " + obj); // prints "test java.lang.Object#16b98e56"
Explanation of the code-logic
That said, I guess you will agree that this statement concatenates a character at position i in the str to the resulting string and assigns the result of concatenation to the same variable result:
result = result + str.charAt(i);
The condition in the code provided by coding bat ensures whether the index i+2 is valid and then checks characters at indices i and i+2. If they are equal to y and k respectively. If that is not the case, the character will be appended to the resulting string. Athowise it will be discarded and the indexed gets incremented by 2 in order to skip the whole group of characters that constitute "yak" (with a which can be an arbitrary symbol).
So the resulting string is being constructed in the loop character by characters.
Flavors of substring()
Method substring() is overload, there are two flavors of it.
A version that expects two argument: the starting index inclusive, the ending index, exclusivesubstring(int, int).
And you can use it to achieve the same result:
// an equivalent of result = result + str.charAt(i);
result = result + str.substring(i, i + 1);
Another version of this method, that expects one argument will not be useful here. Because the result returned by str.substring(i) will be not a string containing a single character, but a substring staring from the given index, i.e. encompassing all the characters until the end of the string as documentation of substring(int) states:
public String substring(int beginIndex)
Returns a string that is a substring of this string. The substring
begins with the character at the specified index and extends to the
end of this string.
Examples:
"unhappy".substring(2) returns "happy"
"Harbison".substring(3) returns "bison"
"emptiness".substring(9) returns "" (an empty string)
Side note:
This coding-problem was introduced in order to master the basic knowledge of loops and string-operations. But actually the simplest to solve this problem is by using method replaceAll() that expects a regular expression and a replacement-string:
return str.repalaceAll("y.k", "");

Capitalize first letters in words in the string with different separators using java 8 stream

I need to capitalize first letter in every word in the string, BUT it's not so easy as it seems to be as the word is considered to be any sequence of letters, digits, "_" , "-", "`" while all other chars are considered to be separators, i.e. after them the next letter must be capitalized.
Example what program should do:
For input: "#he&llo wo!r^ld"
Output should be: "#He&Llo Wo!R^Ld"
There are questions that sound similar here, but there solutions really don't help.
This one for example:
String output = Arrays.stream(input.split("[\\s&]+"))
.map(t -> t.substring(0, 1).toUpperCase() + t.substring(1))
.collect(Collectors.joining(" "));
As in my task there can be various separators, this solution doesn't work.
It is possible to split a string and keep the delimiters, so taking into account the requirement for delimiters:
word is considered to be any sequence of letters, digits, "_" , "-", "`" while all other chars are considered to be separators
the pattern which keeps the delimiters in the result array would be: "((?<=[^-`\\w])|(?=[^-`\\w]))":
[^-`\\w]: all characters except -, backtick and word characters \w: [A-Za-z0-9_]
Then, the "words" are capitalized, and delimiters are kept as is:
static String capitalize(String input) {
if (null == input || 0 == input.length()) {
return input;
}
return Arrays.stream(input.split("((?<=[^-`\\w])|(?=[^-`\\w]))"))
.map(s -> s.matches("[-`\\w]+") ? Character.toUpperCase(s.charAt(0)) + s.substring(1) : s)
.collect(Collectors.joining(""));
}
Tests:
System.out.println(capitalize("#he&l_lo-wo!r^ld"));
System.out.println(capitalize("#`he`&l+lo wo!r^ld"));
Output:
#He&l_lo-wo!R^Ld
#`he`&L+Lo Wo!R^Ld
Update
If it is needed to process not only ASCII set of characters but apply to other alphabets or character sets (e.g. Cyrillic, Greek, etc.), POSIX class \\p{IsWord} may be used and matching of Unicode characters needs to be enabled using pattern flag (?U):
static String capitalizeUnicode(String input) {
if (null == input || 0 == input.length()) {
return input;
}
return Arrays.stream(input.split("(?U)((?<=[^-`\\p{IsWord}])|(?=[^-`\\p{IsWord}]))")
.map(s -> s.matches("(?U)[-`\\p{IsWord}]+") ? Character.toUpperCase(s.charAt(0)) + s.substring(1) : s)
.collect(Collectors.joining(""));
}
Test:
System.out.println(capitalizeUnicode("#he&l_lo-wo!r^ld"));
System.out.println(capitalizeUnicode("#привет&`ёж`+дос^βιδ/ως"));
Output:
#He&L_lo-wo!R^Ld
#Привет&`ёж`+Дос^Βιδ/Ως
You can't use split that easily - split will eliminate the separators and give you only the things in between. As you need the separators, no can do.
One real dirty trick is to use something called 'lookahead'. That argument you pass to split is a regular expression. Most 'characters' in a regexp have the property that they consume the matching input. If you do input.split("\\s+") then that doesn't 'just' split on whitespace, it also consumes them: The whitespace is no longer part of the individual entries in your string array.
However, consider ^ and $. or \\b. These still match things but don't consume anything. You don't consume 'end of string'. In fact, ^^^hello$$$ matches the string "hello" just as well. You can do this yourself, using lookahead: It matches when the lookahead is there but does not consume it:
String[] args = "Hello World$Huh Weird".split("(?=[\\s_$-]+)");
for (String arg : args) System.out.println("*" + args[i] + "*");
Unfortunately, this 'works', in that it saves your separators, but isn't getting you all that much closer to a solution:
*Hello*
* World*
*$Huh*
* *
* *
* Weird*
You can go with lookbehind as well, but it's limited; they don't do variable length, for example.
The conclusion should rapidly become: Actually, doing this with split is a mistake.
Then, once split is off the table, you should no longer use streams, either: Streams don't do well once you need to know stuff about the previous element in a stream to do the job: A stream of characters doesn't work, as you need to know if the previous character was a non-letter or not.
In general, "I want to do X, and use Y" is a mistake. Keep an open mind. It's akin to asking: "I want to butter my toast, and use a hammer to do it". Oookaaaaayyyy, you can probably do that, but, eh, why? There are butter knives right there in the drawer, just.. put down the hammer, that's toast. Not a nail.
Same here.
A simple loop can take care of this, no problem:
private static final String BREAK_CHARS = "&-_`";
public String toTitleCase(String input) {
StringBuilder out = new StringBuilder();
boolean atBreak = true;
for (char c : input.toCharArray()) {
out.append(atBreak ? Character.toUpperCase(c) : c);
atBreak = Character.isWhitespace(c) || (BREAK_CHARS.indexOf(c) > -1);
}
return out.toString();
}
Simple. Efficient. Easy to read. Easy to modify. For example, if you want to go with 'any non-letter counts', trivial: atBreak = Character.isLetter(c);.
Contrast to the stream solution which is fragile, weird, far less efficient, and requires a regexp that needs half a page's worth of comment for anybody to understand it.
Can you do this with streams? Yes. You can butter toast with a hammer, too. Doesn't make it a good idea though. Put down the hammer!
You can use a simple FSM as you iterate over the characters in the string, with two states, either in a word, or not in a word. If you are not in a word and the next character is a letter, convert it to upper case, otherwise, if it is not a letter or if you are already in a word, simply copy it unmodified.
boolean isWord(int c) {
return c == '`' || c == '_' || c == '-' || Character.isLetter(c) || Character.isDigit(c);
}
String capitalize(String s) {
StringBuilder sb = new StringBuilder();
boolean inWord = false;
for (int c : s.codePoints().toArray()) {
if (!inWord && Character.isLetter(c)) {
sb.appendCodePoint(Character.toUpperCase(c));
} else {
sb.appendCodePoint(c);
}
inWord = isWord(c);
}
return sb.toString();
}
Note: I have used codePoints(), appendCodePoint(int), and int so that characters outside the basic multilingual plane (with code points greater than 64k) are handled correctly.
I need to capitalize first letter in every word
Here is one way to do it. Admittedly this is a might longer but your requirement to change the first letter to upper case (not first digit or first non-letter) required a helper method. Otherwise it would have been easier. Some others seemed to have missed this point.
Establish word pattern, and test data.
String wordPattern = "[\\w_-`]+";
Pattern p = Pattern.compile(wordPattern);
String[] inputData = { "#he&llo wo!r^ld", "0hel`lo-w0rld" };
Now this simply finds each successive word in the string based on the established regular expression. As each word is found, it changes the first letter in the word to upper case and then puts it in a string buffer in the correct position where the match was found.
for (String input : inputData) {
StringBuilder sb = new StringBuilder(input);
Matcher m = p.matcher(input);
while (m.find()) {
sb.replace(m.start(), m.end(),
upperFirstLetter(m.group()));
}
System.out.println(input + " -> " + sb);
}
prints
#he&llo wo!r^ld -> #He&Llo Wo!R^Ld
0hel`lo-w0rld -> 0Hel`lo-W0rld
Since words may start with digits, and the requirement was to convert the first letter (not character) to upper case. This method finds the first letter, converts it to upper case and
returns the new string. So 01_hello would become 01_Hello
public static String upperFirstLetter(String word) {
char[] chs = word.toCharArray();
for (int i = 0; i < chs.length; i++) {
if (Character.isLetter(chs[i])) {
chs[i] = Character.toUpperCase(chs[i]);
break;
}
}
return String.valueOf(chs);
}

Java - removing first character of a string

In Java, I have a String:
Jamaica
I would like to remove the first character of the string and then return amaica
How would I do this?
const str = "Jamaica".substring(1)
console.log(str)
Use the substring() function with an argument of 1 to get the substring from position 1 (after the first character) to the end of the string (leaving the second argument out defaults to the full length of the string).
public String removeFirstChar(String s){
return s.substring(1);
}
In Java, remove leading character only if it is a certain character
Use the Java ternary operator to quickly check if your character is there before removing it. This strips the leading character only if it exists, if passed a blank string, return blankstring.
String header = "";
header = header.startsWith("#") ? header.substring(1) : header;
System.out.println(header);
header = "foobar";
header = header.startsWith("#") ? header.substring(1) : header;
System.out.println(header);
header = "#moobar";
header = header.startsWith("#") ? header.substring(1) : header;
System.out.println(header);
Prints:
blankstring
foobar
moobar
Java, remove all the instances of a character anywhere in a string:
String a = "Cool";
a = a.replace("o","");
//variable 'a' contains the string "Cl"
Java, remove the first instance of a character anywhere in a string:
String b = "Cool";
b = b.replaceFirst("o","");
//variable 'b' contains the string "Col"
Use substring() and give the number of characters that you want to trim from front.
String value = "Jamaica";
value = value.substring(1);
Answer: "amaica"
You can use the substring method of the String class that takes only the beginning index and returns the substring that begins with the character at the specified index and extending to the end of the string.
String str = "Jamaica";
str = str.substring(1);
substring() method returns a new String that contains a subsequence of characters currently contained in this sequence.
The substring begins at the specified start and extends to the character at index end - 1.
It has two forms. The first is
String substring(int FirstIndex)
Here, FirstIndex specifies the index at which the substring will
begin. This form returns a copy of the substring that begins at
FirstIndex and runs to the end of the invoking string.
String substring(int FirstIndex, int endIndex)
Here, FirstIndex specifies the beginning index, and endIndex specifies
the stopping point. The string returned contains all the characters
from the beginning index, up to, but not including, the ending index.
Example
String str = "Amiyo";
// prints substring from index 3
System.out.println("substring is = " + str.substring(3)); // Output 'yo'
you can do like this:
String str = "Jamaica";
str = str.substring(1, title.length());
return str;
or in general:
public String removeFirstChar(String str){
return str.substring(1, title.length());
}
public String removeFirst(String input)
{
return input.substring(1);
}
The key thing to understand in Java is that Strings are immutable -- you can't change them. So it makes no sense to speak of 'removing a character from a string'. Instead, you make a NEW string with just the characters you want. The other posts in this question give you a variety of ways of doing that, but its important to understand that these don't change the original string in any way. Any references you have to the old string will continue to refer to the old string (unless you change them to refer to a different string) and will not be affected by the newly created string.
This has a number of implications for performance. Each time you are 'modifying' a string, you are actually creating a new string with all the overhead implied (memory allocation and garbage collection). So if you want to make a series of modifications to a string and care only about the final result (the intermediate strings will be dead as soon as you 'modify' them), it may make more sense to use a StringBuilder or StringBuffer instead.
I came across a situation where I had to remove not only the first character (if it was a #, but the first set of characters.
String myString = ###Hello World could be the starting point, but I would only want to keep the Hello World. this could be done as following.
while (myString.charAt(0) == '#') { // Remove all the # chars in front of the real string
myString = myString.substring(1, myString.length());
}
For OP's case, replace while with if and it works aswell.
You can simply use substring().
String myString = "Jamaica"
String myStringWithoutJ = myString.substring(1)
The index in the method indicates from where we are getting the result string, in this case we are getting it after the first position because we dont want that "J" in "Jamaica".
Another solution, you can solve your problem using replaceAll with some regex ^.{1} (regex demo) for example :
String str = "Jamaica";
int nbr = 1;
str = str.replaceAll("^.{" + nbr + "}", "");//Output = amaica
My version of removing leading chars, one or multiple. For example, String str1 = "01234", when removing leading '0', result will be "1234". For a String str2 = "000123" result will be again "123". And for String str3 = "000" result will be empty string: "". Such functionality is often useful when converting numeric strings into numbers.The advantage of this solution compared with regex (replaceAll(...)) is that this one is much faster. This is important when processing large number of Strings.
public static String removeLeadingChar(String str, char ch) {
int idx = 0;
while ((idx < str.length()) && (str.charAt(idx) == ch))
idx++;
return str.substring(idx);
}
##KOTLIN
#Its working fine.
tv.doOnTextChanged { text: CharSequence?, start, count, after ->
val length = text.toString().length
if (length==1 && text!!.startsWith(" ")) {
tv?.setText("")
}
}

Remove end of line characters from end of Java String

I have a string which I'd like to remove the end of line characters from the very end of the string only using Java
"foo\r\nbar\r\nhello\r\nworld\r\n"
which I'd like to become
"foo\r\nbar\r\nhello\r\nworld"
(This question is similar to, but not the same as question 593671)
You can use s = s.replaceAll("[\r\n]+$", "");. This trims the \r and \n characters at the end of the string
The regex is explained as follows:
[\r\n] is a character class containing \r and \n
+ is one-or-more repetition of
$ is the end-of-string anchor
References
regular-expressions.info/Anchors, Character Class, Repetition
Related topics
You can also use String.trim() to trim any whitespace characters from the beginning and end of the string:
s = s.trim();
If you need to check if a String contains nothing but whitespace characters, you can check if it isEmpty() after trim():
if (s.trim().isEmpty()) {
//...
}
Alternatively you can also see if it matches("\\s*"), i.e. zero-or-more of whitespace characters. Note that in Java, the regex matches tries to match the whole string. In flavors that can match a substring, you need to anchor the pattern, so it's ^\s*$.
Related questions
regex, check if a line is blank or not
how to replace 2 or more spaces with single space in string and delete leading spaces only
Wouldn't String.trim do the trick here?
i.e you'd call the method .trim() on your string and it should return a copy of that string minus any leading or trailing whitespace.
The Apache Commons Lang StringUtils.stripEnd(String str, String stripChars) will do the trick; e.g.
String trimmed = StringUtils.stripEnd(someString, "\n\r");
If you want to remove all whitespace at the end of the String:
String trimmed = StringUtils.stripEnd(someString, null);
Well, everyone gave some way to do it with regex, so I'll give a fastest way possible instead:
public String replace(String val) {
for (int i=val.length()-1;i>=0;i--) {
char c = val.charAt(i);
if (c != '\n' && c != '\r') {
return val.substring(0, i+1);
}
}
return "";
}
Benchmark says it operates ~45 times faster than regexp solutions.
If you have Google's guava-librariesin your project (if not, you arguably should!) you'd do this with a CharMatcher:
String result = CharMatcher.any("\r\n").trimTrailingFrom(input);
String text = "foo\r\nbar\r\nhello\r\nworld\r\n";
String result = text.replaceAll("[\r\n]+$", "");
"foo\r\nbar\r\nhello\r\nworld\r\n".replaceAll("\\s+$", "")
or
"foo\r\nbar\r\nhello\r\nworld\r\n".replaceAll("[\r\n]+$", "")

codingbat wordEnds using regex

I'm trying to solve wordEnds from codingbat.com using regex.
Given a string and a non-empty word string, return a string made of each char just before and just after every appearance of the word in the string. Ignore cases where there is no char before or after the word, and a char may be included twice if it is between two words.
wordEnds("abcXY123XYijk", "XY") → "c13i"
wordEnds("XY123XY", "XY") → "13"
wordEnds("XY1XY", "XY") → "11"
wordEnds("XYXY", "XY") → "XY"
This is the simplest as I can make it with my current knowledge of regex:
public String wordEnds(String str, String word) {
return str.replaceAll(
".*?(?=word)(?<=(.|^))word(?=(.|$))|.+"
.replace("word", java.util.regex.Pattern.quote(word)),
"$1$2"
);
}
replace is used to place in the actual word string into the pattern for readability. Pattern.quote isn't necessary to pass their tests, but I think it's required for a proper regex-based solution.
The regex has two major parts:
If after matching as few characters as possible ".*?", word can still be found "(?=word)", then lookbehind to capture any character immediately preceding it "(?<=(.|^))", match "word", and lookforward to capture any character following it "(?=(.|$))".
The initial "if" test ensures that the atomic lookbehind captures only if there's a word
Using lookahead to capture the following character doesn't consume it, so it can be used as part of further matching
Otherwise match what's left "|.+"
Groups 1 and 2 would capture empty strings
I think this works in all cases, but it's obviously quite complex. I'm just wondering if others can suggest a simpler regex to do this.
Note: I'm not looking for a solution using indexOf and a loop. I want a regex-based replaceAll solution. I also need a working regex that passes all codingbat tests.
I managed to reduce the occurrence of word within the pattern to just one.
".+?(?<=(^|.)word)(?=(.?))|.+"
I'm still looking if it's possible to simplify this further, but I also have another question:
With this latest pattern, I simplified .|$ to just .? successfully, but if I similarly tried to simplify ^|. to .? it doesn't work. Why is that?
Based on your solution I managed to simplify the code a little bit:
public String wordEnds(String str, String word) {
return str.replaceAll(".*?(?="+word+")(?<=(.|^))"+word+"(?=(.|$))|.+","$1$2");
}
Another way of writing it would be:
public String wordEnds(String str, String word) {
return str.replaceAll(
String.format(".*?(?="+word+")(?<=(.|^))"+word+"(?=(.|$))|.+",word),
"$1$2");
}
With this latest pattern, I simplified .|$ to just .? successfully, but if I similarly tried to simplify ^|. to .? it doesn't work. Why is that?
In Oracle's implementation, the behavior of look-behind is as follow:
By "studying" the regex (with study() method in each node), it knows the maximum length and minimum length of the pattern in look-behind group. (The study() method is what allows for obvious look-behind length)
It verifies the look-behind by starting a match at every position from index (current - min_length) to position (current - max_length) and exits early if the condition is satisfied.
Effectively, it will try to verify the look-behind on the shortest string first.
The implementation multiplies the matching complexity by O(k) factor.
This explains why changing ^|. to .? doesn't work: due to the starting position, it effectively checks for word before .word. The quantifier doesn't have a say here, since the ordering is imposed by the match range.
You can check the code of match method in Pattern.Behind and Pattern.NotBehind inner classes to verify what I said above.
In .NET's flavor, look-behind is likely implemented by the reverse matching feature, which means that no extra factor is incurred on the matching complexity.
My suspicion comes from the fact that the capturing group in (?<=(a+))b matches all a's in aaaaaaaaaaaaaab. The quantifier is shown to have free reign in look-behind group.
I have tested that ^|. can be simplified to .? in .NET and the regex works correctly.
I am working in .NET's regex but I was able to change your pattern to:
.+?(?<=(\w?)word)(?=(\w?))|.+
with the positive results. You know its a word (alphanumeric) type character, why not give a valid hint to the parser of that fact; instead of any character its an optional alpha numeric character.
It may answer why you don't need to specify the anchors of ^ and $, for what exactly is $ - is it \r or \n or other? (.NET has issues with $, and maybe you are not exactly capturing a Null of $, but the null of \r or \n which allowed you to change to .? for $)
Another solution to look at...
public String wordEnds(String str, String word) {
if(str.equals(word)) return "";
int i = 0;
String result = "";
int stringLen = str.length();
int wordLen = word.length();
int diffLen = stringLen - wordLen;
while(i<=diffLen){
if(i==0 && str.substring(i,i+wordLen).equals(word)){
result = result + str.charAt(i+wordLen);
}else if(i==diffLen && str.substring(i,i+wordLen).equals(word)){
result = result + str.charAt(i-1);
}else if(str.substring(i,i+wordLen).equals(word)){
result = result + str.charAt(i-1) + str.charAt(i+wordLen) ;
}
i++;
}
if(result.length()==1) result = result + result;
return result;
}
Another possible solution:
public String wordEnds(String str, String word) {
String result = "";
if (str.contains(word)) {
for (int i = 0; i < str.length(); i++) {
if (str.startsWith(word, i)) {
if (i > 0) {
result += str.charAt(i - 1);
}
if ((i + word.length()) < str.length()) {
result += str.charAt(i + word.length());
}
}
}
}
return result;
}

Categories

Resources