Checking if there is whitespace between two elements in a String

Checking if there is whitespace between two elements in a String - java

I am working with Strings where I need to separate two chars/elements if there is a whitespace between them. I have seen a former post on SO about the same however it still has not worked for me as intended yet. As you would assume, I could just check if the String contains(" ") and then substring around the space. However my strings could possibly contains countless whitespaces at the end despite not having whitespace in between characters. Hence my question is "How do I detect a whitespace between two chars (numbers too) " ?
//Example with numbers in a String
String test = "2 2";
final Pattern P = Pattern.compile("^(\\d [\\d\\d] )*\\d$");
final Matcher m = P.matcher(test);
if (m.matches()) {
System.out.println("There is between space!");
}

You would use String.strip() to remove any leading or trailing whitespace, followed by String.split(). If there is a whitespace, the array will be of length 2 or greater. If there is not, it will be of length 1.
Example:
String test = " 2 2 ";
test = test.strip(); // Removes whitespace, test is now "2 2"
String[] testSplit = test.split(" "); // Splits the string, testSplit is ["2", "2"]
if (testSplit.length >= 2) {
System.out.println("There is whitespace!");
} else {
System.out.println("There is no whitespace");
}
If you need an array of a specified length, you can also specify a limit to split. For example:
"a b c".split(" ", 2); // Returns ["a", "b c"]
If you want a solution that only uses regex, the following regex matches any two groups of characters separated by a single space, with any amount of leading or trailing whitespace:
\s*(\S+\s\S+)\s*

Positive lookahead and lookbehind may also work if you use the regex (?<=\\w)\\s(?=\\w)
\w : a word character [a-zA-Z_0-9]
\\s : whitespace
(?<=\\w)\\s : positive lookbehind, matches if a whitespace preceeded by a \w
\\s(?=\\w) : positive lookahead, matches if a whitespace followed by a \w
List<String> testList = Arrays.asList("2 2", " 245 ");
Pattern p = Pattern.compile("(?<=\\w)\\s(?=\\w)");
for (String str : testList) {
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(str + "\t: There is a space!");
} else {
System.out.println(str + "\t: There is not a space!");
}
}
Output:
2 2 : There is a space!
245 : There is not a space!

The reason you pattern does not work as expected is because ^(\\d [\\d\\d] )*\\d$ which can be simplified to (\\d \\d )*\\d$ starts by repeating 0 or more times what is between the parenthesis.
Then it matches a digit at the end of the string. As the repetition is 0 or more times, it is optional and it would also match just a single digit.
If you want to check if there is a single space between 2 non whitespace chars:
\\S \\S
Regex demo | Java demo
final Pattern P = Pattern.compile("\\S \\S");
final Matcher m = P.matcher(test);
if (m.find()) {
System.out.println("There is between space!");
}

Here is the simplest way you can do it:
String testString = " Find if there is a space. ";
testString.trim(); //This removes all the leading and trailing spaces
testString.contains(" "); //Checks if the string contains a whitespace still
You can also use a shorthand method in one line by chaining the two methods:
String testString = " Find if there is a space. ";
testString.trim().contains(" ");

Use
String text = "2 2";
Matcher m = Pattern.compile("\\S\\s+\\S").matcher(text.trim());
if (m.find()) {
System.out.println("Space detected.");
}
Java code demo.
text.trim() will remove leading and trailing whitespaces, \S\s+\S pattern matches a non-whitespace, then one or more whitespace characters, and then a non-whitespace character again.

Related

Regex to split string based on length and neighboring characters at split index

Suppose this is my sample string:
String s = "Just a sa'mple 'String.";
I want to split this string such that token length will be always less than or equal to 5 characters and also the neighboring characters at the split index are either alphanumeric or white space.
String tokens[] = s.split(<regex>);
So the tokens array here should contain following string tokens:
"Just ","a s","a'mpl","e 'St","ring."
Note that second token "a s" is expected to have only three characters. I can't split second token like "a sa'" or "a sa" will be incorrect as a non alphanumeric character "'" (single quote) will be before split index in first case and it will be after split index in the second case.
I am trying to achieve this using Java.
I have tried iterating on String and decrementing the split index if I find neighbor character of split index other than alphanumeric or white space. I feel that my current approach is making the code complicated and using regex instead will simplify the code.

See whether this code helps
(?<=[\w\d\s.]|^)[\w\d\s.].{0,3}[\w\d\s.](?=[\w\d\s.]|$)
regex101 online test
neighboring characters at the split index are either alphanumeric or white space
I am assuming you want to match . as well instead of only alphanumerics characters on edges, since your last group has it.
In java it would be like this (code provided by regex101)
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "(?<=[\\w\\d\\s.]|^)[\\w\\d\\s.].{0,3}[\\w\\d\\s.](?=[\\w\\d\\s.]|$)";
final String string = "Just a sa'mple 'String.";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
result:
Full match: Just
Full match: a s
Full match: a'mpl
Full match: e 'St
Full match: ring.

Remove leading trailing non numeric characters from a string in Java

I need to strip off all the leading and trailing characters from a string upto the first and last digit respectively.
Example : OBC9187A-1%A
Should return : 9187A-1
How do I achieve this in Java?
I understand regex is the solution, but I am not good at it.
I tried this replaceAll("([^0-9.*0-9])","")
But it returns only digits and strips all the alpha/special characters.

Here is a self-contained example of using regex and java to solve your problem. I would suggest looking at a regex tutorial of some kind here is a nice one.
public static void main(String[] args) throws FileNotFoundException {
String test = "OBC9187A-1%A";
Pattern p = Pattern.compile("\\d.*\\d");
Matcher m = p.matcher(test);
while (m.find()) {
System.out.println("Match: " + m.group());
}
}
Output:
Match: 9187A-1
\d matches any digit .* matches anything 0 or more times \d matches any digit. The reason we use \\d is to escape the \ for Java since \ is a special character...So this regex will match a digit followed by anything followed by another digit. This is greedy so it will take the longest/largest/greediest match so it will get the first and last digit and anything in between. The while loop is there because if there was more than 1 match it would loop through all matches. In this case there can only be 1 match so you can leave the while loop or change to if like this:
if(m.find())
{
System.out.println("Match: " + m.group());
}

This will strip leading and trailing non-digit characters from string s.
String s = "OBC9187A-1%A";
s = s.replaceAll("^\\D+", "").replaceAll("\\D+$", "");
System.out.println(s);
// prints 9187A-1
DEMO
Regex explanation
^\D+
^ assert position at start of the string
\D+ match any character that's not a digit [^0-9]
Quantifier: + Between one and unlimited times, as many times as possible
\D+$
\D+ match any character that's not a digit [^0-9]
Quantifier: + Between one and unlimited times, as many times as possible
$ assert position at end of the string

Extracting both matching and not matching regex

I have a String like this one abc3a de'f gHi?jk I want to split it into the substrings abc3a, de'f, gHi, ? and jk. In other terms, I want to return Strings that match the regular expression [a-zA-Z0-9'] and the Strings that do not match this regular expression. If there is a way to tell whether each resulting substring is a match or not, this will be a plus.
Thanks!

import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class HelloWorld{
public static void main(String []args){
Pattern pattern = Pattern.compile("([a-zA-Z0-9']*)?([^a-zA-Z0-9']*)?");
String str = "abc3a de'f gHi?jk";
Matcher matcher = pattern.matcher(str);
while(matcher.find()){
if(matcher.group(1).length() > 0)
System.out.println("Match:" + matcher.group(1));
if(matcher.group(2).length() > 0)
System.out.println("Miss: `" + matcher.group(2) + "`");
}
}
}
Output:
Match:abc3a
Miss: ` `
Match:de'f
Miss: ` `
Match:gHi
Miss: `?`
Match:jk
If you don't want white space.
Pattern pattern = Pattern.compile("([a-zA-Z0-9']*)?([^a-zA-Z0-9'\\s]*)?");
Output:
Match:abc3a
Match:de'f
Match:gHi
Miss: `?`
Match:jk

You can use this regex:
"[a-zA-Z0-9']+|[^a-zA-Z0-9' ]+"
Will give:
["abc3a", "de'f", "gHi", "?", "jk"]
Online Demo: http://regex101.com/r/xS0qG4
Java code:
Pattern p = Pattern.compile("[a-zA-Z0-9']+|[^a-zA-Z0-9' ]+");
Matcher m = p.matcher("abc3a de'f gHi?jk");
while (m.find())
System.out.println(m.group());
OUTPUT
abc3a
de'f
gHi
?
jk

myString.split("\\s+|(?<=[a-zA-Z0-9'])(?=[^a-zA-Z0-9'\\s])|(?<=[^a-zA-Z0-9'\\s])(?=[a-zA-Z0-9'])")
splits at all the boundaries between runs of characters in that charset.
The lookbehind (?<=...) matches after a character in a run, while the lookahead (?=...) matches before a character in a run of characters outside the set.
The \\s+ is not a boundary match, and matches a run of whitespace characters. This has the effect of removing white-space from the result entirely.
The | allows causing splitting to happy at either boundary or at a run of white-space.
Since the lookbehind and lookahead are both positive, the boundaries will not match at the start or end of the string, so there's no need to ignore empty strings in the output unless there is white-space there.

You can use anchors to split
private static String[] splitString(final String s) {
final String [] arr = s.split("(?=[^a-zA-Z0-9'])|(?<=[^a-zA-Z0-9'])");
final ArrayList<String> strings = new ArrayList<String>(arr.length);
for (final String str : arr) {
if(!"".equals(str.trim())) {
strings.add(str);
}
}
return strings.toArray(new String[strings.size()]);
}
(?=xxx) means xxx will follow here and (?<=xxx) mean xxx precedes this position.
As you did not want to include all-whitespace-matches into the result you need to filter the Array given by split.

Iterating through String with .find() in Java regex

I'm currently trying to solve a problem from codingbat.com with regular expressions.
I'm new to this, so step-by-step explanations would be appreciated. I could solve this with String methods relatively easily, but I am trying to use regular expressions.
Here is the prompt:
Given a string and a non-empty word string, return a string made of each char just before and just after every appearance of the word in the string. Ignore cases where there is no char before or after the word, and a char may be included twice if it is between two words.
wordEnds("abcXY123XYijk", "XY") → "c13i"
wordEnds("XY123XY", "XY") → "13"
wordEnds("XY1XY", "XY") → "11"
etc
My code thus far:
String regex = ".?" + word+ ".?";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
String newStr = "";
while(m.find())
newStr += m.group().replace(word, "");
return newStr;
The problem is that when there are multiple instances of word in a row, the program misses the character preceding the word because m.find() progresses beyond it.
For example: wordEnds("abc1xyz1i1j", "1") should return "cxziij", but my method returns "cxzij", not repeating the "i"
I would appreciate a non-messy solution with an explanation I can apply to other general regex problems.

This is a one-liner solution:
String wordEnds = input.replaceAll(".*?(.)" + word + "(?:(?=(.)" + word + ")|(.).*?(?=$|." + word + "))", "$1$2$3");
This matches your edge case as a look ahead within a non-capturing group, then matches the usual (consuming) case.
Note that your requirements don't require iteration, only your question title assumes it's necessary, which it isn't.
Note also that to be absolutely safe, you should escape all characters in word in case any of them are special "regex" characters, so if you can't guarantee that, you need to use Pattern.quote(word) instead of word.
Here's a test of the usual case and the edge case, showing it works:
public static String wordEnds(String input, String word) {
word = Pattern.quote(word); // add this line to be 100% safe
return input.replaceAll(".*?(.)" + word + "(?:(?=(.)" + word + ")|(.).*?(?=$|." + word + "))", "$1$2$3");
}
public static void main(String[] args) {
System.out.println(wordEnds("abcXY123XYijk", "XY"));
System.out.println(wordEnds("abc1xyz1i1j", "1"));
}
Output:
c13i
cxziij

Use positive lookbehind and postive lookahead which are zero-width assertions
(?<=(.)|^)1(?=(.)|$)
^ ^ ^-looks for a character after 1 and captures it in group2
| |->matches 1..you can replace it with any word
|
|->looks for a character just before 1 and captures it in group 1..this is zero width assertion that doesn't move forward to match.it is just a test and thus allow us to capture the values
$1 and $2 contains your value..Go on finding till the end
So this should be like
String s1 = "abcXY123XYiXYjk";
String s2 = java.util.regex.Pattern.quote("XY");
String s3 = "";
String r = "(?<=(.)|^)"+s2+"(?=(.)|$)";
Pattern p = Pattern.compile(r);
Matcher m = p.matcher(s1);
while(m.find()) s3 += m.group(1)+m.group(2);
//s3 now contains c13iij
works here

Use regex as follows:
Matcher m = Pattern.compile("(.|)" + Pattern.quote(b) + "(?=(.?))").matcher(a);
for (int i = 1; m.find(); c += m.group(1) + m.group(2), i++);
Check this demo.

java regex to accept any word other than none

I need a regular expression to match any string other than none.
I tried using
regular exp ="^[^none]$",
But it does not work.

If you are matching a String against a specific word in Java you should use equals(). In this case you want to invert the match so your logic becomes:
if(!theString.equals("none")) {
// do stuff here
}
Much less resource hungry, and much more intuitive.
If you need to match a String which contains the word "none", you are probably looking for something like:
if(theString.matches("\\bnone\\b")) {
/* matches theString if the substring "none" is enclosed between
* “word boundaries”, so it will not match for example: "nonetheless"
*/
}
Or if you can be fairly certain that “word boundaries” mean a specific delimiter you can still evade regular expressions by using the indexOf() method:
int i = theString.indexOf("none");
if(i > -1) {
if(i > 0) {
// check theString.charAt(i - 1) to see if it is a word boundary
// e.g.: whitespace
}
// the 4 is because of the fact that "none" is 4 characters long.
if((theString.length() - i - 4) > 0) {
// check theString.charAt(i + 4) to see if it is a word boundary
// e.g.: whitespace
}
}
else {
// not found.
}

You can use the regular expression (?!^none$).*. See this question for details: Regex inverse matching on specific string?
The reason "^[^none]$" doesn't work is that you are actually matching all strings except the strings "n", "o", or "e".
Of course, it would be easier to just use String.equals like so: !"none".equals(testString).

Actually this is the regex to match all words except "word":
Pattern regex = Pattern.compile("\\b(?!word\\b)\\w+\\b");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group()
// match start: regexMatcher.start()
// match end: regexMatcher.end()
}
You must use word boundaries so that "word" is not contained in other words.
Explanation:
"
\b # Assert position at a word boundary
(?! # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
Lorem # Match the characters “Lorem” literally
\b # Assert position at a word boundary
)
\w # Match a single character that is a “word character” (letters, digits, etc.)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\b # Assert position at a word boundary
"

This is the regex you are looking for:
Pattern p = Pattern.compile("^(?!none$).*$");
Matcher m = p.matcher("your string");
System.out.println(s + ": " + (m.matches() ? "Match" : "NO Match"));
Having that said, if you are not forced to use a regex that matches everything but "none", the more simple, fast, clear, and easy to write and understand is this:
Pattern p = Pattern.compile("^none$");
Then, you just exclude the matches.
Matcher m = p.matcher("your string");
System.out.println(s + ": " + (m.matches() ? "NO Match" : "Match"));

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Checking if there is whitespace between two elements in a String - java

Related

Regex to split string based on length and neighboring characters at split index

Remove leading trailing non numeric characters from a string in Java

Extracting both matching and not matching regex

Iterating through String with .find() in Java regex

java regex to accept any word other than none

Categories

Resources