Regex to check if String is one word in Java

Regex to check if String is one word in Java - java

I need regex to check if String has only one word (e.g. "This", "Country", "Boston ", " Programming ").
So far I used an alternative way of doing it which is to check if String contains spaces. However, I am sure that this can be done using regex.
One possible way in my opinion is "^\w{2,}\s". Does this work properly? Are there any other possible answers?

The pattern ^\w{2,}\s matches 2 or more word characters from the start of the string, followed by a mandatory whitespace char (that can also match a newline)
As the pattern is also unanchored, it can also match Boston in Boston test
If you want to match a single word with as least 2 characters surrounded by optional horizontal whitespace characters using \h* and add an anchor $ to assert the end of the string.
^\h*\w{2,}\h*$
Regex demo
In Java
String regex = "^\\h*\\w{2,}\\h*$";

Related

Regex pattern matching with multiple strings

Forgive me. I am not familiarized much with Regex patterns.
I have created a regex pattern as below.
String regex = Pattern.quote(value) + ", [NnoneOoff0-9\\-\\+\\/]+|[NnoneOoff0-9\\-\\+\\/]+, "
+ Pattern.quote(value);
This regex pattern is failing with 2 different set of strings.
value = "207e/160";
Use Case 1 -
When channelStr = "207e/160, 149/80"
Then channelStr.matches(regex), returns "true".
Use Case 2 -
When channelStr = "207e/160, 149/80, 11"
Then channelStr.matches(regex), returns "false".
Not able to figure out why? As far I can understand it may be because of the multiple spaces involved when more than 2 strings are present with separated by comma.
Not sure what should be correct pattern I should write for more than 2 strings.
Any help will be appreciated.

If you print your pattern, it is:
\Q207e/160\E, [NnoneOoff0-9\-\+\/]+|[NnoneOoff0-9\-\+\/]+, \Q207e/160\E
It consists of an alternation | matching a mandatory comma as well on the left as on the right side.
Using matches(), should match the whole string and that is the case for 207e/160, 149/80 so that is a match.
Only for this string 207e/160, 149/80, 11 there are 2 comma's, so you do get a partial match for the first part of the string, but you don't match the whole string so matches() returns false.
See the matches in this regex demo.
To match all the values, you can use a repeating pattern:
^[NnoeOf0-9+/-]+(?:,\h*[NnoeOf0-90+/-]+)*$
^ Start of string
[NnoeOf0-9\\+/-]+
(?: Non capture group
,\h* Match a comma and optional horizontal whitespace chars
[NnoeOf0-90-9\\+/-]+ Match 1+ any of the listed in the character class
)* Close the non capture group and optionally repeat it (if there should be at least 1 comma, then the quantifier can be + instead of *)
$ End of string
Regex demo
Example using matches():
String channelStr1 = "207e/160, 149/80";
String channelStr2 = "207e/160, 149/80, 11";
String regex = "^[NnoeOf0-9+/-]+(?:,\\h*[NnoeOf0-90+/-]+)*$";
System.out.println(channelStr1.matches(regex));
System.out.println(channelStr2.matches(regex));
Output
true
true
Note that in the character class you can put - at the end not having to escape it, and the + and / also does not have to be escaped.

You can use regex101 to test your RegEx. it has a description of everything that's going on to help with debugging. They have a quick reference section bottom right that you can use to figure out what you can do with examples and stuff.
A few things, you can add literals with \, so \" for a literal double quote.
If you want the pattern to be one or more of something, you would use +. These are called quantifiers and can be applied to groups, tokens, etc. The token for a whitespace character is \s. So, one or more whitespace characters would be \s+.
It's difficult to tell exactly what you're trying to do, but hopefully pointing you to regex101 will help. If you want to provide examples of the current RegEx you have, what you want to match and then the strings you're using to test it I'll be happy to provide you with an example.

^(?:[NnoneOoff0-9\\-\\+\\/]+ *(?:, *(?!$)|$))+$
^ Start
(?: ... ) Non-capturing group that defines an item and its separator. After each item, except the last, the separator (,) must appear. Spaces (one, several, or none) can appear before and after the comma, which is specified with *. This group can appear one or more times to the end of the string, as specified by the + quantifier after the group's closing parenthesis.
Regex101 Test

Replace white spaces only in part of the string

I have a String like
"This is apple tree"
I want to remove the white spaces available until the word apple.After the change it will be like
"Thisisapple tree"
I need to achieve this in single replace command combined with regular expressions.

For now it looks like you may be looking for
String s = "This is apple tree";
System.out.println(s.replaceAll("\\G(\\S+)(?<!(?<!\\S)apple)\\s", "$1"));
Output: Thisisapple tree.
Explanation:
\G represents either end of previous match or start of input (^) if there was no previous match yet (when we are attempting to find first match)
\S+ represents one or more non-whitespace characters (to match words, including non-alphabetic characters like ' or punctuation)
(?<!(?<!\\S)apple)\\s negative-look-behind will prevent accepting whitespace which has apple before it (I added another negative-look-behind before apple to make sure that it doesn't have any non-whitespace which ensures that this is not part of some other word)
$1 in replacement represents match from group 1 (the one from (\S+)) which represents word. So we are replacing word and spaces with only word (effectively removing spaces)
WARNING: This solution assumes that
sentence doesn't start with space,
words can be separated with only one space.
If we want to get rid of this assumptions we would need something like:
System.out.println(s.replaceAll("^\\s+|\\G(\\S+)(?<!(?<!\\S)apple)\\s+", "$1"));
^\s+ will allow us to match spaces at beginning of string (and replace them with content of group 1 (word) which in this case will be empty, so we will simply remove these whitespaces)
\s+ at the end allows us to match word and one or more spaces after it (to remove them)

A single replace() is unlikely to solve your problem. You could do something like this..
String s[] = "This is an apple tree, not an orange tree".split("apple");
System.out.println(new StringBuilder(s[0].replace(" ","")).append("apple").append(s[1]));

This is achived via lookahead assertion, like this:
String str = "This is an apple tree";
System.out.println(str.replaceAll(" (?=.*apple)", ""));
It means: replace all spaces in front of which there anywhere word apple

If you want to use a regular expression you could try:
Matcher matcher = Pattern.compile("^(.*?\\bapple\\b)(.*)$").matcher("This is an apple but this apple is an orange");
System.out.println((!matcher.matches()) ? "No match" : matcher.group(1).replaceAll(" ", "") + matcher.group(2));
This checks that "apple" is an individual word and not just part of another word such as "snapple". It also splits at the first use of "apple".

Why does this pattern not match? ([\\\\A\\\\W]its[\\\\W\\\\z])

I'm trying to do a replace with this pattern, so I need to match this:
String pattern = "[\\\\A\\\\W]its[\\\\W\\\\z]";
The way I'm interpreting my pattern is: either a beginning of the string OR a non word character like a space or comma, then an "its", then a non word character OR the end of the string.
Why doesn't it match on this "its" inside this string?
its about time
The idea of what this is supposed to do it's supposed to detect incorrectly written words like "its" and fix them to "it's".
Also why do I need so many escape characters in order for the pattern to be accepted by the vm at all?

\\A and \\z are boundary matches. They cannot go inside character classes. If you use them properly, i.e. with two slashes instead of four, regex pattern compiler would throw an exception, because \A or \z cannot go inside [] blocks.
Use straight | syntax with non-capturing groups instead:
String pattern = "(?:\\A|\\W)its(?:\\W|\\z)";
Demo.

Escape symbol while spliting string using regex in java

I have a string that recieved while parsing XML-document:
"ListOfItems/Item[Name='Model/Id']/Price"
And I need to split it by delimeter - "/"
String[] nodes = path.split("/") , but with one condition:
"If backslash presence in name of item, like in an example above, I must skip this block and don't split it."
ie after spliting a must get next array of nodes:
ListOfItems, Item[Name='Model/Id'], Price
How can I do it using regex expression?
Thanks for help!

You can split using this regex:
/(?=(?:(?:[^']*'){2})*[^']*$)
RegEx Demo
This regex basically splits on only forward slashes / that are followed be even number of single quotes, which in other words mean that / inside single quotes are not matched for splitting.

A way consists to use this pattern with the find method and to check if the last match is empty. The advantage is that you don't need to add an additional lookahead to test the string until the end for each possible positions. The items you need are in the capture group 1:
\\G/?((?>[^/']+|'[^']*')*)|$
The \G is an anchor that matches either the start of the string or the position after the previous match. Using this forces all the matchs to be contiguous.
(?>[^/']+|'[^']*')* defines the possible content of an item: all that is not a / or a ', or a string between quotes.
Note that the description of a string between quotes can be improved to deal with escaped quotes: '(?>[^'\\]+|\\.)*' (with the s modifier)
The alternation with the $ is only here to ensure that you have parsed all the string until the end. The capture group 1 of the last match must be empty. If it is null, this means that the global research has stopped before the end (for example in case of unbalanced quotes)
example

capture all characters between match character (single or repeated) on string

I'm trying to extract the string preceding a specific character (even when character is repeated, like this (ie: underscore '_'):
this_is_my_example_line_0
this_is_my_example_line_1_
this_is_my_example_line_2___
_this_is_my_ _example_line_3_
__this_is_my___example_line_4__
and after running my regex I should get this (the regex should ignore the any instances of the matching character in the middle of the string):
this_is_my_example_line_0
this_is_my_example_line_1
this_is_my_example_line_2
this_is_my_ _example_line_3
this_is_my___example_line_4
In other words I'm trying to 'trim' the matched character(s) at the beginning and end of string.
I'm trying to use a Regex in Java to accomplish this, my idea is to capture the group of characters between the special character(s) at the end or beginning of the line.
So far I can only do this successfully for example 3 with this regexp:
/[^_]+|_+(.*)[_$]+|_$+/
[^_]+ not 'underscore' once or more
| OR
_+ underscore once or more
(.*) capture all characters
[_$]+ not 'underscore' once or more followed by end of line
|_$+ OR 'underscore' once or more followed by end of line
I just realized that this excludes the first word of the message on example 0,1,2 since the string doesn't start with underscore and it only starts matching after finding a underscore..
Is there an easier way not involving regex?
I don't really care about the first character (although it would be nice) I only need to ignore the repeating character at the end.. it looks that (by this regex tester) just doing this, would work? /()_+$/ the empty parenthesis matches anything before a single or repeting matches at the end of the line.. would that be correct?
Thank you!

There are a couple of options here, you could either replace matches of ^_+|_+$ with an empty string, or extract the contents of the first capture group from the match of ^_*(.*?)_*$. Note that if your strings may be multiple lines and you want to perform the replacement on each line then you will need to use the Pattern.MULTILINE flag for either approach. If your strings may be multiple lines and you only want to replacement to occur at the very beginning and end, don't use Pattern.MULTILINE but use Pattern.DOTALL for the second approach.
For example: http://regexr.com?355ff

How about [^_\n\r](.*[^_\n\r])??
Demo
String data=
"this_is_my_example_line_0\n" +
"this_is_my_example_line_1_\n" +
"this_is_my_example_line_2___\n" +
"_this_is_my_ _example_line_3_\n" +
"__this_is_my___example_line_4__";
Pattern p=Pattern.compile("[^_\n\r](.*[^_\n\r])?");
Matcher m=p.matcher(data);
while(m.find()){
System.out.println(m.group());
}
output:
this_is_my_example_line_0
this_is_my_example_line_1
this_is_my_example_line_2
this_is_my_ _example_line_3
this_is_my___example_line_4

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex to check if String is one word in Java - java

Related

Regex pattern matching with multiple strings

Replace white spaces only in part of the string

Why does this pattern not match? ([\\\\A\\\\W]its[\\\\W\\\\z])

Escape symbol while spliting string using regex in java

capture all characters between match character (single or repeated) on string

Categories

Resources