Regex for text in brackets which contains no specific words

Regex for text in brackets which contains no specific words - java

I found some expressions like
\((.*?)\)
which works good to find any text in brackets with brackets like (text)
and
^((?!.*(word1|word2|word3)).*)+
to find text which contains no one specific word like word4abcd and not for example word1 test
How to merge them to find text in brackets which not contains these words like in (example) and not like (example word2)?
Thanks in advance

The first regular expression uses a reluctant quantifier *? to make sure that the first available closing bracket is matched after the opening bracket. The second regular expression uses a zero-width negative look-ahead group (that's the (?!...) construction) to prevent matching certain words. To combine these tricks we're looking at something like this:
\(((?!something).*?)\)
The question is what goes in the place of the something. Simply putting .*(word1|word2) there will not work: this will also forbid word1 or word2 outside the brackets. Replacing .* by .*? does not change that. What does work is [^)]*(word1|word2) which will match any sequence of characters unequal to ) followed by word1 or word2.
The resultant expression then is
\(((?![^)]*(word1|word2)).*?)\)
which will match a bracketed expression that does not contain word1 or word2.

Question isn't very clear but I am giving an example that might help you.
Look at this regex:
/\((?!.*?\bexample\s+word\b[^)]*\)).*?\bexample\b.*?\)/
This matches word example inside square brackets. match will happen if there is no example word in the square brackets.
\b has been used for word boundaries so that examples is not matched instead.

Maybe this it what you want?
^\(((?!.*word1)(\S)*)|(\3)\)
Match: (wordwordexample)
Not Match: (word wordexample)
(word1wordexample)

Related

What is the regex for any number of spaces followed by one or more integers?

I am trying to make a regex which will match any string which looks like this:
User<spaces><Any positive integer here><spaces>Status:<anything here>
Sample expression - User 1 Status: Not Ready.
Regex pattern - ^[User].*\d+.*[Status:].*$
As you can see, I am using ".*" to incorrectly match spaces. I tried to use \s and [" "] instead, but they did not work. How do I handle spaces or tabs in this regex ?
By the way, I am using https://regex101.com/ with JavaScript regex parser to validate my Regex. I don't know if there is any nice regex helper website just for Java and not JavaScript.
Thanks.

You are using character classes (those things surrounded by []) inappropriately. The []s don't mean "match these characters literally". They mean "match any one character in this list". For most characters, they themselves mean "match this literally".
Also, you seem to want to match User: in your regex, yet in the example you provided, there is no :, just User. Please decide whether or not you want the :.
\s is indeed used to match whitespace. You thought it didn't work probably because your regex has other mistakes, making the whole thing not match.
A corrected version of your regex:
^User\s*\d+\s*Status:.*$
Demo

Find and replace characters in brackets

I have a string kind of:
String text = "(plum) some other words, [apple], another words {pear}.";
I have to find and replace the words in brackets, don't replacing the brackets themselves.
If I write:
text = text.replaceAll("(\\(|\\[|\\{).*?(\\)|\\]|\\})", "fruit");
I get:
fruit some other words, fruit, another words fruit.
So the brackets went away with the fruits, but I need to keep them.
Desired output:
(fruit) some other words, [fruit], another words {fruit}.

Here is your regex:
(?<=[({\[\(])[A-Za-z]*(?=[}\]\)])
Test it here:
https://regex101.com/
In order to use it in Java, remember to add second backslashes:
(?<=[({\\[\\(])[A-Za-z]*(?=[}\\]\\)])
It matches 0 or more letters (uppercase or lowercase) preceded by either of these [,{,( and followed by either of these ],},).
If you want to have at least 1 letter between brackets just replace '*' with '+' like this:
(?<=[({\[\(])[A-Za-z]+(?=[}\]\)])

GCP showed how to use look aheads and look behinds to exclude the brackets from the matched part. But you can also match them, and refer to them in your replacement string with capturing groups:
text.replaceAll("([\\(\\[\\{]).*?([\\)\\]\\}])", "$1fruit$2");
Also note that you can replace the | ORs by a character group [].

match whole sentence with regex

I'm trying to match sentences without capital letters with regex in Java:
"Hi this is a test" -> Shouldn't match
"hi thiS is a test" -> Shouldn't match
"hi this is a test" -> Should match
I've tried the following regex, but it also matches my second example ("hi, thiS is a test").
[a-z]+
It seems like it's only looking at the first word of the sentence.
Any help?

[a-z]+ will match if your string contains any lowercase letter.
If you want to make sure your string doesn't contain uppercase letters, you could use a negative character class: ^[^A-Z]+$
Be aware that this won't handle accentuated characters (like É) though.
To make this work, you can use Unicode properties: ^\P{Lu}+$
\P means is not in Unicode category, and Lu is the uppercase letter that has a lowercase variant category.

^[a-z ]+$
Try this.This will validate the right ones.

It's not matching because you haven't used a space in the match pattern, so your regex is only matching whole words with no spaces.
try something like ^[a-z ]+$ instead (notice the space is the square brackets) you can also use \s which is shorthand for 'whitespace characters' but this can also include things like line feeds and carriage returns so just be aware.
This pattern does the following:
^ matches the start of a string
[a-z ]+ matches any a-z character or a space, where 1 or more exists.
$ matches the end of the string.

I would actually advise against regex in this case, since you don't seem to employ extended characters.
Instead try to test as following:
myString.equals(myString.toLowerCase());

Stop regular expression from matching across lines

I have a regular expression,
end\\s+[a-zA-Z]{1}[a-zA-Z_0-9]
which is supposed to match a line with the specifications
end abcdef123
where abcdef123 must start with a letter and subsequent alphanumeric characters.
However currently it is also matching this
foobar barfooend
bar fred bob
It's picking up that end at the end of barfooend and also picking up bar in effect returning end bar as a legitimate result.
I tried
^end\\s+[a-zA-Z]{1}[a-zA-Z_0-9]
but that doesn't seem to work at all. It ends up matching nothing.
It should be fairly simple but I can't seem to nut it out.

\s includes also newline characters. So you either need to specify a character class that has only the wanted whitespace charaters or exclude the not wanted.
Use instead of \\s+ one of those:
[^\\S\r\n] this includes all whitespace but not \r and \n. See end[^\S\r\n]+[a-zA-Z][a-zA-Z_0-9]+ here on Regexr
[ \t] this includes only space and tab. See end[ \t]+[a-zA-Z][a-zA-Z_0-9]+ here on Regexr

You can use \b (word boundary detection) to check a word boundary. In our case we will use it to match the beginning of the word end. It can also be used to match the end of a word.
As #nhahtdh stated in his comment the {1} is redundant as [a-zA-Z] already matches one letter in the given range.
Also your regex does not do what you want because it only matches one alphanumeric character after the first letter. Add a + at the end (for one or more times) or * (for zero or more times).
This should work:
"\\bend\\s+[a-zA-Z]{1}[a-zA-Z_0-9]*"
Edit : I think \b is better than ^ because the latter only matches the beginning of a line.
For example take this input : "end azd123 end bfg456" There will be only one match for ^ when \b will help matching both.

Try the regular expression:
end[ ]+[a-zA-Z]\w+
\w is a word character: [a-zA-Z_0-9]

Finding a simple pattern in a string unless escaped

I have some code that looks for a simple bold markup
private Pattern bold = Pattern.compile("\\*[^\\*]*\\*")
If someone uses: this my *bolded* text - my pattern would find "bolded"
I now need a way to use * not in the context of bolding. So I'd like to allow escaping.
E.g. this my \*non-bolded\* text - should not find any pattern.
Is there a simple way I can change my Regex to achieve this?

You need a negative lookbehind here:
(?<!\\)\*[^*]+(?<!\\)\*
In a Java string, this gives (backslash galore):
"(?<!\\\\)\\*[^*]+(?<!\\\\)\\*"
Note: the star (*) has no special meaning within a character class, therefore there is no need to escape it
Note 2: (?<!...) is a negative lookbehind; it is an anchor, which means it finds a position but consumes no text. Literally, it can be translated as: "find a position where there is no preceding text matching regex ...". Other anchors are:
^: find a position where there is no available input before (ie, can only match at the beginning of the input);
$: find a position where there is no available input after (ie, can only match at the end of the input);
(?=...): find a position where the following text matches regex ... (this is called a positive lookahead);
(?!...): find a position where the following text does not match regex ... (this is called a negative lookahead);
(?<=...): find a position where the preceding text matches regex ... (this is a positive lookbehind);
\<: find a position where the preceding input is either nothing or a character which is not a word character, and the following character is a word character (implementation dependent);
\>: find a position where the following input is either nothing or a character which is not a word character, and the preceding character is a word character (implementation dependent);
\b: either \< or \>.
Note 3: Javascript regexes do not support lookbehinds; neither do they support \< or \>. More information here.
Note 4: with some regex engines, it is possible to alter the meaning of ^ and $ to match positions at the beginning and end of each line instead; in Java, that is Pattern.MULTILINE; in Perl-like regex engines, that is /m.

This negative lookbehind based regex should work for you:
(?<!\\)\*[^*]+\*(?<!\\)
Live Demo: http://www.rubular.com/r/sobKUrkTjP
When translated to Java it will become:
(?<!\\\\)\\*[^*]+\\*(?<!\\\\)

I think the two answers until now are very interesting, but not completely correct. They don't work when a bolded text has escaped asterisk inside (I assume this is almost the main reason to escape asterisks).
For example:
My *bold \*text* here, another *bold*, more \* and *here\* and
\* end* more text
Should find three groups:
*bold \*text*
*bold*
*here\* and \* end*
With a little modification, we can do that, with this regular expression:
(?<!\\)\*([^*\\]|\\\*)+\*
can be tested here:
http://www.rubular.com/r/Jeml02HHYJ
Of course, in Java some more escaping is needed:
(?<!\\\\)\\*([^*\\\\]|\\\\\\*)+\\*

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex for text in brackets which contains no specific words - java

Maybe this it what you want? ^\(((?!.word1)(\S))|(\3)\) Match: (wordwordexample) Not Match: (word wordexample) (word1wordexample)

Related

What is the regex for any number of spaces followed by one or more integers?

Find and replace characters in brackets

match whole sentence with regex

Stop regular expression from matching across lines

Finding a simple pattern in a string unless escaped

Categories

Resources

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex for text in brackets which contains no specific words - java

Maybe this it what you want? ^\(((?!.*word1)(\S)*)|(\3)\) Match: (wordwordexample) Not Match: (word wordexample) (word1wordexample)

Related

What is the regex for any number of spaces followed by one or more integers?

Find and replace characters in brackets

match whole sentence with regex

Stop regular expression from matching across lines

Finding a simple pattern in a string unless escaped

Categories

Resources

Maybe this it what you want? ^\(((?!.word1)(\S))|(\3)\) Match: (wordwordexample) Not Match: (word wordexample) (word1wordexample)