Lookahead and lookbehind with regex

Lookahead and lookbehind with regex - java

I am trying to build a regex pattern and I'm a beginner.
The string looks like this
INITIAL TEXT\KEYWORD1\TEXT1\KEYWORD2\TEXT2\KEYWORD3\TEXT3
The string starts with initial text but the keywords with their texts could be in any order or may not be present.
The initial text could contain any character including backslashes.
I want to capture the initial text so I tried something like this
(?<=(.*)(?=\KEYWORD1\|\KEYWORD2\|KEYWORD3).*)
I am able to capture it on regex101 in group1 but my java code doesn't recognize the group 1.
Thanks for helping.

If the string starts with the text you want to capture, then you can use a start-of-string anchor followed by a lazy match on any character, terminating with a forward lookahead to one of the keywords (or end-of-string, to allow for the case with no keywords):
^.*?(?=\\(?:KEYWORD1|KEYWORD2|KEYWORD3)\\|$)
This will match only the INITIAL TEXT
Demo on regex101
Note that in Java you will need to double the backslash characters in the regex string. Demo on ideone

Related

Regular expression for matching texts before and after string

I would like to match URL strings which can be specified in the following manner.
xxx.yyy.com (For example, the regular expression should match all strings like 4xxx.yyy.com, xxx4.yyy.com, xxx.yyy.com, 4xxx4.yyy.com, 444xxx666.yyy.com, abcxxxdef.yyy.com etc).
I have tried to use
([a-zA-Z0-9]+$)xxx([a-zA-Z0-9]+$).yyy.com
([a-zA-Z0-9]*)xxx([a-zA-Z0-9]*).yyy.com
But they don't work. Please help me write a correct regular expression. Thanks in advance.
Note: I'm trying to do this in Java.

If you want to make sure there is xxx and you want to allow all non whitespace chars before and after. If you want to match the whole string, you could add anchors at the start and end.
Note to escape the dot to match it literally.
^\S*xxx\S*\.yyy\.com$
^ Start of string
\S*xxx\S* Match xxx between optional non whitespace chars
\.yyy Match .yyy
\.com Match .com
$ End of string
Regex demo
In Java double escape the backslash
String regex = "^\\S*xxx\\S*\\.yyy\\.com$";
Or specify the characters on the left and right that you would allow to match in the character class:
^[0-9A-Za-z!##$%^&*()_+]*xxx[0-9A-Za-z!##$%^&*()_+]*\.yyy\.com$
Regex demo

regular expression not containing special word in java

Input string
hello sworked? worked hello
output string
I need only work in worked.
I tried with this regex
(?!s)work
But this returns all work in both of sworked? and worked.

To match work not preceded with s use
(?<!s)work
See proof & explanation. (?<!s) is a negative lookbehind disallowing s right before work.
Or, match a word startign with work:
\bwork
See demo. \b is a word boundary.
Add \w* to match the rest of the word if necessary.
In Java, double backslashes (e.g. String regex = "\\bwork";).

What is the regex for any number of spaces followed by one or more integers?

I am trying to make a regex which will match any string which looks like this:
User<spaces><Any positive integer here><spaces>Status:<anything here>
Sample expression - User 1 Status: Not Ready.
Regex pattern - ^[User].*\d+.*[Status:].*$
As you can see, I am using ".*" to incorrectly match spaces. I tried to use \s and [" "] instead, but they did not work. How do I handle spaces or tabs in this regex ?
By the way, I am using https://regex101.com/ with JavaScript regex parser to validate my Regex. I don't know if there is any nice regex helper website just for Java and not JavaScript.
Thanks.

You are using character classes (those things surrounded by []) inappropriately. The []s don't mean "match these characters literally". They mean "match any one character in this list". For most characters, they themselves mean "match this literally".
Also, you seem to want to match User: in your regex, yet in the example you provided, there is no :, just User. Please decide whether or not you want the :.
\s is indeed used to match whitespace. You thought it didn't work probably because your regex has other mistakes, making the whole thing not match.
A corrected version of your regex:
^User\s*\d+\s*Status:.*$
Demo

Finding a simple pattern in a string unless escaped

I have some code that looks for a simple bold markup
private Pattern bold = Pattern.compile("\\*[^\\*]*\\*")
If someone uses: this my *bolded* text - my pattern would find "bolded"
I now need a way to use * not in the context of bolding. So I'd like to allow escaping.
E.g. this my \*non-bolded\* text - should not find any pattern.
Is there a simple way I can change my Regex to achieve this?

You need a negative lookbehind here:
(?<!\\)\*[^*]+(?<!\\)\*
In a Java string, this gives (backslash galore):
"(?<!\\\\)\\*[^*]+(?<!\\\\)\\*"
Note: the star (*) has no special meaning within a character class, therefore there is no need to escape it
Note 2: (?<!...) is a negative lookbehind; it is an anchor, which means it finds a position but consumes no text. Literally, it can be translated as: "find a position where there is no preceding text matching regex ...". Other anchors are:
^: find a position where there is no available input before (ie, can only match at the beginning of the input);
$: find a position where there is no available input after (ie, can only match at the end of the input);
(?=...): find a position where the following text matches regex ... (this is called a positive lookahead);
(?!...): find a position where the following text does not match regex ... (this is called a negative lookahead);
(?<=...): find a position where the preceding text matches regex ... (this is a positive lookbehind);
\<: find a position where the preceding input is either nothing or a character which is not a word character, and the following character is a word character (implementation dependent);
\>: find a position where the following input is either nothing or a character which is not a word character, and the preceding character is a word character (implementation dependent);
\b: either \< or \>.
Note 3: Javascript regexes do not support lookbehinds; neither do they support \< or \>. More information here.
Note 4: with some regex engines, it is possible to alter the meaning of ^ and $ to match positions at the beginning and end of each line instead; in Java, that is Pattern.MULTILINE; in Perl-like regex engines, that is /m.

This negative lookbehind based regex should work for you:
(?<!\\)\*[^*]+\*(?<!\\)
Live Demo: http://www.rubular.com/r/sobKUrkTjP
When translated to Java it will become:
(?<!\\\\)\\*[^*]+\\*(?<!\\\\)

I think the two answers until now are very interesting, but not completely correct. They don't work when a bolded text has escaped asterisk inside (I assume this is almost the main reason to escape asterisks).
For example:
My *bold \*text* here, another *bold*, more \* and *here\* and
\* end* more text
Should find three groups:
*bold \*text*
*bold*
*here\* and \* end*
With a little modification, we can do that, with this regular expression:
(?<!\\)\*([^*\\]|\\\*)+\*
can be tested here:
http://www.rubular.com/r/Jeml02HHYJ
Of course, in Java some more escaping is needed:
(?<!\\\\)\\*([^*\\\\]|\\\\\\*)+\\*

Extract words (multiple whitespace) starting with # by regular expression

I have a problem with my regular expression:
String regex = "(?<=[\\s])#\\w+\\s";
I want a regex that formats a string like this:
"This is a Text #tag1 #tag2 #tag3"
With the regular expression, I get the last two values as result but not tag1 - because there is more than one whitespace. But i want all 3 of them!
I tried some variations, but nothing worked.

Use this regular expression:
(?<=(^|\\S)\\s)#\\w+(?=\\s|$)
Here's a demo.

It's a bit unclear from your question what you're really after, so I've put up some simple alternatives:
To capture all the tags in the string, we can use a lookbehind:
((?<=\\s|^)#\\w+)
To capture all the tags at the end of the string, we can use a lookahead:
(#\\w+(?=\\s#)|#\\w+$)
If there's always three tags at the end, there's no need for a lookaround:
(#\\w+)\s(#\\w+)\s(#\\w+)$

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Lookahead and lookbehind with regex - java

Related

Regular expression for matching texts before and after string

regular expression not containing special word in java

What is the regex for any number of spaces followed by one or more integers?

Finding a simple pattern in a string unless escaped

Extract words (multiple whitespace) starting with # by regular expression

Categories

Resources