Regular Expressions - Match Character Not Between Two Strings - java

I've read many questions that ask about finding a regular expression to match characters between two strings, but my problem is the inverse. I'm attempting to create an expression that will match characters NOT between two strings.
Consider the following string.
This is short & [tag]fun & interesting[/tag].
I want to replace any ampersand character that is NOT inside the tag elements with the symbol #. The result should be as shown below.
This is short # [tag]fun & interesting[/tag].
I tried the following regular expression, but unfortunately, it matches the ampersand inside the tag elements.
/(?<!\[tag\])&(?!\[\/tag\])/g
I understand that it matches that ampersand because it's surrounded by characters on either side in the string. But I can't add a random number of characters to check because the lookbehind and lookahead must be fixed length.
Is there a regular expression that will accomplish what I want here?

This does the job even with nested tag:
Find: \[(\w+)\].+?\[/\1\](*SKIP)(*FAIL)|&
Replace: #
Demo & explanation
How it works:
\[(\w+)\].+?\[/\1\] is trying to match opening and closing tag with some data inside
(*SKIP)(*FAIL) if tag is found, then discard it
| else
& match an ampersand. At this point, we are sure it is not inside a tag.
Unfortunately this doesn't work with Java, but this requirement was only added after I answered.

Related

Regex-How to prevent repeated special characters?

I don't have an experience on Regular Expressions. I need to a regular expression which doesn't allow to repeat of special characters (+-*/& etc.)
The string can contain digits, alphanumerics, and special characters.
This should be valid : abc,df
This should be invalid : abc-,df
i will be really appreciated if you can help me ! Thanks for advance.
Two solutions presented so far match a string that is not allowed.
But the tilte is How to prevent..., so I assume that the regex
should match the allowed string. It means that the regex should:
match the whole string if it does not contain 2
consecutive special characters,
not match otherwise.
You can achieve this putting together the following parts:
^ - start of string anchor,
(?!.*[...]{2}) - a negative lookahead for 2 consecutive special
characters (marked here as ...), in any place,
a regex matching the whole (non-empty) string,
$ - end of string anchor.
So the whole regex should be:
^(?!.*[!##$%^&*()\-_+={}[\]|\\;:'",<.>\/?]{2}).+$
Note that within a char class (between [ and ]) a backslash
escaping the following char should be placed before - (if in
the middle of the sequence), closing square bracket,
a backslash itself and / (regex terminator).
Or if you want to apply the regex to individual words (not the whole
string), then the regex should be:
\b(?!\S*[!##$%^&*()\-_+={}[\]|\\;:'",<.>\/?]{2})\S+
[\,\+\-\*\/\&]{2,} Add more characters in the square bracket if you want.
Demo https://regex101.com/r/CBrldL/2
Use the following regex to match the invalid string.
[^A-Za-z0-9]{2,}
[^\w!\s]{2,} This would be a shortest version to match any two consecutive special characters (ignoring space)
If you want to consider space, please use [^\w]{2,}

Filter regex not valid

i have a problem with text pattern in java. I want to use this pattern to validate inserted text to filter text in table. This text can contain "!"(not) and "*"(like?). I have also possibility to use & (and) and |(or) to join logical expressions together with text operations (endsWith, startsWith, contains). What exactly i want is to avoid inserting not validate text. Examples for not valid text
-
!
!*
*!
!!
**
*!
!*
A**
**A
*A&
&A*
And examples for valid text
A*
*A
*-*
!a|b&c|!ab
!a|*b&c*|!*ab
!*aa*|*sdb&casd*|!*aasdb*
!*aa|*sdb&casd*|!*aasdb*
S*|L*
With my pattern "\s*(!?\*?[^&\|]*[&\|])*!?\*?[^&\|]*\*?\s*" both groups are valid. I tried difference combinations but without success. Any ideas?
Your description of what you wan't, and your examples aren't absolutely clear. For example, *-* is valid indicating - is a valid filter character, so why is just a - invalid? That's not logical.
But from what I can gather, you're after something like this:
\s*(?:!?\*?[^&|*!\n]+\*?[&|])*!?\*?[^&|*!\n]+\*?\s*
It uses a non capturing group for the first, optional part. If this is present it ends with a | or a &. Then follows the non optional part, which is the same, except for not having the terminating operator:
An optional !, possibly followed by an optional *. Then any character but a &, |, ! or newline - at least one. Finally an optional * can follow.
See it here at regex101.
Note that not anchoring to start and end (^ and $) implies using java's matches method, that does that for you. Using another method, or another regex flavor, will require those to be added, or this will match basically anything.

Could I give Java a regular expression when java should not split an string?

Can I give the String.split method a parameter which tells it when it must not split the given string? In my particular case, I have text documents with lots of text and symbols. But in every file there are many different symbols. This is what I want to achieve:
string.split(not(A-Z,ß,ä,ö,ü));
So basically, I want String.split to only split whenever it finds a character that is not part of the German set of characters.
I hope you can help me.
There are three tokens in regular expressions that allow you to do exactly what you want to achieve:
[] creates a character class which contains all characters that are listed inside. In your particular case, you'd want this to be [a-zßäöü] as this character group contains all characters a through z, ß, ä, ö and ü.
^ negates the contents of a character class. So, using the character class from above, you'd use [^a-zßäöü] if you wanted to match any character that is not part of the character group.
Additionally, adding (?i) in front of your regular expression causes it to be case insensitive, which allows your expression to match the uppercase letters as well without having to actually add them to your expression.
So, adding those three tokens together, you get the regular expression (?i)[^a-zßäöü]. Now the only thing left is to put them into your String.split method and you're done:
string.split("(?i)[^a-zßäöü]");
Mr.Human,
If I'm understanding your question correctly, you want to split a string on non-German characters?
So,
abcdöyüp
becomes
a, b, c, dö, yü, p
If that is the case, then unfortunately you need to specify the set of characters that are non-German, e.g. [A-Z] to split on. If you are trying to accomplish something other than this, please clarify and/or provide an example.

Finding a simple pattern in a string unless escaped

I have some code that looks for a simple bold markup
private Pattern bold = Pattern.compile("\\*[^\\*]*\\*")
If someone uses: this my *bolded* text - my pattern would find "bolded"
I now need a way to use * not in the context of bolding. So I'd like to allow escaping.
E.g. this my \*non-bolded\* text - should not find any pattern.
Is there a simple way I can change my Regex to achieve this?
You need a negative lookbehind here:
(?<!\\)\*[^*]+(?<!\\)\*
In a Java string, this gives (backslash galore):
"(?<!\\\\)\\*[^*]+(?<!\\\\)\\*"
Note: the star (*) has no special meaning within a character class, therefore there is no need to escape it
Note 2: (?<!...) is a negative lookbehind; it is an anchor, which means it finds a position but consumes no text. Literally, it can be translated as: "find a position where there is no preceding text matching regex ...". Other anchors are:
^: find a position where there is no available input before (ie, can only match at the beginning of the input);
$: find a position where there is no available input after (ie, can only match at the end of the input);
(?=...): find a position where the following text matches regex ... (this is called a positive lookahead);
(?!...): find a position where the following text does not match regex ... (this is called a negative lookahead);
(?<=...): find a position where the preceding text matches regex ... (this is a positive lookbehind);
\<: find a position where the preceding input is either nothing or a character which is not a word character, and the following character is a word character (implementation dependent);
\>: find a position where the following input is either nothing or a character which is not a word character, and the preceding character is a word character (implementation dependent);
\b: either \< or \>.
Note 3: Javascript regexes do not support lookbehinds; neither do they support \< or \>. More information here.
Note 4: with some regex engines, it is possible to alter the meaning of ^ and $ to match positions at the beginning and end of each line instead; in Java, that is Pattern.MULTILINE; in Perl-like regex engines, that is /m.
This negative lookbehind based regex should work for you:
(?<!\\)\*[^*]+\*(?<!\\)
Live Demo: http://www.rubular.com/r/sobKUrkTjP
When translated to Java it will become:
(?<!\\\\)\\*[^*]+\\*(?<!\\\\)
I think the two answers until now are very interesting, but not completely correct. They don't work when a bolded text has escaped asterisk inside (I assume this is almost the main reason to escape asterisks).
For example:
My *bold \*text* here, another *bold*, more \* and *here\* and
\* end* more text
Should find three groups:
*bold \*text*
*bold*
*here\* and \* end*
With a little modification, we can do that, with this regular expression:
(?<!\\)\*([^*\\]|\\\*)+\*
can be tested here:
http://www.rubular.com/r/Jeml02HHYJ
Of course, in Java some more escaping is needed:
(?<!\\\\)\\*([^*\\\\]|\\\\\\*)+\\*

Java: validating a certain string pattern

I am trying to validate a string in a 'iterative way' and all my tryouts just fail!
I find it a bit complicated and i'm guessing maybe you could teach me how to do it right.
I assume that most of you will suggest me to use regex patterns but i dont really know how, and in general, how can a regex be defined for infinite "sets"?
The string i want to validate is
"ANYTHING|NUMBER_ONLY,ANYTHING|NUMBER_ONLY..."
for example: "hello|5,word|10" and "hello|5,word|10," are both valid.
note: I dont mind if the string ends with or without a comma ','.
Kleene star (*) lets you define "infinite sets" in regular expressions. Following pattern should do the trick:
[^,|]+\|\d+(,[^,|]+\|\d+)*,?
A----------B--------------C-
Part A matches the first element. Part B matches any following elements (notice the star). Part C is the optional comma at the end.
WARNING: Remember to escape backslashes in Java string.
I'd suggest splitting your string to array by | delimiter. And validate each part separately. Each part (except first one) should match following pattern \d+(,.*)?
UPDATED
Split by , and validate each part with .*|\d+

Categories

Resources