Can regex extract a substring, specifically after another without including the first?

Can regex extract a substring, specifically after another without including the first? - java

(Edit, I said 'digit', I should have said 'alphanumeric char')
How do I extract a postfix from a string from a list of possibles (,X,,Y,,X),),Y). All need to be preceded by a alphanumeric character to be valid but the character is not to be extracted:-
What I am using is \w(,X|,Y|,X\)|\),Y){1}$ but this includes the preceding character (\w) in the extracted value.
(Unit tests pass but it's not sophisticated enough to test the returned match)
https://regex101.com/r/4Ggu7z/5/tests

Converting my comment to an answer.
You can use a negative lookahead instead of matching character in your regex. Here is working regex:
(?<=\w)(,[XY]|,X\)|\),Y)$
RegEx Demo

Related

Regex Match Reset \K Equalent In Java

I have come up with a regex pattern to match a part of a Json value. But only PRCE engine is supporting this. I want to know the Java equalent of this regex.
Simplified version
cif:\K.*(?=(.+?){4})
Matches part of the value, leaving the last 4 characters.
cif:test1234
Matched value will be test
https://regex101.com/r/xV4ZNa/1
Note: I can only define the regex and the replace text. I don't have access to the Java code since it's handle by a propriotery log masking framework.

You can write simplify the pattern to:
(?<=cif:).*(?=....)
Explanation
(?<=cif:) Positive lookbehind, assert cif: to the left
.* Match 0+ times any character without newlines
(?=....) Positive lookahead, assert 4 characters (which can include spaces)
See a regex demo.
If you don't want to match empty strings, then you can use .+ instead
(?<=cif:).+(?=....)

You can use a lookbehind assertion instead:
(?<=cif:).*(?=(.+?){4})
Demo: https://regex101.com/r/xV4ZNa/3

Why I got IllegalStateException here? [duplicate]

I have a string. The end is different, such as index.php?test=1&list=UL or index.php?list=UL&more=1. The one thing I'm looking for is &list=.
How can I match it, whether it's in the middle of the string or it's at the end? So far I've got [&|\?]list=.*?([&|$]), but the ([&|$]) part doesn't actually work; I'm trying to use that to match either & or the end of the string, but the end of the string part doesn't work, so this pattern matches the second example but not the first.

Use:
/(&|\?)list=.*?(&|$)/
Note that when you use a bracket expression, every character within it (with some exceptions) is going to be interpreted literally. In other words, [&|$] matches the characters &, |, and $.

In short
Any zero-width assertions inside [...] lose their meaning of a zero-width assertion. [\b] does not match a word boundary (it matches a backspace, or, in POSIX, \ or b), [$] matches a literal $ char, [^] is either an error or, as in ECMAScript regex flavor, any char. Same with \z, \Z, \A anchors.
You may solve the problem using any of the below patterns:
[&?]list=([^&]*)
[&?]list=(.*?)(?=&|$)
[&?]list=(.*?)(?![^&])
If you need to check for the "absolute", unambiguous string end anchor, you need to remember that is various regex flavors, it is expressed with different constructs:
[&?]list=(.*?)(?=&|$) - OK for ECMA regex (JavaScript, default C++ `std::regex`)
[&?]list=(.*?)(?=&|\z) - OK for .NET, Go, Onigmo (Ruby), Perl, PCRE (PHP, base R), Boost, ICU (R `stringr`), Java/Andorid
[&?]list=(.*?)(?=&|\Z) - OK for Python
Matching between a char sequence and a single char or end of string (current scenario)
The .*?([YOUR_SINGLE_CHAR_DELIMITER(S)]|$) pattern (suggested by João Silva) is rather inefficient since the regex engine checks for the patterns that appear to the right of the lazy dot pattern first, and only if they do not match does it "expand" the lazy dot pattern.
In these cases it is recommended to use negated character class (or bracket expression in the POSIX talk):
[&?]list=([^&]*)
See demo. Details
[&?] - a positive character class matching either & or ? (note the relationships between chars/char ranges in a character class are OR relationships)
list= - a substring, char sequence
([^&]*) - Capturing group #1: zero or more (*) chars other than & ([^&]), as many as possible
Checking for the trailing single char delimiter presence without returning it or end of string
Most regex flavors (including JavaScript beginning with ECMAScript 2018) support lookarounds, constructs that only return true or false if there patterns match or not. They are crucial in case consecutive matches that may start and end with the same char are expected (see the original pattern, it may match a string starting and ending with &). Although it is not expected in a query string, it is a common scenario.
In that case, you can use two approaches:
A positive lookahead with an alternation containing positive character class: (?=[SINGLE_CHAR_DELIMITER(S)]|$)
A negative lookahead with just a negative character class: (?![^SINGLE_CHAR_DELIMITER(S)])
The negative lookahead solution is a bit more efficient because it does not contain an alternation group that adds complexity to matching procedure. The OP solution would look like
[&?]list=(.*?)(?=&|$)
or
[&?]list=(.*?)(?![^&])
See this regex demo and another one here.
Certainly, in case the trailing delimiters are multichar sequences, only a positive lookahead solution will work since [^yes] does not negate a sequence of chars, but the chars inside the class (i.e. [^yes] matches any char but y, e and s).

Regex-How to prevent repeated special characters?

I don't have an experience on Regular Expressions. I need to a regular expression which doesn't allow to repeat of special characters (+-*/& etc.)
The string can contain digits, alphanumerics, and special characters.
This should be valid : abc,df
This should be invalid : abc-,df
i will be really appreciated if you can help me ! Thanks for advance.

Two solutions presented so far match a string that is not allowed.
But the tilte is How to prevent..., so I assume that the regex
should match the allowed string. It means that the regex should:
match the whole string if it does not contain 2
consecutive special characters,
not match otherwise.
You can achieve this putting together the following parts:
^ - start of string anchor,
(?!.*[...]{2}) - a negative lookahead for 2 consecutive special
characters (marked here as ...), in any place,
a regex matching the whole (non-empty) string,
$ - end of string anchor.
So the whole regex should be:
^(?!.*[!##$%^&*()\-_+={}[\]|\\;:'",<.>\/?]{2}).+$
Note that within a char class (between [ and ]) a backslash
escaping the following char should be placed before - (if in
the middle of the sequence), closing square bracket,
a backslash itself and / (regex terminator).
Or if you want to apply the regex to individual words (not the whole
string), then the regex should be:
\b(?!\S*[!##$%^&*()\-_+={}[\]|\\;:'",<.>\/?]{2})\S+

[\,\+\-\*\/\&]{2,} Add more characters in the square bracket if you want.
Demo https://regex101.com/r/CBrldL/2

Use the following regex to match the invalid string.
[^A-Za-z0-9]{2,}

[^\w!\s]{2,} This would be a shortest version to match any two consecutive special characters (ignoring space)
If you want to consider space, please use [^\w]{2,}

How to Java Regex to match everything but specified pattern

I am trying to match everything but garbage values in the entire string.The pattern I am trying to use is:
^.*(?!\w|\s|-|\.|[#:,]).*$
I have been testing the pattern on regexPlanet and this seems to be matching the entire string.The input string I was using was:
Vamsi///#k03#g!!!l.com 123**5
How can I get it to only match everything but the pattern,I would like to replace any string that matches with an empty space or a special charecter of my choice.

The pattern, as written, is supposed to match the whole string.
^ - start of string.
.* - zero or more of any character.
(?!\w|\s|-|\.|[#:,]) - negative look-ahead for some characters.
.* - zero or more of any character.
$ - end of string.
If you only want to match characters which aren't one of the supplied characters, try simply:
[^-\w\s.#:,]
[^...] is a negated character class, it will match any characters not supplied in the brackets. See this for more information.
Test.

Java String validation only one alphanumeric with Regex

I want to do validation for a String which can only contains alphanumeric and only one special character. I tried with (\\W).{1,1}(\\w+).
But it is true only when I start with a special character. But I can have one special character at any place in String.

Use the ^ and $ anchors to instruct the regex engine to start matching from the beginning of the string and stop matching at the end of the string, so taking your regex:
^(\\W).{1,1}(\\w+)$
Please take a look at this Oracle (Java) tutorial on regular expressions.

Try this regexp: \w*\W?\w* (Java string: "\\w*\\W?\\w*")
This expression has a drawback of matching zero-length strings. If your input must have exactly one special character, remove the question mark ? from the expression.

use matcher.find() and not matcher.match() and search for \\w and remove plus (+) because it will match all alphanumeric characters sequence in your string.If your string contains only them, your regex will match whole string.

if I understand your regex correctly, this could solve your problem:
([\w]+)([^\w])([\w]+)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Can regex extract a substring, specifically after another without including the first? - java

Converting my comment to an answer. You can use a negative lookahead instead of matching character in your regex. Here is working regex: (?<=\w)(,[XY]|,X\)|\),Y)$ RegEx Demo

Related

Regex Match Reset \K Equalent In Java

Why I got IllegalStateException here? [duplicate]

Regex-How to prevent repeated special characters?

How to Java Regex to match everything but specified pattern

Java String validation only one alphanumeric with Regex

Categories

Resources