Java's regex confusion

Java's regex confusion - java

Basically i want to match filename with .json extension but not file that start with . and excluding list.json.
This is what i come out with (without java string escapes)
(?i)^([^\.][^list].+|list.+)\.json$
I had use an online regex tester, Regexplanet to try my regex
http://fiddle.re/x9g86
Everything works fine with the regex tester, however when i tried it in Java. Everything that has the letter l,i,s,t will be excluded... which is very confusing for me.
Can anyone give me some clues?
Many thanks in advance.

I want to match filename with .json extension but not file that start with . and excluding list.json.
I am not sure you need regular expressions for this. I find the following much easier on the eye:
boolean match = s.endsWith(".json") && !s.startsWith(".") && !s.equals("list.json");

You're using a character exclusion class, [^list], which ignores character order and instead of excluding list, excludes any cases of l, i, s, or t.
Instead, you want to use a negative lookahead:
(?i)(?!^list\.json$)[^\.].*\.json

A negative look-ahead will do it.
(?i)(?!\.|list\.json$).*\.json
(?!\.|list\.json$) is a negative look-ahead checking that the characters following is not either list.json followed by the end of the string, or ..
Code:
String regex = "(?i)(?!\\.|list\\.json$).*\\.json";
System.out.println("list.json".matches(regex)); // false
System.out.println(".json".matches(regex)); // false
System.out.println("a.Json".matches(regex)); // true
System.out.println("abc.json".matches(regex)); // true
But NPE's more readable solution is probably preferred.

Related

validating input string "RX-EZ12345678912345B" using regex

I need to validate input string which should be in the below format:
<2_upper_case_letters><"-"><2_upper_case_letters><14-digit number><1_uppercase_letter>
Ex: RX-EZ12345678912345B
I tried something like this ^[IN]-?[A-Z]{0,2}?\\d{0,14}[A-Z]{0,1} but its not giving the expected result.
Any help will be appreciated.
Thanks

Your biggest problem is the [IN] at the beginning, which matches only one letter, and only if it's I or N. If you want to match two of any letters, use [A-Z]{2}.
Once you fix that, your regex will still only match RX-E. That's because [A-Z]{0,2}? starts out trying to consume nothing, thanks to the reluctant quantifier, {0,2}?. Then \d{0,14} matches zero digits, and [A-Z]{0,1} greedily consumes the E.
If you want to match exactly 2 letters and 14 digits, use [A-Z]{2} and \d{14}. And since you're validating the string, you should end the regex with the end anchor, $. Result:
^[A-Z]{2}-[A-Z]{2}\d{14}[A-Z]$
...or, as a Java string literal:
"^[A-Z]{2}-[A-Z]{2}\\d{14}[A-Z]$"
As #nhahtdh observed, you don't really have to use the anchors if you're using Java's matches() method to apply the regex, but I recommend doing so anyway. It communicates your intent better, and it makes the regex portable, in case you have to use it in a different flavor/context.
EDIT: If the first two characters should be exactly IN, it would be
^IN-[A-Z]{2}\d{14}[A-Z]$

Simply translating your requirements into a java regex:
"^[A-Z]{2}-[A-Z]{2}\\d{14}[A-Z]$"
This will allow you to use:
if (!input.matches("^[A-Z]{2}-[A-Z]{2}\\d{14}[A-Z]$")) {
// do something because input is invalid
}

Not sure what you are trying to do at the beginning of your current regex.
"^[A-Z]{2}-[A-Z]{2}\\d{14}[A-Z]$"
The regex above will strictly match the input string as you specified. If you use matches function, ^ and $ may be omitted.
Since you want exact number of repetitions, you should specify it as {<number>} only. {<number>,<number>} is used for variable number of repetitions. And ? specify that the token before may or may not appear - if it must be there, then specifying ? is incorrect.

^[A-Z]{2}-[A-Z]{2}\\d{14}[A-Z]$
This should solve your purpose. You can confirm it from here

This should solve your problem. Check out the validity here
^[A-Z]{2}-[A-Z]{2}[0-9]{14}[A-Z]$

^([A-Z]{2,2}[-]{1,1}[A-Z]{2,2}[0-9]{14,14}[A-Z]{1,1}){1,1}$

Necessary to escape a java regular expression in matches()?

I'm currently doing a test on an HTTP Origin to determine if it came from SSL:
(HttpHeaders.Names.ORIGIN).matches("/^https:\\/\\//")
But I'm finding it's not working. Do I need to escape matches() strings like a regular expression or can I leave it like https://? Is there any way to do a simple string match?
Seems like it would be a simple question, but surprisingly I'm not getting anywhere even after using a RegEx tester http://www.regexplanet.com/advanced/java/index.html. Thanks.

Java's regex doesn't need delimiters. Simply do:
.matches("https://.*")
Note that matches validates the entire input string, hence the .* at the end. And if the input contains line break chars (which . will not match), enable DOT-ALL:
.matches("(?s)https://.*")
Of couse, you could also simply do:
.startsWith("https://")
which takes a plain string (no regex pattern).

How about this Regex:
"^(https:)\/\/.*"
It works in your tester

how to negate any regular expression in Java

I have a regular expression which I want to negate, e.g.
/(.{0,4})
which String.matches returns the following
"/1234" true
"/12" true
"/" true
"" false
"1234" false
"/12345" false
Is there a way to negate (using regx only) to the above so that the results are:
"/1234" false
"/12" false
"/" false
"" true
"1234" true
"/12345" true
I'm looking for a general solution that would work for any regx without re-writing the whole regex.
I have looked at the following
How to negate the whole regex? using (?! pattern), but that doesn't seem to work for me.
The following regx
(?!/(.{0,4}))
returns the following:
"/1234" false
"/12" false
"/" false
"" true
"1234" false
"/12345" false
which is not what I want.
Any help would be appreciated.

You need to add anchors. The original regex (minus the unneeded parentheses):
/.{0,4}
...matches a string that contains a slash followed by zero to four more characters. But, because you're using the matches() method it's automatically anchored, as if it were really:
^/.{0,4}$
To achieve the inverse of that, you can't rely on automatic anchoring; you have to make at least the end anchor explicit within the lookahead. You also have to "pad" the regex with a .* because matches() requires the regex to consume the whole string:
(?!/.{0,4}$).*
But I recommend that you explicitly anchor the whole regex, like so:
^(?!/.{0,4}$).*$
It does no harm, and it makes your intention perfectly clear, especially to people who learned regexes from other flavors like Perl or JavaScript. The automatic anchoring of the matches() method is highly unusual.

I know this is a really old question but hopefully my answer can help anyone looking for this in the future.
While Alan Moore's answer is almost correct. You would need to group the whole regex too, or else you risk anchoring only part of the original regex.
For example if you want to negate the following regex: abc|def (which matches either "abc" or "def"
Prepending (?! and appending $).*. You will end up with (?!abc|def$).*.
The anchor here is only applying to def, meaning that "abcx" will not match when it should.
I would rather prepend (?!(?:and append )$).*.
String negateRegex(String regex) {
return "(?!(?:" + regex + ")$).*";
}
From my testing it looks like negateRegex(negateRegex(regex)) would indeed be functionally the same as regex.

Assuming our regex is MYREG, match other lines with:
^(?:(?!.*MYREG).*)$
Ave Maria.

How to exclude occurrence of a substring from a string using regex?

I have a string input in the following two forms.
1.
<!--XYZdfdjf., 15456, hdfv.4002-->
<!DOCTYPE
2.
<!--XYZdfdjf., 15456, hdfv.4002
<!DOCTYPE
I want to return a match if the form 2 is encountered and no match for the form 1.
Thus basically I want a regex that accepts arbitrarily all characters between <!-- and <!DOCTYPE, except when there is an occurance of --> in between.
I am using Pattern , Matcher and java regex.
Help is sought in terms of a regex specifically usable with Pattern.compile()
Thanks in advance.

Pattern p = Pattern.compile("(?s)<!--(?:(?!-->).)*<!DOCTYPE");
(?:(?!-->).)* matches one character at a time, after checking that it's not the first character of -->.
(?s) sets DOTALL mode (a.k.a. single-line mode), allowing the . to match newline characters.
If there's a possibility of two or more matches and you want to find them individually, you can replace the * with a non-greedy *?, like so:
"(?s)<!--(?:(?!-->).)*?<!DOCTYPE"
For example, applying that regex to the text of your question will find two matches, while the original regex will find one, longer match.

This seems like it is easily solved by using String.contains():
if (yourHtml.contains("-->")) {
// exclude
} else {
// extract the content you need
String content =
yourHtml.substring("<!--".length(), yourHtml.indexOf("<!DOCTYPE"));
}
I think you are looking too far into it.

\<!--([\s\S](?!--\>))*?(?=\<\!DOCTYPE)
this uses a negative lookahead to prevent the --> and a positive lookahead to find the <!DOCTYPE
Here's a good reference for atomic assertions (lookahead and behind).

I don't have a testing system handy so i can't give you the regex but you should look inside the Pattern documentation for something called negative lookahead assertion. This allows you to express rules of the form: Match this if not followed by that.
It should help you :)

A regular expression might not be the best answer to your problem. Have you tried splitting the first line away from everything else and seeing if it contains the -->?
Specifically, something like:
String htmlString;
String firstLine = htmlString.split("\r?\n")[0];
if(firstLine.contains("-->"))
;//no match
//match

How do I write a regular expression to find the following pattern?

I am trying to write a regular expression to do a find and replace operation. Assume Java regex syntax. Below are examples of what I am trying to find:
12341+1
12241+1R1
100001+1R2
So, I am searching for a string beginning with one or more digits, followed by a "1+1" substring, followed by 0 or more characters. I have the following regex:
^(\d+)(1\\+1).*
This regex will successfully find the examples above, however, my goal is to replace the strings with everything before "1+1". So, 12341+1 would become 1234, and 12241+1R1 would become 1224. If I use the first grouped expression $1 to replace the pattern, I get the wrong result as follows:
12341+1 becomes 12341
12241+1R1 becomes 12241
100001+1R2 becomes 100001
Any ideas?

Your existing regex works fine, just that you are missing a \ before \d
String str = "100001+1R2";
str = str.replaceAll("^(\\d+)(1\\+1).*","$1");
Working link

IMHO, the regex is correct.
Perhaps you wrote it wrong in the code. If you want to code the regex ^(\d+)(1\+1).* in a string, you have to write something like String regex = "^(\\d+)(1\\+1).*".
Your output is the result of ^(\d+)(1+1).* replacement, as you miss some backslash in the string (e.g. "^(\\d+)(1\+1).*").

Your regex looks fine to me - I don't have access to java but in JavaScript the code..
"12341+1".replace(/(\d+)(1\+1)/g, "$1");
Returns 1234 as you'd expect. This works on a string with many 'codes' in too e.g.
"12341+1 54321+1".replace(/(\d+)(1\+1)/g, "$1");
gives 1234 5432.

Personally, I wouldn't use a Regex at all (it'd be like using a hammer on a thumbtack), I'd just create a substring from (Pseudocode)
stringName.substring(0, stringName.indexOf("1+1"))
But it looks like other posters have already mentioned the non-greedy operator.
In most Regex Syntaxes you can add a '?' after a '+' or '*' to indicate that you want it to match as little as possible before moving on in the pattern. (Thus: ^(\d+?)(1+1) matches any number of digits until it finds "1+1" and then, NOT INCLUDING the "1+1" it continues matching, whereas your original would see the 1 and match it as well).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java's regex confusion - java

I want to match filename with .json extension but not file that start with . and excluding list.json. I am not sure you need regular expressions for this. I find the following much easier on the eye: boolean match = s.endsWith(".json") && !s.startsWith(".") && !s.equals("list.json");

You're using a character exclusion class, [^list], which ignores character order and instead of excluding list, excludes any cases of l, i, s, or t. Instead, you want to use a negative lookahead: (?i)(?!^list\.json$)[^\.].*\.json

Related

validating input string "RX-EZ12345678912345B" using regex

Necessary to escape a java regular expression in matches()?

how to negate any regular expression in Java

How to exclude occurrence of a substring from a string using regex?

How do I write a regular expression to find the following pattern?

Categories

Resources