Invalid Java Regex for JSON - java

For some reason this regex keeps saying it is invalid in java, but on the online testers it works fine:
({.+?})
I am using it for a JSON data structure.
Is there a way to get it to work with Java?
Included link: http://regexr.com/3bs0p

Here are my 2 cents:
You need to escape the opening { in Java regex to tell the engine it is not the brace starting the limiting quantifier.
Do not use regexr.com to test Java regexps, use OCPSoft Visual Regex Tester or RegexPlanet that support Java regex syntax.
Do not use round brackets around the whole pattern, you can always refer to it with .group(0) or $0 back-reference.
The regex should look like
String pattern = "\\{.+?}"; // Note that `.+?` requires at least 1 character,
// Use .*? to allow 0 characters
And
Do not parse JSON with regex. See How to parse JSON in Java.

You probably need to escape your { } with backslashes, since you are treating them as literal characters.
E.g.
(\\{.+?\\})

Related

Regex for detecting struts `.do` pages - tests failing

I am testing the following regex that exists in an older project I have inherited:
.*\\.do
Within Java, the regex is declared as:
private static final String[] ACCESS_REGEX = {".*\\.do", ""};
And is essentially checked using the wrapper for Pattern.matches method: value.matches(check).
This old regex is working fine for various incoming requests such as home.do and I am doing a test on various regex test sites (listed below):
http://www.regexplanet.com/advanced/java/index.html
http://www.freeformatter.com/java-regex-tester.html
However, I can't see to get the regex to match various strings that I believe should match... I thought the regex above matches Strings that end with .do and have some characters in front. However, when I test for these no matches are found.
Example Test Strings:
home.do
\home.do
mmm\mmm\home.do
\mmm\home.do
home.do
Remind the special meaning, the \ character has in regular expressions and in Java string literals!
The regular expression should be
.*\.do
This works very well on http://www.freeformatter.com/java-regex-tester.html.
In a Java string literal you also need to escape the \ character, hence the regular expression in Java must be
.*\\.do
The issue is that the online regex tools you're using expect plain regex (without special characters escaped), which is .*\.do in your particular case - mind the single backslash.
On the other hand, when defined in a string literal in Java, regexes need special characters escaped, hence ".*\\.do" in your Java code.
Use unescaped regexes in the online test tools.
you need single back slash rather than double back slash in regexp.
You want to escape the dot only. remove the first back slash and the tests will pass

Java replaceAll to javascript regex

I want to move some user input test from Java to javascript. The code suppose to remove wildcard characters out of user input string, at any position. I'm attempting to convert the following Java notation to javascript, but keep getting error
"Invalid regular expression: /(?<!\")~[\\d\\.]*|\\?|\\*/: Invalid group".
I have almost no experience with regex expressions. Any help will be much appreciated:
JAVA:
str = str.replaceAll("(?<!\")~[\\d\\.]*|\\?|\\*","");
My failing javascript version:
input = input.replace( /(?<!\")~[\\d\\.]*|\\?|\\*/g, '');
The problem, as anubhava points out, is that JavaScript doesn't support lookbehind assertions. Sad but true. The lookbehind assertion in your original regex is (?<!\"). Specifically, it's looking only for strings that don't start with a double quotation mark.
However, all is not lost. There are some tricks you can use to achieve the same result as a lookbehind. In this case, the lookbehind is there only to prevent the character prior to the tilde from being replaced as well. We can accomplish this in JavaScript by matching the character anyway, but then including it in the replacement:
input = input.replace( /([^"])~[\d.]*|\?|\*/g, '$1' );
Note that for the alternations \? and \*, there will be no groups, so $1 will evaluate to the empty string, so it doesn't hurt to include it in the replacement.
NOTE: this is not 100% equivalent to the original regular expression. In particular, lookaround assertions (like the lookbehind above) also prevent the input stream from being consumed, which can sometimes be very helpful when matching things that are right next to each other. However, in this case, I can't think of a way that that would be a problem. To make a completely equivalent regex would be more difficult, but I believe this meets the need of the original regex.

Necessary to escape a java regular expression in matches()?

I'm currently doing a test on an HTTP Origin to determine if it came from SSL:
(HttpHeaders.Names.ORIGIN).matches("/^https:\\/\\//")
But I'm finding it's not working. Do I need to escape matches() strings like a regular expression or can I leave it like https://? Is there any way to do a simple string match?
Seems like it would be a simple question, but surprisingly I'm not getting anywhere even after using a RegEx tester http://www.regexplanet.com/advanced/java/index.html. Thanks.
Java's regex doesn't need delimiters. Simply do:
.matches("https://.*")
Note that matches validates the entire input string, hence the .* at the end. And if the input contains line break chars (which . will not match), enable DOT-ALL:
.matches("(?s)https://.*")
Of couse, you could also simply do:
.startsWith("https://")
which takes a plain string (no regex pattern).
How about this Regex:
"^(https:)\/\/.*"
It works in your tester

Regex conversion from java to php

I have a regular expression in php and I need to convert it to java.
Is it possible to do so? If yes how can i do?
Thanks in advance
$region_pattern = "/<a href=\"#\"><img src=\"images\/ponto_[^\.]+\.gif\"[^>]*>[ ]*<strong>(?P<neighborhood>[^\(<]+)\((?P<region>[^\)]+)\)<\/strong><\/a>/i" ;
A typical conversion from any regex to java is to:
Exclude pattern delimiters => remove starting and trailing /
Remove flags, these are applied to the Pattern object, this is the trailing i. You should either put it in the initialisation of your Pattern object or prepend it to the regex like (?i)<regex>
Replace all \ with \\, \ has a meaning already in java(escape in strings), to use a backslash inside a regex in java you have to use \\ instead of \, so \w becomes \\w. and \\ becomes \\\\
Above regex would become
Pattern.compile("<a href=\"#\"><img src=\"images\\/ponto_[^\\.]+\\.gif\"[^>]*>[ ]*<strong>(?P<neighborhood>[^\\(<]+)\\((?P<region>[^\\)]+)\\)<\\/strong><\\/a>", Pattern.CASE_INSENSITIVE);
This will fail however, I think it is because ?P is a modifier, not one I know exists in Java so ye it is a invalid regex.
There are some problems with the original regex that have to be cleared away first. First, there's [ ], which matches one of the characters &, n, b, s, p or ;. To match an actual non-breaking space character, you should use \xA0.
You also have a lot of unneeded backslashes in there. You can get rid of some by changing the regex delimiter to something other than /; others aren't needed because they're inside character classes, where most metacharacters lose their special meanings. That leaves you with this PHP regex:
"~<img src=\"images/ponto_[^.]+\.gif\"[^>]*>\xA0*<strong>(?P<neighborhood>[^(<]+)\((?P<region>[^)]+)\)</strong>~i"
There are three things that make this regex incompatible with Java. One is the delimiters (/ originally, ~ in the version above) along with the trailing i modifier. Java doesn't use regex delimiters at all, so just drop those. The modifier can be moved into the regex itself by using the inline form, (?i), at the beginning of the regex. (That will work in PHP too, by the way.)
Next is the backslashes. The ones that are used to escape quotation marks remain as they are, but all the others get doubled because Java is more strict about escape sequences in string literals.
Finally, there are the named groups. Up until Java 6, named groups weren't supported at all; Java 7 supports them, but they use the shorter (?<name>...) syntax favored by .NET,
not the Pythonesque (?P<name>...) syntax. (By the way, the shorter (?<name>...) version should work in PHP, too (as should (?'name'...), also introduced by .NET).
So the Java 7 version of your regex would be:
"(?i)<img src=\"images/ponto_[^.]+\\.gif\"[^>]*>\\xA0*<strong>(?<neighborhood>[^(<]+)\\((?<region>[^)]+)\\)</strong>"
For Java 6 or earlier you would use:
"(?i)<img src=\"images/ponto_[^.]+\\.gif\"[^>]*>\\xA0*<strong>([^(<]+)\\(([^)]+)\\)</strong>"
...and you'd have to use numbers instead of names to refer to the group captures.
REGEX is REGEX regardless of language. The REGEX you've posted will work on both Java and PHP. You do need to make some adjustments as both language don't take the pattern exactly the same (though the pattern itself will work in both languages).
Points to Consider
You should know that Java's Pattern object applies flags without having to specify them on the pattern string itself.
Delimiters should not be included as well. Only the pattern itself.

How do I write a regular expression to find the following pattern?

I am trying to write a regular expression to do a find and replace operation. Assume Java regex syntax. Below are examples of what I am trying to find:
12341+1
12241+1R1
100001+1R2
So, I am searching for a string beginning with one or more digits, followed by a "1+1" substring, followed by 0 or more characters. I have the following regex:
^(\d+)(1\\+1).*
This regex will successfully find the examples above, however, my goal is to replace the strings with everything before "1+1". So, 12341+1 would become 1234, and 12241+1R1 would become 1224. If I use the first grouped expression $1 to replace the pattern, I get the wrong result as follows:
12341+1 becomes 12341
12241+1R1 becomes 12241
100001+1R2 becomes 100001
Any ideas?
Your existing regex works fine, just that you are missing a \ before \d
String str = "100001+1R2";
str = str.replaceAll("^(\\d+)(1\\+1).*","$1");
Working link
IMHO, the regex is correct.
Perhaps you wrote it wrong in the code. If you want to code the regex ^(\d+)(1\+1).* in a string, you have to write something like String regex = "^(\\d+)(1\\+1).*".
Your output is the result of ^(\d+)(1+1).* replacement, as you miss some backslash in the string (e.g. "^(\\d+)(1\+1).*").
Your regex looks fine to me - I don't have access to java but in JavaScript the code..
"12341+1".replace(/(\d+)(1\+1)/g, "$1");
Returns 1234 as you'd expect. This works on a string with many 'codes' in too e.g.
"12341+1 54321+1".replace(/(\d+)(1\+1)/g, "$1");
gives 1234 5432.
Personally, I wouldn't use a Regex at all (it'd be like using a hammer on a thumbtack), I'd just create a substring from (Pseudocode)
stringName.substring(0, stringName.indexOf("1+1"))
But it looks like other posters have already mentioned the non-greedy operator.
In most Regex Syntaxes you can add a '?' after a '+' or '*' to indicate that you want it to match as little as possible before moving on in the pattern. (Thus: ^(\d+?)(1+1) matches any number of digits until it finds "1+1" and then, NOT INCLUDING the "1+1" it continues matching, whereas your original would see the 1 and match it as well).

Categories

Resources