Regex for detecting struts `.do` pages - tests failing - java

I am testing the following regex that exists in an older project I have inherited:
.*\\.do
Within Java, the regex is declared as:
private static final String[] ACCESS_REGEX = {".*\\.do", ""};
And is essentially checked using the wrapper for Pattern.matches method: value.matches(check).
This old regex is working fine for various incoming requests such as home.do and I am doing a test on various regex test sites (listed below):
http://www.regexplanet.com/advanced/java/index.html
http://www.freeformatter.com/java-regex-tester.html
However, I can't see to get the regex to match various strings that I believe should match... I thought the regex above matches Strings that end with .do and have some characters in front. However, when I test for these no matches are found.
Example Test Strings:
home.do
\home.do
mmm\mmm\home.do
\mmm\home.do
home.do

Remind the special meaning, the \ character has in regular expressions and in Java string literals!
The regular expression should be
.*\.do
This works very well on http://www.freeformatter.com/java-regex-tester.html.
In a Java string literal you also need to escape the \ character, hence the regular expression in Java must be
.*\\.do

The issue is that the online regex tools you're using expect plain regex (without special characters escaped), which is .*\.do in your particular case - mind the single backslash.
On the other hand, when defined in a string literal in Java, regexes need special characters escaped, hence ".*\\.do" in your Java code.
Use unescaped regexes in the online test tools.

you need single back slash rather than double back slash in regexp.
You want to escape the dot only. remove the first back slash and the tests will pass

Related

Invalid Java Regex for JSON

For some reason this regex keeps saying it is invalid in java, but on the online testers it works fine:
({.+?})
I am using it for a JSON data structure.
Is there a way to get it to work with Java?
Included link: http://regexr.com/3bs0p
Here are my 2 cents:
You need to escape the opening { in Java regex to tell the engine it is not the brace starting the limiting quantifier.
Do not use regexr.com to test Java regexps, use OCPSoft Visual Regex Tester or RegexPlanet that support Java regex syntax.
Do not use round brackets around the whole pattern, you can always refer to it with .group(0) or $0 back-reference.
The regex should look like
String pattern = "\\{.+?}"; // Note that `.+?` requires at least 1 character,
// Use .*? to allow 0 characters
And
Do not parse JSON with regex. See How to parse JSON in Java.
You probably need to escape your { } with backslashes, since you are treating them as literal characters.
E.g.
(\\{.+?\\})

Necessary to escape a java regular expression in matches()?

I'm currently doing a test on an HTTP Origin to determine if it came from SSL:
(HttpHeaders.Names.ORIGIN).matches("/^https:\\/\\//")
But I'm finding it's not working. Do I need to escape matches() strings like a regular expression or can I leave it like https://? Is there any way to do a simple string match?
Seems like it would be a simple question, but surprisingly I'm not getting anywhere even after using a RegEx tester http://www.regexplanet.com/advanced/java/index.html. Thanks.
Java's regex doesn't need delimiters. Simply do:
.matches("https://.*")
Note that matches validates the entire input string, hence the .* at the end. And if the input contains line break chars (which . will not match), enable DOT-ALL:
.matches("(?s)https://.*")
Of couse, you could also simply do:
.startsWith("https://")
which takes a plain string (no regex pattern).
How about this Regex:
"^(https:)\/\/.*"
It works in your tester

Regular Expression Java Error

I can't run this regular expression on Java:
String regex = "/^{m:\"(.*)\",s:([0-9]{1,15}),r:([0-9]{1,15}),t:([0-9]{1,2})}$/";
String data = "{m:\"texttexttext\",s:1231,r:23123,t:1}";
Pattern p = Pattern.compile(regex_Write_clientToServer);
Matcher a = p.matcher(data);
This the same regex and the same data on regex site's tester ( as http://gskinner.com/RegExr/ ) works fine!
Two problems:
In java, (unlike perl etc) regexes are not wrapped in / characters
You must escape your { literals:
Try this:
String regex = "^\\{m:\"(.*)\",s:([0-9]{1,15}),r:([0-9]{1,15}),t:([0-9]{1,2})\\}$";
There are two problems:
The forward slashes aren't part of the pattern itself, and shouldn't be included.
You need to escape the braces at the start and end, as otherwise they'll be treated as repetition quantifiers. This may not be the case in other regular expression implementations, but it's certainly the case in Java - when I tried just removing the slashes, I got an exception in Pattern.compile.
Try this:
String regex="^\\{m:\"(.*)\",s:([0-9]{1,15}),r:([0-9]{1,15}),t:([0-9]{1,2})\\}$";
(That works with your sample data.)
As an aside, if this is meant to be parsing JSON, I would personally not try to do it with regular expressions - use a real JSON parser instead. It'll be a lot more flexible in the long run.
Two things:
Java does not require you to have any kind of begin/end character. so you can drop the / chars
Also, Java requires you to escape any regex metacharacters if you want to match them. In your case, the brace characters '{' and '}' need to be preceded by a double backslash (one for java escape, one for regex escape):
"^\\{m:\"(.*)\",s:([0-9]{1,15}),r:([0-9]{1,15}),t:([0-9]{1,2})\\}$"

Regex conversion from java to php

I have a regular expression in php and I need to convert it to java.
Is it possible to do so? If yes how can i do?
Thanks in advance
$region_pattern = "/<a href=\"#\"><img src=\"images\/ponto_[^\.]+\.gif\"[^>]*>[ ]*<strong>(?P<neighborhood>[^\(<]+)\((?P<region>[^\)]+)\)<\/strong><\/a>/i" ;
A typical conversion from any regex to java is to:
Exclude pattern delimiters => remove starting and trailing /
Remove flags, these are applied to the Pattern object, this is the trailing i. You should either put it in the initialisation of your Pattern object or prepend it to the regex like (?i)<regex>
Replace all \ with \\, \ has a meaning already in java(escape in strings), to use a backslash inside a regex in java you have to use \\ instead of \, so \w becomes \\w. and \\ becomes \\\\
Above regex would become
Pattern.compile("<a href=\"#\"><img src=\"images\\/ponto_[^\\.]+\\.gif\"[^>]*>[ ]*<strong>(?P<neighborhood>[^\\(<]+)\\((?P<region>[^\\)]+)\\)<\\/strong><\\/a>", Pattern.CASE_INSENSITIVE);
This will fail however, I think it is because ?P is a modifier, not one I know exists in Java so ye it is a invalid regex.
There are some problems with the original regex that have to be cleared away first. First, there's [ ], which matches one of the characters &, n, b, s, p or ;. To match an actual non-breaking space character, you should use \xA0.
You also have a lot of unneeded backslashes in there. You can get rid of some by changing the regex delimiter to something other than /; others aren't needed because they're inside character classes, where most metacharacters lose their special meanings. That leaves you with this PHP regex:
"~<img src=\"images/ponto_[^.]+\.gif\"[^>]*>\xA0*<strong>(?P<neighborhood>[^(<]+)\((?P<region>[^)]+)\)</strong>~i"
There are three things that make this regex incompatible with Java. One is the delimiters (/ originally, ~ in the version above) along with the trailing i modifier. Java doesn't use regex delimiters at all, so just drop those. The modifier can be moved into the regex itself by using the inline form, (?i), at the beginning of the regex. (That will work in PHP too, by the way.)
Next is the backslashes. The ones that are used to escape quotation marks remain as they are, but all the others get doubled because Java is more strict about escape sequences in string literals.
Finally, there are the named groups. Up until Java 6, named groups weren't supported at all; Java 7 supports them, but they use the shorter (?<name>...) syntax favored by .NET,
not the Pythonesque (?P<name>...) syntax. (By the way, the shorter (?<name>...) version should work in PHP, too (as should (?'name'...), also introduced by .NET).
So the Java 7 version of your regex would be:
"(?i)<img src=\"images/ponto_[^.]+\\.gif\"[^>]*>\\xA0*<strong>(?<neighborhood>[^(<]+)\\((?<region>[^)]+)\\)</strong>"
For Java 6 or earlier you would use:
"(?i)<img src=\"images/ponto_[^.]+\\.gif\"[^>]*>\\xA0*<strong>([^(<]+)\\(([^)]+)\\)</strong>"
...and you'd have to use numbers instead of names to refer to the group captures.
REGEX is REGEX regardless of language. The REGEX you've posted will work on both Java and PHP. You do need to make some adjustments as both language don't take the pattern exactly the same (though the pattern itself will work in both languages).
Points to Consider
You should know that Java's Pattern object applies flags without having to specify them on the pattern string itself.
Delimiters should not be included as well. Only the pattern itself.

What is a regular expression for control characters?

I'm trying to match a control character in the form \^c where c is any valid character for control characters. I have this regular expression, but it's not currently working: \\[^][#-z]
I think the problem lies with the fact that the caret character (^) is part of the regular expressions parsing engine.
Match an ASCII text string of the form ^X using the pattern \^., nothing more. Match an ASCII text string of the form \^X with the pattern \\\^.. You may wish to constrain that dot to [?#_\[\]^\\], so \\\^[A-Z?#_\[\]^\\]. It’s easier to read as [?\x40-\x5F] for the bracketed character class, hence \\\^[?\x40-\x5F] for a literal BACKSLASH, followed by a literal CIRCUMFLEX, followed by something that turns into one of the valid control characters.
Note that that is the result of printing out the pattern, or what you’d read from a file. It’s what you need to pass to the regex compiler. If you have it as a string literal, you must of course double each of those backslashes. `\\\\\\^[?\\x40-\\x5F]" Yes, it is insane looking, but that is because Java does not support regexes directly as Groovy and Scala — or Perl and Ruby — do. Regex work is always easier without the extra bbaacckksslllllaasshheesssssess. :)
If you had real control characters instead of indirect representations of them, you would use \pC for all literal code points with the property GC=Other, or \p{Cc} for just GC=Control.
Check this out: http://www.regular-expressions.info/characters.html . You should be able to use \cA to \cZ to find the control characters..

Categories

Resources