I have built an API where you can register a callback URL.
The URL's are validated using the Apache UrlValidator class.
I now have to add a feature that allow to add placeholders in the configured URL.
https:/foo.com/${placeholder1}/bar/${placeholder2}
These placeholders will be dynamically replaced using the Apache StrSubstitutor or something similar.
Now my issue, how do I validate the URL's with the placeholders ?
I have thought of a solution :
I replace the expected placeholders with an example value
Then I Validate the URL using the Apache UrlValidator
My issue with this solution is that the Apache UrlValidator only returns a boolean so the error message will be quite ambiguous.
Is there another solution than creating my own regex ?
Update : following discussions in the comments
There is a finite number of allowed placeholders.
The format of the Strings that will replace the placeholders is also known.
The first objective is to be able to check if the given URL which eventually contains placeholders is valid at the time it is configured.
The second objective is, if the URL is not valid return an intelligible error message.
There are multiple error cases :
A placeholder used in the URL is not in the allowed placeholder list
The URL in not valid independently of the placeholders
For a minimal URL validation, you could use the java.net.URL constructor (it will work with your https:/foo.com/${placeholder1}/bar/${placeholder2} example).
According to the docs, it throws:
MalformedURLException - if no protocol is specified, or an unknown protocol is found, or spec is null.
You can then leverage the URL methods as a bonus, to get parts of it such as path, protocol, etc.
I would definitely advise against re-inventing the wheel with regex for URL validation.
Note that java.net.URI has a much stricter validation and would fail your example with placeholders as is.
Edit
As discussed, since you need to validate placeholders as well, you probably want to actually try to fill them first and fail fast if something's wrong, then proceed and validate the populated URL against java.net.URI, for strict validation.
General caveat
You might also want to make your life easier and leverage an existing framework that would allow you to use annotated path variables in the first place (e.g. Spring, etc.), but that's quite a broad discussion.
Related
Using owasp.esapi for to filter incoming request parameters and headers, I'm stumbling on an issue where apparently the Referer header contains a value that is considered as using "multiple encoding".
An example:
http://123.abc.xx/xyz/input.xhtml?server=http%3A%2F%2F123.abc.xx%3A7016%2Fxyz&o=1&language=en&t=a074faf3
To me though, that URL seems to be correctly encoded, and decoding it results in a perfectly readable and correct url.
So, can anyone explain the issue here, and how to handle this?
ESAPI reports the error when running this method on the header value:
value = ESAPI.encoder().canonicalize(value);
Output:
SEVERE: [SECURITY FAILURE] INTRUSION - Mixed encoding (2x) detected
As a matter of fact yes. I fixed this bug in the upcoming release of ESAPI but it will require an API change, perhaps one that might have a bug based on your data here.
In short, prior to my fix, ESAPI just did a Regex against the URI. The problem and slew of bug reports on this, is that URI’s are not a regular language. They are a language themselves. So what would happen is that the URI in question would have parameters that contained HTML entities, only, some random data variants would align to known HTML entities such as ¶m=foo which would be interpreted as the entity ¶ which is paragraph. There were also some issues in regards to ASCII vs Unicode (non bmp encodings.).
At any rate there will be a new method to use in the release candidate for our next library, Encoder.getCanonicalizedURI();
This will be safe to regex against as it will be broken down and checked for mixed/multiple encoding. The method you’re currently using is now deprecated.
I'm trying to set up some expectations in MockServer where I have a specific expectation for requests matching this path
/api/users/:user_id
using the regex /api/users/.*.
However, I have a few other expectations which I want to match when accessing user-specific resources:
/api/users/:user_id/books
/api/users/:user_id/books/:book_id
/api/users/:user_id/holidays
I'm not really sure how to properly use regexs to match the first path, without also affecting the requests coming for the user-specific resources without (i.e., all requests are matching on the first path).
For example, I believe for the /api/users/:user_id/books path, I can use the regex /api/users/.*/books, but this will never match, because the regex /api/users/.* for the first path will always match these deeper URLs.
I've been reading this page, but can't quite figure out how to correctly use the regexes for this particular case
One general approach here might be to use negative lookahead assertions. For the general case, you could use this pattern:
/api/users/\d+/(?!books|holidays)
Then, for the more specific cases, use patterns which you already had in mind, e.g.
/api/users/\d+/books
/api/users/\d+/holidays
Demo
I just realized that my base64 encoded Header "Authentication" can't be read
with request.getHeader("Authentication").
I found this post about that it's a security Feature in URLConnection
getRequestProperty("Authorization") always returns null
, i don't know why but it seems to be true for request.getHeader as well.
How can i still get this Header if l don't want to Switch to other libraries?
I was searching through https://fossies.org/dox/apache-tomcat-6.0.45-src/catalina_2connector_2Request_8java_source.html#l01947 and found a section where restricted headers will be used if Globals.IS_SECURITY_ENABLED is set.
Since I'm working on a reverse Proxy and only Need to pass requests/Responses through I did simply set "System.setSecurityManager(null);" and for my case it might be a valid solution but if you want to use authentication there is no reason to use this Workaround.
My bad, it does work with https now.
The accepted solution did not work for me – may have something to do with different runtime environments.
However, i've managed to come up with a working snippet to access the underlying MessageHeader-collection via reflection and extract the "Authorization"-header value.
I have an high performance application which deals with URLs. For every URL it needs to retrieve the appropriate settings from a predefined pool. Every settings object is associated with a URL pattern which indicates which URLs should use these settings. The matching rules are as follows:
"google.com" match pattern should match all URLs pointing to the google domain (thus, maps.google.com and www.google.com/match are matched).
"*.google.com" should match all URLs pointing to a subdomain of google.com (thus, maps.google.com matches, but google.com and www.google.com don't).
"maps.google.com" should match all URLs pointing to this specific subdomain.
Apart from the above rules, every match rule can contain a path, which means that the path part of the URL should start with the match rule path. So: "*.google.com/maps" matches "maps.google.com/maps" but not "maps.google.com/advanced".
As you can see the rules above are overlapping. In the case two rules exist which match the same URL the most specific should apply. The list above is ranked from least specific to most specific.
This seems to be such a standard problem that I was hoping to use a ready made library rather than program my self. Google reveals a couple of options but without a clear way to choose between them. What would you recommend as a good library for this task?
Thanks,
Boaz
I don't think you need a specific library to solve this; the standard Java API has all that you need to write the code without too much work.
Take a look at java.util.regex.Pattern and work out the regular expressions you need to match each of your rules. You might also want to use java.net.URL to parse out the different fields from the URL.
You already said you have a priority scheme to handle scenarios where multiple patterns match the URL, so that should be the last piece for this puzzle.
It looks like a pretty straight-forward task.
Related to this question:
URL characters replacement in JSP with UrlRewrite
I want to have masked URLs in this JSP Java EE web project.
For example if I had this:
http://mysite.com/products.jsp?id=42&name=Programming_Book
I would like to turn that URL into something more User/Google friendly like:
http://mysite.com/product-Programming-Book
I've been fighting with UrlRewrite, forwarding and RequestDispatcher to accomplish what I want, but I'm kind of lost. I should probably have a filter for all http requests, re format them, and forward the page.
Can anyone give some directions? Tips?
Thanks a lot.
UPDATE: Servlets did it. Thanks Yuval for your orientation.
I had been using UrlRewrite, as you can see at the first sentence of the question I also asked a question about that. But I couldn't manage to get UrlRewrite work the way I wanted. Servlets did the job.
You could use a URLRewrite filter. It's like how mod_rewrite is for Apache's HTTP web server.
http://tuckey.org/urlrewrite/
"Redirect one url
<rule>
<from>^/some/old/page\.html$</from>
<to type="redirect">/very/new/page.html</to>
</rule>
Tiny/Freindly url
<rule>
<from>^/zebra$</from>
<to type="redirect">/big/ugly/url/1,23,56,23132.html</to>
</rule>
"
It's been a while since I mucked about with JSPs, but if memory serves you can add URL patterns to your web.xml (or one of those XML config files) and have the servlet engine automatically route the request to a valid URL with your choice of paramters. I can look up the details if you like.
In your case, map http://mysite.com/product-Programming-Book to the URL
http://mysite.com/products.jsp?id=42&name=Programming_Book and the user no longer sees the real URL. Also, you can use this more user-friendly URL within your application, as a logical name for that page.
Yuval =8-)
Generally you're fronting your application with Apache. If so, look into using Apache's mod_rewrite. http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html
For one thing I'd recommend you deal with this within your application, and not rely on external rewrites, say via Apache mod_rewrite (unless you have determined this is the fastest way to do so.)
But a few things first:
I would not convert this:
http://mysite.com/products.jsp?id=42&name=Programming_Book
Into this:
http://mysite.com/product-Programming-Book
See, if I go only by your book example, I don't see what is wrong with the former URL. After, all it works for Amazon. And there is no such thing as google friendly URLs (only user friendly.) You have to consider why you want to do that type of rewriting, and how. For example, in your rewrite option, where is the id?
That is, you have to define a logical rule that define
the unique pages you want to show, and
the unique combination of parameters that can identify each page.
For example, using your book case. Let's say you can identify any book using the following rules:
by ISBN
by Author Name, Title and if
applicable version (if version is
missing, assume latest)
if ISBN is included with Author
Name, Title and/or edition, ignore
all except ISBN. That is, treat it
as the former (or more precisely,
ignore all other book identification
parameters when ISBN is present.)
With a ?parametrized url scheme, then you'd have the following possibilities:
http://yoursite/products?isbn=123465
http://yoursite/products?author=johndoe&title="the cookbook" << this assumes the latest edition, or 1 if first.
http://yoursite/products?author=johndoe&title="the cookbook"&edition=3
http://yoursite/products?title="the cookbook"&author=johndoe
http://yoursite/products?edition=3&title="the cookbook"&author=johndoe
....
and so on for all combinations. So before you look for a technical implementation, you have to think very carefully how you will do it. You'd have to create a syntax and a hierarchy of parameters (say, author will always come before title, and title will always come before edition).
So you'll end up with the following (using the same example as John Doe the author, with his book being in the 3rd edition):
http://yoursite/product/isbn/12345
http://yoursite/product/author/johndoe/the%20cookbook << see the %20 for encoding spaces (not a good idea, but something to take into account)
http://yoursite/product/author/johndoe/the%20cookbook/3
Any other combination should either generate an error or smartly figure out how to rewrite to the "cannon" versions and send a HTTP 3xx to the client with the appropriate URL target.
Once you have ironed those details out, you can ask yourself it the effort is worth it or necessary.
So if you find yourself that you need to, then easiest and cheapest DIY way is to write a filter that parses the url, breaks the parameters down, creates a ?parametrized url string for a JSP page, get its RequestDispatcher and forward to it.
You do not want to do URL rewrites because these incur in HTTP 303/307 back and forth between your server and your client. Or at least you want to keep that to a minimum.