how to mandate com/net/etc in url using regex? - java

i have below the regex. how can i mandate .com/net/etc?
String regex = "^(((https?|ftp)://|(www|ftp)\\.)[a-z0-9-]+(\\.[a-z0-9-]+)+([/?].*)?)|(http://)$";
Thanks!

I would recommend that you don't us a regex for this.
I'd recommend that you parse the URL using the URL class (or URI class if that is more appropriate), and then check that the hostname part ends with one of the required top-level domains.
I'd also recommend that you avoid hard-wiring a set of top-level domains into your code and/or your regexes.
A whole swathe of new TLDs are going to go live fairly soon. Like thousands of them ...
Even ignoring the new TLDs, the set of 2-letter country TLDs is not fixed. (Does South Sudan have a code yet?)

This regex should do it:
[.](com|net|other)
But place it at the correct position in your big url regex (which is maybe not the best way to go...)

After you validate URL protocol you may use something like ^[a-zA-Z0-9-.]+.(com|org|net|mil|edu|COM|ORG|NET)$

Related

Java regex URL for MockServer expectation

I'm trying to set up some expectations in MockServer where I have a specific expectation for requests matching this path
/api/users/:user_id
using the regex /api/users/.*.
However, I have a few other expectations which I want to match when accessing user-specific resources:
/api/users/:user_id/books
/api/users/:user_id/books/:book_id
/api/users/:user_id/holidays
I'm not really sure how to properly use regexs to match the first path, without also affecting the requests coming for the user-specific resources without (i.e., all requests are matching on the first path).
For example, I believe for the /api/users/:user_id/books path, I can use the regex /api/users/.*/books, but this will never match, because the regex /api/users/.* for the first path will always match these deeper URLs.
I've been reading this page, but can't quite figure out how to correctly use the regexes for this particular case
One general approach here might be to use negative lookahead assertions. For the general case, you could use this pattern:
/api/users/\d+/(?!books|holidays)
Then, for the more specific cases, use patterns which you already had in mind, e.g.
/api/users/\d+/books
/api/users/\d+/holidays
Demo

Java Url Validation With Placeholders

I have built an API where you can register a callback URL.
The URL's are validated using the Apache UrlValidator class.
I now have to add a feature that allow to add placeholders in the configured URL.
https:/foo.com/${placeholder1}/bar/${placeholder2}
These placeholders will be dynamically replaced using the Apache StrSubstitutor or something similar.
Now my issue, how do I validate the URL's with the placeholders ?
I have thought of a solution :
I replace the expected placeholders with an example value
Then I Validate the URL using the Apache UrlValidator
My issue with this solution is that the Apache UrlValidator only returns a boolean so the error message will be quite ambiguous.
Is there another solution than creating my own regex ?
Update : following discussions in the comments
There is a finite number of allowed placeholders.
The format of the Strings that will replace the placeholders is also known.
The first objective is to be able to check if the given URL which eventually contains placeholders is valid at the time it is configured.
The second objective is, if the URL is not valid return an intelligible error message.
There are multiple error cases :
A placeholder used in the URL is not in the allowed placeholder list
The URL in not valid independently of the placeholders
For a minimal URL validation, you could use the java.net.URL constructor (it will work with your https:/foo.com/${placeholder1}/bar/${placeholder2} example).
According to the docs, it throws:
MalformedURLException - if no protocol is specified, or an unknown protocol is found, or spec is null.
You can then leverage the URL methods as a bonus, to get parts of it such as path, protocol, etc.
I would definitely advise against re-inventing the wheel with regex for URL validation.
Note that java.net.URI has a much stricter validation and would fail your example with placeholders as is.
Edit
As discussed, since you need to validate placeholders as well, you probably want to actually try to fill them first and fail fast if something's wrong, then proceed and validate the populated URL against java.net.URI, for strict validation.
General caveat
You might also want to make your life easier and leverage an existing framework that would allow you to use annotated path variables in the first place (e.g. Spring, etc.), but that's quite a broad discussion.

regex to disallow access to parent directories - java

So what I need is to create a regex which is going to be used on my server to make sure that all the files that the user is requesting access to, are under a specific directory. Let's name that dir UserFiles and let's assume that it is under the path /Server/Users/Bob/UserFiles.
So now when a client sends a request to read a file I want to validate that the path that he is asking access to is under /Bob/UserFiles/.
I thought about making sure that the prefix of the path always begins with /Userfiles/ and that there is no .. in the path (so that would also protect me from restricted access like /UserFiles/../../noAccess.txt)
examples of not allowed inputs:
C:/UserFiles/
../../Alice/txt.txt
/UserFiles/../../noAccess.txt
examples of allowed input:
/UserFiles/UserFiles/Alice/txt.txt
/UserFiles/txt.txt
/UserFiles/Bob/Bob/txt.txt
I cannot think of any cases why this wouldn't work. I also tried to build the regex but it is not quite right as it allows inputs like /UserFiles//txt.txt (Might allow even more that it shouldn't that I have no knowledge of)
So is my idea complete or there are other cases I havent thought of? If my idea is complete could you please help me fix my regex?
(?!\.\.)^\/UserFiles\/[/\w,\s-]+\.[A-Za-z]{3}$
How about resolving the path and checking only afterwards (note, the behaviour is OS-dependent):
new File(input).getCanonicalPath().startsWith("/UserFiles/")
Or, depending on how to interpret your question:
new File(input).getCanonicalPath().startsWith("/Server/Users/Bob/UserFiles/")

Handling special characters in domain names (without IDN)?

I am using the URI class to break apart a string url.
The getHost() method returns null when there are special characters in it.
Such as: http://✪df.ws/g44
It was suggested to use the IDN class to work around this. However, that class is only available in the Android API level 9 and above, which means 2.3 and above.
Is there another way to do this without the IDN class?
I want to be able to break apart a string url into the various pieces and be able to handle modern urls.
Thanks
Update It looks like the WebView doesn't support these types of urls either. So, it looks like I need to find a way to support or convert these urls for pre 2.3 devices.
Is there a way to convert these urls without the IDN class?
getHost() = ignore everything from the start until :// and then capture everything until you get a slash.
Wouldn't that work?

What is the base open source java package to filter/match URLs?

I have an high performance application which deals with URLs. For every URL it needs to retrieve the appropriate settings from a predefined pool. Every settings object is associated with a URL pattern which indicates which URLs should use these settings. The matching rules are as follows:
"google.com" match pattern should match all URLs pointing to the google domain (thus, maps.google.com and www.google.com/match are matched).
"*.google.com" should match all URLs pointing to a subdomain of google.com (thus, maps.google.com matches, but google.com and www.google.com don't).
"maps.google.com" should match all URLs pointing to this specific subdomain.
Apart from the above rules, every match rule can contain a path, which means that the path part of the URL should start with the match rule path. So: "*.google.com/maps" matches "maps.google.com/maps" but not "maps.google.com/advanced".
As you can see the rules above are overlapping. In the case two rules exist which match the same URL the most specific should apply. The list above is ranked from least specific to most specific.
This seems to be such a standard problem that I was hoping to use a ready made library rather than program my self. Google reveals a couple of options but without a clear way to choose between them. What would you recommend as a good library for this task?
Thanks,
Boaz
I don't think you need a specific library to solve this; the standard Java API has all that you need to write the code without too much work.
Take a look at java.util.regex.Pattern and work out the regular expressions you need to match each of your rules. You might also want to use java.net.URL to parse out the different fields from the URL.
You already said you have a priority scheme to handle scenarios where multiple patterns match the URL, so that should be the last piece for this puzzle.
It looks like a pretty straight-forward task.

Categories

Resources