Handling special characters in domain names (without IDN)?

Handling special characters in domain names (without IDN)? - java

I am using the URI class to break apart a string url.
The getHost() method returns null when there are special characters in it.
Such as: http://✪df.ws/g44
It was suggested to use the IDN class to work around this. However, that class is only available in the Android API level 9 and above, which means 2.3 and above.
Is there another way to do this without the IDN class?
I want to be able to break apart a string url into the various pieces and be able to handle modern urls.
Thanks
Update It looks like the WebView doesn't support these types of urls either. So, it looks like I need to find a way to support or convert these urls for pre 2.3 devices.
Is there a way to convert these urls without the IDN class?

getHost() = ignore everything from the start until :// and then capture everything until you get a slash.
Wouldn't that work?

Related

Java Url Validation With Placeholders

I have built an API where you can register a callback URL.
The URL's are validated using the Apache UrlValidator class.
I now have to add a feature that allow to add placeholders in the configured URL.
https:/foo.com/${placeholder1}/bar/${placeholder2}
These placeholders will be dynamically replaced using the Apache StrSubstitutor or something similar.
Now my issue, how do I validate the URL's with the placeholders ?
I have thought of a solution :
I replace the expected placeholders with an example value
Then I Validate the URL using the Apache UrlValidator
My issue with this solution is that the Apache UrlValidator only returns a boolean so the error message will be quite ambiguous.
Is there another solution than creating my own regex ?
Update : following discussions in the comments
There is a finite number of allowed placeholders.
The format of the Strings that will replace the placeholders is also known.
The first objective is to be able to check if the given URL which eventually contains placeholders is valid at the time it is configured.
The second objective is, if the URL is not valid return an intelligible error message.
There are multiple error cases :
A placeholder used in the URL is not in the allowed placeholder list
The URL in not valid independently of the placeholders

For a minimal URL validation, you could use the java.net.URL constructor (it will work with your https:/foo.com/${placeholder1}/bar/${placeholder2} example).
According to the docs, it throws:
MalformedURLException - if no protocol is specified, or an unknown protocol is found, or spec is null.
You can then leverage the URL methods as a bonus, to get parts of it such as path, protocol, etc.
I would definitely advise against re-inventing the wheel with regex for URL validation.
Note that java.net.URI has a much stricter validation and would fail your example with placeholders as is.
Edit
As discussed, since you need to validate placeholders as well, you probably want to actually try to fill them first and fail fast if something's wrong, then proceed and validate the populated URL against java.net.URI, for strict validation.
General caveat
You might also want to make your life easier and leverage an existing framework that would allow you to use annotated path variables in the first place (e.g. Spring, etc.), but that's quite a broad discussion.

Escape HTML in JSON with PlayFramework2

I am using PlayFramework2 and I can't find a way to properly handle HTML escaping.
In the template system, HTML entities are filtered by default.
But when I use REST requests with Backbone.js, my JSON objects are not filtered.
I use play.libs.Json.toJson(myModel) to transform an Object into a String.
So, in my controller, I use return ok(Json.toJson(myModel)); to send the response ... but here, the attributes of my model are not secured.
I can't find a way to handle it ...
Second question :
The template engine filters HTML entities by default, this means that we have to store into our database the raw user inputs.
Is it a save behaviour ?
Third questdion :
Is there in the PlayFramework a function to manualy escape strings ? All those I can find require to add new dependencies.
Thanks !
Edit : I found a way at the Backbone.js templating level :
- Use myBackboneModel.escape('attr'); instead of myBackboneModel.get('attr');
Underscore.js templating system also includes that options : <%= attr %> renders without escaping but <%- attr %> renders with escaping !
Just be careful to the efficiency, strings are re-escaped at each rendering. That's why the Backbone .create() should be prefered.

The best practices on XSS-attacks prevention usually recommend you to reason about your output rather than your input. There's a number of reasons behind that. In my opinion the most important are:
It doesn't make any sense to reason about escaping something unless you exactly know how you are going to output/render your data. Because different ways of rendering will require different escaping strategies, e.g. properly escaped HTML string is not good enough to use it in Javascript block. Requirements and technologies change constantly, today you render your data one way - tomorrow you might be using another (let's say you will be working on a mobile client which doesn't require HTML-escaping, because it doesn't use HTML at all to render data) You can only be sure about proper escaping strategy while rendering your data. This is why modern frameworks delegate escaping to templating engines. I'd recommend reviewing the following article: XSS (Cross Site Scripting) Prevention Cheat Sheet
Escaping user's input is actually a destructive/lossy operation – if you escape user's input before persisting it to a storage you will never find out what was his original input. There's no deterministic way to 'unescape' HTML-escaped string, consider my mobile client example above.
That is why I believe that the right way to go would be to delegate escaping to your templating engines (i.e. Play and JS-templating engine you're using for Backbone). There's no need to HTML-escape string you serialize to JSON. Notice that behind the scenes JSON-serializer will JSON-escape your strings, e.g. if you have a quote in your string it will be properly escaped to ensure resulting JSON is correct, because it's a JSON serializer after all that's why it only cares about proper JSON rendering, it knows nothing about HTML (and it shouldn't). However when you rendering your JSON data in the client side you should properly HTML-escape it using the functionality provided by the JS-templating engine you're using for Backbone.
Answering another question: you can use play.api.templates.HtmlFormat to escape raw HTML-string manually:
import play.api.templates.HtmlFormat
...
HtmlFormat.escape("<b>hello</b>").toString()
// -> <b>hello</b>
If you really need to make JSON-encoder escape certain HTML strings, a good idea might be to create a wrapper for them, let's say RawString and provide custom Format[RawString] which will also HTML-escape a string in its writes method. For details see: play.api.libs.json API documentation

What is the base open source java package to filter/match URLs?

I have an high performance application which deals with URLs. For every URL it needs to retrieve the appropriate settings from a predefined pool. Every settings object is associated with a URL pattern which indicates which URLs should use these settings. The matching rules are as follows:
"google.com" match pattern should match all URLs pointing to the google domain (thus, maps.google.com and www.google.com/match are matched).
"*.google.com" should match all URLs pointing to a subdomain of google.com (thus, maps.google.com matches, but google.com and www.google.com don't).
"maps.google.com" should match all URLs pointing to this specific subdomain.
Apart from the above rules, every match rule can contain a path, which means that the path part of the URL should start with the match rule path. So: "*.google.com/maps" matches "maps.google.com/maps" but not "maps.google.com/advanced".
As you can see the rules above are overlapping. In the case two rules exist which match the same URL the most specific should apply. The list above is ranked from least specific to most specific.
This seems to be such a standard problem that I was hoping to use a ready made library rather than program my self. Google reveals a couple of options but without a clear way to choose between them. What would you recommend as a good library for this task?
Thanks,
Boaz

I don't think you need a specific library to solve this; the standard Java API has all that you need to write the code without too much work.
Take a look at java.util.regex.Pattern and work out the regular expressions you need to match each of your rules. You might also want to use java.net.URL to parse out the different fields from the URL.
You already said you have a priority scheme to handle scenarios where multiple patterns match the URL, so that should be the last piece for this puzzle.
It looks like a pretty straight-forward task.

expand Tiny url in java

I want to write a code in java that takes a url identify whether it is tiny url or not. if yes then it will identify the url is malicious or not. if not malicious print the url...
Please can any body help me....

You can use HttpClient to detect whether the URL is redirected to another location. After that it's a simple case of:
if (!isMalicious(redirectTargetURL))
{
System.out.println(redirectTargetURL);
}
The isMalicious(...) implementation is left as an excercise for the reader.

If you trust google to implement isMalicious(...) then they have done so with their Safe Browsing API.

So 2 main things you want:
Identify if it's a tinyurl
Identify if the URL is malicious
The answer to part 1 is easy. Just check if the URL belongs to the domain 'tinyurl.com'. Should be straightforward to either test raw URL string, or the host part returned by the getHost() method of a java.net.URL object.
Part 2 is more difficult to code up from scratch...
First you will need your code to figure out where the tinyurl redirects to.
The next bit really depends on how you want to define 'malicious'. Detecting deceptive URLs will require a bit of work (e.g. finding the difference between something like www.stackoverflow.com and www.stack0verf10w.com), or comparing the target URL with a malicous URL list (there's sites that publish them). There's also checking for multiple redirects, popups, and the list of criteria could go on and on.

HttpServletRequest - Quick way to encode url and hidden field paramaters

In my java app I'm preventing XSS attacks. I want to encode URL and hidden field paramaters in the HttpServletRequest objects I have a handle on.
How would I go about doing this?

Don't do that. You're making it unnecessarily more complicated. Just escape it during display only. See my answer in your other topic: Java 5 HTML escaping To Prevent XSS

To properly display user-entered data on an HTML page, you simply need to ensure that any special HTML characters are properly encoded as entities, via String#replace or similar. The good news is that there is very little you need to encode (for this purpose):
str = str.replace("&", "&").replace("<", "<");
You can also replace > if you like, but there's no need to.
This isn't only because of XSS, but also just so that characters show up properly. You may also want to handle ensuring that characters outside the common latin set are turned into appropriate entities, to protect against charset issues (UTF-8 vs. Windows-1252, etc.).

You can use StringEscapeUtils from the library Apache Jakarta Commons Lang
http://www.jdocs.com/lang/2.1/org/apache/commons/lang/StringEscapeUtils.html

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.