How can I validate hashmap using esapi? - java

For this code:
HashMap temphashMap = session.getAttribute("abc");
how can I validate hashmap against the Cross-site Scripting vulnerability using ESAPI ?

You can't do this. In order to do validation, you have to know in advance, the special rules and data flows of every single item in that HashMap.
How could you tell if one of the form fields in your application was a rich-text field, like the one I used to type this answer? What if some HTML tags are legal?
You can't know that.
That's why it's 100x more important to make sure data is properly encoded for its proper context.
For that you'll use the methods in the ESAPI.encoder() class.
The closest thing to what you're asking for would be to set up a servlet filter then on every request/response pair, and then use an XSS filter like AntiSamy to allow certain tags in and disallow others. However the rules are different for a value used in an HTML attribute, HTML tag text, or a <textarea> block. You would have to write a pretty significant amount of logic for what you could avoid just by escaping all data before you hand it off to another interpreter.

Related

Preventing xss attack in java web app while saving actual values in database

In a Recent scan of our java based web application through AppScan it was found that the application was prone to XSS attacks.
I did my research and found that a ServletFilter was probably the easiest way to protect the application.
I introduced the filter where I extended HttpServletRequestWrapper (because java does not allow request param to be changed, there is no request.setParam method). I introduced a sanitize method there and here is what it does
result = ESAPI.encoder().canonicalize( input);
// Avoid null characters
result = result.replaceAll("\0", "");
// Clean out HTML
result = Jsoup.clean( result, Whitelist.none() );
Post this change, it was good, I tested for XSS vulnerabilites myself and most of them were fixed. But this posed another problem. Suppose I have a form to create a product, and in product name a user enters something like
<script>alert('somethingStupid')</script>
Now Ideally I should be able to save this to database, but still be protected from XSS attack. Not sure what to do in my filter or anywhere else to achieve this.
HTML-injection is an output-stage issue, caused by forgetting to encode text when injecting it into a context where characters are special. ESAPI offers encoders for various contexts, as discussed by #Zakaria. If you use these consistently, each in the correct context, you have fixed injection-related XSS issues.
If you are using purely JSTL tags like <c:out> for your templating, these will also HTML-escape by default. In general, it is best to generate HTML using a templating system that works HTML-escaping out for you automatically, because otherwise you are likely to forget to manually encodeForHTML occasionally.
(Aside: on project where I am compelled to use the mostly-terrible owasp-esapi-java library, my preference is for encodeForXML over the HTML encoders, as it produces output that is safe for HTML content and quoted attribute values whilst not needlessly attempting to produce entity references for non-ASCII characters. I would typically try to avoid injecting into JavaScript string literals; it is typically easier and more maintainable to inject run-time content into HTML data- attributes and read them from separate JavaScript DOM code.)
Trying to filter out HTML at the input stage is a lamentably still-popular but completely misguided approach. It prevents you from entering HTML-like input when you need to—as you have found out, with the <script> example. Indeed, if StackOverflow used such an input filter we would not be able to have this conversation.
What's more, it's not resilient: there are many ways to smuggle potential injections past input filters. To make a filter effective you'd have to consider blocking pretty much all punctuation, which is generally not considered acceptable. Plus, any data that gets into your application by means other than request parameters won't be vetted.
Input validation is great for enforcing business rules on the formats of particular input fields, and can be used to filter out input that you never want, like control characters. But it's the wrong place to be worrying about escaping or removing HTML. The time to do that is when you're creating HTML.
Cross Site Scripting (XSS) is a security issue which occurs when there is no mechanism of validating user input so the result will be an exploitable javascript code generally.
3 types of XSS are known : Reflexive XSS, DOM-based XSS and Persistant XSS.
In your case and since you're using OWASP ESAPI, canonicalizing inputs is not enough, sure it's a good way to defense against Untrusted URL in a SRC or HREF attribute but it's not enough.
You should Follow thess Rules : Source ( XSS (Cross Site Scripting) Prevention Cheat Sheet of OWASP ) (here are some rules for further reading follow the link) :
1- HTML Escape Before Inserting Untrusted Data into HTML Element Content: see the example :
String safe = ESAPI.encoder().encodeForHTML( request.getParameter( "input" ) );
2- Attribute Escape Before Inserting Untrusted Data into HTML Common Attributes :
String safe = ESAPI.encoder().encodeForHTMLAttribute( request.getParameter( "input" ) );
3- JavaScript Escape Before Inserting Untrusted Data into JavaScript Data Values:
String safe = ESAPI.encoder().encodeForJavaScript( request.getParameter( "input" ) );

Escape HTML in JSON with PlayFramework2

I am using PlayFramework2 and I can't find a way to properly handle HTML escaping.
In the template system, HTML entities are filtered by default.
But when I use REST requests with Backbone.js, my JSON objects are not filtered.
I use play.libs.Json.toJson(myModel) to transform an Object into a String.
So, in my controller, I use return ok(Json.toJson(myModel)); to send the response ... but here, the attributes of my model are not secured.
I can't find a way to handle it ...
Second question :
The template engine filters HTML entities by default, this means that we have to store into our database the raw user inputs.
Is it a save behaviour ?
Third questdion :
Is there in the PlayFramework a function to manualy escape strings ? All those I can find require to add new dependencies.
Thanks !
Edit : I found a way at the Backbone.js templating level :
- Use myBackboneModel.escape('attr'); instead of myBackboneModel.get('attr');
Underscore.js templating system also includes that options : <%= attr %> renders without escaping but <%- attr %> renders with escaping !
Just be careful to the efficiency, strings are re-escaped at each rendering. That's why the Backbone .create() should be prefered.
The best practices on XSS-attacks prevention usually recommend you to reason about your output rather than your input. There's a number of reasons behind that. In my opinion the most important are:
It doesn't make any sense to reason about escaping something unless you exactly know how you are going to output/render your data. Because different ways of rendering will require different escaping strategies, e.g. properly escaped HTML string is not good enough to use it in Javascript block. Requirements and technologies change constantly, today you render your data one way - tomorrow you might be using another (let's say you will be working on a mobile client which doesn't require HTML-escaping, because it doesn't use HTML at all to render data) You can only be sure about proper escaping strategy while rendering your data. This is why modern frameworks delegate escaping to templating engines. I'd recommend reviewing the following article: XSS (Cross Site Scripting) Prevention Cheat Sheet
Escaping user's input is actually a destructive/lossy operation – if you escape user's input before persisting it to a storage you will never find out what was his original input. There's no deterministic way to 'unescape' HTML-escaped string, consider my mobile client example above.
That is why I believe that the right way to go would be to delegate escaping to your templating engines (i.e. Play and JS-templating engine you're using for Backbone). There's no need to HTML-escape string you serialize to JSON. Notice that behind the scenes JSON-serializer will JSON-escape your strings, e.g. if you have a quote in your string it will be properly escaped to ensure resulting JSON is correct, because it's a JSON serializer after all that's why it only cares about proper JSON rendering, it knows nothing about HTML (and it shouldn't). However when you rendering your JSON data in the client side you should properly HTML-escape it using the functionality provided by the JS-templating engine you're using for Backbone.
Answering another question: you can use play.api.templates.HtmlFormat to escape raw HTML-string manually:
import play.api.templates.HtmlFormat
...
HtmlFormat.escape("<b>hello</b>").toString()
// -> <b>hello</b>
If you really need to make JSON-encoder escape certain HTML strings, a good idea might be to create a wrapper for them, let's say RawString and provide custom Format[RawString] which will also HTML-escape a string in its writes method. For details see: play.api.libs.json API documentation

Anchors with SafeHtml

How would you use SafeHtml in combination with links?
Scenario: Our users can enter unformatted text which may contain links, e.g. I like&love http://www.stackoverflow.com. We want to safely render this text in GWT but make the links clickable, e.g. I like&love <a="http://www.stackoverflow.com">stackoverflow.com</a>. Aside rendering the text in the GWT frontend, we also want to send it via email where the links should be clickable as well.
So far, we considered the following options:
Store the complete text as HTML in the backend and let the frontend assume it's correctly encoded (I like&love <a="http://www.stackoverflow.com">stackoverflow.com</a>) -> Introduces XSS vulnerabilities
Store plain text but the links as HTML (I like&love <a="http://www.stackoverflow.com">stackoverflow.com</a>) in the backend and use HtmlSanitizer in the frontend
Store plain text and special encoding for the links (I like&love [stackoverflow.com|http://www.stackoverflow.com]) in the backend and use a custom SafeHtml generator in the frontend
To us, the third option looks the cleanest but it seems to require the most custom code since we can't leverage GWT's SafeHtml infrastructure.
Could anybody share how to best solve the problem? Is there another option that we didn't consider so far?
Why not store the text exactly as it was entered by the user, and perform any special treatment when transforming it for the output (e.g. for sending emails, creating PDFs, ...). This is the most natural approach, and you won't have to undo any special treatment e.g. when you offer the user to edit the string.
As a general rule, I would always perform encoding/escaping/transformation only for the immediate transport/storage/output target. There are very few reasons to deviate from this rule, one of them may be performance, e.g. caching a transformed value in the DB. (In these cases, I think it's best to give the DB field a specific name like 'text_htmltransformed' - this avoids 'overescaping', which can be just as harmful as no escaping.)
Note: Escaping/encoding is no replacement for input validation.

How to detect different data types inside HTML page?

What is the best way to detect data types inside html page using Java facilities DOM API, regexp, etc?
I'd like to detect types like skype plugin does for the phone/skype numbers, similar for addresses, emails, time, etc.
'Types' is an inappropriate term for the kind of information you are referring to. Choice of DOM API or regex depends upon the structure of information within the page.
If you know the structure, (for example tables being used for displaying information, you already know from which cell you can find phone number and which cell you can find email address), it makes sense to go with a DOM API.
Otherwise, you should use regex on plain HTML text without parsing it.
I'd use regexes in the following order:
Extract only the BODY content
Remove all tags to leave just plain text
Match relevant patterns in text
Of course, this assumes that markup isn't providing hints, and that you're purely extracting data, not modifying page context.
Hope this helps,
Phil Lello

Java: Best way to remove Javascript from HTML

What's the best library/approach for removing Javascript from HTML that will be displayed?
For example, take:
<html><body><span onmousemove='doBadXss()'>test</span></body></html>
and leave:
<html><body><span>test</span></body></html>
I see the DeXSS project. But is that the best way to go?
JSoup has a simple method for sanitizing HTML based on a whitelist.
Check http://jsoup.org/cookbook/cleaning-html/whitelist-sanitizer
It uses a whitelist, which is safer then the blacklist approach DeXSS uses. From the DeXSS page:
There are still a number of known XSS attacks that DeXSS does not yet detect.
A blacklist only disallows known unsafe constructions, while a whitelist only allows known safe constructions. So unknown, possibly unsafe constructions will only be protected against with a whitelist.
The easiest way would be to not have those in the first place... It probably would make sense to allow only very simple tags to be used in free-form fields and to disallow any kind of attributes.
Probably not the answer you're going for, but in many cases you only want to provide markup capabilities, not a full editing suite.
Similarly, another even easier approach would be to provide a text-based syntax, like Markdown, for editing. (not that many ways you can exploit the SO edit area, for instance. Markdown syntax + limited tag list without attributes).
You could try dom4j http://dom4j.sourceforge.net/dom4j-1.6.1/ This is a DOM parser (as opposed to SAX) and allows you to easily traverse and manipulate the DOM, removing node attributes like onmouseover for example (or entire elements like <script>), before writing back out or streaming somewhere. Depending on how wild your html is, you may need to clean it up first - jtidy http://jtidy.sourceforge.net/ is good.
But obviously doing all this involves some overhead if you're doing this at page render time.

Categories

Resources