In my java app I'm preventing XSS attacks. I want to encode URL and hidden field paramaters in the HttpServletRequest objects I have a handle on.
How would I go about doing this?
Don't do that. You're making it unnecessarily more complicated. Just escape it during display only. See my answer in your other topic: Java 5 HTML escaping To Prevent XSS
To properly display user-entered data on an HTML page, you simply need to ensure that any special HTML characters are properly encoded as entities, via String#replace or similar. The good news is that there is very little you need to encode (for this purpose):
str = str.replace("&", "&").replace("<", "<");
You can also replace > if you like, but there's no need to.
This isn't only because of XSS, but also just so that characters show up properly. You may also want to handle ensuring that characters outside the common latin set are turned into appropriate entities, to protect against charset issues (UTF-8 vs. Windows-1252, etc.).
You can use StringEscapeUtils from the library Apache Jakarta Commons Lang
http://www.jdocs.com/lang/2.1/org/apache/commons/lang/StringEscapeUtils.html
Related
To escape from Cross-Site-Scripting attack i have to sanitize/validate java object that is coming from RequestBody. Can i make use of Encoder (from OWASP) to encode the entire java object. It seems that the Encoder will encode only the strings and cant accept objects. I have the similar issue many places wherein I should to handle this issue.
Is there any way to do sanitize for whole object for avoiding cross site script issue?
As you noticed, sanitization of input to prevent XSS (Cross Site Scripting) is only relevant for strings. Encoding other types is either impossible or meaningless.
To understand it better, you need to actually understand the mechanism and attack vector of an XSS. I suggest starting here: OWASP XSS
To solve your problem, it would make sense to create a custom method that after getting the object from the request, sanitizes it by going over all its strings (don't forget strings in lists and other data structures) and encode them using the OWASP encoder.
Good luck!
Lets take a look at this scenario: you have a textbox that allows the user to copy any kind of text (UTF8 or Chinese or Arabic characters), then a Submit button to insert that text into MySQL DB.
Normally, I use URLEncoder.encode(text,"UTF-8") & my App runs really stably; I never worried if the users inserted any special characters since the text was encoded so when I read the text, I just decoded it & the text came out exactly the way it was before.
But some guys said that we can set UTF8 in MySQL and Tomcat server or something so we don't need to encode, but this solution requires configuration and I hate configuration as it is not a very sound solution.
Besides, users can enter junk code to hack the DB.
So, In Java & MYSQL, is it good practice to encode text when it is inserted into the DB?
Some people in other forum said it is very bad to store encoded text in DB, but they don't say why it is bad.
So this question is for people who have a lot of experience in Java and MySQL to answer!
The problem with putting URL or XML encoded text into the database is that makes life difficult for querying and doing other processing of that text.
The other problem is that there are different types of escaping that are required in different contexts.
... but this solution requires configuration & I hate configuration as it is not a very sound solution.
Ermm, asserting that configuration is "not a very sound solution" is not a rational argument. The vast majority of applications with a database component require some kind of database configuration.
Besides, users can enter junk code to hack the DB.
The real solution to SQL injection is to use PreparedStatement and fixed SQL query, insert, update, etc strings. Use placeholders for all of the query parameters and use the PreparedStatement set parameter methods to supply their values. This will properly quote the text in the parameters to remove the possibility of SQL injection attacks.
The other thing you need to worry about is people using unescaped XML / HTML metacharacters (like <, > and quotes) to effect XSS attacks against other users. The way to defeat that is to escape the text at the point you are creating the HTML. For instance, you can use the <c:out> to escape the text.
Finally, HTML URL encoded text can't be inserted directly into an HTML page. The URL encoding scheme (using %'s and +'s) is not the correct encoding scheme for text in an HTML page. There you need to use &...; character entities to encode things. A %xx in text will appear as exactly that when you display your web page in a browser. Try it and see!
Answering the questions in the comments:
iamthepiguy said "encode everything before putting it into Db", but u said "No". Suppose i put Html text into DB, there a lot of special characters & many other stuffs, how can we let Db to handle all of them, for example, if mysql doesn't recognize a char, it will turn to "?" & it means the text got corrupted, it mean the users lost that text. How Mysql handle all kind of special characters?
If you use a PreparedStatement with SQL that has placeholders for all of the text parameters, then the JDBC driver takes care of the escaping automatically.
Also, since there is a very diversity of UTF & special chars, so how many other things we need to worry if we do not encode text to make sure the system run stably?
Same answer.
Encoded text make the system run a bit slower, but we are headache-free.
There are no headaches if you use prepared statements and <c:out> (or the equivalent).
you sid "The way to defeat that is to escape the text at the point you are creating the HTML." so we have to use Java to encode right?
Yes, but you ONLY HTML encode the text when you output it for inclusion in a web page. If you output it as JSON, you encode using JSON escaping ... or more likely, you let the JSON serializer do it for you. If you send the text in other formats, or include it in other things, you encode it as required ... or not at all.
But the point is that you don't store it in the database in encoded form. If you do, then in nearly all cases (including HTML!!) you'd need to decode the HTML URL-encoded text before encoding it in the correct way.
It is somewhat better in terms of stability and configuration, as well as safety from XSS attacks, to encode everything before putting it in the database. The disadvantages are it takes slightly longer, and slightly more space in the DB, and you could escape everything when it is created again, but it's easier to escape everything.
I am using PlayFramework2 and I can't find a way to properly handle HTML escaping.
In the template system, HTML entities are filtered by default.
But when I use REST requests with Backbone.js, my JSON objects are not filtered.
I use play.libs.Json.toJson(myModel) to transform an Object into a String.
So, in my controller, I use return ok(Json.toJson(myModel)); to send the response ... but here, the attributes of my model are not secured.
I can't find a way to handle it ...
Second question :
The template engine filters HTML entities by default, this means that we have to store into our database the raw user inputs.
Is it a save behaviour ?
Third questdion :
Is there in the PlayFramework a function to manualy escape strings ? All those I can find require to add new dependencies.
Thanks !
Edit : I found a way at the Backbone.js templating level :
- Use myBackboneModel.escape('attr'); instead of myBackboneModel.get('attr');
Underscore.js templating system also includes that options : <%= attr %> renders without escaping but <%- attr %> renders with escaping !
Just be careful to the efficiency, strings are re-escaped at each rendering. That's why the Backbone .create() should be prefered.
The best practices on XSS-attacks prevention usually recommend you to reason about your output rather than your input. There's a number of reasons behind that. In my opinion the most important are:
It doesn't make any sense to reason about escaping something unless you exactly know how you are going to output/render your data. Because different ways of rendering will require different escaping strategies, e.g. properly escaped HTML string is not good enough to use it in Javascript block. Requirements and technologies change constantly, today you render your data one way - tomorrow you might be using another (let's say you will be working on a mobile client which doesn't require HTML-escaping, because it doesn't use HTML at all to render data) You can only be sure about proper escaping strategy while rendering your data. This is why modern frameworks delegate escaping to templating engines. I'd recommend reviewing the following article: XSS (Cross Site Scripting) Prevention Cheat Sheet
Escaping user's input is actually a destructive/lossy operation – if you escape user's input before persisting it to a storage you will never find out what was his original input. There's no deterministic way to 'unescape' HTML-escaped string, consider my mobile client example above.
That is why I believe that the right way to go would be to delegate escaping to your templating engines (i.e. Play and JS-templating engine you're using for Backbone). There's no need to HTML-escape string you serialize to JSON. Notice that behind the scenes JSON-serializer will JSON-escape your strings, e.g. if you have a quote in your string it will be properly escaped to ensure resulting JSON is correct, because it's a JSON serializer after all that's why it only cares about proper JSON rendering, it knows nothing about HTML (and it shouldn't). However when you rendering your JSON data in the client side you should properly HTML-escape it using the functionality provided by the JS-templating engine you're using for Backbone.
Answering another question: you can use play.api.templates.HtmlFormat to escape raw HTML-string manually:
import play.api.templates.HtmlFormat
...
HtmlFormat.escape("<b>hello</b>").toString()
// -> <b>hello</b>
If you really need to make JSON-encoder escape certain HTML strings, a good idea might be to create a wrapper for them, let's say RawString and provide custom Format[RawString] which will also HTML-escape a string in its writes method. For details see: play.api.libs.json API documentation
In a portion of my J2EE/java code, I do a URLEncoding on the output of getRequestURI() to sanitize it to prevent XSS attacks, but Fortify SCA considers that poor validation.
Why?
The key point is that you need to convert HTML special characters to HTML entities. This is also called "HTML escaping" or "XML escaping". Basically, the characters <, >, ", & and ' needs to be replaced by <, >, ", & and '.
URL encoding does not do that. URL encoding converts URL special characters to percent-encoded values. This is not HTML escaping.
In case of web applications, HTML escaping is normally to be done in the view side, exactly there where you're redisplaying user-controlled input. In case of a Java EE web applications, that depends on the view technology you're using.
If the webapp is using modern Facelets view technology, then you don't need to escape it yourself. Facelets will already implicitly do that.
If the webapp is using legacy JSP view technology, then you need to ensure that you're using JSTL <c:out> tag or fn:escapeXml() function to redisplay user-controlled input.
<c:out value="${bean.foo}" />
<input type="text" name="foo" value="${fn:escapeXml(param.foo)}" />
If the webapp is very legacy or bad designed and using servlets or scriptlets to print HTML, then you've a bigger problem. There are no builtin tags or functions, let alone Java methods which can escape HTML entities. You should either write some escape() method yourself or use the Apache Commons Lang StringEscapeUtils#escapeHtml() for this. Then you need to ensure that you're using it everywhere you're printing user-controlled input.
out.print("<p>" + StringEscapeUtils.escapeHtml(request.getParameter("foo")) + "</p>");
Much better would be to redesign that legacy webapp to use JSP with JSTL.
URL encoding does not affect certain significant characters including single quote (') and parentheses, so URL encoding will pass through unchanged certain payloads.
For example,
onload'alert(String.fromCharCode(120))'
will be treated by some browsers as a valid attribute that can result in code execution when injected inside a tag.
The best way to avoid XSS is to treat all untrusted inputs as plain text, and then when composing your output, properly encode all plain text to the appropriate type on output.
If you want to filter inputs as an additional layer of security, make sure your filter treats all quotes (including back-tick) and parentheses as possible code, and disallow them unless the make sense for that input.
We have a webapplication. At some points there is a JavaScript based WSIWYG / RichText Editor. It filters some JavaScript but uses HTML text to format it's content.
Unfortunately it does not filter all JavaScript. I was able to proof a XSS attack with an event handler. I think the JavaScript client side filtering of JavaScript is not safe at all, because at client side it can be manipulated.
So I would like to filter or escape JavaScript at the server side. I had a short look at ESAPI for Java. But we have a requirement, I don't know if it is special or a problem:
The HTML elements the editor uses should not be filtered or escaped, only JavaScript. The HTML should be ordinary rendered in the browser.
Is there a safe way, to escapce or filter JavaScript while keeping the HTML like it is?
Does ESAPI or any other API help me doing this?
How do I do it.
Thanks in advance.
It is difficult to state what escaping schemes have to be used to escape JavaScript without knowing whether the application is vulnerable to DOM-based XSS attacks or the run-of-the-mill (reflected and persistent) XSS attacks.
ESAPI for Java will help in both cases though. In the case of DOM-based XSS attacks, you would need to encode the unsafe data multiple times (and using different encoding schemes if necessary) to ensure that each parser in the parsing chain will not be subject to XSS attacks. In the case of reflected or persistent XSS attacks, you'll usually need to apply the escaping only once, in the appropriate context.
It should be kept in mind that, allowing raw HTML on its own is also unsafe, resulting in XSS. You might want to take a look at a different approach to sanitizing inputs; using AntiSamy for filtering HTML might be warranted in this case.
You need to parse the HTML and reject any tags and attributes that aren't in a strict whitelist of safe tags/attributes.
The whitelist would not include tags like <script>, <style>, or <link>, and it wouldn't include attributes like onclick, onload, or style.
You should also make sure that href and src attributes use the http or https protocols (or a relative path), and not javascript:.