Java website protection solutions (especially XSS)

Java website protection solutions (especially XSS) - java

I'm developing a web application, and facing some security problems.
In my app users can send messages and see other's (a bulletin board like app). I'm validating all the form fields that users can send to my app.
There are some very easy fields, like "nick name", that can be 6-10 alpabetical characters, or message sending time, which is sended to the users as a string, and then (when users ask for messages, that are "younger" or "older" than a date) I parse this with SimpleDateFormat (I'm developing in java, but my question is not related to only java).
The big problem is the message field. I can't restrict it to only alphabetical characters (upper or lowercase), because I have to deal with some often use characters like ",',/,{,} etc... (users would not be satisfied if the system didn't allow them to use these stuff)
According to this http://ha.ckers.org/xss.html, there are a lot of ways people can "hack" my site. But I'm wondering, is there any way I can do to prevent that? Not all, because there is no 100% protection, but I'd like a solution that can protect my site.
I'm using servlets on the server side, and jQuery, on the client side. My app is "full" AJAX, so users open 1 JSP, then all the data is downloaded and rendered by jQuery using JSON. (yeah, I know it's not "users-without-javascript" friendly, but it's 2010, right? :-) )
I know front end validation is not enough. I'd like to use 3 layer validation:
- 1. front end, javascript validate the data, then send to the server
- 2. server side, the same validation, if there is anything, that shouldn't be there (because of client side javascript), I BAN the user
- 3. if there is anything that I wasn't able to catch earlier, the rendering process handle and render appropriately
Is there any "out of the box" solution, especially for java? Or other solution that I can use?

To minimize XSS attacks important thing is to encode any field data before putting it back on the page. Like change > to > and so on. This would never allow any malicious code to execute when being added to the page.
I think you are doing lot of right things by white listing the data you expect for different fields. Beyond that for fields which can allow other characters which can be problematic encoding would fix the issue for you.
Further since you are using Ajax it gives you some protection as people cannot override values in URL parameters etc.

Look at the AntiSamy library. It allows you to define rulesets for your application, then run your user input through AntiSamy to clean it per your rules.

The easiest way is to do a simple replacement for the following
< with <
> with >
' with \'
That will solve most database vulnerability.

Related

filter out encoded javascript content from request

I have a problem where I am trying to cleanse the request content to strip out HTML and javascript if included in the input parameters.
This is basically to protect against XSS attacks and the ideal mechanism would be to validate input and encode the output but due to some restrictions I cannot work on the output end.
All I can do at this time is to try to cleanse the input through a filter. I am using ESAPI to canonicalize the input parameters and also using jsoup with the most restrictive Whitelist.none() option to strip all HTML.
This works as long as the malicious javascript is within some HTML tags but fails for a URL with javascript code without any HTML surrounding it, eg:
http://example.com/index.html?a=40&b=10&c='-prompt``-'
ends up showing an alert on the page. This is kind of what I am doing right now:
param = encoder.canonicalize(param, false, false);
param = Jsoup.clean(param, Whitelist.none());
So the question is:
Is there some way through which I can make sure that my input is stripped of all HTML and javascript code at the filter?
Should I throw in some regex validations but is there any regex that will take care of the cases that are getting past the check I have right now?

DISCLAIMER:
If output-escaping is not allowed in your internet-facing solution, you are in a NO-WIN SCENARIO. It's like antivirus on Windows: You'll be able to detect specific and known attacks, but you will be unable to detect or defend against unknown attacks. If your employer insists on this path, your due diligence is to make management aware of this fact and get their acceptance of the risks in writing. Every time I've confronted management with this, they've opted for the correct solution--output escaping.
================================================================
First off... watch out when using JSoup in any kind of a cleaning/filtering/input validation situation.
Upon receiving invalid HTML, like
<script>alert(1);
Jsoup will add in the missing </script> tag.
This means that if you're using Jsoup to "cleanse" HTML, it first transforms INVALID HTML into VALID HTML, before it begins processing.
So the question is: Is there some way through which I can make sure
that my input is stripped of all HTML and javascript code at the
filter? Should I throw in some regex validations but is there any
regex that will take care of the cases that are getting past the check
I have right now?
No. ESAPI and ESAPI's input validation is not appropriate for your use case because HTML is not a regular language and ESAPI's input for its validation are Regular Expressions. The fact is you cannot do what you ask:
Is there some way through which I can make sure that my input is
stripped of all HTML and javascript code at the filter?
And still have a functioning web application that requires user-defined HTML/JavaScript.
You can stack the deck in your favor a little bit: I would choose something like OWASP's HTML Sanitizer. and test your implementation against the XSS inputs listed here.
Many of those inputs are taken from OWASP's XSS Filter evasion cheat sheet, and will at least exercise your application against known attempts. But you will never be secure without output escaping.
===================UPDATE FROM COMMENTS==================
SO the use case is to try and block all html and javascript. My recommendation is to implement caja since it encapsulates HTML, CSS, and Javascript.
Javascript though is also difficult to manage from input validation, because like HTML, JavaScript is a non-regular language. Additionally, each browser has its own implementation that deviates in different ways from the ECMAScript spec. If you want to protect your input from being interpreted, this means you'd ideally have to have a parser for each browser family attempting to interpret user input in order to block it.
When all you've really got to do is make sure that the output is escaped. Sorry to beat a dead horse, but I have to stress that output escaping is 100x more important than rejecting user input. You want both, but if forced to choose one or the other, output escaping is less work overall.

Generating Dynamic URLs

I have a list of users across various companies who are using one of the functionality that our website provides. Whenever they contact our business group , we need to send a url via email to the requestor in order for them to upload some data. All these external users do not have any dedicated account. However we do not want a static link to be provided to them as this can be accessed by anyone over the internet. We want dynamic links to be generated. Is this something that is usually done? Is there an industry accepted way of doing this? Should we ensure that the dynamic link expires after a certain amount of time - if so , are there any design options?
Thanks a lot!

Usually, parameters to urls and not the actual urls are what's dynamic. Basically you generate params that are stored somewhere, typically on the database, and send email with the url and the parameter(s). This url is valid for only a limited period of time and possibly only for one request.
Answers to questions:
yes, this is something that is quite commonly used in, for example, unsubscribing from a mailing list or validating an account with a working email address
I'm not aware of any single way that is "industry accepted", there are many ways of doing it, but the idea is not that complex - you just need to decide on a suitable token format
normally you should ensure that the link expires after a certain amount of time. Depending on the use case that can be some days, a week or something else. In practice, you'd remove or disable the generated parameters in your database. However, if this data is something that might be needed for extended periods of time, you might want to think up a functionality so that it can be retrieved later on.

You may have a static URL taking a token as parameter. Eg. http://www.mycompany.com/exchange/<UUID> or http://www.mycompany.com/exchange?token=<UUID>.
The UUID could have a validity in a time range or be limited to a single use (one access or one upload).

Other variant is to use exists cookies on that site in web browser (of course, if they are).
But there are some drawbacks in this solution:
User can open link in different machine, different browser. User can clean all cookies or they can expire after it was visited your site last time when user try to go on granted URL. In these cases user won't access your page.

Anchors with SafeHtml

How would you use SafeHtml in combination with links?
Scenario: Our users can enter unformatted text which may contain links, e.g. I like&love http://www.stackoverflow.com. We want to safely render this text in GWT but make the links clickable, e.g. I like&love <a="http://www.stackoverflow.com">stackoverflow.com</a>. Aside rendering the text in the GWT frontend, we also want to send it via email where the links should be clickable as well.
So far, we considered the following options:
Store the complete text as HTML in the backend and let the frontend assume it's correctly encoded (I like&love <a="http://www.stackoverflow.com">stackoverflow.com</a>) -> Introduces XSS vulnerabilities
Store plain text but the links as HTML (I like&love <a="http://www.stackoverflow.com">stackoverflow.com</a>) in the backend and use HtmlSanitizer in the frontend
Store plain text and special encoding for the links (I like&love [stackoverflow.com|http://www.stackoverflow.com]) in the backend and use a custom SafeHtml generator in the frontend
To us, the third option looks the cleanest but it seems to require the most custom code since we can't leverage GWT's SafeHtml infrastructure.
Could anybody share how to best solve the problem? Is there another option that we didn't consider so far?

Why not store the text exactly as it was entered by the user, and perform any special treatment when transforming it for the output (e.g. for sending emails, creating PDFs, ...). This is the most natural approach, and you won't have to undo any special treatment e.g. when you offer the user to edit the string.
As a general rule, I would always perform encoding/escaping/transformation only for the immediate transport/storage/output target. There are very few reasons to deviate from this rule, one of them may be performance, e.g. caching a transformed value in the DB. (In these cases, I think it's best to give the DB field a specific name like 'text_htmltransformed' - this avoids 'overescaping', which can be just as harmful as no escaping.)
Note: Escaping/encoding is no replacement for input validation.

How do I save a viewer response to a server?

If I want to save a response to a query on a website I'm coding to a server, how would I do that?
Here's an example. If I had a site with a "Rate us" form, and a person answered with a "AWFUL SITE!" how would I be able to save & retrieve that information?

There are several ways to do what you want to do. I'll describe two of them.
You could append each rating to the end of a file on the web server. This would be done in a server-side scripting language usually, such as PHP or ASP.NET, and you would probably want to set the permissions on the file so that it's not readable to everyone.
You could set up a table in a database (MySQL or otherwise) and add a new row for each rating given. Again, this would be done in something like PHP or ASP.NET and you would want to make sure you take precautions against SQL injection attacks (not much of a problem if you use PHP Data Objects rather than the deprecated mysql_* functions).
I would personally go for the second option as it's easier to manage and change, and it's easier to set it up so that you can store IP, name, optional email and message in every row. And like I said, you can add a new field later down the line without running into the obvious problems.

Looking for a question that combines the understanding of few web technologies [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am teaching a web development course at a CS department, I wrote most of the final test by now, each question focus on a specific feature or a specific technology,
I wonder if you can think of/recommend a question that combine the knowledge of few technologies..
The course mostly covers: HTML, CSS, JS, HTTP, Servlets, JSP and JDBC.
(as well as AJAX, ORM, basic security issues like SQL-Injection and XSS, HTML5, REST APIs)
EDIT: I will super appreciate questions with answers :-) thanks!
I'll give the bounty to the question with the highest rank, so please vote! I honestly like most of the questions here, thank you all :-)

Explain the relationship of the DOM to
each of the following technologies:
HTML, CSS, JavaScript.
The goal here is for the answer to make clear the student understands that HTML generates a DOM structure, CSS affects how that structure is rendered, and JavaScript affects how that structure is modified. If you understand how it all ties back into the DOM, all client-side coding becomes straightforward.

Fun question :-) How about...
On web development you need to separate content, style and behavior. Describe why this is done and what different technologies you use in which layer. Every acronym should be written in full text on first time use. (10 p)
or...
Describe what happens in a Web Browser (step by step) when a web page is transferred on the internet from a Web server through HyperText Transfer Protocol to a Client. Consider all the different technologies you have used in this course. (10 p)

Explain what happens, and which technologies could be used, when a user logs in to a protected web site using form based login that sets a HTTP cookie. (Starting from the HTML form all the way to the database and back to the browser.) Bonus question: What changes, when using AJAX for the login?
Answer (main points):
HTML: Form (using POST) with text input fields and a button. Security: Form sends via HTTPS. The login page itself should also be a HTTPS page (otherwise, the form could be replaced by mallory -> MITM)
Javascript: Performs some basic validation (e. g. empty password), and displays error message before sending to server.
Servlet: Receives POST request, takes username/password parameters (in plaintext), calculates (salted) hash from password, discards plaintext password.
JDBC: Selects hashed password from DB. Used to compare with the transmitted password.
Servlet: On success, creates a new session (leads to the creation of a cookie header). Prepares objects that will be used in the JSP page (and stores them in the session or request scope).
JSP: Prepares the HTML page that will be sent to the browser.
Browser: Receives HTTP response, sets cookie and displays the page.
Bonus (AJAX): The server doesn't have to prepare the entire page, but only sends the necessary data and/or HTML snippets to the client. The browser doesn't reload the entire page, but modifies the current page using JavaScript. Security: AJAX can't perform Cross-Site requests, so it's impossible to have a HTTP page submit the login data via HTTPS.
Caution
It should be noted, that this is not meant to be used as a HOWTO for building a secure login mechanism. This description is simplified and doesn't cover every security aspect. OTOH, as an exam question, it should probably be simplified further and adjusted to the content of the curriculum.

You can ask to explain how to implement MVC pattern. And in this MVC pattern where does each technology come in use. Rather How and Why ?

Since students have already developed simplified twitter during their course, you may ask a question like what additional steps they would do to make it a real twitter website or a clone of it and ask to describe each steps staring from html to ORM / database. You may explicitly specify the technologies to be used.

Well, putting on my "evil" hat for a moment, you could ask how the back end data model should dictate the layout of the front end, and any answer other than some variation of "It doesn't" gets to take the class over again. >:-)

Why should any framework you use generate
HTML, CSS and JS?
DRY

Imagine you work for a security agency
and were given the task of developing
a web-site. The field agents
specifilly requested that the site
could swap colors so that they could
use it both on night-vision and at the
office. With what you learned describe
how you would separate content from
structure to allow night/day switching
and what security measures you would
implement to prevent another enemy
agency from stealing your data.
A spiced up question. I always find my students more interested when I put them in the middle of a plot.

Something along the lines of...
Explain how you would display the results of a call to an offsite XML feed when the user performs some action in the browser. The browser must not navigate.
A good answer would address the need for client-side scripting, the XSS issue, and the server-side component necessary to get around the XSS issue, possibly with pseudocode or snippets.

ask to develop a student database system,in which you user can search the database with Date of Birth.
here the folowing technologies can be used and tested.
1.HTML for form controls
2.CSS for esthetics
3.Javascript for date validation
4.very importantly you can explain SQL INJECTION.
5.JSP
6.SERVLETS
7.JDBC
8.ANY database
9.AJAX
10.MVC design pattern can be used.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.