How to sanitize HTML code in Java to prevent XSS attacks? - java

I'm looking for class/util etc. to sanitize HTML code i.e. remove dangerous tags, attributes and values to avoid XSS and similar attacks.
I get html code from rich text editor (e.g. TinyMCE) but it can be send malicious way around, ommiting TinyMCE validation ("Data submitted form off-site").
Is there anything as simple to use as InputFilter in PHP? Perfect solution I can imagine works like that (assume sanitizer is encapsulated in HtmlSanitizer class):
String unsanitized = "...<...>..."; // some potentially
// dangerous html here on input
HtmlSanitizer sat = new HtmlSanitizer(); // sanitizer util class created
String sanitized = sat.sanitize(unsanitized); // voila - sanitized is safe...
Update - the simpler solution, the better! Small util class with as little external dependencies on other libraries/frameworks as possible - would be best for me.
How about that?

You can try OWASP Java HTML Sanitizer. It is very simple to use.
PolicyFactory policy = new HtmlPolicyBuilder()
.allowElements("a")
.allowUrlProtocols("https")
.allowAttributes("href").onElements("a")
.requireRelNofollowOnLinks()
.build();
String safeHTML = policy.sanitize(untrustedHTML);

Thanks to #Saljack's answer. Just to elaborate more to OWASP Java HTML Sanitizer. It worked out really well (quick) for me. I just added the following to the pom.xml in my Maven project:
<dependency>
<groupId>com.googlecode.owasp-java-html-sanitizer</groupId>
<artifactId>owasp-java-html-sanitizer</artifactId>
<version>20150501.1</version>
</dependency>
Check here for latest release.
Then I added this function for sanitization:
private String sanitizeHTML(String untrustedHTML){
PolicyFactory policy = new HtmlPolicyBuilder()
.allowAttributes("src").onElements("img")
.allowAttributes("href").onElements("a")
.allowStandardUrlProtocols()
.allowElements(
"a", "img"
).toFactory();
return policy.sanitize(untrustedHTML);
}
More tags can be added by extending the comma delimited parameter in allowElements method.
Just add this line prior passing the bean off to save the data:
bean.setHtml(sanitizeHTML(bean.getHtml()));
That's it!
For more complex logic, this library is very flexible and it can handle more sophisticated sanitizing implementation.

You could use OWASP ESAPI for Java, which is a security library that is built to do such operations.
Not only does it have encoders for HTML, it also has encoders to perform JavaScript, CSS and URL encoding. Sample uses of ESAPI can be found in the XSS prevention cheatsheet published by OWASP.
You could use the OWASP AntiSamy project to define a site policy that states what is allowed in user-submitted content. The site policy can be later used to obtain "clean" HTML that is displayed back. You can find a sample TinyMCE policy file on the AntiSamy downloads page.

HTML escaping inputs works very well. But in some cases business rules might require you NOT to escape the HTML. Using REGEX is not fit for the task and it is too hard to come up with a good solution using it.
The best solution I found was to use: http://jsoup.org/cookbook/cleaning-html/whitelist-sanitizer
It builds a DOM tree with the provided input and filters any element not previosly allowed by a Whitelist. The API also has other functions for cleaning up html.
And it can also be used with javax.validation #SafeHtml(whitelistType=, additionalTags=)

Regarding Antisamy, you may want to check this regarding the dependencies:
http://code.google.com/p/owaspantisamy/issues/detail?id=95&can=1&q=redyetidave

Related

How to fix CWE 73 External Control of File Name or Path

During veracode scan i got CWE 73 issue in my result. Can someone suggest me how to fix this solution for the below coding scenario?
The existing solutions provide is not working,also i would like to know any ESAPI properties can be used to get rid of this issue?
try
{
String serviceFile = System.getProperty("PROP", "");
logger.info("service A", "Loading service file [" + serviceFile+ "].");//Security Issue CWE 73 Occurs in this line
}
There are several solutions for it:
Validate with a whitelist but use the input from the entry point As
we mentioned at Use a list of hardcoded values.
Validate with a simple regular expression whitelist
Canonicalise the input and validate the path
I used the first and second solutions and work fine.
More info at: https://community.veracode.com/s/article/how-do-i-fix-cwe-73-external-control-of-file-name-or-path-in-java
this document provides detailed information on recommended remedies any explicit custom solution. In my case, I tried whitelisting or blacklisting patterns in the provided method parameters however that still did not resolve CWE-73 risk. I think thats expected and the document does state this:
A static engine is limited in what it can detect. It can only scan your code, it won’t actually run your code (unlike Dynamic Scanning). So it won’t pick up any custom validation (if conditions, regular expressions, etc.). But that doesn’t mean that this is not a perfectly valid solution that removes the risk!
I tried my custom blacklisted pattern based solution and annotated the method with FilePathCleanser as documented in here and it resolved CWE-73 and veracode no longer shows that as a risk.
Try to validate this PROP string with Recommended OWASP ESAPI Validator methods, like below.
Example:
String PropParam= ESAPI.validator().getValidInput("",System.getProperty("PROP", ""),"FileRegex",false);
FileRegex is the Regular Expression against which PROP string i.e. path of the file is getting validated.
Define FileRegex = ^(.+)\/([^\/]+)$ in Validator.properties.

Is there any method to create radio button in com.ibm.cics.server

The java API for CICS is here. Does anyone know if there any method to put a couple of radio buttons to a web form using this API?
Here's my code to create radio button
HttpRequest req = HttpRequest.getHttpRequestInstance();
String msg = "ZEUSBANK ANTI-FRAUD CHECK BY SHE0008.<br> "
+ "When investigation is complete. Tick the check box and submit.<br>";
String template = "<form><input type=\"radio\"> YES<br><input type=\"radio\"> NO<br></form>";
HttpResponse resp = new HttpResponse();
Document doc = new Document();
doc.createText(msg);
doc.appendFromTemplate(template);
resp.setMediaType("text/plain");
resp.sendDocument(doc, (short)200, "OK", ASCII);
But when I run it on a browser, it print plain text and doesn't convert html tag.
Fixed it, I just change media type from text/plain to text/html and it works.
As you've already discovered, you needed to send the request with the text/html content type.
If you're planning to do more Java web-based work through CICS Java, you might want to investigate the embedded WebSphere Liberty. It adds support for Java EE features, which includes JSF, JSP and Servlets, which can make web development in Java a lot easier.
Tri,
I haven't used CICS for 15 years, so I doubt I'm an expert anymore. But looking quickly at the API, it seems like all the presentation logic would be in your regular Java code. You would then format appropriate messages and invoke the CICS API to update the server & get a response.
There doesn't seem to be any 'BMS-related' methods at all (which is a good thing).
The only 'field' method I see is com.ibm.cics.server.FormField but that only has get() methods, not set().
Are you just starting with Java CICS, or are you just stuck on this particular issue? If you have some sample code of what you are trying, post it so we can see if anyone has any ideas.
HTH, Jim

HtmlUnit to take snapshot of Ajax applications

I create a basic GWT (Google Web Toolkit) Ajax application, and now I'm trying to create snapshots to the crawlers read the page.
I create a Servlet to response the crawlers, using HtmlUnit.
My application runs perfectly when I'm on a browser. But when in HtmlUnit, it throws a lot of errors about the special chars I have in the HTML. But these chars are content, and I wouldn't like to replace it with the special codes, once it's currently working, just because of the HtmlUnit. (at least I should check before if I'm using HtmlUnit correctly )
I think HtmlUnit should read the charset information of the page and render it as a browser, once it's the objective of the project I think.
I haven't found good information about this problem. Is this an HtmlUnit limitation? Do I need to change all the content of my website to use this java library to take snapshots?
Here's my code:
if ((queryString != null) && (queryString.contains("_escaped_fragment_"))) {
// ok its the crawler
// rewrite the URL back to the original #! version
// remember to unescape any %XX characters
url = URLDecoder.decode(url, "UTF-8");
String ajaxURL = url.replace("?_escaped_fragment_=", "#!");
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_24);
HtmlPage page = webClient.getPage(ajaxURL);
// important! Give the headless browser enough time to execute JavaScript
// The exact time to wait may depend on your application.
webClient.waitForBackgroundJavaScript(3000);
// return the snapshot
response.getWriter().write(page.asXml());
The problem was XML confliting with the HTML. #ColinAlworth comments helped me.
I followed Google example, and there was not working.
To it work, you need to remove XML tags and let just the HTML be responded, changing the line:
// return the snapshot
response.getWriter().write(page.asXml());
to
response.getWriter().write(page.asXml().replaceFirst("<\\?.*>",""));
Now it's rendering.
But although it is being rendered, the CSS is ot working, and the DOM is not updated (GWT updates page title when page opens). HTMLUnit throwed a lot of errors about CSS, and I'm using twitter bootstrap without any changes. Apparently, HtmlUnit project have a lot of bugs, good for small tests, but not to parse complex (or even simple) HTMLs.

Is there a decent, customisable, HTML to Markdown Java API?

I want to save text I scrape from various sources without the HTML tags that are on it, but also keeping as much of the structure as I reasonably can.
Markdown seems to be the solution to this (or possibly MultiMarkdown).
There is a question which offers a suggestion on converting from HTML to markdown, but I want to specify some specific things:
ALL links (including images) are referenced at the END only (i.e. no inline urls)
NO embeded HTML (I'm not even 100% sure yet how I'd like to deal with difficult HTML... but it won't be embeded!)
So my question is as stated in the title: Is there a decent, customisable, HTML to Markdown Java API?
You could try adapting HtmlCleaner which provides a workable interface onto the DOM:
TagNode root = htmlCleaner.clean( stream );
Object[] found = root.evaluateXPath( "//div[id='something']" );
if( found.length > 0 && found instanceof TagNode ) {
((TagNode)found[0]).removeFromTree();
}
This would allow you to structure your output stream in any format that you want using a fairly simple API.
There is a great library for JS called Turndown, you can try it online here. It can be partially customized. For example, links can be referenced at the end. And as far as I know there is no embedded html, everything is transformed.
I needed it for Java (as the linked question), so I ported it. The library for Java is called CopyDown, it has the same test suite as Turndown.
To install with gradle:
dependencies {
compile 'io.github.furstenheim:copy_down:1.0'
}
Then to use it:
CopyDown converter = new CopyDown();
String myHtml = "<h1>Some title</h1><div>Some html<p>Another paragraph</p></div>";
String markdown = converter.convert(myHtml);
System.out.println(markdown);
> Some title\n==========\n\nSome html\n\nAnother paragraph\n

best way to externalize HTML in GWT apps?

What's the best way to externalize large quantities of HTML in a GWT app? We have a rather complicated GWT app of about 30 "pages"; each page has a sort of guide at the bottom that is several paragraphs of HTML markup. I'd like to externalize the HTML so that it can remain as "unescaped" as possible.
I know and understand how to use property files in GWT; that's certainly better than embedding the content in Java classes, but still kind of ugly for HTML (you need to backslashify everything, as well as escape quotes, etc.)
Normally this is the kind of thing you would put in a JSP, but I don't see any equivalent to that in GWT. I'm considering just writing a widget that will simply fetch the content from html files on the server and then add the text to an HTML widget. But it seems there ought to be a simpler way.
I've used ClientBundle in a similar setting. I've created a package my.resources and put my HTML document and the following class there:
package my.resources;
import com.google.gwt.core.client.GWT;
import com.google.gwt.resources.client.ClientBundle;
import com.google.gwt.resources.client.TextResource;
public interface MyHtmlResources extends ClientBundle {
public static final MyHtmlResources INSTANCE = GWT.create(MyHtmlResources.class);
#Source("intro.html")
public TextResource getIntroHtml();
}
Then I get the content of that file by calling the following from my GWT client code:
HTML htmlPanel = new HTML();
String html = MyHtmlResources.INSTANCE.getIntroHtml().getText();
htmlPanel.setHTML(html);
See http://code.google.com/webtoolkit/doc/latest/DevGuideClientBundle.html for further information.
You can use some templating mechanism. Try FreeMarker or Velocity templates. You'll be having your HTML in files that will be retrieved by templating libraries. These files can be named with proper extensions, e.g. .html, .css, .js obsearvable on their own.
I'd say you load the external html through a Frame.
Frame frame = new Frame();
frame.setUrl(GWT.getModuleBase() + getCurrentPageHelp());
add(frame);
You can arrange some convention or lookup for the getCurrentPageHelp() to return the appropriate path (eg: /manuals/myPage/help.html)
Here's an example of frame in action.
In GWT 2.0, you can do this using the UiBinder.
<ui:UiBinder xmlns:ui='urn:ui:com.google.gwt.uibinder'>
<div>
Hello, <span ui:field='nameSpan’/>, this is just good ‘ol HTML.
</div>
</ui:UiBinder>
These files are kept separate from your Java code and can be edited as HTML. They are also provide integration with GWT widgets, so that you can easily access elements within the HTML from your GWT code.
GWT 2.0, when released, should have a ClientBundle, which probably tackles this need.
You could try implementing a Generator to load external HTML from a file at compile time and build a class that emits it. There doesn't seem to be too much help online for creating generators but here's a post to the GWT group that might get you started: GWT group on groups.google.com.
I was doing similar research and, so far, I see that the best way to approach this problem is via the DeclarativeUI or UriBind. Unfortunately it still in incubator, so we need to work around the problem.
I solve it in couple of different ways:
Active overlay, i.e.: you create your standard HTML/CSS and inject the GET code via <script> tag. Everywhere you need to access an element from GWT code you write something like this:
RootPanel.get("element-name").setVisible(false);
You write your code 100% GWT and then, if a big HTML chunk is needed, you bring it to the client either via IFRAME or via AJAX and then inject it via HTML panel like this:
String html = "<div id='one' "
+ "style='border:3px dotted blue;'>"
+ "</div><div id='two' "
+ "style='border:3px dotted green;'"
+ "></div>";
HTMLPanel panel = new HTMLPanel(html);
panel.setSize("200px", "120px");
panel.addStyleName("demo-panel");
panel.add(new Button("Do Nothing"), "one");
panel.add(new TextBox(), "two");
RootPanel.get("demo").add(panel);
Why not to use good-old IFRAME? Just create an iFrame where you wish to put a hint and change its location when GWT 'page' changes.
Advantages:
Hits are stored in separate maintainable HTML files of any structure
AJAX-style loading with no coding at all on server side
If needed, application could still interact with loaded info
Disadvantages:
Each hint file should have link to shared CSS for common look-and-feel
Hard to internationalize
To make this approach a bit better, you might handle loading errors and redirect to default language/topic on 404 errors. So, search priority will be like that:
Current topic for current language
Current topic for default language
Default topic for current language
Default error page
I think it's quite easy to create such GWT component to incorporate iFrame interactions
The GWT Portlets framework (http://code.google.com/p/gwtportlets/) includes a WebAppContentPortlet. This serves up any content from your web app (static HTML, JSPs etc.). You can put it on a page with additional functionality in other Portlets and everything is fetched with a single async call when the page loads.
Have a look at the source for WebAppContentPortlet and WebAppContentDataProvider to see how it is done or try using the framework itself. Here are the relevant bits of source:
WebAppContentPortlet (client side)
((HasHTML)getWidget()).setHTML(html == null ? "<i>Web App Content</i>" : html);
WebAppContentDataProvider (server side):
HttpServletRequest servletRequest = req.getServletRequest();
String path = f.path.startsWith("/") ? f.path : "/" + f.path;
RequestDispatcher rd = servletRequest.getRequestDispatcher(path);
BufferedResponse res = new BufferedResponse(req.getServletResponse());
try {
rd.include(servletRequest, res);
res.getWriter().flush();
f.html = new String(res.toByteArray(), res.getCharacterEncoding());
} catch (Exception e) {
log.error("Error including '" + path + "': " + e, e);
f.html = "Error including '" + path +
"'<br>(see server log for details)";
}
You can use servlets with jsps for the html parts of the page and still include the javascript needed to run the gwt app on the page.
I'm not sure I understand your question, but I'm going to assume you've factored out this common summary into it's own widget. If so, the problem is that you don't like the ugly way of embedding HTML into the Java code.
GWT 2.0 has UiBinder, which allows you to define the GUI in raw HTMLish template, and you can inject values into the template from the Java world. Read through the dev guide and it gives a pretty good outline.
Take a look at
http://code.google.com/intl/es-ES/webtoolkit/doc/latest/DevGuideClientBundle.html
You can try GWT App with html templates generated and binded on run-time, no compiling-time.
Not knowing GWT, but can't you define and anchor div tag in your app html then perform a get against the HTML files that you need, and append to the div? How different would this be from a micro-template?
UPDATE:
I just found this nice jQuery plugin in an answer to another StackOverflow question.

Categories

Resources