Should I use URL rewriting to protect against XSS - java

Let's say someone enters the following URL in their browser:
http://www.mywebsite.com/<script>alert(1)</script>
The page is displayed as normal with the alert popup as well. I think this should result in a 404, but how do I best achieve that?
My webapp is running on a Tomcat 7 server. Modern browser will automatically protect against this, but older ones, I am looking at you IE6, wont.

It sounds like you are actually getting a 404 page, but that page includes the resource (in this case a piece of JavaScript code) and doesn't do any converting of < and > to their respective HTML entities. I've seen this happen on several websites.
The solution would be to create a custom 404 page which doesn't echo back the resource to the page, or that does proper HTML entity conversion beforehand. There are plenty of tutorials you can find through Google which should help you do this.

Here's what I did:
Created a high level servlet filter which uses OWASP's HTML sanitizer to check for dodgy characters. If there are any, I redirect to our 404 page.

You should put a filter in your webapp to protect against an XSS attack.
Get all the parameters from the HttpServletRequest object and replace any parameter with value starting with with spaces in filter code.
This way any harmful JS script won't reach your server side components.

Related

How to enable Nutch follow http redirect?

During my crawling, there is a page redirecting to 404 error, but when i use "readdb" command, the status of the page is still 302 instead of 404.
Then i looked up the configuration file, and i found option "http.redirect.max". I have already configure "http.redirect.max" for 3, and recrawled the page, but the status of it is still 302.
After i read the source code, i found something like:
Response response = getResponse(u, datum, false);
In method "getProtocolOutput" of HttpBase.java. After i changed "false" to "true" and recompiled nutch, the function works.
So i wonder is this a correct way of enabling nutch to follow redirects? Will this modification leads to some other error while crawling?
I think that in this case, Nutch is working properly http.redirect.max controls if the redirect is followed immediately or if it should be queued for a next round.
If you crawl one URL (A) which contains a redirect to a 404, the first URL still exists with a 30x state, it's the second URL (B) that has a 404 response. From Nutch side, there are two different URLs (which makes sense).
I haven't tested your change (with other scenarios), but if you have a similar case, let's say that page A redirects to a different page C (which is not a 404), in this case, would you expect that the content of page C to be linked to the URL of A and totally ignore the URL of C?
In the browser, this is usually how we perceive it, but underneath there are two different requests/URLs.

How to check is webpage is static or dynamic

I'm doing some web scraping and using Jsoup to parse html files and my understanding is that Jsoup doesn't work well with dynamic web pages. Is there a way to check if a web page is dynamic so that I don't bother attempting to parse it using Jsoup?
Short answer: Not really. You need to check case by case
Explanation:
Today's websites are full of ajax calls. Many are loading important data, others are only maginally interesting when you scrape a site's content. Many very modern sites even do both, they send complete rendered page to the client where it gets transformed to a web-app (keyword isomorphic rendering)
So you need to check the site in question case by case. It is not that hard though. just fire up Curl and see if you get the content you need. If not, it is often also not that hard to understand the structure and parameters of the ajax calls. If you are doing this, then you often get even dynamic content fine with only Jsoup.
You cannot be sure 100% that a website is dynamic or static, cause there are ways to hide the clues that show a website is dynamic. but you can check on a limited number of HTTP response headers to test whether its dynamic or static :
Cookie : An HTTP cookie previously sent by the server with Set-Cookie
X-Csrf-Token : Used to prevent cross-site request forgery. Alternative header names are: X-CSRFToken and X-XSRF-TOKEN
X-Powered-By : specifies the technology (e.g. ASP.NET, PHP, JBoss) supporting the web application (version details are often in X-Runtime, X-Version, or X-AspNet-Version)
These are 3 HTTP headers that a server scripting is involved with to generate(as far as I know)
Also chances are that a webpage with form related elements should have a server side mechanism to process form data.

How to redirect to a page in webcenter sites

Is there any way to redirect to a page/template using webcenter sites tags? or we need to depend on standard j2ee respnose.sendRedirect() method??
If you're using a JSP wrapper, then you can't really do this since JSPs start sending the response headers too early. You'll have to render an HTML page with the meta redirect tag.
If your wrapper is XML or Groovy, then you can do this using WebCenter Sites APIs. There's a Groovy example here.
Redirecting a request is a tricky part in oracle webcenter sites. The response.sendRedirect code doesn’t work in sites JSP. Because the response headers are committed early in the page evolution, so we can not set the return status code in jsp in sites.
We can control this at client side immediately after loading the webpage. In javascript we can set the condition to forward to the respective page/url. Return the below javascript code as the response from the sites’s jsp page. Here is the best solution to achieve this task.
http://devble.com/forward-and-redirect-request-in-webcenter-sites/

Crawl contents loaded by ajax

Nowadays many websites contain some content loaded by ajax(e.g,comments in some video websites). Normally we can't crawl these data and what we get is just some js source code. So here is the question: in what ways can we execute the javascript code after we get the html response and get to the final page we want?
I know that HtmlUnit has the ability to execute background js,yet some many bugs and errors are there. Are there any else tools can help me with it?
Some people tell me that I can crawl the ajax request url, analyze its parameters and send request again so as to gain the data. If things can't work out according to the way I mention above, can anyone tell me how to extract the ajax url and send the request in correct format?
By the way,if the language is java,it would be the best
Yes, Netwoof can crawl Ajax easily. Its API and bot builder let you do it without a line of code.
Thats the great thing about HTTP you don't even need java. My goto tool for debugging AJAX is the chrome extension Postman. I start by looking at the request in the chrome debugger and identifying the salient bits(url or form encoded params etc.)
Then it can be as simple as opening a tab and launch requests at the server with Postman. As long as its all in the same browser context all of your cookies(for authentication, etc.) will be shipped along too.

Redirect from docroot to an external url in glassfish

I've googled around and only found solution where they suggest putting an apache httpd in front of glassfish. Sure, that works.
But what if I do not wish to/cannot put any thing in front of glassfish?
Without using the index.jsp in the docroot of the domain to have something like:
<%
String redirectURL = "https://stackoverflow.com/";
response.sendRedirect(redirectURL);
%>
Can I make browser to be redirected when I point it to: http://my.glassfish.domain/ ?
To provide a little bit more details:
I tried adding a property to the vitual server as:
redirect_1 from=/ url=https://stackoverflow.com/
But that make everything to be redirected to https://stackoverflow.com/, eg. http://my.glassfish.domain/myapp redirects to https://stackoverflow.com/ while all I want was http://my.glassfish.domain/ to be redirected to https://stackoverflow.com/
Any help please?
Maybe you can use UrlRweriteFilter to redirect users according to defined mappings. Here are some examples
I think the solution you dismiss is actually the 'best'...
Write a jsp in the docroot for the server.
If you really have to do something fancier, due to complications that you haven't described, you may want to try creating a new DefaultServer. Look in your domain-dir/config/default-web.xml.
You may want to look at the code of the DefaultServer that ships with GlassFish Server 3 as a guide.
Modify the DNS mapping for the given URL in your DNS Server (/etc/host on your local machine) . May not be a feasible solution for you - but it does the work of directing the user.
No you can not. When a request comes to your server, there should be a page (HTML/JSP/Servlet) to process that page. That page should do whatever you wanted to do.
So you must create a HTML / JSP / Servlet.
Hope this helps.

Categories

Resources