href in HTML and forward slash (/) and %2F? - java

I am generating an HTML page that lists files in a directory in Java. Each file name is placed in a link to the file. Since the directory names and file names contain non-ASCII characters and spaces, I used the following method to encode them.
URLEncoder.encode(str, "UTF-8").replace("+", "%20");
I can either put the full directory to href of <base> or append it to the href of <a> for each file. That method above converts / to %2F. But I have seen href contains /.
So,
should I replace / to %2F or
should I leave / as it is or
it does not matter whether it is / or %2F?
If the answer is 2, what Java method should I use instead of URLEncoder.encode() because it replaces / with %2F.

Related

Dynamic file path in assets reverse routing

I am trying to use reverse routing to access static Assets using:
#routes.Assets.at("path", "file")
However I would like to define file as dynamic part as well like:
#for(c <- models.WebContent.find.all) {
<img src="#routes.Assets.at("/contentfiles/useruploads", "#c.picture1")">
}
Statement above however results in HTML code:
<img src="/contentfiles/userupload/#c.picture1">
Where you can see dynamic part #c.picture1 is not interpreted as dynamic filename but is parsed as raw text resulting in broken link. What I am expecting is that both dynamic parts are interpreted as dynamic resulting in eg.:
<img src="/contentfiles/userupload/1776446515.jpg">
How to define it so both dynamic statements are parsed as dynamic?
PS: I have tried to escape it as ##c.picture or $#c.picture with no luck
Thank you
When using variables as a function argument use it w/out # char and also not within quotes, otherwise as you can is it's used as a... String
<img src="#routes.Assets.at("/contentfiles/useruploads", c.picture1)">
The same as in condition:
Use:
#if(foo==bar){...}
NOT
#if(#foo==#bar){...}

How to convert url encoded string to plain text in JAVA

I am looking for html references in a HTML page I am retrieving from the server. The problem is for all the hyperlinks I am retrieving, the text that I am getting is URL encoded. Lets say, the URL is "http://abc.def.com/gh?ij=x&kl=y&mn=z", my program parses it as "http://abc.def.com/gh?ij=3Dx&kl=3Dy&mn=3Dz" . (look at the difference around "=" and "&" in the two URL's) . Some searching on the Web tells me that the second URL is a URL encoded form of the first URL.
What should I do to retrieve the actual URL as it is, and not its URL encoded version? Right now, I am replacing =3D with 3D and & with &, but that is a very bad hack.
Try to use java.net.URLDecoder

Print only absolute URLs

I wrote a simple Java Web Crawler that lets the user type in any web page and it will search through the page and pull out the links as Strings. I am not using a package like Jsoup. My question is, how do I only print the absolute URLs rather than both relative and absolute URLs?
Inspect the src or href attribute to see if it's absolute, relative, or protocol-relative (//stackoverflow.com/file). Parse the page's URL. If the tag was protocol-relative, use the protocol from the parsed page URL, then append the content of the attribute. If it's relative, strip the query string and fragment IF from the original URL, and "append" the relative portion. Be aware that a relative URL can look like /foo, foo, foo/bar, or ./../../bar/../foo, so you might want to resolve path traversals before printing.
Edit:
Take a look at URL and the Commons URL Builder. They'll both be helpful.

Getting wrong characters in parameter

In files.jsp I am using following anchor and JSTL c:url combination -
<c:url value="downloadfile.jsp" var="dwnUrl" scope="request">
<c:param name="fileType" value="PDF"/>
<c:param name="fileId" value="${file.fileId}"/>
<c:param name="fileName" value="${file.fileName}"/>
</c:url>
Download
On downloadfile.jsp getting the file name value in JavaScript variable as -
selectedFile = <c:out value='${param.fileName}'>
Now, if file name contains some extra character e.g. XYZ 2/3" Technical then on the other page I am getting some different character as - XYZ 2/3#034; Technical
However, if I print request.getParameter("fileName"), its giving correct name. What is wrong?
The <c:out> by default escapes XML entities, such as the doublequote. This is done so to get well-formed XML and to avoid XSS.
To fix this, you should either get rid of <c:out>, since JSP 2.0, EL works perfectly fine in template text as well:
selectedFile = '${param.fileName}';
.. or, if you're still on legacy JSP 1.2 or older, set its escapeXml attribute to false:
selectedFile = '<c:out value="${param.fileName}" escapeXml="false">';
Note that I have added the singlequotes and semicolon to make JS code valid.
Needless to say, you'll need to keep XSS risks in mind if you do so.
The funky characters in your <c:param> values are being URL encoded by <c:url> as they should be. As far as downloadfile.jsp is concerned, the servlet container takes care of URL decoding incoming variables so you don't have to. This is normal behavior and shouldn't pose any problems for you.
If you simply turn escapeXml to false as #BalusC suggests, you will add an XSS vunerability to your page. Instead, you should encode the user input at the time of injection into the destination language, and escape characters that would be evaluated in the destination language. In this case, if the user input contained a single quote character (I'm assuming the string literal in your original example was supposed to be wrapped in single quotes, but the same would be true for double quotes if you were using them), any JavaScript code that followed it would be interpreted by the browser and executed. To safely do what you are trying to do, you should change the line in downloadfile.jsp to:
selectedFile = '${fn:replace(param.fileName, "'", "\'")}';
That will escape only single quotes, which would otherwise end the string literal declaration.
If you were using double quotes, then this would be appropriate:
selectedFile = "${fn:replace(param.fileName, '"', '\"')}";
It is worth noting that escapeXml could be appropriate for escaping JavaScript string literals (and it often is) when the string literal will eventually be dumped into HTML markup. However, in this case, the value should not be XML escaped as it is evaluated in the context of a file path, rather than in the context of HTML.

Show Images with name containing special characters

I am trying to display some images containing special characters like ☻ ☺ ♥ or Chinese or Arabic characters in their names using jsp...but the images are not getting displayed !!
<img src = "pipo².jpg" />
<img src = "pip☺☻♥o².jpg" />
What am I doing wrong !!
Try encoding the filename using URLEncoder.encode() method before the HTML is sent to the page, e.g.
String encodedString = URLEncoder.encode(filename, "UTF-8").
This will convert the characters to entities which can be passed in HTML.
you can percent encode the urls using encodeURIComponent in javascript to give you
<img src="pip%C3%A2%C2%98%C2%BA%C3%A2%C2%98%C2%BB%C3%A2%C2%99%C2%A5o%C3%82%C2%B2.jpg">
I'd recommend renaming your files.
Using special characters in src paths is not strictly allowed, you'd have to find the URL style escape codes for those characters.

Categories

Resources