How to put a HTML link in URLEncoder message - java

I am trying to create a String in java with a link in it.
The message reads like
String message ="Something happened please go back to Home and start again";
This message is ultimately encoded using
String msg = URLEncoder.encode(message,"UTF-8");
and displayed on a JSP page, but this message when rendered on JSP page looks like this.
Something happened please go back to Home and start again
Plain String without a actual link in it.
I am not sure how to embed a link in a String message in Java.

This seems a lot like the issue discussed on this link:
https://www.talisman.org/~erlkonig/misc/lunatech%5Ewhat-every-webdev-must-know-about-url-encoding/#Donotuse%7B%7Bjava.net.URLEncoder%7D%7Dor%7B%7Bjava.net.URLDecoder%7D%7DforwholeURLs
The article says:
Do not use java.net.URLEncoder or java.net.URLDecoder for whole URLs
We are not kidding. These classes are not made to encode or decode
URLs, as their API documentation clearly says:
Utility class for HTML form encoding. This class contains static
methods for converting a String to the
application/x-www-form-urlencoded MIME format. For more information
about HTML form encoding, consult the HTML specification.
This is not about URLs. At best it resembles the query part encoding.
It is wrong to use it to encode or decode entire URLs. You would think
the standard JDK had a standard class to deal with URL encoding
properly (part by part, that is) but either it is not there, or we
have not found it, which lures a lot of people into using URLEncoder
for the wrong purpose.
I have formatted the relevant code from the page above and adjusted it with regards to your code:
String pathSegment = "link.com";
String message ="Something happened please go back to Home and start again";

Related

Android parse special characters in json response

In my Android application I get JSON response string from a PHP url. from the response I get some hotel names with apostrophe, I get &#039 character instead of apostrophe. How can I parse the hotel with special characters in android? I can see the apostrophe in the browser but could not see in android logcat.
I have tried jresponse = URLEncoder.encode(jresponse,"UTF-8"); but I could not get apostrophe for hotel name.
This is the one of the hotel name in the response.
I see the following in browser.
{"id":12747,
"name":"Oscar's",
....
}
But in the logcat:
id 12747
name Oscar's
Use the decoder instead of encoder. URLDecoder.decode(jresponse,"UTF-8")
Use ISO-8859-2 when you create the URLEncodedEntity that you send off. You can set this as a parameter in the constructor.
Without a specified charset, you are probably sending the data in UTF-8/UTF-16 (most common) which the server is interpreting in a different way.
EDIT: It looks like ISO-8859-2 doesn't support ñ. You may have to change something server-side. http://en.wikipedia.org/wiki/ISO/IEC_8859-2
You can try Html class. eg :-
jresponse = Html.fromHtml(jresponse);

how can I clean and sanitize a url submitted by a user for redisplay in java?

I want a user to be able to submit a url, and then display that url to other users as a link.
If I naively redisplay what the user submitted, I leave myself open to urls like
http://somesite.com' ><script>[any javacscript in here]</script>
that when I redisplay it to other users will do something nasty, or at least something that makes me look unprofessional for not preventing it.
Is there a library, preferably in java, that will clean a url so that it retains all valid urls but weeds out any exploits/tomfoolery?
Thanks!
URLs having ' in are perfectly valid. If you are outputting them to an HTML document without escaping, then the problem lies in your lack of HTML-escaping, not in the input checking. You need to ensure that you are calling an HTML encoding method every time you output any variable text (including URLs) into an HTML document.
Java does not have a built-in HTML encoder (poor show!) but most web libraries do (take your pick, or write it yourself with a few string replaces). If you use JSTL tags, you get escapeXml to do it for free by default:
ok
Whilst your main problem is HTML-escaping, it is still potentially beneficial to validate that an input URL is valid to catch mistakes - you can do that by parsing it with new URL(...) and seeing if you get a MalformedURLException.
You should also check that the URL begins with a known-good protocol such as http:// or https://. This will prevent anyone using dangerous URL protocols like javascript: which can lead to cross-site-scripting as easily as HTML-injection can.
I think what you are looking for is output encoding. Have a look at OWASP ESAPI which is tried and tested way to perform encoding in Java.
Also, just a suggestion, if you want to check if a user is submitting malicious URL, you can check that against Google malware database. You can use SafeBrowing API for that.
You can use apache validator URLValidator
UrlValidator urlValidator = new UrlValidator(schemes);
if (urlValidator.isValid("http://somesite.com")) {
//valid
}

escaping a value in javascript when submitting form onclick

I am sending sensitive data encrypted when the user clicks the onclick event. This encrypted data at times contains a plus sign (+) When I retrieve this request variable on the server, the + is getting converted to a whitespace. This causes the decryption to fail.
Example:
xrUxHtYpO2Yu3Z31ve+KNA==
gets converted to:
xrUxHtYpO2Yu3Z31ve KNA==
Is there a way escape the string so it is sent as is?
The function you're looking for is "encodeURIComponent()":
var encoded = encodeURIComponent("nasty string");
You shouldn't need any code at all on the server side; URL encoding will almost certainly be implicitly un-done by your web framework. (Edit - ah, if you're using some Java/JSP web framework, then you definitely don't have to do anything fancy on the server side.)
Try replacing the + with %2B. That came from HTML URL Encoding Reference at W3Schools. Hope this helps!

encoding problem in servlet

I have a servlet which receive some parameter from the client ,then do some job.
And the parameter from the client is Chinese,so I often got some invalid characters in the servet.
For exmaple:
If I enter
http://localhost:8080/Servlet?q=中文&type=test
Then in the servlet,the parameter of 'type' is correct(test),however the parameter of 'q' is not correctly encoding,they become invalid characters that can not parsed.
However if I enter the adderss bar again,the url will changed to :
http://localhost:8080/Servlet?q=%D6%D0%CE%C4&type=test
Now my servlet will get the right parameter of 'q'.
What is the problem?
UPDATE
BTW,it words well when I send the form with post.
WHen I send them in the ajax,for example:
url="http://..q='中文',
xmlhttp.open("POST",url,true);
Then the server side also get the invalid characters.
It seems that just when the Chinese character are encoded like %xx,the server side can get the right result.
That's to say http://.../q=中文 does not work,
http://.../q=%D6%D0%CE%C4 work.
But why "http://www.google.com.hk/search?hl=zh-CN&newwindow=1&safe=strict&q=%E4%B8%AD%E6%96%87&btnG=Google+%E6%90%9C%E7%B4%A2&aq=f&aqi=&aql=&oq=&gs_rfai=" work?
Ensure that the encoding of the page with the form itself is also UTF-8 and ensure that the browser is instructed to read the page as UTF-8. Assuming that it's JSP, just put this in very top of the page to achieve that:
<%# page pageEncoding="UTF-8" %>
Then, to process GET query string as UTF-8, ensure that the servletcontainer in question is configured to do so. It's unclear which one you're using, so here's a Tomcat example: set the URIEncoding attribute of the <Connector> element in /conf/server.xml to UTF-8.
<Connector URIEncoding="UTF-8">
For the case that you'd like to use POST, then you need to ensure that the HttpServletRequest is instructed to parse the POST request body using UTF-8.
request.setCharacterEncoding("UTF-8");
Call this before you access the first parameter. A Filter is the best place for this.
See also:
Unicode - How to get the characters right?
Using non-ASCII characters as GET parameters (i.e. in URLs) is generally problematic. RFC 3986 recommends using UTF-8 and then percent encoding, but that's AFAIK not an official standard. And what you are using in the case where it works isn't UTF-8!
It would probably be safest to switch to POST requests.
I believe that the problem is on sending side. As I understood from your description if you are writing the URL in browser you get "correctly" encoded request. This job is done by browser: it knows to convert unicode characters to sequence of codes like %xx.
So, try to check how do you send the request. It should be encoded on sending.
Other possibility is to use POST method instead of GET.
Do read this article on URL encoding format "www.blooberry.com/indexdot/html/topics/urlencoding.htm".
If you want, you could convert characters to hex or Base64 and put them in the parameters of the URL.
I think it's better to put them in the body (Post) then the URL (Get).

Running a JavaScript command from MATLAB to fetch a PDF file

I'm currently writing some MATLAB code to interact with my company's internal reports database. So far I can access the HTML abstract page using code which looks like this:
import com.mathworks.mde.desk.*;
wb=com.mathworks.mde.webbrowser.WebBrowser.createBrowser;
wb.setCurrentLocation(ReportURL(8:end));
pause(1);
s={};
while isempty(s)
s=char(wb.getHtmlText);
pause(.1);
end
desk=MLDesktop.getInstance;
desk.removeClient(wb);
I can extract out various bits of information from the HTML text which ends up in the variable s, however the PDF of the report is accessed via what I believe is a JavaScript command (onClick="gotoFulltext('','[Report Number]')").
Any ideas as to how I execute this JavaScript command and get the contents of the PDF file into a MATLAB variable?
(MATLAB sits on top of Java, so I believe a Java solution would work...)
I think you should take a look at the JavaScript that is being called and see what the final request to the webserver looks like.
You can do this quite easily in Firefox using the FireBug plugin.
https://addons.mozilla.org/en-US/firefox/addon/1843
Once you have found the real server request then you can just request this URL or post to this URL instead of trying to run the JavaScript.
Once you have gotten the correct URL (a la the answer from pjp), your next problem is to "get the contents of the PDF file into a MATLAB variable". Whether or not this is possible may depend on what you mean by "contents"...
If you want to get the raw data in the PDF file, I don't think there is a way currently to do this in MATLAB. The URLREAD function was the first thing I thought of to read content from a URL into a string, but it has this note in the documentation:
s = urlread('url') reads the content
at a URL into the string s. If the
server returns binary data, s will
be unreadable.
Indeed, if you try to read a PDF as in the following example, s contains some text intermingled with mostly garbage:
s = urlread('http://samplepdf.com/sample.pdf');
If you want to get the text from the PDF file, you have some options. First, you can use URLWRITE to save the contents of the URL to a file:
urlwrite('http://samplepdf.com/sample.pdf','temp.pdf');
Then you should be able to use one of two submissions on The MathWorks File Exchange to extract the text from the PDF:
Extract text from a PDF document by Dimitri Shvorob
PDF Reader by Tom Gaudette
If you simply want to view the PDF, you can just open it in Adobe Acrobat with the OPEN function:
open('temp.pdf');
wb=com.mathworks.mde.webbrowser.WebBrowser.createBrowser;
wb.executeScript('javascript:alert(''Some code from a link'')');
desk=com.mathworks.mde.desk.MLDesktop.getInstance;
desk.removeClient(wb);

Categories

Resources