URL decoding of JS-encoded Japanese characters in Java 6 - java

I am using encodeURIComponent in javascript(assuming this does UTF-8 encoding) to encode a variable which could contain characters like =, +, etc. This is sent as POST to my servlet where I decode it.
This works well with English but when used with Japanese string - "バスケット", this converts to some special character sequence like this - "ãÂÂã¹ã±ãÂÂãÂÂ"
I am using following java 1.6 code to decode it but it doesn't work -
String ID = java.net.URLDecoder.decode(assignedID,"UTF-8");
where assignedID contains special character sequence. The above code returns me - "ãÂÂã¹ã±ãÂÂãÂÂ"

In your post, is the string you're sending is being sent as part of the URL or as part of the POST body. Its mostly the part of POST body, try adding (to jsp):
<% request.setCharacterEncoding("UTF-8"); %>

Related

getParameter special characters

I'm trying to get an url parameter in jee.
So I have this kind of url :
http://MySite/MySite.jsp?page=recherche&msg=toto
First i tried with : request.getParameter("msg").toString();
it works well but if I try to search "c++" , the method "getParameter()" returns "c" and not "c++" and i understand.
So I tried another thing. I get the current URL and parse it to get the value of the message :
String msg[]= request.getQueryString().split("msg=");
message=msg[1].toString();
It works now for the research "c++" but now I can't search accent. What can I do ?
EDIT 1
I encode the message in the url
String urlString=Utils.encodeUrl(request.getParameter("msg"));
so for the URL : http://MySite/MySite.jsp?page=recherche&msg=c++
i have this encoded URL : http://MySite/MySite.jsp?page=recherche&msg=c%2B%2B
And when i need it, i decode the message of the URL
String decodedUrl = URLDecoder.decode(url, "ISO-8859-1");
Thanks everybody
Anything you send via "get" method goes as part of the url, which needs to be urlencoded to be valid in case it contains at least one of the reserved characters. So, any character will need to be encoded before sending.
In order to send c++, you would have to send c%2B%2B. That would be interpreted properly at the server side.
Here some reference you can check:
http://www.blooberry.com/indexdot/html/topics/urlencoding.htm
Now the question is, how and where do you generate your URL? According to the language, you will need to use the proper method to encode your strings.
if I try to search "c++" , the method "getParameter()" returns "c" and not "c++"
Query parameters are treated as application/x-www-form-urlencoded, so a + character in the URL means a space character in the parameter value. If you want to send a + character then it needs to be encoded in the URL as %2B:
http://MySite/MySite.jsp?page=recherche&msg=c%2B%2B
The same applies to accented characters, they need to be escaped as the bytes of their UTF-8 representation, so été would need to be:
msg=%C3%A9t%C3%A9
(é being Unicode character U+00E9, which is C3 A9 in UTF-8).
In short, it's not the fault of this code, it's the fault of whatever component is responsible for constructing the URL on the client side.
Call your URL with
msg=c%2B%2B
+ in a URL mean 'space'. It needs to be escaped.
You need to escape special characters when passing them as URL parameters. Since + means space and & means and another parameter, these cannot be used as parameter values.
See this other S.O. question.
You may want to use the Apache HTTP client library to help you with the URL encoding/decoding. The URIUtil class has what you need.
Something like this should work:
String rawParam = request.getParameter("msg");
String msgParam = URIUtil.decode(rawParam);
Your example indicates that the data is not being properly encoded on the client side. See this JavaScript question.

Tomcat: possible to parse URL parameters containing '&'?

I have a servlet running on tomcat 6 which should be called as follows:
http://<server>/Address/Details?summary="Acme & co"
However: when I iterate through the parameters in the servlet code:
//...
while (paramNames.hasMoreElements()) {
paramName = (String) paramNames.nextElement();
if (paramName.equals("summary")) {
summary = request.getParameter(paramName).toString();
}
}
//...
the value of summary is "Acme ".
I assume tomcat ignores the quotes - so it sees "& co" as a second parameter (albeit improperly formed: there's no =...).
So: is there any way to avoid this? I want the value of summary to be "Acme & co". I tried replacing '&' in the URL with & but that doesn't work (presumably because it's decoded back to a straight '&' before the params are parsed out).
Thanks.
Use http://<server>/Address/Details?summary="Acme %26 co". Because in URL special http symbol(e.g. &,/, //) does not work as parameters.
Are you encoding and decoding the URL with URLEncode ? If so, can you check what the input and output of those are ? Seems like one of the special characters is not being properly encoded/decoded
Try %26 for the &
Try your parameter like
summary="Acme & co"
& is part reserved characters. Refer RFC2396 section
2.2. Reserved Characters.
how to encode URL to avoid special characters in java
Characters allowed in GET parameter
HTTP URL - allowed characters in parameter names
http://illegalargumentexception.blogspot.in/2009/12/java-safe-character-handling-and-url.html

strange characters(?) added to the end of my subject text

I have a problem with my java code sending email to users. There is some problem with the encoding of the email. When the email arrives to email account the subject line ($subject) has encoding problems as has strange characters(?) added to the end of my subject text.
The email message content itself is fine just the subject line(?) I have searched all over but cant find,after using Unicode and content type as text/html mail body have no problem with special character
(ó) but same fix is not working for subject line.
I have a class that sends an email with javamail, with a text like
this one in subject :
"Estimado Iván Escobedo:
The problem is that when the mail arrives to its destination, it
arrives this way:
"Estimado Iv?n Escobedo:
All the á, é, í, ó, ú, etc special characters are replaced with ?.
What could be the problem and how to solve it?
You should use something like that to read the message properly:
TextMessage txtMessage = (TextMessage)message;
ByteArrayInputStream bais = new ByteArrayInputStream(txtMessage.getText().getBytes ("ISO-8859-15"))
Edit :
Sanjay found the solution.
In order to set properly the message before sending, use :
MimeUtility.encodeText(SubjectText, "ISO-8859-15", "Q")
encodeText :
Encode a RFC 822 "text" token into mail-safe form as per RFC 2047.
The given Unicode string is examined for non US-ASCII characters. If the string contains only US-ASCII characters, it is returned as-is. If the string contains non US-ASCII characters, it is first character-encoded using the specified charset, then transfer-encoded using either the B or Q encoding. The resulting bytes are then returned as a Unicode string containing only ASCII characters.
Note that this method should be used to encode only "unstructured" RFC 822 headers.

Java difference between two URL Encoded strings

What is the difference between the following two encoded strings?
%D0%9E%D0%BA%D0%B6%D1%8D%D0%B7
and
%26%231055%3B%26%231088%3B%26%231080%3B%26%231074%3B%26%231077%3B%26%231090%3B
I am trying to URL Encode the russian text "Привет" into the second encoded string above (the W3Schools encoder does it correctly), but the URL encoder that I am using keeps giving me the first encoded string above. I am using URLUTF8Encoder.java from the W3 consortium. I have to use this one as I am working on a mobile platform requiring J2ME.
Thanks!
The URL encoder at w3schools is doing it utterly wrong. The %D0%9E%D0%BA%D0%B6%D1%8D%D0%B7 is perfectly valid. That's also what I get when I do
String encoded = URLEncoder.encode("Привет", "UTF-8");
When I URL-decode the w3schools' answer as follows
String decoded = URLDecoder.decode("%26%231055%3B%26%231088%3B%26%231080%3B%26%231074%3B%26%231077%3B%26%231090%3B", "UTF-8");
then I get Привет which are exactly those Russian characters, but then converted into XML entities first.
That w3schools site is by the way in no way related to W3 Consortium. See also w3fools.
Your string "Привет" is encoded as:
%D0%9E
%D0%BA
%D0%B6
%D1%8D
%D0%B7
The second string seems to be converted into HTML entities before url-encoding:
%26%231055%3B
%26%231088%3B
%26%231080%3B
%26%231074%3B
%26%231077%3B
%26%231090%3B
%26 is &, %23 is #, %3B is ;:
П
р
и
в
е
т

Chinese character in URL with Java

I used the following line in Firefox's URL field :
http://www.baidu.com/s?wd=你
This line was generated by my Java program.
The last Chinese character in the URL field sometimes became: %C4%E3 [Correct]
Other times it became: %E4%BD%A0 [Incorrect]
I tried to use the URL with IE. It shows up still as 你, but the result page search field shows the character as 浣. Could this be a UTF-8 or UTF-16 encoding problem? How do I get the correct code %C4%E3 from the char 你 with my Java program?
URLEncoder.encode(string, encoding)

Categories

Resources