Encoding of URL query params in HTML mail

Encoding of URL query params in HTML mail - java

My application sends an HTML email with a hyperlink. The user should be able to click on the link in a mail client and the browser opens the page. Easy... i thought.
But now the link contains special characters in a query param. E.g. the value for the second query param is id=1234/pid=1000 and the resulting java.net.URLEncoder.encode()d URL becomes
http://example.com/path/?key1=value1&key2=id%3D1234%2Fpid%3D1000
And this is what gets embedded into an HTML mail:
<html ...
<head>
<meta http-equiv=Content-Type content="text/html; charset=unicode">
...
<body lang=EN-US ...
link text
...
(at least this can be seen in the raw mail content by the receiving client).
Problem is that when clicking on the link, the browser opens up but the query params are double encoded:
http://example.com/path/?key1=value1&key2=id%253D1234%252Fpid%253D1000
and the link does not work anymore. Funny thing, if the link is copy/pasted into the browser, the additional encoding is omitted (as expected).
When hovering over the link with the mouse, the URL is displayed decoded:
http://example.com/path/?key1=value1&key2=id=1234/pid=1000 (tested with Outlook and Thundbird, both have same behaviour).
So my question is, what is wrong here?
Is it my first encoding and should I skip it?
Is it my first encoding and am I doing it wrong?
Are the mail clients to blame?
I can find tons of information about how URL encoding works and that it should be done but not which encoding in which use cases.
Thanks.

Finally solved it. It has nothing to do with the encoding in the mail or the mail client but the server does an instant redirect and changes the query parameter -> duplicate of mod_rewrite urlencoding an already urlencoded query string parameter - any way to disable this?.
Should have really tried example.com to figure this out but I couldn't even imagine it's a problem with the web server.

Related

request.getCharacterEncoding() returns NULL... why?

A coworker of mine created a basic contact-us type form, which is mangling accented characters (è, é, à, etc). We're using KonaKart a Java e-commerce platform on Struts 1.
I've narrowed the issue down to the data coming in through the HttpServletRequest object. Comparing a similar (properly functioning) form, I noticed that on the old form the request object's Character Encoding (request.getCharacterEncoding()) is returned as "UTF-8", but on the new form it is coming back as NULL, and the text coming out of request.getParameter() is already mangled.
Aside from that, I haven't found any significant differences between the known-good form, and the new-and-broken form.
Things I've ruled out:
Both HTML pages have the tag: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Both form tags in the HTML use POST, and do not set encodings
Checking from Firebug, both the Request and Response headers have the same properties
Both JSP pages use the same attributes in the <%#page contentType="text/html;charset=UTF-8" language="java" %> tag
There's nothing remotely interesting going on in the *Form.java files, both inherit from BaseValidatorForm
I've checked the source file encodings, they're all set to Default - inherited from Container: UTF-8
If I convert them from ISO-8859-1 to UTF-8, it works great, but I would much rather figure out the core issue.
eg: new String(request.getParameter("firstName").getBytes("ISO-8859-1"),"UTF8")
Any suggestions are welcome, I'm all out of ideas.

Modern browsers usually don't supply the character encoding in the HTTP request Content-Type header. It's in case of HTML form based applications however the same character encoding as specified in the Content-Type header of the initial HTTP response serving the page with the form. You need to explicitly set the request character encoding to the same encoding yourself, which is in your case thus UTF-8.
request.setCharacterEncoding("UTF-8");
Do this before any request parameter is been retrieved from the request (otherwise it's too late; the server platform default encoding would then be used to parse the parameters, which is indeed often ISO-8859-1). A servlet filter which is mapped on /* is a perfect place for this.
See also:
Unicode - How to get the characters right?

The request.getCharacterEncoding() relies on the Content-Type request attribute, not Accept-Charset
So application/x-www-form-urlencoded;charset=IS08859_1 should work for the POST action. The <%#page tag doesn't affect the POST data.

Why i can't get parameters from the url sent by Facebook after '#' character

I'm trying to make login to facebook according to Facebook Server-Side Authentication flow using jsp and servlets.
I was succsessfuly login to my fb account but when fb is redirecting to my app, it sends the parameters (state...) after the character '#'.
I was looking at this: http://facebook.stackoverflow.com/questions/4144878/get-url-parameters-after-in-java.
but i'm doing it in a different way then they do.
here's my code:
response.sendRedirect(response.encodeRedirectURL("http://www.facebook.com/dialog/oauth/?client_id=343473222406382" +
"&redirect_uri=http://localhost:8080/accountsLogger&response_type=token&state=logged"));
and i'm redirecting to:
{http://localhost:8080/accountsLogger/#state=logged&access_token=AAAE4YxdpUO4BAILjJoj5GsFZBDir1YmZCy4ZC9BmZAOCAztC2QclKo46OSce7dzObL6lSzrYpRDgQycOzzhfbqThR6kVC16lmurC5X5oV1lIrsvI0h9D&expires_in=4329.}

The FB API is built (largely) to be used client-side. The hash-portion of the URL (what is after the '#') is never sent to the server by the browser (which is why you can't pick it up). If you need this information server-side, you need to pick it up using javascript on your accountsLogger page (document.location.hash), convert the "hash parameters" into "regular" url-parameters on another URL and redirect the page to that URL.

URL after # is not sent to the server. It's originally intended as an anchor, it is, something to refer on the downloaded page. The browser then do:
GET /your/url.htm?param1=value&param2=value
and if there's #someanchor at the end the browser tries to position the page to make that anchor visible (if it exists).
Old school anchor:
<a name="someanchor">Hello</a>
Standard HTML anchor:
<p id="someanchor">Hello</p> (thanks #Jon Hanna)

Submitting POST form data in UTF -8

I have a form in which the user enters his name in Chinese, but when I do
String strName = request.getParameter("name");
I get strName as some meaningless characters. As a solution I tried
request.setCharacterEncoding("UTF-8");
before reading any parameters from the request object. This worked. What I want to know is how do I achieve this in HTML/javascript . I have tried the
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
but this doesn't work . Any help?

It should work if you define the charset in the Content-Type request header. I believe it is usually not defined by default. For example if you use jQuery to make the request you have to add "charset=utf-8" to the contentType option: http://api.jquery.com/jQuery.ajax/
There's no way to force a web browser to send Content-Type for every request, so it's better call setCharacterEncoding always.

What you do client-side affects how the data is sent. Apparently the data is sent as UTF-8 encoded, since the problem was in reading it server-side. So adding the meta (though OK) has no effect on this.

In pure HTML you should need <form enctype="multipart/form-data" accept-charset="UTF-8"> set to submit utf-8 in the browser.

retrieve and display Tamil characters from mysql database to Browser

I am using java language. I can store Tamil characters to database in the same format. But when I retrieve and display in the browser using jsp it displayed like boxes. I use the following code to save Tamil character in mysql database.
Properties pr = new Properties();
pr.put("user", "root");
pr.put("password", "root");
pr.put("characterEncoding", "UTF-8");
pr.put("useUnicode", "true");
Class.forName("com.mysql.jdbc.Driver");
connection = DriverManager.getConnection(connectionURL,pr);
I can see the Tamil characters in database. But I can't retrieve and display in the same format. Please Help me. Thanks in advance.

Add the following to the top of your JSPs:
<%# page pageEncoding="UTF-8" %>
It instructs the server to use UTF-8 to write the characters to the response. It also adds a HTTP response Content-Type header with a value of text/html;charset=UTF-8. This is quite different from a simple <meta> tag which is ignored by webbrowsers when the content is served over HTTP. For debugging purposes, you can see the real HTTP headers using for example Fiddler2 or Firebug.
That should be sufficient.

What is the character set of your JSP pages? Make sure that it is UTF-8.
<meta http-equiv="content-type" content="text/html; charset=UTF-8">

adarshr gave you the probable solution to your probleme, also you could read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets.

Where to add the UTF-8 extension in the HTML page?

I need to add the charset="utf-8" at the end of the script tags to get the translation to another language done.
I don know where all I should add the tags. Any rules are followed. Please let me know where to add the charset. Do i need to add at the end of "ApplicationLoader.js" or only after the jquery plugins. Any suggestion please.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>My Web App</title>
<link href="css/jquery/jquery.ui.all.css" rel="stylesheet" type="text/css" />
<script type="text/javascript" src="js/jquery-1.4.2.min.js" charset="utf-8"></script>
<script type="text/javascript" src="js/jquery-ui-1.8.custom.min.js" charset="utf-8"></script>
<script type="text/javascript" src="js/jquery.depends.js" charset="utf-8"></script>
<script type="text/javascript" src="myemployeelist.js" ></script>
Update:
I will make it more easy for you to understand my situation in details and am not a more stuff guy am a newbie under training now.As far as I understood I will explain my problems to you.
I created a webapplication project in Eclipse, in which I created a class
for JDBC connection to MySQL.
I have a class in serverside which gets the users profile value from my webapp
textbox and saves in DB.
I am using jQuery plugin for doing translation from English to Arabic.
I have an HTML page in which as stated in the above part of question I have the
tags in which I added the charset="utf-8" to specify unicode.
I am using dwr to get the values in js and send it to server side.
I changed my computer input language to Arabic and my Mozilla firefox locale to
Arabic.
I am able to enter the English values in mysql and I am able to retrieve it but when I
enter a Arabic value it's not getting saved. The JDBC error is
java.sql.SQLException: Incorrect string value: '\xD8\xB3\xD9\x84\xD8\xA8...'
I dont know how to configure my server Jetty 6 LANG variable to utf-8. Any suggestion please. Thanks.

If a Content-Type header is present in the HTTP response headers, then this will override the meta headers. Very often, this header is already by default supplied by the webserver and more than often the charset is absent (which would assume client's default charset which is often ISO-8859-1). In other words, the meta headers are generally only interpreted whenever the resources are opened locally (not by HTTP). Big chance that this is the reason that your meta headers apparently didn't work when served over HTTP.
You can use Firebug or Fiddler2 to determine the HTTP response headers. Below is a Firebug screen:
You can configure the general setting for HTTP response headers at webserver level. You can also configure it on a request basis at programming language level. Since it's unclear what webserver / programming language you're using, I can't go in detail about how to configure it accordingly.
Update: as per the problem symptoms, which is the following typical MySQL exception:
java.sql.SQLException: Incorrect string value: '\xD8\xB3\xD9\x84\xD8\xA8...'
The byte sequence D8 B3 D9 84 D8 A8 is a valid UTF-8 sequence which represents those characters سلب (U+0633, U+0644 and U+0628). So the HTTP part is fine. You mentioned that you were using Jetty 6 as servletcontainer. The later builds of Jetty 6 already support UTF-8 out of the box.
However, the problem is in the DB part. This exception indicates that the charset the DB/table is been instructed to use doesn't support this byte sequence. This can happen when the DB/table isn't been instructed to use UTF-8.
To fix the DB part, issue those MySQL commands:
ALTER DATABASE db_name DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
And for future DB/tables, use CHARACTER SET utf8 COLLATE utf8_general_ci in CREATE statement as well.

It is usually enough to declare the main document's encoding once, either through a content-type header (See #BalusC's answer for an in-depth explanation) or, if that is not available, the Meta tag.
If you use the Meta tag, make sure it is in the first line of the head section.
There should be no need to specify the character set explicitly for the script files.
Of course, all the content you deal with needs to be UTF-8 encoded as well for this to work. It's not enough to just slap the content-type meta tag in front. (But you are probably aware of that.)

In this way:
<script type="text/javascript" src="[path]/myscript.js" charset="utf-8"></script>

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.