obtaining response charset of response to get or post request - java

I am working to extract response charset in a java web app, where I am using Apache HTTP Client.
For example, one possible value obtained from "Content-Type" header is
text/html; charset=UTF-8
Then my code will extract all text after the "=" sign...
So the charset as extracted will be
UTF-8
I just wanted to know, is the above method for obtaining response charset correct? Or is there some scenario where the above code will not work? Is there something I am missing here?

The method provided by forty-two can work. But the method is deprecated, I find out that this website has a good example of method to find the charset.
HttpEntity entity = response.getEntity();
ContentType contentType = ContentType.getOrDefault(entity);
Charset charset = contentType.getCharset();
System.out.println("Charset = " + charset.toString());

Doesn't httpclient (or http core) already provide that functionality? Something like this:
HttpResponse response = ...
String charset = EntityUtils.getContentCharSet(response.getEntity());

Well, that approach will fail when
the charset value is quoted
when the quoted value uses escapes
when there are parameters other than "charset"

Related

How can I set the encoding of a httpExchange response?

I'm trying to modify some server code which uses an httpExchangeobject to handle the server's response to the client.
My issue is that for responses containing characters not supported by iso-8859-1, such as Chinese characters, I get something like '????' in place of the characters. I'd like to set the encoding of the response to utf-8, but have thus far been unsuccessful in doing so.
I tried adding this line:
httpExchange.getResponseHeaders().put("charset", Arrays.asList("UTF-8"));
This successfully puts a "charset" header in the response, but I still can't send the characters I want in the response.
How do I set the encoding of the response to allow for these characters?
Use Content-Type header to specify encoding.
String encoding = "UTF-8";
httpExchange.getResponseHeaders().set("Content-Type", "text/html; charset=" + encoding);
Writer out = new OutputStreamWriter(httpExchange.getResponseBody(), encoding));
out.write(something);

How can i change charset encoding in HTTP response in Java

I have to fetch some JSON object from a remote server and for that i am using this function which is working great except that for sometime some weird data is getting fetched which i believe is because it is using ASCII charset to decode.
Please find below thw method that i am using
public HttpResponse call(String serviceURL,String serviceHost,String namespace,String methodName,String payloadKey, String payloadValue) throws ClientProtocolException,IOException,JSONException
{
HttpResponse response = null;
HttpContext HTTP_CONTEXT = new BasicHttpContext();
HTTP_CONTEXT.setAttribute(CoreProtocolPNames.USER_AGENT, "Mozilla/5.0");
HttpPost httppost = new HttpPost(serviceURL);
httppost.setHeader("User-Agent",Constants.USER_AGENT_BROWSER_FIREFOX);
httppost.setHeader("Accept", "application/json, text/javascript, */*");
httppost.setHeader("Accept-Language","en-US,en;q=0.8");
httppost.setHeader("Content-Encoding", "foo-1.0");
httppost.setHeader("Content-Type", "application/json; charset=UTF-8");
httppost.setHeader("X-Requested-With","XMLHttpRequest");
httppost.setHeader("Host",serviceHost);
httppost.setHeader("X-Foo-Target", String.format("%s.%s", namespace,methodName));
/*Making Payload*/
JSONObject objectForPayload = new JSONObject();
objectForPayload.put(payloadKey, payloadValue);
StringEntity stringentity = new StringEntity(objectForPayload.toString());
httppost.setEntity(stringentity);
response = client.execute(httppost);
return response;
}
All these headers that i am passing are correct and i have verified the same via inspect element in Google chrome or Firebug plugin if you are familiar with Mozilla.
Now the problem is that most of the time i am getting the readable data but sometimes i do get unreadable data.
I debugged using eclipse and noticed that the charset under wrappedEntity is showing as "US-ASCII". I am attaching a jpg for reference
Can someone please tell me how can i change the charset from ASCII to UTF-8 of the response before i do response = client.execute(httppost); .
PS:As you have noticed that i am passing charset=utf-8 in the header and that i have already verified using firebug and google chrome that i am passing the exact headers .
Please zoom in to see the image more clearly
Thanks in advance
i was able to resolve the issue just mentioning it for people that may face similar issue.
after getting the response first get the entity by using
HttpEntity entity = response.getEntity();
and since my response was a json object convert entity to string but using "UTF-8" something like this
responseJsonObject = new JSONObject(EntityUtils.toString(entity,"UTF-8"));
previously i was just doing
responseJsonObject = new JSONObject(EntityUtils.toString(entity));
I don't think it's a problem with your headers, I think it's a problem with your string. Just having the header say it's utf-8 doesn't mean the string you write is utf-8, and that depends a lot on how the string was encoded and what's in the "payloadValue"
That said, you can always re-encode the thing correctly before sending it across the wire, for example:
objectForPayload.put(payloadKey, payloadValue);
StringEntity stringentity = new StringEntity(
new String(
objectForPayload.toString().getBytes(),
"UTF8"));
See if that works for you.
You may need to add an "Accept-Encoding"-header and set this to "UTF-8"
Just for the record: the "Content-Encoding" header field is incorrect - a correct server would reject the request as it contains an undefined content coding format.
Furthermore, attaching a charset parameter to application/json is meaningless.
bourne already answered that in the above comment though.
Changing entity = IOUtils.toString(response.getEntity().getContent())
TO entity = EntityUtils.toString(response.getEntity(),"UTF-8")
did the trick.

Setting content length of an HTTP POST request

I am trying to make a Http POST request using apache HTTP client. I am trying to copy contents of an HTTP POST request (received at my application) to another HTTP POST request (initiated from my application to another URL). Code is shown below:
httpPost = new HttpPost(inputURL);
// copy headers
for (Enumeration<String> e = request.getHeaderNames(); e.hasMoreElements();) {
String headerName = e.nextElement().toString();
httpPost.setHeader(headerName, request.getHeader(headerName));
}
BufferedInputStream clientToProxyBuf = new BufferedInputStream(request.getInputStream());
BasicHttpEntity basicHttpEntity = new BasicHttpEntity();
basicHttpEntity.setContent(clientToProxyBuf);
basicHttpEntity.setContentLength(clientToProxyBuf.available());
httpPost.setEntity(basicHttpEntity);
HttpResponse responseFromWeb = httpclient.execute(httpPost);
Basically, I am trying to implement a proxy application which will get a url as parameter, froward the request to the URL and then serve pages etc in custom look and feel.
Here request is HttpServletRequest. I am facing problem in setting content length. Through debugging I found out that clientToProxyBuf.available() is not giving me correct length of input stream and I am getting Http error 400 IE and Error 354 (net::ERR_CONTENT_LENGTH_MISMATCH): The server unexpectedly closed the connection in chrome.
Am I doing it wrong? Is there any other way to achieve it?
The available() function doesn't provide the actual length of the content of the stream, rather
Returns the number of bytes that can be read from this input stream without blocking. (From javadoc)
I would suggest you to first read the whole content from the stream, and then set that to the content, rather than passing the stream object. That way, you will also have the actual length of the content.
It was rather simple and very obvious. I just needed to get content length from header as:
basicHttpEntity.setContentLength(Integer.parseInt(request.getHeader("Content-Length")));

Sending UTF-8 values in HTTP headers results in Mojibake

i want to send arabic data from servlet using HTTPServletResponse to client
i am trying this
response.setCharacterEncoding("UTF-8");
response.setHeader("Info", arabicWord);
and i receive the word like this
String arabicWord = response.getHeader("Info");
in client(receiving) also tried this
byte[]d = response.getHeader("Info").getBytes("UTF-8");
arabicWord = new String(d);
but seems like there is no unicode because i receive strange english words,so please how can i send and receive arabic utf8 words?
HTTP headers doesn't support UTF-8. They officially support ISO-8859-1 only. See also RFC 2616 section 2:
Words of *TEXT MAY contain characters from character sets other than ISO- 8859-1 [22] only when encoded according to the rules of RFC 2047 [14].
Your best bet is to URL-encode and decode them.
response.setHeader("Info", URLEncoder.encode(arabicWord, "UTF-8"));
and
String arabicWord = URLDecoder.decode(response.getHeader("Info"), "UTF-8");
URL-encoding will transform them into %nn format which is perfectly valid ISO-8859-1. Note that the data sent in the headers may have size limitations. Rather send it in the response body instead, in plain text, JSON, CSV or XML format. Using custom HTTP headers this way is namely a design smell.
I don't know where word variable is comming from, but try this:
arabicWord = new String(d, "UTF-8");
UPDATE: Looks like the problem is with UTF-8 encoded data in HTTP headers, see: HTTP headers encoding/decoding in Java for detailed discussion.

Content Type header for POST request

I have in Java (similar in other languages, problem should be language independent) a POST request I am sending to the server. The POST request contains only some POST parameters no body.
I basically have this:
postData = URLEncoder.encode("user", "UTF-8") + "=" + URLEncoder.encode("jackychan", "UTF-8");
//HttpSessionToken.setRequestProperty("Content-Type", "application/xml");
OutputStream postContent = (OutputStream)HttpSessionToken.getOutputStream();
postContent.write(postData.getBytes("UTF-8"));
This works fine. The question is around the second line, a comment at the moment. Uncommenting this line ruins my code, okay my data is not XML so I can understand this. To some REST services you have to POST a whole XML document, but no POST parameters, something like this
postData = "<xml> whatever xml structure here </xml>"
HttpSessionToken.setRequestProperty("Content-Type", "application/xml");
OutputStream postContent = (OutputStream)HttpSessionToken.getOutputStream();
postContent.write(postData.getBytes("UTF-8"));
This works too. The difference is the postData is now an XML and the content type is set.
The question now is, what if a Service requires BOTH, POST parameters as in example 1 AND an xml body as in example 2. How would I do this? If that never happens, why doesn't it happen?
Thanks, A.
You could do that as multipart/form-data so you can have mixed content in a single POST body. It's similar to multipart-mime, and each part can have its own content-type. Here's a previous stackoverflow answer for multipart form-data in Java: How can I make a multipart/form-data POST request using Java?

Categories

Resources