Read non-english characters from http get request - java

I have a problem in getting Hebrew characters from a http get request.
I'm getting squares characters like this: "[]" instead of the Hebrew characters.
The English characters are Ok.
This is my function:
public String executeHttpGet(String urlString) throws Exception {
BufferedReader in = null;
try {
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet();
request.setURI(new URI(urlString));
HttpResponse response = client.execute(request);
in = new BufferedReader(new InputStreamReader(response.getEntity().getContent(),"UTF-8"));
StringBuffer sb = new StringBuffer("");
String line = "";
String NL = System.getProperty("line.separator");
while ((line = in.readLine()) != null) {
sb.append(line + NL);
}
in.close();
String page = sb.toString();
// System.out.println(page);
return page;
} finally {
if (in != null) {
try {
in.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
You can test is by this example url:
String str = executeHttpGet("http://kavim-t.co.il/include/getXMLStations.asp?parent=7_%20_1");
Thank you!

The file you linked to doesn't seem to be UTF-8. I tested that it opens correctly using WINDOWS-1255 (hebrew encoding), you should try that instead of UTF-8.

Try a different website, it looks like it doesn't use UTF-8. Alternatively, UTF-16 may work but I haven't tried. Your code looks fine.

As others have pointed out, the content is not actually encoded as UTF-8. You might want to look at httpEntity.getContentType() to extract the actual encoding of the content, and then pass this to your InputStreamReader. This means your code will then be able to cope correctly with any encoding.

hi as is posted in this other question Special characters in PHP / MySQL
you can set the characters on the php file on the example they set utf-8, but you can set a different type that supports the chararcters you need.

Related

JSONParser returns Unexpected character () at position 0

I'm with problem to parse a JSON. Always that I try do it, the follow was result is returned: Unexpected character () at position 0.
public Object execute(HttpRequestBase request){
DefaultHttpClient client = new DefaultHttpClient();
HttpResponse response = null;
Object object = null;
try {
response = client.execute(request);
InputStream is = response.getEntity().getContent();
BufferedReader br = new BufferedReader(new InputStreamReader((is)));
StringBuilder builder = new StringBuilder();
String output;
while ((output = br.readLine()) != null) {
builder.append(output).append("\n");
}
if (response.getStatusLine().getStatusCode() == 200) {
object = new JSONParser().parse(builder.toString());
client.getConnectionManager().shutdown();
} else {
LOG.log(Level.SEVERE, builder.toString());
throw new RuntimeException(builder.toString());
}
} catch (IOException | ParseException ex) {
LOG.log(Level.SEVERE, ex.toString());
} finally {
}
return object;
}
PS:
My response returns a JSON well formatted;
The problem happens when this piece of code is running object = new JSONParser().parse(builder.toString());
The is part o my JSON file:
[
{
"id":2115,
"identificacao":"17\/2454634-6",
"ultima_atualizacao":null
},
{
"id":2251,
"identificacao":"17\/3052383-2",
"ultima_atualizacao":"2017-11-21"
},
{
"id":2258,
"identificacao":"17\/3070024-6",
"ultima_atualizacao":null
},
{
"id":2257,
"identificacao":"17\/3070453-5",
"ultima_atualizacao":null
}
]
Most probably your content has some unprinted special character at beginning. For UTF-8 encoded data this may be a BOM.
Please post start of you content as byte[].
It is happening because of UTF-8 BOM.
What is UTF-8 BOM ?
The UTF-8 BOM is a sequence of bytes (EF BB BF) that allows the reader to identify a file as being encoded in UTF-8.
Normally, the BOM is used to signal the endianness of an encoding, but since endianness is irrelevant to UTF-8, the BOM is unnecessary.
How to solve the issue ?
Convert encoding of your .json or any file to UTF-8 instead of UTF-8 BOM.
Like this.
Use this instead of BufferReader
Map<String,Object> result
= (Map<String,Object>)JSONValue.parse(IOUtils.toString(response.getEntity().getContent(), "utf-8"));
You will get your JsonData into Map and simply you can iterate over map.
I believe your problem is with the content-type. Use something like this:
HttpEntity content = response.getEntity();
StringBuilder sb = new StringBuilder();
InputStream is = content.getContent();
InputStreamReader isr = new InputStreamReader(is, "UTF-8");
int character;
do {
character = isr.read();
if (character >= 0) {
sb.append((char) character);
}
} while (character >= 0);
return sb.toString();
No need for BufferedReader, InputStreamReader can handle it fine.
Hope it helps!
I found the problem!
My JSON was returning a different space character, so I did add it in my code this::
String content = builder.toString();
content = content.replaceAll("\\uFEFF", "");
This \uFEFF was my problem! And in my Dev environment it is not happens, just in production env!

Encoding UTF-8 in HTTPServlet request

this may look like like a problem that's already been solved but it's not, because I have gone through all the questions that deal with UTF-8 and none of the solutions has helped me.
I'm sending http request to my java servlet containing JSON object using the JSON simple library.
I added the UTF-8 encoding in Tomcat xml file
my HTML pages support UTF-8 encoding
both my database and all my tables are also UTF-8 encoded
I changes the default encoding of the JVM to UTF-8 using system variables (yeah! that's how desperate I got)
this is my dispatcher function:
protected void doPost(HttpServletRequest request,
HttpServletResponse response) throws ServletException, IOException {
request.setCharacterEncoding("UTF-8");
AjaxParser cr = AjaxParser.ClientRequestFactory();
ClientRequest msg = cr.ParseClientAjax(request);
HandleRequest HR = new HandleRequest();
HandleRequestStatus HRS = HR.HandleMessage(msg);
AjaxResponseGenerator ARG = new AjaxResponseGenerator();
JSONObject jsonObj = ARG.HandleResponse(HRS);
response.setCharacterEncoding("UTF-8");
response.setContentType("application/json");
PrintWriter out = response.getWriter();
System.out.println(jsonObj);// write the json object to console
out.println(jsonObj);
}
and this is how I do the parsing to String:
public ClientRequest ParseClientAjax(HttpServletRequest request) {
ClientRequest msg = new ClientRequest();
StringBuffer jb = new StringBuffer();
String line = null;
try {
BufferedReader reader = request.getReader();
while ((line = reader.readLine()) != null)
jb.append(line);
} catch (Exception e) {
e.printStackTrace();
}
JSONParser parser = new JSONParser();
try {
JSONObject obj = (JSONObject) parser.parse(jb.toString());
String opcodeString = (String) obj.get("opcode");
RequestCodeEnum numericEnumCode = (RequestCodeEnum) OpCodesMap
.get(opcodeString);
msg.setOpCode(numericEnumCode);
String entityStr = obj.get("Entity").toString();
Entity entity = makeEntityFromString(numericEnumCode, entityStr);
msg.setEntity(entity);
} catch (ParseException pe) {
System.out.println(pe);
}
return msg;
}
I tried to do some debugging by printing to the Eclipse console (which I also changed to UTF-8 encoding) the text I send throughout my application to find out where the text is not encoded correctly, I found that the text is in the right encoding until right before the execution of my query. after that I check the database manually and the text is inserted there as question marks.
I tried to manually insert Non-English text to my database using Workbench, and it works fine, both in the database itself and when displaying the data in my HTML afterwards.
the problem happens only when I insert data from my web page.
I'm stuck, I have no idea where the problem might be.
Any suggestions?
Try this:
InputStream inputStream = request.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream , StandardCharsets.UTF_8));

Convert encoded string to readable string in java

I am trying to send a POST request from a C# program to my java server.
I send the request together with an json object.
I recive the request on the server and can read what is sent using the following java code:
BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
OutputStream out = conn.getOutputStream();
String line = reader.readLine();
String contentLengthString = "Content-Length: ";
int contentLength = 0;
while(line.length() > 0){
if(line.startsWith(contentLengthString))
contentLength = Integer.parseInt(line.substring(contentLengthString.length()));
line = reader.readLine();
}
char[] temp = new char[contentLength];
reader.read(temp);
String s = new String(temp);
The string s is now the representation of the json object that i sent from the C# client. However, some characters are now messed up.
Original json object:
{"key1":"value1","key2":"value2","key3":"value3"}
recived string:
%7b%22key1%22%3a%22value1%22%2c%22key2%22%3a%22value2%22%2c%22key3%22%3a%22value3%22%%7d
So my question is: How do I convert the recived string so it looks like the original one?
Seems like URL Encoded so why not use java.net.URLDecoder
String s = java.net.URLDecoder.decode(new String(temp), StandardCharsets.UTF_8);
This is assuming the Charset is in fact UTF-8
Those appear the be URL encoded, so I'd use URLDecoder, like so
String in = "%7b%22key1%22%3a%22value1%22%2c%22key2"
+ "%22%3a%22value2%22%2c%22key3%22%3a%22value3%22%7d";
try {
String out = URLDecoder.decode(in, "UTF-8");
System.out.println(out);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
Note you seemed to have an extra percent in your example, because the above prints
{"key1":"value1","key2":"value2","key3":"value3"}

Android HTTP Request Encoding

I want to do a HTTPRequest in my Android App, using the following Code:
BufferedReader in = null;
try {
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet();
request.setURI(new URI("http://www.example.de/example.php"));
HttpResponse response = client.execute(request);
in = new BufferedReader
(new InputStreamReader(response.getEntity().getContent()));
StringBuffer sb = new StringBuffer("");
String line = "";
String NL = System.getProperty("line.separator");
while ((line = in.readLine()) != null) {
sb.append(line + NL);
}
in.close();
String page = sb.toString();
System.out.println(page);
return page;
} finally {
if (in != null) {
try {
in.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
The webpage I'm calling is a php Script which returns a string. My problem is that the the special Characters (ä,ü,ö,€ etc.) are showed as a Question mark with a box. How can I get these characters?
I think it's a problem with the encoding (German App -> UTF-8?).
May be you could try to set encoding when displaying into the console. Something characters are correctly returned from the server but fails to display in the console.
String page = sb.toString();
PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.println(page);
I have played around with your code, against http://www.google.de.
I was able to "hack" something, not sure it's the most elegant solution though.
After the line:
HttpResponse response = client.execute(request);
... I've added:
HttpEntity e = response.getEntity();
Header ct = e.getContentType();
HeaderElement[] he = ct.getElements();
if (
he.length > 0
&& he[0].getParameters().length > 0
&& he[0].getParameter(0) != null
&& he[0].getParameter(0).getName().equals("charset")
) {
String charset = he[0].getParameter(0).getValue();
// with google.de, will print ISO latin ("ISO-8859-1")
Log.d("com.example.test", charset);
}
... then you can add the charset representation, or its Java equivalent as a second argument of your InputStreamReader constructor call:
in = new BufferedReader(
new InputStreamReader(
response.getEntity().getContent(),
charset != null ? charset : "UTF-8"
);
Let me know if that works out for you.
Also note that in order to check Java charset equivalences, you could use Charset.forName(String charsetName) and catch the relevant Exceptions (and then revert to Charset.defaultCharset() or UTF-8, etc. in your catch statement).

open.mapquestapi.com: http-response decoding in Java

I want to use open.mapquestapi.com within Java. It works fine, as far as I have to care for (german) umlauts, let's take as example the german city "Köln".
In Java, i don't get the mapquestapi-response decode correctly, i always end up with "Köln".
// String query.. e.g. "Hohenstaufenring 25, Köln"
URI uri = new URI("http", "open.mapquestapi.com", "/nominatim/v1/search", "format=json&addressdetails=1&email=[...]&countrycodes=DE&q=" + query, null);
URL mapqOsm = new URL(uri.toASCIIString());
BufferedReader reader = new BufferedReader(new InputStreamReader(mapqOsm.openStream(), "UTF-8"));
String response = "";
String line;
while ((line = reader.readLine()) != null) {
response += line;
}
reader.close();
I have to decode "response" another way, but I don't have any ideas left how to decode it correctly. Sourcefile encoding is UTF-8.
How do I decode open.mapquestapi.com-response in Java correctly?

Categories

Resources