open.mapquestapi.com: http-response decoding in Java

open.mapquestapi.com: http-response decoding in Java - java

I want to use open.mapquestapi.com within Java. It works fine, as far as I have to care for (german) umlauts, let's take as example the german city "Köln".
In Java, i don't get the mapquestapi-response decode correctly, i always end up with "KÃ¶ln".
// String query.. e.g. "Hohenstaufenring 25, Köln"
URI uri = new URI("http", "open.mapquestapi.com", "/nominatim/v1/search", "format=json&addressdetails=1&email=[...]&countrycodes=DE&q=" + query, null);
URL mapqOsm = new URL(uri.toASCIIString());
BufferedReader reader = new BufferedReader(new InputStreamReader(mapqOsm.openStream(), "UTF-8"));
String response = "";
String line;
while ((line = reader.readLine()) != null) {
response += line;
}
reader.close();
I have to decode "response" another way, but I don't have any ideas left how to decode it correctly. Sourcefile encoding is UTF-8.
How do I decode open.mapquestapi.com-response in Java correctly?

Related

Reading HTML from URL in Java vs. Python

I'm trying to read the HTML from a particular URL and store it into a String for parsing. I referred to a previous post to help me out. When I print out what was read, all I get are special characters.
Here is my Java code (with try/catches left out) that reads from a URL and prints:
String path = "https://html1-f.scribdassets.com/913q5pjrsw60h9i4/pages/106-6b1bd15200.jsonp";
URL url = new URL(path);
InputStream in = url.openStream();
BufferedReader bw = new BufferedReader(new InputStreamReader(in, "UTF-8");
String line;
while ((line = bw.readLine()) != null) {
System.out.println(line);
}
Program output:
�ĘY106-6b1bd15200.jsonpmP�r� �Ƨ�!�%m�vD"��Ra*��w�%����ݳ�sβ��MK�d�9+%�m��l^��މ����:���� ���8B�Vce�.A*��x$FCo���a�b�<����Xy��m�c�>t����� �Z������Gx�o� �J���oKe�0�5�kGYpb�*l����+|�U���-�N3��jBp�R�z5Cۥjh��o�;�~)����~��)~ɮhy��<c,=;tHW���'�c�=~�w���
Expected output:
window.page106_callback(["<div class=\"newpage\" id=\"page106\" style=\"width: 902px; height:1273px\">\n<div class=image_layer style=\"z-index: 1\">\n<div class=ie_fix>\n<img class=\"absimg\" style=\"left:18px;top:27px;width:860px;height:1077px;clip:rect(1px 859px 1076px 1px)\" orig=\"http://html.scribd.com/913q5pjrsw60h9i4/images/106-6b1bd15200.jpg\"/>\n</div>\n</div>\n</div>\n\n"]);
At first, I thought it was an issue with permissions or something that somehow encrypted the stream, but my friend wrote a small Python script to do the same thing and it worked, thereby ruling this out. This is what he wrote:
import requests
link = 'https://html1-f.scribdassets.com/913q5pjrsw60h9i4/pages/106-
6b1bd15200.jsonp'
f = requests.get(link)
text = (f.text)
print(text)
So the question is, why is the Java version unable to correctly read and print from this particular URL? Note that I tried testing some other URLs from various websites and those worked fine. Maybe I should learn Python.

The response is gzip-encoded. You can do:
InputStream in = new GZIPInputStream(con.getInputStream());

#Maurice Perry is right, I tried with below code
String url = "https://html1-f.scribdassets.com/913q5pjrsw60h9i4/pages/106-6b1bd15200.jsonp";
URL obj = new URL(url);
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(new GZIPInputStream(con.getInputStream())));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
System.out.println(response.toString());

How to correct change encoding of post query?

When I send post to my page without setCharacterEncoding on server-side, I get С„С‹РІ. With setCharacterEncoding(UTF-8), I get С‹РІР°. How to correct change character encoding of POST query?
P.S.: I read data from ServletInputStream.
Code below.
doPost
req.setCharacterEncoding("UTF-8");
BufferedReader r = new BufferedReader(new InputStreamReader(req.getInputStream()));
String line;
while ((line = r.readLine()) != null) {
System.out.println(line);
}

BufferedReader r = new BufferedReader(
new InputStreamReader(req.getInputStream(), StandardCharsets.UTF_8));
With getInputStream you have binary data without an encoding. Hence the binary-to-text bridging class InputStreamReader needs the correct encoding. Otherwise it uses the system default System.getProperty("file.encoding").

Convert encoded string to readable string in java

I am trying to send a POST request from a C# program to my java server.
I send the request together with an json object.
I recive the request on the server and can read what is sent using the following java code:
BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
OutputStream out = conn.getOutputStream();
String line = reader.readLine();
String contentLengthString = "Content-Length: ";
int contentLength = 0;
while(line.length() > 0){
if(line.startsWith(contentLengthString))
contentLength = Integer.parseInt(line.substring(contentLengthString.length()));
line = reader.readLine();
}
char[] temp = new char[contentLength];
reader.read(temp);
String s = new String(temp);
The string s is now the representation of the json object that i sent from the C# client. However, some characters are now messed up.
Original json object:
{"key1":"value1","key2":"value2","key3":"value3"}
recived string:
%7b%22key1%22%3a%22value1%22%2c%22key2%22%3a%22value2%22%2c%22key3%22%3a%22value3%22%%7d
So my question is: How do I convert the recived string so it looks like the original one?

Seems like URL Encoded so why not use java.net.URLDecoder
String s = java.net.URLDecoder.decode(new String(temp), StandardCharsets.UTF_8);
This is assuming the Charset is in fact UTF-8

Those appear the be URL encoded, so I'd use URLDecoder, like so
String in = "%7b%22key1%22%3a%22value1%22%2c%22key2"
+ "%22%3a%22value2%22%2c%22key3%22%3a%22value3%22%7d";
try {
String out = URLDecoder.decode(in, "UTF-8");
System.out.println(out);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
Note you seemed to have an extra percent in your example, because the above prints
{"key1":"value1","key2":"value2","key3":"value3"}

Get raw text from html

Im on quite a basic level of android development.
I would like to get text from a page such as "http://www.google.com". (The page i will be using will only have text, so no pictures or something like that)
So, to be clear: I want to get the text written on a page into etc. a string in my application.
I tried this code, but im not even sure if it does what i want.
URL url = new URL(/*"http://www.google.com");
URLConnection connection = url.openConnection();
// Get the response
BufferedReader rd = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line = "";
I cant get any text from it anyhow. How should I do this?

From the sample code you gave you are not even reading the response from the request. I would get the html with the following code
URL u = new URL("http://www.google.com");
URLConnection conn = u.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(
conn.getInputStream()));
StringBuffer buffer = new StringBuffer();
String inputLine;
while ((inputLine = in.readLine()) != null)
buffer.append(inputLine);
in.close();
System.out.println(buffer.toString());
From there you would need to pass the string into some kind of html parser if you want only the text. From what I've heard JTidy would is a good library for this however I have never used any Java html parsing libraries.

You want to extract text from HTML file? You can make use of specialized tool such as the Jericho HTML parser library. I'm not sure if it can be used directly in Android app, it is quite big, but it is open source so you can make use of its code and take only what you need for your task.

Here is one way:
public String scrape(String urlString) throws Exception {
URL url = new URL(urlString);
URLConnection connection = url.openConnection();
BufferedReader reader = new BufferedReader(new InputStreamReader(
connection.getInputStream()));
String line = null, data = "";
while ((line = reader.readLine()) != null) {
data += line + "\n";
}
return data;
}
Here is another.

Read non-english characters from http get request

I have a problem in getting Hebrew characters from a http get request.
I'm getting squares characters like this: "[]" instead of the Hebrew characters.
The English characters are Ok.
This is my function:
public String executeHttpGet(String urlString) throws Exception {
BufferedReader in = null;
try {
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet();
request.setURI(new URI(urlString));
HttpResponse response = client.execute(request);
in = new BufferedReader(new InputStreamReader(response.getEntity().getContent(),"UTF-8"));
StringBuffer sb = new StringBuffer("");
String line = "";
String NL = System.getProperty("line.separator");
while ((line = in.readLine()) != null) {
sb.append(line + NL);
}
in.close();
String page = sb.toString();
// System.out.println(page);
return page;
} finally {
if (in != null) {
try {
in.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
You can test is by this example url:
String str = executeHttpGet("http://kavim-t.co.il/include/getXMLStations.asp?parent=7_%20_1");
Thank you!

The file you linked to doesn't seem to be UTF-8. I tested that it opens correctly using WINDOWS-1255 (hebrew encoding), you should try that instead of UTF-8.

Try a different website, it looks like it doesn't use UTF-8. Alternatively, UTF-16 may work but I haven't tried. Your code looks fine.

As others have pointed out, the content is not actually encoded as UTF-8. You might want to look at httpEntity.getContentType() to extract the actual encoding of the content, and then pass this to your InputStreamReader. This means your code will then be able to cope correctly with any encoding.

hi as is posted in this other question Special characters in PHP / MySQL
you can set the characters on the php file on the example they set utf-8, but you can set a different type that supports the chararcters you need.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

open.mapquestapi.com: http-response decoding in Java - java

Related

Reading HTML from URL in Java vs. Python

How to correct change encoding of post query?

Convert encoded string to readable string in java

Get raw text from html

Read non-english characters from http get request

Categories

Resources