I am working on an Android app which will connect to a webpage using the java class HttpsURLConnection and parse the HTML response using JSoup. The issue is that the HTML response from the website appears to be encoded. Any ideas on what I can do to get the actual HTML?
Here is my code for contacting the website:
private String GetPageContent(String url) throws Exception {
URL obj = new URL(url);
conn = (HttpsURLConnection) obj.openConnection();
// default is GET
conn.setRequestMethod("GET");
conn.setUseCaches(false);
// act like a browser
conn.setRequestProperty("User-Agent", USER_AGENT);
conn.setRequestProperty("Accept",
"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
conn.setRequestProperty("Accept-Language", "en-US,en;q=0.8,en-GB;q=0.6");
conn.setRequestProperty("Accept-Encoding" , "gzip, deflate, sdch");
conn.setRequestProperty("Connection" , "keep-alive");
if (cookies != null) {
for (String cookie : this.cookies) {
conn.addRequestProperty("Cookie", cookie.split(";", 1)[0]);
}
}
int responseCode = conn.getResponseCode();
Log.v(TAG,"\nSending 'GET' request to URL : " + url);
Log.v(TAG,"Response Code : " + responseCode);
BufferedReader in = new BufferedReader(new InputStreamReader(
conn.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
// Get the response cookies
setCookies(conn.getHeaderFields().get("Set-Cookie"));
return response.toString();
}
And a snippet of the response:
��������������]�r�6��۞�w#ՙ�NDQ�ﱥ|�siv�Kkw�m&�HH�M, Z��ff_c_o�d�#���9�l�6����� �_=w|����/A{��!W� LZ��������f]�=wc߽�2,˨�|�8x��~�}�x1�$Ib�Uq�7�j�X|;��K
EDIT: The HTML was encoded with GZIP, as shown in the request headers here.
The solution to this issue was to use the GZIPInputStream class as shown below:
BufferedReader in = new BufferedReader(new InputStreamReader(
new GZIPInputStream(conn.getInputStream())));
Based on the headers returned with the request, we can conclude that the content is encoded using gzip. Luckily, there is an easy method to decode a gzip encoding stream, using the GZIPInputStream class.
Don't know which URL you are trying to access, but have you tried setting the charset ?
BufferedReader in = new BufferedReader(new InputStreamReader(
conn.getInputStream(), "UTF8"));
Related
I am trying to send a POST (with XML in the body) to an API and get a response back that is XML. There are confirmation or error details that I need to get from the response body (from within the ERRORelement).
I can send the POST and it does trigger the change in the API, but I cannot read the response.
When I send the POST manually from Postman, I trigger the change in the API and I can see the response from the API as text/html;charset=charset=utf-8
Here is my current code:
URL url = new URL(urlString);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setDoOutput(true);
OutputStreamWriter writer = new OutputStreamWriter(conn.getOutputStream());
writer.write(xmlString);
writer.flush();
String line;
BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
writer.close();
reader.close();
The response body should be in the following format:
<Root>
<Session>
<UserId>theUserID</UserId>
<Password>thePassword</Password>
<ERROR Status="0" Description="Logon Successful" />
</Session>
<ActivityList>
<Activity Type="ReqUp" Incident="12345" ElapsedTime="350"
Description="example response"
Status="Complete">
<ERROR Status="0" Description="OK" />
</Activity>
</ActivityList>
</Root>
Current/Actual result:
line is showing as null
Since you want xml to come back from the server, it is likely that you need to add an "accept" header to the request to tell the server that you (the client) will "accept" xml. From your post, it appears that the default is html.
This might work for you:
URL url = new URL(urlString);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestProperty("accept", "application/xml");
conn.setDoOutput(true);
OutputStreamWriter writer = new OutputStreamWriter(conn.getOutputStream());
If you are sending xml to the server, you also may want to add a "content-type" header, like this:
conn.setRequestProperty("content-type", "application/xml");
ref: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
There was a bug in my original code that prevented line from ever forming anything other than null.
the correct code is as follows:
URL url = new URL(urlString);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestProperty("accept", "text/html");
conn.setDoOutput(true);
OutputStreamWriter writer = new OutputStreamWriter(conn.getOutputStream());
writer.write(xmlString);
writer.flush();
String builtResponse = "";
String line ="";
BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
while ((line = reader.readLine()) != null) {
builtResponse += line;
}
writer.close();
reader.close();
When I make the SOAP request from SOAP UI it returns normal answer, but when I try from Java code it returns not understandable characters. I tried to convert answer to UTF8 format, but it did not help. Please advise a solution, may be something wrong with my SOAP request. Example of response: OÄžLU, bu it must be OĞLU or MÄ°KAYIL instead of MİKAYIL
URL obj = new URL(url);
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
String userCredentials = username + ":" + password;
String basicAuth = "Basic " + new String(Base64.getEncoder().encode(userCredentials.getBytes()));
con.setRequestProperty("Authorization", basicAuth);
con.setRequestMethod("POST");
con.setRequestProperty("Content-Type", "text/xml");
con.setDoOutput(true);
DataOutputStream wr = new DataOutputStream(con.getOutputStream());
wr.writeBytes(myXML);
wr.flush();
wr.close();
String responseStatus = con.getResponseMessage();
System.out.println(responseStatus);
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
String xmlResponse = response.toString();
I tried:
ByteBuffer buffer = ByteBuffer.wrap(xmlResponse.getBytes("UTF-8"));
String converted = new String(buffer.array(), "UTF-8");
Try this:
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream(), "UTF-8"));
The character encoding is set as part of the Content-Type header.
I believe you're accidentally mixing charsets, which is why it is not displaying properly.
Try adding the charset to Content-Type like so:
con.setRequestProperty("Content-Type", "text/xml; charset=utf-8");
Would you try this?
con.setRequestProperty("Content-Type", "text/xml; charset=utf-8");
I am using Java.Net.URL for making a Rest webservice call.
using the below example code.
URL url = new URL("UrlToConnect");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setDoOutput(true);
conn.setRequestMethod("POST");
conn.setRequestProperty("Content-Type", "application/json");
String input = "{\"qty\":100,\"name\":\"iPad 4\"}";
OutputStream os = conn.getOutputStream();
os.write(input.getBytes());
os.flush();
if (conn.getResponseCode() != HttpURLConnection.HTTP_CREATED) {
throw new RuntimeException("Failed : HTTP error code : "
+ conn.getResponseCode());
}
BufferedReader br = new BufferedReader(new InputStreamReader(
(conn.getInputStream())));
String output;
System.out.println("Output from Server .... \n");
while ((output = br.readLine()) != null) {
System.out.println(output);
}
conn.disconnect();
i am trying to capture response code from this webservice call. I observed that Even after putting a wrong URL i am getting 200 response code from the connection. Please suggest a way by which i can capture response codes 200 , 201 and 202.
Thanks.
I have returning response to client as
return Response.status(200).entity("Data was succesfully loaded into database").build();
I have to read this on client my client code
URL url=new URL(urlString);
// URLConnection connection=url.openConnection();
//connection.setDoOutput(true);
HttpURLConnection httpCon = (HttpURLConnection) url.openConnection();
httpCon.setDoOutput(true);
httpCon.setRequestMethod("POST");
httpCon.setRequestProperty("Content-Type",
"application/json");
how to read these type of responses on client side
Once you have HttpURLConnection you can send data to the server (if this is needed, but looks like as it is, because you have POST request):
DataOutputStream wr = new DataOutputStream(httpCon.getOutputStream());
wr.writeBytes(yourData);
wr.flush();
wr.close();
Then you can check for response code (for e.g. if it is 200):
int responseCode = httpCon.getResponseCode();
And read data from response:
BufferedReader in = new BufferedReader(
new InputStreamReader(httpCon.getInputStream()));
String line;
StringBuffer response = new StringBuffer();
while ((line = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
If you want to parse JSON you can use org.json or Gson.
I'm trying to read http://www.meuhumor.com.br/ on java using this:
URL url;
HttpURLConnection connection = null;
try{
url = new URL(targetURL);
connection = (HttpURLConnection)url.openConnection();
connection.setRequestMethod("POST");
connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
connection.setRequestProperty("Content-Language", "en-US");
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11");
connection.setUseCaches(false);
connection.setDoInput(true);
connection.setDoOutput(true);
DataOutputStream dataout = new DataOutputStream(connection.getOutputStream());
dataout.flush();
dataout.close();
InputStream is = connection.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line;
StringBuffer response = new StringBuffer();
while((line = br.readLine()) != null){
response.append(line);
response.append('\n');
}
br.close();
String html = response.toString();
I can access the website using any browser, but when i try to get the html with Java im getting java.io.IOException: Server returned HTTP response code: 403 for URL:
Someone know a way to get the html?
You are most likely getting an HTTP 403 response because your POST request has no body. Your code looks like it's trying to submit a form. If your intention was to simply pull down the page content without submitting a form, try a GET request, remove the Content-Type header, remove connection.setDoOutput(true), and remove the 3 DataOutputStream lines.