Java URLConnection utf-8 encoding doesn't work - java

I'm writing a small crawler for sites in English only, and doing that by opening a URL connection. I set the encoding to utf-8 both on the request, and the InputStreamReader but I continue to get gobbledigook for some of the requests, while others work fine.
The following code represents all the research I did and advice out there. I have also tried changing URLConnection to HttpURLConnection with no luck. Some of the returned strings continue to look like this:
??}?r?H????P?n?c??]?d?G?o??Xj{?x?"P$a?Qt?#&??e?a#?????lfVx)?='b?"Y(defUeefee=??????.??a8??{O??????zY?2?M???3c??#
What am I missing?
My code:
public static String getDocumentFromUrl(String urlString) throws Exception {
String wholeDocument = null;
URL url = new URL(urlString);
URLConnection conn = url.openConnection();
conn.setRequestProperty("Content-Type", "text/plain; charset=utf-8");
conn.setRequestProperty("Accept-Charset", "utf-8");
conn.setConnectTimeout(60*1000); // wait only 60 seconds for a response
conn.setReadTimeout(60*1000);
InputStreamReader isr = new InputStreamReader(conn.getInputStream(), "utf-8");
BufferedReader in = new BufferedReader(isr);
String inputLine;
while ((inputLine = in.readLine()) != null) {
wholeDocument += inputLine;
}
isr.close();
in.close();
return wholeDocument;
}

The server is sending the document GZIP compressed. You can set the Accept-Encoding HTTP header to make it send the document in plain text.
conn.setRequestProperty("Accept-Encoding", "identity");
Even so, the HTTP client class handles GZIP compression for you, so you shouldn't have to worry about details like this. What seems to be going on here is that the server is buggy: it does not send the Content-Encoding header to tell you the content is compressed. This behavior seems to depend on the User-Agent, so that the site works in regular web browsers but breaks when used from Java. So, setting the user agent also fixes the issue:
conn.setRequestProperty("User-Agent", "Mozilla/5.0"); // for example

Related

Adding value to path parameter in Java REST?

NOTICE UPDATE!!
The problem got solved and i added my own answer in the thread
In short, I have attempted to add the parameter "scan_id" value but since it is a POST i can't add the value directly in the url path.
using the code i already have, how would i go about modifying or adding so that the url is correct, that is, so that it accepts my POST?.
somehow i have been unable to find any examples that have helped me in figuring out how i would go about doing this..
I know how to do a POST with a payload, a GET with params. but a post with Params is very confusing to me.
Appreciate any help. (i'd like to continue using HttpUrlConnection unless an other example is provided that also tells me how to send the request and not only configuring the path.
I've tried adding it to the payload.
I've tried UriBuilder but found it confusing and in contrast with the rest of my code, so wanted to ask for help with HttpUrlConnection.
URL url = new URL("http://localhost/scans/{scan_id}/launch");
HttpURLConnection con = (HttpURLConnection) url.openConnection();
con.setRequestMethod("POST");
con.setRequestProperty("tmp_value_dont_mind_this", "432432");
con.setRequestProperty("X-Cookie", "token=" + "43432");
con.setRequestProperty("X-ApiKeys", "accessKey="+"43234;" + " secretKey="+"43234;");
con.setDoInput(true);
con.setDoOutput(true); //NOT NEEDED FOR GETS
con.setRequestMethod("POST");
con.setRequestProperty("Accept", "application/json");
con.setRequestProperty("Content-Type", "application/json; charset=UTF-8");
//First example of writing (works when writing a payload)
OutputStreamWriter writer = new OutputStreamWriter(con.getOutputStream(), "UTF-8");
writer.write(payload);
writer.close();
//second attemp at writing, doens't work (wanted to replace {scan_id} in the url)
DataOutputStream writer = new DataOutputStream(con.getOutputStream());
writer.writeChars("scan_id=42324"); //tried writing directly
//writer.write(payload);
writer.close();
Exception:
Exception in thread "main" java.io.IOException: Server returned HTTP response code: 400 for URL: http://localhost/scans/launch
I'd like one of the three response codes because then i know the Url is correct:
200 Returned if the scan was successfully launched.
403 Returned if the scan is disabled.
404 Returned if the scan does not exist.
I've tried several urls
localhost/scans/launch,
localhost/scans//launch,
localhost/scans/?/launch,
localhost/scans/{scan_id}/launch,
So with the help of a friend and everyone here i solved my problem.
The below code is all the code in an entire class explained bit by bit. at the bottom you have the full class with all its syntax etc, that takes parameters and returns a string.
in a HTTP request there are certain sections.
Such sections include in my case, Request headers, parameters in the Url and a Payload.
depending on the API certain variables required by the API need to go into their respective category.
My ORIGINAL URL looked like this: "http://host:port/scans/{scan_id}/export?{history_id}"
I CHANGED to: "https://host:port/scans/" + scan_Id + "/export?history_id=" + ID;
and the API i am calling required an argument in the payload called "format" with a value.
String payload = "{\"format\" : \"csv\"}";
So with my new URL i opened a connection and set the request headers i needed to set.
HttpsURLConnection con = (HttpsURLConnection) url.openConnection();
The setDoOutput should be commented out when making a GET request.
con.setDoInput(true);
con.setDoOutput(true);
con.setRequestMethod("POST");
con.setRequestProperty("Accept", "application/json");
con.setRequestProperty("Content-Type", "application/json; charset=UTF-8");
con.setRequestProperty("X-Cookie", "token=" + token);
con.setRequestProperty("X-ApiKeys", "accessKey="+"23243;" +"secretKey="+"45543;");
Here i write to the payload.
//WRITING THE PAYLOAD to the http call
OutputStreamWriter writer = new OutputStreamWriter(con.getOutputStream(), "UTF-8");
writer.write(payload);
writer.close();
After i've written the payload i read whatever response i get back (this depends on the call, when i do a file download (GET Request) i don't have a response to read as i've already read the response through another piece of code).
I hope this helps anyone who might encounter this thread.
public String requestScan(int scan_Id, String token, String ID) throws MalformedInputException, ProtocolException, IOException {
try {
String endpoint = "https://host:port/scans/" + scan_Id + "/export?history_id=" ID;
URL url = new URL(endpoint);
String payload= "{\"format\" : \"csv\"}";
HttpsURLConnection con = (HttpsURLConnection) url.openConnection();
con.setDoInput(true);
con.setDoOutput(true);
con.setRequestMethod("POST");
con.setRequestProperty("Accept", "application/json");
con.setRequestProperty("Content-Type", "application/json; charset=UTF-8");
con.setRequestProperty("X-Cookie", "token=" + token);
con.setRequestProperty("X-ApiKeys", "accessKey="+"324324;" +
"secretKey="+"43242;");
//WRITING THE PAYLOAD to the http call
OutputStreamWriter writer = new OutputStreamWriter(con.getOutputStream(), "UTF-8");
writer.write(payload);
writer.close();
//READING RESPONSE
BufferedReader br = new BufferedReader(new InputStreamReader(con.getInputStream()));
StringBuffer jsonString = new StringBuffer();
String line;
while ((line = br.readLine()) != null) {
jsonString.append(line);
}
br.close();
con.disconnect();
return jsonString.toString();
} catch (Exception e) {
throw new RuntimeException(e.getMessage());
}
}
As discussed here the solution would be to change the content type to application/x-www-form-urlencoded, but since you are already using application/json; charset=UTF-8 (which I am assuming is a requirement of your project) you have no choise to redesign the whole thing. I suggest you one of the following:
Add another GET service;
Add another POST service with content type application/x-www-form-urlencoded;
Replace this service with one of the above.
Do not specify the content type at all so the client will accept anything. (Don't know if possible in java)
If there are another solutions I'm not aware of, I don't know how much they would be compliant to HTTP protocol.
(More info)
Hope I helped!
Why you are not using like this. Since you need to do a POST with HttpURLConnection, you need to write the parameters to the connection after you have opened the connection.
String urlParameters = "scan_id=42324";
byte[] postData = urlParameters.getBytes(StandardCharsets.UTF_8);
DataOutputStream dataOutputStream = new DataOutputStream(conn.getOutputStream());
dataOutputStream.write(postData);
Or if you have launch in the end, just change the above code to the following,
String urlParameters = "42324/launch";
byte[] postData = urlParameters.getBytes(StandardCharsets.UTF_8);
DataOutputStream dataOutputStream = new DataOutputStream(conn.getOutputStream());
dataOutputStream.write(postData);
URL url = new URL("http://localhost/scans/{scan_id}/launch");
That line looks odd to me; it seems you are trying to use a URL where you are intending the behavior of a URI Template.
The exact syntax will depend on which template implementation you choose; an implementation using the Spring libraries might look like:
import org.springframework.web.util.UriTemplate;
import java.net.url;
// Warning - UNTESTED code ahead
UriTemplate template = new UriTemplate("http://localhost/scans/{scan_id}/launch");
Map<String,String> uriVariables = Collections.singletonMap("scan_id", "42324");
URI uri = template.expand(uriVariables);
URL url = uri.toURL();

SSL peer shut down incorrectly

This is my first post here. I am a hobbyist so please bear with me.
I am attempting to to grab a webpage from https://eztv.it/shows/1/24/ with the following code.
public static void WriteHTMLToFile(String URL){
try {
URI myURI=new URI(URL);
URL url = myURI.toURL();
HttpsURLConnection con= (HttpsURLConnection)url.openConnection();
File myFile=new File("c:\\project\\Test.txt");
myFile.createNewFile();
FileWriter wr=new FileWriter(myFile);
InputStream ins=con.getInputStream();
InputStreamReader isr= new InputStreamReader(ins);
BufferedReader reader = new BufferedReader(isr);
String line;
while ((line = reader.readLine()) != null) {
wr.write(line+"\n");
}
reader.close();
wr.close();
}
catch(Exception e){
log(e.toString());
}
}
When I run this I get the following:
javax.net.ssl.SSLException: SSL peer shut down incorrectly
If I run the above code on this URL: https://eztv.it/shows/887/the-blacklist/ it works as intended. The difference between the two URL file sizes seems to be a contributing factor. In testing different URLs to the same server the above code only seemed to work for files less that ~30Kb. Anything over would generate the above exception.
I figured it out. The server is responding with gzip encoding once file sizes are over a certain size.
con.setRequestProperty("Accept-Encoding", "gzip, deflate, sdch");
was added to the request header as well as some code to handle the gzip stream.

Why HttpURLConnection does not send data unless I try to receive something

I cannot comprehend why doesn't the following code does not put a packet onto wire (confirmed via wireshark). It is a fairly standard method of sending an HTTP POST request, as I believe. I don't intend to read anything just POST.
private void sendRequest() throws IOException {
String params = "param=value";
URL url = new URL(otherUrl.toString());
HttpURLConnection con = (HttpURLConnection)url.openConnection();
con.setDoOutput(true);
con.setDoInput(true); //setting this to `false` does not help
con.setRequestMethod("POST");
con.setRequestProperty("Content-Type", "text/plain");
con.setRequestProperty("Content-Length", "" + Integer.toString(params.getBytes().length));
con.setRequestProperty("Accept", "text/plain");
con.setUseCaches(false);
con.connect();
DataOutputStream wr = new DataOutputStream(con.getOutputStream());
wr.writeBytes(params);
wr.flush();
wr.close();
//Logger.getLogger("log").info("URL: "+url+", response: "+con.getResponseCode());
con.disconnect();
}
What happens is... actually nothing, unless I try to read anything. For example by uncommenting the above log line which reads the response code. Trying to read a response via con.getInputStream(); also works. There is no movement of packets. When I uncomment the getResponseCode, I can see that http POST is sent, and then 200 OK is sent back. The order is proper. I.e. I don't get some wild response before sending POST. Everything else looks exactly the same (I can attach wireshark screenshots if needed.). In the debugger the code executes (i.e. does not block anywhere).
I don't understand under what circumstances this can be happening. I belive it should be possible, to send a POST request with con.setDoInput(false);. Currently it doesn't send anything or fails (when trying to execute con.getResponseCode()) with an exception because I obviously promised I won't read anything.
It might be relevant, that before sendRequest I do request some data from the same site, but I trust I close everything properly. I.e:
public static String getData(String urlAddress) throws MalformedURLException, IOException {
URL url = new URL(urlAddress);
HttpURLConnection con = (HttpURLConnection)url.openConnection();
con.setDoOutput(false);
InputStream in = con.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder data = new StringBuilder();
String line;
while((line = reader.readLine()) != null) {
data.append(line);
}
reader.close();
in.close();
con.getResponseCode();
con.disconnect();
return data.toString();
}
The server for url in both cases is the same, port also, so I believe it is possible to use the same socket for communication. The above code works and retrieves the data properly.
I am not sure, maybe I don't clean something, and it gets cached, so with out an explicit read the POST gets delayed. There is no other traffic on the socket.
Unless you're using fixed-length or chunked transfer mode, HttpURLConnection will buffer all your output until you call getInputStream() or getResponseCode(), so that it can send a correct Content-length header.
If you call getResponseCode() you should have a look at its value.

Posting minutiae byte array from applet to server

In Grails web application, I am trying to post minutiae (finger print) byte array from applet to server using rest API.
This what i tried so for
private String post(String purl,String customerId, byte[] regMin1,byte[] regMin2) throws Exception {
StringBuilder parameters = new StringBuilder();
parameters.append("customerId=");
parameters.append(customerId);
parameters.append("&regMin1=");
parameters.append(URLEncoder.encode(new String(regMin1),"UTF-8"));
parameters.append("&regMin2=");
parameters.append(URLEncoder.encode(new String(regMin2),"UTF-8"));
URL url = new URL(purl);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setDoOutput(true);
connection.setDoInput(true);
connection.setRequestMethod("POST");
connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
connection.setRequestProperty("Content-Length",Integer.toString(parameters.toString().getBytes().length));
DataOutputStream wr = new DataOutputStream(connection.getOutputStream ());
wr.writeBytes(parameters.toString());
wr.flush();
wr.close();
BufferedReader in = new BufferedReader(new InputStreamReader(
connection.getInputStream()));
StringBuilder builder = new StringBuilder();
String aux = "";
while ((aux = in.readLine()) != null) {
builder.append(aux);
}
in.close();
connection.disconnect();
return builder.toString();
}
I can post regMin1, regMin2 successfully but fingerprint verification always failing. I doubt, am i posting correctly.
This looks like a very bad idea to me:
parameters.append(URLEncoder.encode(new String(regMin1),"UTF-8"));
...
parameters.append(URLEncoder.encode(new String(regMin2),"UTF-8"));
If regMin1 and regMin2 aren't actually UTF-8 text (and my guess is that they're not) you'll almost certainly be losing data here.
Don't treat arbitrary binary data as if it's encoded text.
Instead, convert regMin1 and regMin2 to base64 - that way you'll end up with ASCII characters which you can then decode on the server to definitely get the original binary data. You can use a URL-safe version of base64 to avoid having to worry about further encoding the result.
There's a good public domain base64 library you can use for this if you don't have anything else to hand. So for example:
parameters.append("&regMin1=")
.append(Base64.encodeBytes(regMin1, Base64.URL_SAFE))
.append("&regMin2=")
.append(Base64.encodeBytes(regMin2, Base64.URL_SAFE));
Note that you'd want to decode with the URL_SAFE option as well - don't just try to decode it as "normal" base64 data.
(You might still want to convert this to a POST request, and you'd definitely have an easier time if you could use a better HTTP library, but they're slightly separate concerns.)

How to get the Response Back from the Remote Server using an HttpURLConnection Object?

I am trying to send an HTTP POST Request to a remote server using an instance of the HttpURLConnection class. Although, I am able to get a response code and a response message, when I try to write the input stream into a StringBuffer, I am not able to actually read any lines.
When I analyzed the packets sent from WireShark, I noticed that a full response was being sent from the remote server. My only guess as to why I am not able to see it in the Java program is because the time in which I try to read from the InputStream is too late.
So, how do I read the immediate, full response from the remote server using my HttpURLConnection object? Below is the code that I am using:
HttpURLConnection conn = null;
String urlStr = "...";
URL url = null;
try
{
url = new URL(urlStr);
conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("POST");
conn.setDoInput(true);
conn.setDoOutput(true);
...
BufferedReader rd = new BufferedReader(
new InputStreamReader(conn.getInputStream()));
StringBuilder sb = new StringBuilder();
String line;
while ((line = rd.readLine()) != null)
{
sb.append(line);
}
rd.close();
...
}...
Okay, never mind. It turns out that what I was looking for was in the HTTP Respone's header. So, I got what I needed by looking through its headers. ::Face Palm::

Categories

Resources