Incorrect Java HttpClient's response stream - java

In my application I need to parse a website and save some data from ir to the database. I am using HttpClient to get the page content. My code looks like this:
HttpClient client = new DefaultHttpClient();
System.out.println(doc.getUrl());
HttpGet contentGet= new HttpGet(siteUrl + personUrl);
HttpResponse response = client.execute(contentGet);
String html = convertStreamToString(response.getEntity().getContent());
/*
parse the page
*/
/***********************************************************************/
public static String convertStreamToString(InputStream is) throws Exception {
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
StringBuilder sb = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null) {
sb.append(line + "\n");
}
is.close();
return sb.toString();
}
I am doing this in a loop - I try to get content of some pages (their structure is the same). Sometimes it works fine, but unfortunately, my response in many cases is a sequence of similar trash liek this:
�=�v7���9�Hdz$�d7/�$�st��؎I��X^�$A6t_D���!gr�����C^��k#��MQ�2�d�8�]
I
I don't know where is the problem, please help me.
I have displayed headers of all responses that I got. For correct ones, there are:
Server : nginx/1.0.13
Date : Sat, 23 Mar 2013 21:50:31 GMT
Content-Type : text/html; charset=utf-8
Transfer-Encoding : chunked
Connection : close
Vary : Accept-Encoding
Expires : Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control : no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma : no-cache
Set-Cookie : pfSC=1; path=/; domain=.profeo.pl
Set-Cookie : pfSCvp=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.profeo.pl
For incorrect ones:
Server : nginx/1.2.4
Date : Sat, 23 Mar 2013 21:50:33 GMT
Content-Type : text/html
Transfer-Encoding : chunked
Connection : close
Set-Cookie : pfSCvp=3cff2422fd8f9b6e57e858d3883f4eaf; path=/; domain=.profeo.pl
Content-Encoding : gzip
Any other suggestions? My guess is that this gzip encoding is a problem here, but what can I do about it?

This probably has to do with some websites using a different character encoding in their response than your JVM default. To convert from a raw byte stream, like those provided by InputStreams, to a character stream (or a String), you have to choose a character encoding. HTTP responses can use different encodings, but they'll typically tell you what encoding they're using. You could do this manually by finding the "Content-Encoding" header of the HttpResponse, but your library provides a utility for doing this, since it's a common need. It's found in the EntityUtils class, and you can use it like so:
String html = EntityUtils.toString(response.getEntity());
You'll have to add
import org.apache.http.util.EntityUtils;
to the top of your file for that to work.
If that doesn't help, another possibility is that some of the URLs you're retrieving are binary, not textual, in which case the things you're trying to do don't make sense. If that's the case, you can possibly try to distinguish between the textual responses and the binary responses by checking Content-Type header, like so:
boolean isTextual = response.getFirstHeader("Content-Type").getValue().startsWith("text");
NEW MATERIAL:
After looking at the HTTP headers you added to your question, my best guess is that this is being caused by gzip compression of the responses. You can find more info on how to deal with that in this question, but the short version is that you should try using ContentEncodingHttpClient instead of DefaultHttpClient.
Another edit: ContentEncodingHttpClient is now deprecated, and you're supposed to use DecompressingHttpClient instead.

You need a httpclient which don't use compression.
I use this HttpClientBuilder.create().disableContentCompression().build() httpclient

Related

Getting error -> Invalid header field name, with 32 [duplicate]

I am new to Flutter and I am trying to call my ASP.NET server web API.
From the logs on my server, everything goes fine but Android Studio throws an exception: "invalid header field name".
Here is the code in dart:
import 'package:http/http.dart' as http;
...
_getService() async {
String result;
try {
var url = 'http://192.168.1.14:34263/api/Mobile/test/1';
Future<http.Response> response = http.get( url );
result = response.toString();
} catch(exception){
result = exception.toString();
debugPrint(result);
}
...
}
Here is the response header (obtained via Chrome):
Access-Control-Allow-Headers:accept, authorization, Content-Type
Access-Control-Allow-Methods: GET, POST, OPTIONS, PUT, PATCH, DELETE
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: WWW-Authenticate
Cache-Control: no-cache
Content-Encoding: deflate
Content-Length:79
Content-Type: application/xml; charset=utf-8
Date: Thu, 08 Mar 2018 01:01:25 GMT
Expires:-1
Pragma:no-cache
Server:MyTestServer
X-Content-Type-Options:NOSNIFF
X-Permitted-Cross-Domain-Policies:master-only
X-SourceFiles:=?UTF-8?BDpcTXlJbmNyZWRpYmxlRHJlc3NpbmdcTXlJbmNyZWRpYmxlRHJlc3NpbmdcTXlJbmNyZWRpYmxlRHJlc3NpbmdcYXBpXE1vYmlsZVxjb3Vjb3VcMQ==?=
X-XSS-Protection:1;mode=block
Here is the answer which is returned:
<string xmlns="http://schemas.microsoft.com/2003/10/Serialization/">test</string>
Can anyone tell me what am I doing wrong?
Many thanks
Ok, I finally found out, by debugging the code.
In fact, my server added a series of field names in the response's header (via the Web.config) and the last character of one of these field names was a space.
As a result, the http_parser.dart threw an exception since spaces are no authorized characters in header field name.
Nothing was detected by Chrome (or any browser) nor by Postman.
I had similar problem and after some heavy debugging
I removed these headers from nginx:
#add_header X−Content−Type−Options nosniff;
#add_header X−Frame−Options SAMEORIGIN;
#add_header X−XSS−Protection 1;
and it works fine. So most likely it's backend - header related issue

How I download a document from a envelop in DocuSign?

I need to download a document from an envelope using the rest services of DocuSign.
The system that I'm working is in Javascript but uses bibles of Java.
I'm doing the call of the method via java.net.URL, and I can't the get the bytes of the file to use on the system.
I tried to read the InputStream returned, but this doesn't return an XML with base64.
var url = new java.net.URL('https://demo.docusign.net/restapi/v2.1/accounts/0c2ddaae-e258-4ade-a435-e4ee50fd2542/envelopes/c60565e2-40d9-43f3-bb2d-58e086c20fca/documents/1');
var connection = url.openConnection();
connection.setDoOutput(true);
connection.setRequestMethod("GET");
connection.setRequestProperty("X-DocuSign-Authentication", '{"Username":"user","Password":"password=","IntegratorKey": "guid"}');
connection.setRequestProperty("content-type", "text/xml;charset=UTF-8");
if(connection.getResponseCode() == 200){
try{
var retorno = new java.io.BufferedReader(new java.io.InputStreamReader(connection.getInputStream()));
var retData = new java.lang.StringBuilder();
var line;
while((line = retorno.readLine()) != null){
retData.append(line);
}
var strData = retData.toString();
When I use the SoapUi, I receive this:
JVBERi0xLjQKJfv8/blablalblalblalba
But in my code, I receive something like this:
HTTP/1.1 200 OK Cache-Control: no-cache Content-Length: 122448
Content-Type: application/pdf X-RateLimit-Reset: 1561921200
X-RateLimit-Limit: 1000 X-RateLimit-Remaining: 955
X-DocuSign-TraceToken: c5710b05-b13c-460f-b04a-1e683471934e
Content-Disposition: file; filename=blank1.pdf; documentid=1 Date:
Sun, 30 Jun 2019 18:51:47 GMT
You use the EnvelopeDocuments::get API method.
See the code example.
See the Java example implementation
Note that documentId can be the id of a specific document in the envelope or one of the reserved values:
combined -- will download a single PDF containing all of the envelope's documents
archive -- will download a zip file.
Your code example implies that you are trying to optimize the download by streaming the data your destination. These days, with cheap memory (real and virtual), I suggest that you simply download the document to memory and then deal with it.
You can later optimize to use streams if necessary.

HttpPost form submission error

I'm trying to login to a grade database with HttpClient. I send it valid LogOnDetails.Username and LogOnDetails.Password information, but whenever I submit it consistently sends back errors. I'm not sure where it's happening, it may log in successfully and then hang up, or it may not even be making it in. here is the html it outputs after the request:
output
and this is my code:
HttpClient client = new DefaultHttpClient();
HttpPost post = new HttpPost("https://home-access.cfisd.net/HomeAccess/Account/LogOn");
List<NameValuePair> list = new ArrayList<NameValuePair>();
list.add(new BasicNameValuePair("LogOnDetails.Username", "s491670"));
list.add(new BasicNameValuePair("LogOnDetails.Password","qrrp4ji6t"));
post.setEntity(new UrlEncodedFormEntity(list));
HttpResponse response = client.execute(post);
BufferedReader file = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));
PrintWriter pw = new PrintWriter(new FileWriter(new File("output.txt")));
String line = null;
while((line = file.readLine())!=null)
pw.println(line);
pw.close();
if anyone could shed some light on this, I'd love them forever. if HTML for the login form is needed let me know. thanks!
When the form submits the following form data is sent:
Form Data
Database:10
LogOnDetails.UserName:sadf
LogOnDetails.Password:sdf
Add the following:
list.add(new BasicNameValuePair("Database", "10")); // or 20
As you are well aware I can't test this.
If you add #robbmj's patch, you do not get the 500 error page, but a 302 (moved), which is a good start:
<html><head><title>Object moved</title></head><body>
<h2>Object moved to here.</h2>
</body></html>
You can advise the http-client to handle redirects by itself, but depending on the version of http-client, handling redirects is done differently (they're always refactoring this code). Which version are you using?
Maybe the 302 means that it was all successful and I'm logged in now. ;)
Headers returned:
Cache-Control : private
Content-Type : text/html; charset=utf-8
Location : /HomeAccess/
Server : Microsoft-IIS/7.5
X-AspNetMvc-Version : 4.0
X-AspNet-Version : 4.0.30319
Set-Cookie : ASP.NET_SessionId=hvjw3jqjoaa5ohofaaxu4od1; path=/; HttpOnly
Set-Cookie : .AuthCookie=; expires=Tue, 12-Oct-1999 05:00:00 GMT; path=/; HttpOnly
Set-Cookie : .AuthCookie=0863B972684CC784E4D9D5594354B6F08FF6FF7225836F01A9715D0ABA633042946B032987F7926588610F5FB7C18757CE759338B75E341DF56DB3FB71BC326B3D6E49EA94EEE43B39FCC84BB98F236CA0D63CE668E14434169C6B835FA671DD; path=/; HttpOnly
X-Powered-By : ASP.NET
Date : Thu, 12 Mar 2015 23:37:12 GMT
Content-Length : 129

Restlet Can't parse payload of a GET request

I am using Restlet in java as a simple client to make calls to a RESTful service provided by tastypie (python django)
I am monitoring the payload through tcpmon and I am seeing the following payload get returned however i can't get at the data in the payload.
Client calls
ClientResource resource = new ClientResource("http://localhost/someplace/");
String rep = resource.get(String.class);
Payload
HTTP/1.0 200 OK
Date: Thu, 12 Jul 2012 20:22:12 GMT
Server: WSGIServer/0.1 Python/2.7.1
Vary: Cookie
Content-Type: application/json; charset=utf-8
Set-Cookie: sessionid=63c5ea23113073e489cb8920819f37d; expires=Thu, 26-Jul-2012 20:22:12 GMT; httponly; Max-Age=1209600; Path=/
{"jsonData": "someData"}
When i debug into restlet i notice that the InboundWay.createEntity(..) is setting the data to EmptyRepresentation due to the lack of length, chunkEncoding, or connection headers. Why is this? Why can't I just stream the data?
Does any one know why this is happening? Is there a better client? So far tried jersey and Resteasy with limited success. (jeresy has a bug where it is trying to mark and reset a autoCloseSteam, and resteasy was just a pain to use as a client) I was hoping not to have to write an HTTPClient.
Did a bit more digging it looks like RESTLET is doing something quite silly in my oppinion.
It is looking a the response headers and trying to find if it knows the size, or if its chunkedEncoding, or if the Connection:closed is specified. Otherwise it does NOT try to parse the payload ... Does anyone know why this is? I didn't realize that those headers are required in any way. Why can't we use the ClosingInputStream when the connection close is not specified...
Restlet Code
public Representation createInboundEntity(Series<Parameter> headers) {
Representation result = null;
long contentLength = HeaderUtils.getContentLength(headers);
boolean chunkedEncoding = HeaderUtils.isChunkedEncoding(headers);
// In some cases there is an entity without a content-length header
boolean connectionClosed = HeaderUtils.isConnectionClose(headers);
// Create the representation
if ((contentLength != Representation.UNKNOWN_SIZE && contentLength != 0)
|| chunkedEncoding || connectionClosed) {
InputStream inboundEntityStream = getInboundEntityStream(
contentLength, chunkedEncoding);
ReadableByteChannel inboundEntityChannel = getInboundEntityChannel(
contentLength, chunkedEncoding);
...

Http Post not posting data

I'm trying to post some data from a Java client using sockets. It talks to localhost running php code, that simply spits out the post params sent to it.
Here is Java Client:
public static void main(String[] args) throws Exception {
Socket socket = new Socket("localhost", 8888);
String reqStr = "testString";
String urlParameters = URLEncoder.encode("myparam="+reqStr, "UTF-8");
System.out.println("Params: " + urlParameters);
try {
Writer out = new OutputStreamWriter(socket.getOutputStream(), "UTF-8");
out.write("POST /post3.php HTTP/1.1\r\n");
out.write("Host: localhost:8888\r\n");
out.write("Content-Length: " + Integer.toString(urlParameters.getBytes().length) + "\r\n");
out.write("Content-Type: text/html\r\n\n");
out.write(urlParameters);
out.write("\r\n");
out.flush();
InputStream inputstream = socket.getInputStream();
InputStreamReader inputstreamreader = new InputStreamReader(inputstream);
BufferedReader bufferedreader = new BufferedReader(inputstreamreader);
String string = null;
while ((string = bufferedreader.readLine()) != null) {
System.out.println("Received " + string);
}
} catch(Exception e) {
e.printStackTrace();
} finally {
socket.close();
}
}
This is how post3.php looks like:
<?php
$post = $_REQUEST;
echo print_r($post, true);
?>
I expect to see an array (myparams => "testString") as the response. But its not passing post args to server.
Here is output:
Received HTTP/1.1 200 OK
Received Date: Thu, 25 Aug 2011 20:25:56 GMT
Received Server: Apache/2.2.17 (Unix) mod_ssl/2.2.17 OpenSSL/0.9.8r DAV/2 PHP/5.3.6
Received X-Powered-By: PHP/5.3.6
Received Content-Length: 10
Received Content-Type: text/html
Received
Received Array
Received (
Received )
Just a FYI, this setup works for GET requests.
Any idea whats going on here?
As Jochen and chesles rightly point out, you are using the wrong Content-Type: header - it should indeed be application/x-www-form-urlencoded. However there are several other issues as well...
The last header should be seperated from the body by a blank line between the headers and the body. This should be a complete CRLF (\r\n), in your code it is just a new line (\n). This is an outright protocol violation and I'm a little surprised you haven't just got a 400 Bad Request back from the server, although Apache can be quite forgiving in this respect.
You should specify Connection: close to ensure that you are not left hanging around with open sockets, the server will close the connection as soon as the request is complete.
The final CRLF sequence is not required. PHP is intelligent enough to sort this out by itself, but other server languages and implementations may not be...
If you are working with any standardised protocol in it's raw state, you should always start by at least scanning over the RFC.
Also, please learn to secure your Apache installs...
It looks like you are trying to send data in application/x-www-form-urlencoded format, but you are setting the Content-Type to text/html.
Use
out.write("Content-Type: application/x-www-form-urlencoded\n\n");
instead. As this page states:
The Content-Length and Content-Type headers are critical because they tell the web server how many bytes of data to expect, and what kind, identified by a MIME type.
For sending form data, i.e. data in the format key=value&key2=value2 use application/x-www-form-urlencoded. It doesn't matter if the value contains HTML, XML, or other data; the server will interpret it for you and you'll be able to retrieve the data as usual in the $_POST or $_REQUEST arrays on the PHP end.
Alternatively, you can send your data as raw HTML, XML, etc. using the appropriate Content-Type header, but you then have to retrieve the data manually in PHP by reading the special file php://input:
<?php
echo file_get_contents("php://input");
?>
As an aside, if you're using this for anything sufficiently complex, I would strongly recommend the use of an HTTP client library like HTTPClient.

Categories

Resources