okhttp content-length is -1 with big files - java

I am downloading a file with okhttp and things work fine - now I want to show the progress and hit a road-bump. The returned content-length is -1.
It comes back correctly from the server:
⋊> ~ curl -i http://ipfs.io/ipfs/QmRMHb4Vhv8LtYqw8RkDgkdZYxJHfrfFeQaHbNUqJYmdF2 13:38:11
HTTP/1.1 200 OK
Date: Tue, 14 Jun 2016 11:38:16 GMT
Content-Type: application/octet-stream
Content-Length: 27865948
I traced the problem down to OkHeaders.java here:
public static long contentLength(Headers headers) {
return stringToLong(headers.get("Content-Length"));
}
I see all the other headers here in headers - but not Content-Length - so headers.get("Content-Length") returns null. Anyone has a clue how this can get lost?
Interestingly if I change the url to "http://google.com" I get a content-length from okhttp - but with curl both look same Content-Length wise - this really confuses me
Update: it seems to correlate with he size of the file. If I use smaller content from the same server I get a Content-Length with okhttp. The problem only happens when the file is big

It looks like above a certain size the server uses chunked encoding and you won't get a content length.
HTTP/1.1 200 OK
Date: Tue, 14 Jun 2016 14:30:07 GMT
Content-Type: application/octet-stream
Transfer-Encoding: chunked

Related

Apache HTTP Client throws NoHttpResponseException When Nginx Ingress Reloaded for POST

When we reload the Nginx Ingress config, we get the NoHttpResponseException for some of our POST requests. This does not occur in neither OkHttp client or just plain ab -c 100 -n 1000 https://...
Using 4.5.7, the latest one, and disabled the Gzip compression for visibility. Put a break point in DefaultHttpResponseParser in:
#Override
protected HttpResponse parseHead(
final SessionInputBuffer sessionBuffer) throws IOException, HttpException {
//read out the HTTP status string
int count = 0;
ParserCursor cursor = null;
do {
// clear the buffer
this.lineBuf.clear();
final int i = sessionBuffer.readLine(this.lineBuf);
if (i == -1 && count == 0) {
// The server just dropped connection on us
throw new NoHttpResponseException("The target server failed to respond");
}
When an error occurs, we observe the buffer has the following contents:
0
1.1 200 OK
Server: nginx/1.15.5
Date: Tue, 19 Mar 2019 08:51:27 GMT
Content-Type: application/json;charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding
Strict-Transport-Security: max-age=15724800; includeSubDomains
10
{"success":true}
But for the regular requests, it has the following contents, which makes more sense:
HTTP/1.1 200 OK
Server: nginx/1.15.5
Date: Tue, 19 Mar 2019 08:52:30 GMT
Content-Type: application/json;charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding
Strict-Transport-Security: max-age=15724800; includeSubDomains
10
{"success":true}
Now, I am not sure what is wrong, because both okhttp and ab works correctly. Tried many versions, but it seems to remain.

Downloading from dropbox url ignores range

If i want to download a file from a dropbox url my http header range is ignored:
httpRequest = new HttpGet(url.toURI());
httpRequest.addHeader("Range", "bytes=" + startPos + "-" + dwnInfo.getStopRange());
httpRequest.addHeader("Accept-Encoding", "");
So instead of making my file download in x chunks of 5mb for ex, the connection ignores the specified range and it downloads x chunks of Y mb, where y is the full size of the file.
Downloading from an amazon storange link i don't have any problems.
Anyone else encountered this situation ? This only happens from some days ago. This wasn't a issue until now.
I tried to look on dropbox dev page but didn't see anything that specifies if they removed the accepted range on urls
The link you gave is to an HTML page (total size ~46KB), so even if range retrieval worked there, it wouldn't be very useful.
Per https://www.dropbox.com/help/201/en, you can turn a share link into a direct link to the file by changing the domain to dl.dropboxusercontent.com, so your link becomes https://dl.dropboxusercontent.com/s/5c7atlfmacjf3qn/02%20Armin%20Van%20Buuren%20-%20A%20State%20Of%20Trance%20Year%20Mix%202013%20%28Cd%202%29.mp3, and range retrieval works for that URL.
(Here I'm using httpie.)
$ http get https://dl.dropboxusercontent.com/s/5c7atlfmacjf3qn/02%20Armin%20Van%20Buuren%20-%20A%20State%20Of%20Trance%20Year%20Mix%202013%20%28Cd%202%29.mp3 range:bytes=0-0
HTTP/1.1 206 PARTIAL CONTENT
Connection: keep-alive
Content-Length: 1
Content-Type: audio/mpeg
Date: Wed, 18 Jun 2014 14:53:32 GMT
Server: nginx
accept-ranges: bytes
cache-control: max-age=0
content-range: bytes 0-0/146014047
etag: 346n
pragma: public
set-cookie: uc_session=2cqmevWxG8lmGt743KMXebc23dRC5iuZEfm8Etx6V2VShWk60jmnUJajFnH1wRG4; Domain=dropboxusercontent.com; Path=/; secure; httponly
x-dropbox-request-id: 2f0c5986a62cf2f0b06af1704ece5bd7
x-server-response-time: 535
I

App-Engine directing endpoint to previous class name, now 404 not found

I've been using App-Engine as the backend for an Android and iOS application. It's been working without problem with both the local development server (over http) and actual app-engine (over https).
Then I noticed that, while renaming endpoints, I accidentally duplicated a word in the class name of an endpoint: RegionRegionIconsEndpoint instead of simply RegionIconsEndpoint. It was a 1-line fix.
public class RegionRegionIconsEndpoint {
#ApiMethod(name = "getRegionIcons", path="regionIcons", httpMethod = HttpMethod.POST)
public RegionInfoVersion.RegionIcons getRegionIcons(User user, #Named("id") String id)
throws OAuthRequestException {
...
}
}
became
public class RegionIconsEndpoint {
#ApiMethod(name = "getRegionIcons", path="regionIcons", httpMethod = HttpMethod.POST)
public RegionInfoVersion.RegionIcons getRegionIcons(User user, #Named("id") String id)
throws OAuthRequestException {
...
}
}
I generated new cloud-endpoint libraries and continued development using the local development server. All good.
When I deployed it to the real App-Engine service, however, a problem arose. When my app starts, there are a series of calls to other endpoints defined just as the one shown above; these always work fine. Then there are calls to this endpoint. A typical call looks like this:
POST https://my-app.appspot.com/_ah/api/client/v1/regionIcons?id=foo
Authorization is also provided and the expected result comes back most of the time... say 80%. The AE logs look like this:
2014-05-02 21:36:30.551 /_ah/spi/com.example.app.endpoints.RegionIconsEndpoint.getRegionIcons 200 48ms 0kb Google-HTTP-Java-Client/1.16.0-rc (gzip) module=default version=1
70.80.59.221 - - [02/May/2014:18:36:30 -0700] "POST /_ah/spi/com.example.app.endpoints.RegionIconsEndpoint.getRegionIcons HTTP/1.1" 200 149 - "Google-HTTP-Java-Client/1.16.0-rc (gzip)" "my-app.appspot.com" ms=49 cpu_ms=41 cpm_usd=0.000017 app_engine_release=1.9.4 instance=006c1b117c1b2d35341e0f407ae5785a825b65e5
The remaining times, I get a 404 Not Found response and the AE logs have this:
2014-05-02 21:36:30.852 /_ah/spi/BackendService.logMessages 204 16ms 0kb module=default version=1
10.1.0.41 - - [02/May/2014:18:36:30 -0700] "POST /_ah/spi/BackendService.logMessages HTTP/1.1" 204 0 - - "my-app.appspot.com" ms=16 cpu_ms=0 app_engine_release=1.9.4 instance=006c1b117c1b2d35341e0f407ae5785a825b65e5
E 2014-05-02 21:36:30.851
Request URL: https://my-app.appspot.com/_ah/api/client/v1/regionIcons?id=foo
Method: client.getRegionIcons
Error Code: 404
Reason: notFound
Message: service 'com.example.app.endpoints.RegionRegionIconsEndpoint' not found
2014-05-02 21:36:30.802 /_ah/spi/com.example.app.endpoints.RegionRegionIconsEndpoint.getRegionIcons 404 16ms 0kb Google-HTTP-Java-Client/1.16.0-rc (gzip) module=default version=1
70.80.59.221 - - [02/May/2014:18:36:30 -0700] "POST /_ah/spi/com.example.app.endpoints.RegionRegionIconsEndpoint.getRegionIcons HTTP/1.1" 404 166 - "Google-HTTP-Java-Client/1.16.0-rc (gzip)" "my-app.appspot.com" ms=16 cpu_ms=0 cpm_usd=0.000019 app_engine_release=1.9.4 instance=006c1b117c1b2d35341e0f407ae5785a825b65e5
You can see on the Message line that, sometimes, AE is still trying to process the call using the old class name with the duplicated word! I've done searches over my entire code-base and the generated files and I cannot find the string "RegionRegion" anywhere. I've checked the web.xml file a dozen times and it has only the new "RegionIconsEndpoint" class name.
Wondering if somehow Google's servers were keeping old information around, I deployed the new version of my app as 2-dot-my-app.appspot.com. The behavior remains exactly the same except that there are no AE log messages for the requests that fail with 404 on this version. Successful request logs are as before.
Both my Android and iPad apps are experiencing this. In addition, I've managed to reproduce it using the web and Google's API explorer on my-app.appspot.com. In this last case, a successful request shows this:
200 OK
cache-control: no-cache, no-store, max-age=0, must-revalidate
content-encoding: gzip
content-length: 171
content-type: application/json; charset=UTF-8
date: Sat, 03 May 2014 03:07:05 GMT
etag: "G170GGjYGsLnxTffzUEJmTttHzU/LUWzmydK3mjH7IeRbEc_n9J6cDQ"
expires: Fri, 01 Jan 1990 00:00:00 GMT
pragma: no-cache
server: GSE
{
"iconsVid": "foo",
"iconsVersion": 3,
"kind": "client#resourcesItem",
"etag": "\"G170GGjYGsLnxTffzUEJmTttHzU/LUWzmydK3mjH7IeRbEc_n9J6cDQ\""
}
and a failed request shows this:
404 Not Found
cache-control: private, max-age=0
content-encoding: gzip
content-length: 169
content-type: application/json; charset=UTF-8
date: Sat, 03 May 2014 03:08:34 GMT
expires: Sat, 03 May 2014 03:08:34 GMT
server: GSE
{
"error": {
"errors": [
{
"domain": "global",
"reason": "notFound",
"message": "service 'com.example.app.endpoints.RegionRegionIconsEndpoint' not found"
}
],
"code": 404,
"message": "service 'com.example.app.endpoints.RegionRegionIconsEndpoint' not found"
}
}
again clearly showing an access to the old class name. When trying to do the same to the v2 version that I deployed (2-dot-my-app.appspot.com), it's different. A success request ends like this:
200 OK
cache-control: no-cache, no-store, max-age=0, must-revalidate
content-encoding: gzip
content-length: 171
content-type: application/json; charset=UTF-8
date: Sat, 03 May 2014 03:12:08 GMT
etag: "EP5CWx59se1v4KdDnkfEx7cTkis/LUWzmydK3mjH7IeRbEc_n9J6cDQ"
expires: Fri, 01 Jan 1990 00:00:00 GMT
pragma: no-cache
server: GSE
{
"iconsVid": "foo",
"iconsVersion": 3,
"kind": "client#resourcesItem",
"etag": "\"EP5CWx59se1v4KdDnkfEx7cTkis/LUWzmydK3mjH7IeRbEc_n9J6cDQ\""
}
and a failed request ends like this:
404 Not Found
cache-control: no-cache, no-store, max-age=0, must-revalidate
content-encoding: gzip
content-length: 29
content-type: text/html; charset=UTF-8
date: Sat, 03 May 2014 03:06:10 GMT
expires: Fri, 01 Jan 1990 00:00:00 GMT
pragma: no-cache
server: GSE
Not Found
I don't know what else to try. To me, it looks like a bug in App-Engine.
So... any ideas what is going on here and how to fix or work around it?
2014-05-04: I tried changing the method from POST to GET: exact same behavior. I tried changing the path from regionIcons to regionIconsFoo: exact same behavior. I tried changing the #API version from v1 to v2: exact same behavior.
Finally, I tried changing the name of the class back to the previous (with the duplicated word): I get fewer failures (maybe 5% instead of 20%) but they still occur with the failing requests trying to access the now non-existent class name without the duplicated word.
Restoring the correct name resumes the originally described behavior with the original failure rate.
I've been struggling with similar problem. Check logs on appengine.com project site. You should see log about updating, if it has additional info about error - check it.
Sometimes AE works on local machine well but deployment process reveals some bugs.
Edit:
1. Rename the class back to old "double" name, upload it to AE and check if all requests are working without a bug, if yes, rename the class again. (if it's appengine bug it should fix it).
2. Create as simple as possible api and substitute it with your project. Update it to AE and check with api explorer is everything ok, without methods from your main project. If it's ok, once again swap "test" project with your true one and upload to AE.
This isn't an answer because it doesn't address the cause, but it is my solution.
I duplicated the working class back into the old class name.
public class RegionRegionIconsEndpoint {
#ApiMethod(name = "getRegionIconsOld", path="regionIconsOld", httpMethod = HttpMethod.POST)
public RegionInfoVersion.RegionIcons getRegionIcons(User user, #Named("id") String id)
throws OAuthRequestException {
...
}
}
Now in the log, even though I'm only ever calling getRegionIcons, I see indications that both classes are being called with the "old" version handling about 20% of the requests. It's a hack and I don't like it, but it works and the clients are happy with it.
If you can't beat 'em, join 'em.

Apache not obeying If-Modified-Since

I'm downloading a JAR file, and would like to utilize If-Modified-Since so I don't get the whole file if I don't need it, but for some reason my vanilla Apache (afaik) isn't returning the 304 correctly.
This is from wireshark:
GET /whatever.jar HTTP/1.1
If-Modified-Since: Sat, 04 Jan 2014 21:46:26 GMT
User-Agent: Jakarta Commons-HttpClient/3.1
Host: example.com
HTTP/1.1 200 OK
Date: Sat, 04 Jan 2014 20:32:31 GMT
Server: Apache/2.2.4 (Unix) mod_ssl/2.2.4 OpenSSL/0.9.8e DAV/2 mod_jk/1.2.26 PHP/5.3.6 SVN/1.4.4
Last-Modified: Sat, 04 Jan 2014 19:13:14 GMT
ETag: "b6c037-1ddad9f-d17a6680"
Accept-Ranges: bytes
Content-Length: 31305119
Vary: User-Agent
Content-Type: text/plain
... [bunch of bytes] ...
There aren't other headers I need to specify, is there? Am I missing a module that Apache needs in order to read this header correctly?
Any other thoughts or suggestions?
Here is my Java code, for reference:
File jarFile = new File(filePath);
GetMethod get = new GetMethod(downloadUrl);
Date lastModified = new Date(jarFile.lastModified());
get.setRequestHeader("If-Modified-Since", DateUtil.formatDate(lastModified));
HttpClient client = new HttpClient();
int code = client.executeMethod(get);
UPDATE: Solution
The If-Modified-Date needed to exactly match the server, and I achieved this by explicitly setting the lastModifiedDate on the downloaded file:
String serverModified = get.getResponseHeader("Last-Modified").getValue();
jarFile.setLastModified(DateUtil.parseDate(serverModified).getTime());
After doing this, subsequent calls would not download the file.
In order to use the "If-Modified-Since" header, you must send an identical header value as the "Last-Modified" header, that is Sat, 04 Jan 2014 19:13:14 GMT != Sat, 04 Jan 2014 21:46:26 GMT. Apache cannot guarantee the file wasn't modified and given a past time on purpose (perhaps through a version control roll-back).
If you want, you may check the "Last-Modified" header on the client side, by using a HeadMethod first to avoid "getting" the resource if it hasn't been modified. Then you would use a "GetMethod" if it has been modified.
See RFC2616 - Section 9, "HTTP/1.1: Method Definitions" for more.

Generating HttpResponse

When creating the HTTP Response manually, how can one get Server and ETag
* HTTP/1.1 200 OK
* Date: Mon, 23 Apr 2012 23:44:52 GMT
* Server: Apache/2.2.3 (Red Hat) <-----
* Last-Modified: Fri, 16 Sep 2005 18:08:50 GMT
* ETag: "421142-2f-400e77c517080" <-----
* Accept-Ranges: bytes
* Content-Length: 47
* Content-Type: text/plain
* Connection: close
"Server" is whatever your HTTP server wants to name/identify itself. I.e. "Zumgto Surver 4.5".
"ETag" identifies "version" of particular item, so as long as your server can reasonable say "this ETag corresponds to current version" you can send pretty much anything. I.e. "v3345", or hash of the item... Totally optional if you don't support "If-None-Match" header in requests.
Neither is required. You can make up your own sever tag using the same format above. Omit the eTag or just generate your own. You could use the current timestamp or a constant. The following formats should work.
Server: Program/version (O/S)
ETag: "Timestamp"

Categories

Resources