Extract multiple JSON strings from multipart HTTP response - java

I'm using Apache HttpClient to work with a web service that returns a multipart/form-data response which contains json.
I'm having a very hard time extracting each JSON string separately so I can read the json string.
I did read similar posts on Stackoverflow, and some suggested using Apache commons fileupload, but I am not sure how that can separate the JSON strings from the whole response that has a bunch of other text such as the boundary string, content type, etc
The response looks something like below.
--Boundary_16003419_2104021487_1483424496169
Content-Type: application/json
Content-RequestId: GetMailboxes
Status-Code: 200
X-Server-Response-Time: 4ms
X-Server-Chain: domain.com
Content-RequestDuration: 5
{JSON}
--Boundary_16003419_2104021487_1483424496169
Content-Type: application/json
Content-RequestId: GetFolders
Status-Code: 200
X-Server-Response-Time: 8ms
X-Server-Chain: domain.com
Content-RequestDuration: 10
{JSON}
--Boundary_16003419_2104021487_1483424496169
Content-Type: application/json
Content-RequestId: GetAlerts
Status-Code: 200
X-Server-Response-Time: 10ms
X-Server-Chain: domain.com
Content-RequestDuration: 12
{JSON}
--Boundary_16003419_2104021487_1483424496169
Content-Type: application/json
Content-RequestId: GetAccounts
Status-Code: 200
X-Server-Response-Time: 11ms
X-Server-Chain: domain.com
Content-RequestDuration: 12
{JSON}
--Boundary_16003419_2104021487_1483424496169
Content-Type: application/json
Content-RequestId: GetAllSavedSearches
Status-Code: 200
X-Server-Response-Time: 10ms
X-Server-Chain: domain.com
Content-RequestDuration: 12
{JSON}
--Boundary_16003419_2104021487_1483424496169
Content-Type: application/json
Content-RequestId: GetAthenaSegment
Status-Code: 200
X-Server-Response-Time: 14ms
Content-RequestDuration: 21
{JSON}
--Boundary_16003419_2104021487_1483424496169
Content-Type: application/json
Content-RequestId: ListFolderThreads
Status-Code: 200
X-Server-Response-Time: 110ms
Content-RequestDuration: 116
{JSON}
--Boundary_16003419_2104021487_1483424496169
Content-Type: application/json
Content-RequestId: GetUserInfo
Status-Code: 200
X-Server-Response-Time: 197ms
Content-RequestDuration: 204
{JSON}
--Boundary_16003419_2104021487_1483424496169
Content-RequestId: Status
Content-Type: application/json
{JSON}
--Boundary_16003419_2104021487_1483424496169--
Any way to do this reliably?

One option would be Apache Mime4j. You would likely want to use MimeTokenStream or MimeStreamParser as described here if you want your application to handle response content without building a complete DOM tree in memory.

Related

How to send mutipart/mixed;boudary=batch using rest assured in java

I have searched to send the data as a batch to rest end points. But didn't got a correct solution. Could you please help me on this.
PFB the sample body of the data.
--batch
Content-Type: multipart/mixed; boundary=changeset
--changeset
Content-Type: application/http
Content-Transfer-Encoding: binary
POST CorporateAccountCollection HTTP/1.1
Content-Type: application/json
Content-ID: 2
Content-Length: 10000
{
"Name":"Testing Batch Operation MB",
"RoleCode":"XXXXX",
"CountryCode":"US",
"CorporateAccountIdentification" : [
{
"IDTypeCode":"XXXXXX",
"IDNumber":"9999999999"
}
]
}
--changeset--
--batch--

How to make an HTTP request in Java and ignore the body

To be more specific, I mean specifically to just consume the HTTP headers over the network and stop the communication before the client receives the response body.
Example
Client makes a request
GET / HTTP/1.1
Host: example.com
Connection: Keep-Alive
User-Agent: Apache-HttpClient/UNAVAILABLE (Java/1.8.0_262)
Accept-Encoding: gzip,deflate
Then the response over the network is just
HTTP/1.1 200 OK
Date: Wed, 23 Sep 2020 22:41:21 GMT
Server: Apache
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Cache-Control: public, max-age=10800
Content-Language: en
Vary: Cookie,Accept-Encoding
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
Age: 1865
grace: none
Content-Length: 9213
Connection: keep-alive
Accept-Ranges: bytes
Http protocol has six method, one of the methods is 'HEAD'. You can try use HEAD method instead of GET method.
And another stupid way : declare a web interface, and return null string.Like this:
// a web interface
String result = "";
return result;

Apache HttpClient is not showing Content-Length and Content-Encoding headers of the response

I installed Apache httpcomponents-client-5.0.x and while reviewing the headers of the http response, I was shocked it doesn't show the Content-Length and Content-Encoding headers, this is the code I used for testing
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.CloseableHttpResponse;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import com.sun.net.httpserver.Headers;
CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet request = new HttpGet(new URI("https://www.example.com"));
CloseableHttpResponse response = httpclient.execute(request);
Header[] responseHeaders = response.getHeaders();
for(Header header: responseHeaders) {
System.out.println(header.getName());
}
// this prints all the headers except
// status code header
// Content-Length
// Content-Encoding
No matter what I try I get the same result, like this
Iterator<Header> headersItr = response.headerIterator();
while(headersItr.hasNext()) {
Header header = headersItr.next();
System.out.println(header.getName());
}
Or this
HttpEntity entity = response.getEntity();
System.out.println(entity.getContentEncoding()); // NULL
System.out.println(entity.getContentLength()); // -1
According to this question that has been asked 6 years ago, it seems like an old issue even with older versions of Apache HttpClient.
Of-course the server is actually returning those headers as confirmed by Wireshark, and Apache HttpClient logs itself
2020-04-03 07:59:09,106 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << HTTP/1.1 200 OK
2020-04-03 07:59:09,106 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Content-Encoding: gzip
2020-04-03 07:59:09,106 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Accept-Ranges: bytes
2020-04-03 07:59:09,107 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Age: 451956
2020-04-03 07:59:09,107 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Cache-Control: max-age=604800
2020-04-03 07:59:09,107 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Content-Type: text/html; charset=UTF-8
2020-04-03 07:59:09,107 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Date: Fri, 03 Apr 2020 05:59:09 GMT
2020-04-03 07:59:09,108 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Etag: "3147526947+gzip"
2020-04-03 07:59:09,108 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Expires: Fri, 10 Apr 2020 05:59:09 GMT
2020-04-03 07:59:09,108 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
2020-04-03 07:59:09,108 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Server: ECS (dcb/7EEB)
2020-04-03 07:59:09,108 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Vary: Accept-Encoding
2020-04-03 07:59:09,109 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << X-Cache: HIT
2020-04-03 07:59:09,109 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Content-Length: 648
BTW, java.net.http library known as JDK HttpClient works great and show all the headers.
Is there something wrong I did, or should I report a bug that been there for years ?
HttpComponents committer here...
You did not closely pay attention what Dave G said. By default, HttpClientBuilder will enable transparent decompression and the reason why you don't see some headers anymore is here:
if (decoderFactory != null) {
response.setEntity(new DecompressingEntity(response.getEntity(), decoderFactory));
response.removeHeaders(HttpHeaders.CONTENT_LENGTH);
response.removeHeaders(HttpHeaders.CONTENT_ENCODING);
response.removeHeaders(HttpHeaders.CONTENT_MD5);
} ...
Regarding the JDK HttpClient, it will not perform any transparent decompression, therefore you see the length of the compressed stream. You have to decompress on your own.
curl committer here...
I have raised an issue too.
Update: 03 Feb. '23 The secret codez to disable automatic decompression are:
CloseableHttpClient httpclient = HttpClients.createSimple();
// OR
CloseableHttpClient httpclient = HttpClients.custom().disableContentCompression().build();
The content-length may be potentially ignored in this case.
HttpGet request = new HttpGet(new URI("https://www.example.com"));
request.setHeader("Accept-Encoding", "identity");
CloseableHttpResponse response = httpclient.execute(request);
I can see the following
HttpEntity entity = response.getEntity();
System.out.println(entity.getContentLength());
System.out.println(entity.getContentEncoding());
Output
...
2020-04-03 03:04:17.760 DEBUG 34196 --- [ main] org.apache.hc.client5.http.headers : http-outgoing-0 << Content-Length: 1256
...
1256
null
I'd like to direct your attention to this header being sent:
http-outgoing-0 >> Accept-Encoding: gzip, x-gzip, deflate
That tells the server that this client can accept gzip, x-gzip, and deflate content in response. The response is stating it is 'gzip' encoded.
http-outgoing-0 << Content-Encoding: gzip
I believe that HttpClient is transparently handling this internally and making the content available.
As stated in the other article you referenced, one of the answers indicated that the method EntityUtils.toByteArray(httpResponse.getEntity()).length could be applied to get the content length.

okhttp content-length is -1 with big files

I am downloading a file with okhttp and things work fine - now I want to show the progress and hit a road-bump. The returned content-length is -1.
It comes back correctly from the server:
⋊> ~ curl -i http://ipfs.io/ipfs/QmRMHb4Vhv8LtYqw8RkDgkdZYxJHfrfFeQaHbNUqJYmdF2 13:38:11
HTTP/1.1 200 OK
Date: Tue, 14 Jun 2016 11:38:16 GMT
Content-Type: application/octet-stream
Content-Length: 27865948
I traced the problem down to OkHeaders.java here:
public static long contentLength(Headers headers) {
return stringToLong(headers.get("Content-Length"));
}
I see all the other headers here in headers - but not Content-Length - so headers.get("Content-Length") returns null. Anyone has a clue how this can get lost?
Interestingly if I change the url to "http://google.com" I get a content-length from okhttp - but with curl both look same Content-Length wise - this really confuses me
Update: it seems to correlate with he size of the file. If I use smaller content from the same server I get a Content-Length with okhttp. The problem only happens when the file is big
It looks like above a certain size the server uses chunked encoding and you won't get a content length.
HTTP/1.1 200 OK
Date: Tue, 14 Jun 2016 14:30:07 GMT
Content-Type: application/octet-stream
Transfer-Encoding: chunked

App-Engine directing endpoint to previous class name, now 404 not found

I've been using App-Engine as the backend for an Android and iOS application. It's been working without problem with both the local development server (over http) and actual app-engine (over https).
Then I noticed that, while renaming endpoints, I accidentally duplicated a word in the class name of an endpoint: RegionRegionIconsEndpoint instead of simply RegionIconsEndpoint. It was a 1-line fix.
public class RegionRegionIconsEndpoint {
#ApiMethod(name = "getRegionIcons", path="regionIcons", httpMethod = HttpMethod.POST)
public RegionInfoVersion.RegionIcons getRegionIcons(User user, #Named("id") String id)
throws OAuthRequestException {
...
}
}
became
public class RegionIconsEndpoint {
#ApiMethod(name = "getRegionIcons", path="regionIcons", httpMethod = HttpMethod.POST)
public RegionInfoVersion.RegionIcons getRegionIcons(User user, #Named("id") String id)
throws OAuthRequestException {
...
}
}
I generated new cloud-endpoint libraries and continued development using the local development server. All good.
When I deployed it to the real App-Engine service, however, a problem arose. When my app starts, there are a series of calls to other endpoints defined just as the one shown above; these always work fine. Then there are calls to this endpoint. A typical call looks like this:
POST https://my-app.appspot.com/_ah/api/client/v1/regionIcons?id=foo
Authorization is also provided and the expected result comes back most of the time... say 80%. The AE logs look like this:
2014-05-02 21:36:30.551 /_ah/spi/com.example.app.endpoints.RegionIconsEndpoint.getRegionIcons 200 48ms 0kb Google-HTTP-Java-Client/1.16.0-rc (gzip) module=default version=1
70.80.59.221 - - [02/May/2014:18:36:30 -0700] "POST /_ah/spi/com.example.app.endpoints.RegionIconsEndpoint.getRegionIcons HTTP/1.1" 200 149 - "Google-HTTP-Java-Client/1.16.0-rc (gzip)" "my-app.appspot.com" ms=49 cpu_ms=41 cpm_usd=0.000017 app_engine_release=1.9.4 instance=006c1b117c1b2d35341e0f407ae5785a825b65e5
The remaining times, I get a 404 Not Found response and the AE logs have this:
2014-05-02 21:36:30.852 /_ah/spi/BackendService.logMessages 204 16ms 0kb module=default version=1
10.1.0.41 - - [02/May/2014:18:36:30 -0700] "POST /_ah/spi/BackendService.logMessages HTTP/1.1" 204 0 - - "my-app.appspot.com" ms=16 cpu_ms=0 app_engine_release=1.9.4 instance=006c1b117c1b2d35341e0f407ae5785a825b65e5
E 2014-05-02 21:36:30.851
Request URL: https://my-app.appspot.com/_ah/api/client/v1/regionIcons?id=foo
Method: client.getRegionIcons
Error Code: 404
Reason: notFound
Message: service 'com.example.app.endpoints.RegionRegionIconsEndpoint' not found
2014-05-02 21:36:30.802 /_ah/spi/com.example.app.endpoints.RegionRegionIconsEndpoint.getRegionIcons 404 16ms 0kb Google-HTTP-Java-Client/1.16.0-rc (gzip) module=default version=1
70.80.59.221 - - [02/May/2014:18:36:30 -0700] "POST /_ah/spi/com.example.app.endpoints.RegionRegionIconsEndpoint.getRegionIcons HTTP/1.1" 404 166 - "Google-HTTP-Java-Client/1.16.0-rc (gzip)" "my-app.appspot.com" ms=16 cpu_ms=0 cpm_usd=0.000019 app_engine_release=1.9.4 instance=006c1b117c1b2d35341e0f407ae5785a825b65e5
You can see on the Message line that, sometimes, AE is still trying to process the call using the old class name with the duplicated word! I've done searches over my entire code-base and the generated files and I cannot find the string "RegionRegion" anywhere. I've checked the web.xml file a dozen times and it has only the new "RegionIconsEndpoint" class name.
Wondering if somehow Google's servers were keeping old information around, I deployed the new version of my app as 2-dot-my-app.appspot.com. The behavior remains exactly the same except that there are no AE log messages for the requests that fail with 404 on this version. Successful request logs are as before.
Both my Android and iPad apps are experiencing this. In addition, I've managed to reproduce it using the web and Google's API explorer on my-app.appspot.com. In this last case, a successful request shows this:
200 OK
cache-control: no-cache, no-store, max-age=0, must-revalidate
content-encoding: gzip
content-length: 171
content-type: application/json; charset=UTF-8
date: Sat, 03 May 2014 03:07:05 GMT
etag: "G170GGjYGsLnxTffzUEJmTttHzU/LUWzmydK3mjH7IeRbEc_n9J6cDQ"
expires: Fri, 01 Jan 1990 00:00:00 GMT
pragma: no-cache
server: GSE
{
"iconsVid": "foo",
"iconsVersion": 3,
"kind": "client#resourcesItem",
"etag": "\"G170GGjYGsLnxTffzUEJmTttHzU/LUWzmydK3mjH7IeRbEc_n9J6cDQ\""
}
and a failed request shows this:
404 Not Found
cache-control: private, max-age=0
content-encoding: gzip
content-length: 169
content-type: application/json; charset=UTF-8
date: Sat, 03 May 2014 03:08:34 GMT
expires: Sat, 03 May 2014 03:08:34 GMT
server: GSE
{
"error": {
"errors": [
{
"domain": "global",
"reason": "notFound",
"message": "service 'com.example.app.endpoints.RegionRegionIconsEndpoint' not found"
}
],
"code": 404,
"message": "service 'com.example.app.endpoints.RegionRegionIconsEndpoint' not found"
}
}
again clearly showing an access to the old class name. When trying to do the same to the v2 version that I deployed (2-dot-my-app.appspot.com), it's different. A success request ends like this:
200 OK
cache-control: no-cache, no-store, max-age=0, must-revalidate
content-encoding: gzip
content-length: 171
content-type: application/json; charset=UTF-8
date: Sat, 03 May 2014 03:12:08 GMT
etag: "EP5CWx59se1v4KdDnkfEx7cTkis/LUWzmydK3mjH7IeRbEc_n9J6cDQ"
expires: Fri, 01 Jan 1990 00:00:00 GMT
pragma: no-cache
server: GSE
{
"iconsVid": "foo",
"iconsVersion": 3,
"kind": "client#resourcesItem",
"etag": "\"EP5CWx59se1v4KdDnkfEx7cTkis/LUWzmydK3mjH7IeRbEc_n9J6cDQ\""
}
and a failed request ends like this:
404 Not Found
cache-control: no-cache, no-store, max-age=0, must-revalidate
content-encoding: gzip
content-length: 29
content-type: text/html; charset=UTF-8
date: Sat, 03 May 2014 03:06:10 GMT
expires: Fri, 01 Jan 1990 00:00:00 GMT
pragma: no-cache
server: GSE
Not Found
I don't know what else to try. To me, it looks like a bug in App-Engine.
So... any ideas what is going on here and how to fix or work around it?
2014-05-04: I tried changing the method from POST to GET: exact same behavior. I tried changing the path from regionIcons to regionIconsFoo: exact same behavior. I tried changing the #API version from v1 to v2: exact same behavior.
Finally, I tried changing the name of the class back to the previous (with the duplicated word): I get fewer failures (maybe 5% instead of 20%) but they still occur with the failing requests trying to access the now non-existent class name without the duplicated word.
Restoring the correct name resumes the originally described behavior with the original failure rate.
I've been struggling with similar problem. Check logs on appengine.com project site. You should see log about updating, if it has additional info about error - check it.
Sometimes AE works on local machine well but deployment process reveals some bugs.
Edit:
1. Rename the class back to old "double" name, upload it to AE and check if all requests are working without a bug, if yes, rename the class again. (if it's appengine bug it should fix it).
2. Create as simple as possible api and substitute it with your project. Update it to AE and check with api explorer is everything ok, without methods from your main project. If it's ok, once again swap "test" project with your true one and upload to AE.
This isn't an answer because it doesn't address the cause, but it is my solution.
I duplicated the working class back into the old class name.
public class RegionRegionIconsEndpoint {
#ApiMethod(name = "getRegionIconsOld", path="regionIconsOld", httpMethod = HttpMethod.POST)
public RegionInfoVersion.RegionIcons getRegionIcons(User user, #Named("id") String id)
throws OAuthRequestException {
...
}
}
Now in the log, even though I'm only ever calling getRegionIcons, I see indications that both classes are being called with the "old" version handling about 20% of the requests. It's a hack and I don't like it, but it works and the clients are happy with it.
If you can't beat 'em, join 'em.

Categories

Resources