I have a Java application which uses Spring's RestTemplate API to write concise, readable consumers of JSON REST services:
In essence:
RestTemplate rest = new RestTemplate(clientHttpRequestFactory);
ResponseEntity<ItemList> response = rest.exchange(url,
HttpMethod.GET,
requestEntity,
ItemList.class);
for(Item item : response.getBody().getItems()) {
handler.onItem(item);
}
The JSON response contains a list of items, and as you can see, I have have an event-driven design in my own code to handle each item in turn. However, the entire list is in memory as part of response, which RestTemplate.exchange() produces.
I would like the application to be able to handle responses containing large numbers of items - say 50,000, and in this case there are two issues with the implementation as it stands:
Not a single item is handled until the entire HTTP response has been transferred - adding unwanted latency.
The huge response object sits in memory and can't be GC'd until the last item has been handled.
Is there a reasonably mature Java JSON/REST client API out there that consumes responses in an event-driven manner?
I imagine it would let you do something like:
RestStreamer rest = new RestStreamer(clientHttpRequestFactory);
// Tell the RestStreamer "when, while parsing a response, you encounter a JSON
// element matching JSONPath "$.items[*]" pass it to "handler" for processing.
rest.onJsonPath("$.items[*]").handle(handler);
// Tell the RestStreamer to make an HTTP request, parse it as a stream.
// We expect "handler" to get passed an object each time the parser encounters
// an item.
rest.execute(url, HttpMethod.GET, requestEntity);
I appreciate I could roll my own implementation of this behaviour with streaming JSON APIs from Jackson, GSON etc. -- but I'd love to be told there was something out there that does it reliably with a concise, expressive API, integrated with the HTTP aspect.
A couple of months later; back to answer my own question.
I didn't find an expressive API to do what I want, but I was able to achieve the desired behaviour by getting the HTTP body as a stream, and consuming it with a Jackson JsonParser:
ClientHttpRequest request =
clientHttpRequestFactory.createRequest(uri, HttpMethod.GET);
ClientHttpResponse response = request.execute();
return handleJsonStream(response.getBody(), handler);
... with handleJsonStream designed to handle JSON that looks like this:
{ items: [
{ field: value; ... },
{ field: value, ... },
... thousands more ...
] }
... it validates the tokens leading up to the start of the array; it creates an Item object each time it encounters an array element, and gives it to the handler.
// important that the JsonFactory comes from an ObjectMapper, or it won't be
// able to do readValueAs()
static JsonFactory jsonFactory = new ObjectMapper().getFactory();
public static int handleJsonStream(InputStream stream, ItemHandler handler) throws IOException {
JsonParser parser = jsonFactory.createJsonParser(stream);
verify(parser.nextToken(), START_OBJECT, parser);
verify(parser.nextToken(), FIELD_NAME, parser);
verify(parser.getCurrentName(), "items", parser);
verify(parser.nextToken(), START_ARRAY, parser);
int count = 0;
while(parser.nextToken() != END_ARRAY) {
verify(parser.getCurrentToken(), START_OBJECT, parser);
Item item = parser.readValueAs(Item.class);
handler.onItem(item);
count++;
}
parser.close(); // hope it's OK to ignore remaining closing tokens.
return count;
}
verify() is just a private static method which throws an exception if the first two arguments aren't equal.
The key thing about this method is that no matter how many items there are in the stream, this method only every has a reference to one Item.
you can try JsonSurfer which is designed to process json stream in event-driven style.
JsonSurfer surfer = JsonSurfer.jackson();
Builder builder = config();
builder.bind("$.items[*]", new JsonPathListener() {
#Override
public void onValue(Object value, ParsingContext context) throws Exception {
// handle the value
}
});
surfer.surf(new InputStreamReader(response.getBody()), builder.build());
Is there no way to break up the request? It sounds like you should use paging. Make it so that you can request the first 100 results, the next 100 results, so on. The request should take a starting index and a count number. That's very common behavior for REST services and it sounds like the solution to your problem.
The whole point of REST is that it is stateless, it sounds like you're trying to make it stateful. That's anathema to REST, so you're not going to find any libraries written that way.
The transactional nature of REST is very much intentional by design and so you won't get around that easily. You'll fighting against the grain if you try.
From what I've seen, wrapping frameworks (like you are using) make things easy by deserializing the response into an object. In your case, a collection of objects.
However, to use things in a streaming fashion, you may need to get at the underlying HTTP response stream. I am most familiar with Jersey, which exposes https://jersey.java.net/nonav/apidocs/1.5/jersey/com/sun/jersey/api/client/ClientResponse.html#getEntityInputStream()
It would be used by invoking
Client client = Client.create();
WebResource webResource = client.resource("http://...");
ClientResponse response = webResource.accept("application/json")
.get(ClientResponse.class);
InputStream is = response.getEntityInputStream();
This provides you with the stream of data coming in. The next step is to write the streaming part. Given that you are using JSON, there are options at various levels, including http://wiki.fasterxml.com/JacksonStreamingApi or http://argo.sourceforge.net/documentation.html. They can consume the InputStream.
These don't really make good use of the full deserialization that can be done, but you could use them to parse out an element of a json array, and pass that item to a typical JSON object mapper, (like Jackson, GSON, etc). This becomes the event handling logic. You could spawn new threads for this, or do whatever your use case needs.
I won't claim to know all the rest frameworks out there (or even half) but I'm going to go with the answer
Probably Not
As noted by others this is not the way REST normally thinks of it's interactions. REST is a great Hammer but if you need streaming, you are (IMHO) in screwdriver territory, and the hammer might still be made to work, but it is likely to make a mess. One can argue that it is or is not consistent with REST all day long, but in the end I'd be very surprised to find a framework that implemented this feature. I'd be even more surprised if the feature is mature (even if the framework is) because with respect to REST your use case is an uncommon corner case at best.
If someone does come up with one I'll be happy to stand corrected and learn something new though :)
Perhaps it would be best to be thinking in terms of comet or websockets for this particular operation. This question may be helpful since you already have spring. (websockets are not really viable if you need to support IE < 10, which most commercial apps still require... sadly, I've got one client with a key customer still on IE 7 in my personal work)
You may consider Restlet.
http://restlet.org/discover/features
Supports asynchronous request processing, decoupled from IO operations. Unlike the Servlet API, the Restlet applications don't have a direct control on the outputstream, they only provide output representation to be written by the server connector.
The best way to achieve this is to use another streaming Runtime for JVM that allows reading response off websockets and i am aware of one called atmostphere
This way your large dataset is both sent and received in chunks on both side and read in the same manner in realtime withou waiting for the whole response.
This has a good POC on this:
http://keaplogik.blogspot.in/2012/05/atmosphere-websockets-comet-with-spring.html
Server:
#RequestMapping(value="/twitter/concurrency")
#ResponseBody
public void twitterAsync(AtmosphereResource atmosphereResource){
final ObjectMapper mapper = new ObjectMapper();
this.suspend(atmosphereResource);
final Broadcaster bc = atmosphereResource.getBroadcaster();
logger.info("Atmo Resource Size: " + bc.getAtmosphereResources().size());
bc.scheduleFixedBroadcast(new Callable<String>() {
//#Override
public String call() throws Exception {
//Auth using keaplogik application springMVC-atmosphere-comet-webso key
final TwitterTemplate twitterTemplate =
new TwitterTemplate("WnLeyhTMjysXbNUd7DLcg",
"BhtMjwcDi8noxMc6zWSTtzPqq8AFV170fn9ivNGrc",
"537308114-5ByNH4nsTqejcg5b2HNeyuBb3khaQLeNnKDgl8",
"7aRrt3MUrnARVvypaSn3ZOKbRhJ5SiFoneahEp2SE");
final SearchParameters parameters = new SearchParameters("world").count(5).sinceId(sinceId).maxId(0);
final SearchResults results = twitterTemplate.searchOperations().search(parameters);
sinceId = results.getSearchMetadata().getMax_id();
List<TwitterMessage> twitterMessages = new ArrayList<TwitterMessage>();
for (Tweet tweet : results.getTweets()) {
twitterMessages.add(new TwitterMessage(tweet.getId(),
tweet.getCreatedAt(),
tweet.getText(),
tweet.getFromUser(),
tweet.getProfileImageUrl()));
}
return mapper.writeValueAsString(twitterMessages);
}
}, 10, TimeUnit.SECONDS);
}
Client:
Atmosphere has it's own javascript file to handle the different Comet/Websocket transport types and requests. By using this, you can set the Spring URL Controller method endpoint to the request. Once subscribed to the controller, you will receive dispatches, which can be handled by adding a request.onMessage method. Here is an example request with transport of websockets.
var request = new $.atmosphere.AtmosphereRequest();
request.transport = 'websocket';
request.url = "<c:url value='/twitter/concurrency'/>";
request.contentType = "application/json";
request.fallbackTransport = 'streaming';
request.onMessage = function(response){
buildTemplate(response);
};
var subSocket = socket.subscribe(request);
function buildTemplate(response){
if(response.state = "messageReceived"){
var data = response.responseBody;
if (data) {
try {
var result = $.parseJSON(data);
$( "#template" ).tmpl( result ).hide().prependTo( "#twitterMessages").fadeIn();
} catch (error) {
console.log("An error ocurred: " + error);
}
} else {
console.log("response.responseBody is null - ignoring.");
}
}
}
It has support on all major browsers and native mobile clients Apple being pioneers of this technology:
As mentioned here excellent support for deployment environments on web and enterprise JEE containers:
http://jfarcand.wordpress.com/2012/04/19/websockets-or-comet-or-both-whats-supported-in-the-java-ee-land/
Related
When using a RestTemplate to talk to an external service, I've seen more than once OutOfMemory errors on our application because the service streams gigs of data (due to a bad implementation on their side, in case of errors they they were sending back big stacktraces in each element of the array, which usually contains few thousand). It was ending in about 6gb of data, serialized by jackson in our app and totally exploding the Xmx of the jvm.
I've looked around but there don't seem to be any way to protect against this kind of even, i.e. aborting the request when the streamed response exceed a given size.
Is there a solution to this? We are using apache's httpcomponents httpclient 4.5.5, but any other underlying implementation would be acceptable.
Besides RestTemplates, a solution for Spring's reactive WebClient would also be welcome.
This has to be enforced at the underlying HTTP client library (spring supports different ones like JDK client, apache client, okHTTP..)
Here you talking about apache-httpcomponent , did you check this HttpEntity.getContent() ?
It actually returns an InputStream that you can read by yourself and determine when the size has been exceeded..
https://hc.apache.org/httpcomponents-core-4.4.x/httpcore/apidocs/org/apache/http/HttpEntity.html
For the records, here is the final solution. The problematic was to load a list of objects that can be very big, using pagination (through elastic search scroll api).
ResponseExtractor<Car[]> responseExtractor = responseEntity -> {
long pageContentLengthInBytes = responseEntity.getHeaders().getContentLength();
long presumableFreeMemoryInBytes = this.getAvailableFreeMemoryAmount();
if (presumableFreeMemoryInBytes - TWENTY_MEGABYTES < pageContentLengthInBytes) {
log.error("Not enough memory to store the page ({} avaiable, content-length={}, trashing it", presumableFreeMemoryInBytes, pageContentLengthInBytes);
responseEntity.close();
return null;
}
return objectMapper.readValue(responseEntity.getBody(), Car[].class);
};
Car[] responseEntities = this.restTemplate.execute(uri, HttpMethod.GET, null, responseExtractor);
/**
* Returns the current amount of memory which may be allocated until an out-of-memory error occurs.
* see https://stackoverflow.com/a/12807848/8836232
*/
private long getAvailableFreeMemoryAmount() {
long allocatedMemory = (Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory());
return Runtime.getRuntime().maxMemory() - allocatedMemory;
}
So here's the situation: I'm implementing the caching of our webapp using vertx-redis (we were formerly using lettuce). Pretty simple mechanism, there is an anotation we use on endpoints which is responsible to invoke the redis-client (whatever implementation we are using) and, if there is cached info for the given key it should be used as response body and the request should be finished with no processing.
But there's this really annoying behavior with the vertx-redis implementation in which ending the request doesn't stop the processing. I make the request, get the quick response since there was cached info, but I can still see in the logs that the app keeps the processing going on, as if the request was still open. I believe that it's because I'm ending the response inside the handler for the Redis client call, like this:
client.get("key", onResponse -> {
if (onResponse.succeeded() && onResponse.result() != null) {
//ending request from here
}
});
I realize that I could maybe reproduce the behavior as it was before if I could do something like this:
String cachedInfo = client.get("key").map(onResponse -> onResponse.result());
// endResponse
But as we know, vertx-redis is a semantic API and every method returns the same instance of RedisClient. I also thought about doing something like this:
private String cachedInfo;
...
client.get("key", onResult -> {
if (onResponse.succeeded()) {
this.cachedInfo = onResponse.result();
}
});
if (cachedInfo != null) { // The value could be unset since the lambda is running in other thread
//end request
}
Really don't know what to do, is there a way to return the contents of the AsyncResult to a variable or maybe set it to a variable synchronously somehow? I've also been searching for ways to somehow stop the whole flow of the current request but couldn't find any satisfactory, non-aggressive solution so far, but I'm really open to this option either.
My channel pipeline contains several decoders, all of them operating on TextWebSocketFrame messages. Now my problem is, that I have to choose the right decoder base on some content of the message.
Essentially, I have to parse a certain field in the message and then decide if I want to proceed handling the message or pass the message to the next encoder/handler.
Most people suggest to use a single decoder to decode all messages in such a case, but my problem is that some decoders are added dynamically and it would be a mess to put all logic in a single decoder.
Currently the code looks like this:
#Override
protected void decode(ChannelHandlerContext ctx, TextWebSocketFrame msg, List<Object> out) throws Exception {
String messageAsJson = msg.text();
JsonObject jsonObject = JSON_PARSER.fromJson(messageAsJson, JsonObject.class);
JsonObject messageHeader = jsonObject.getAsJsonObject(MESSAGE_HEADER_FIELD);
String serviceAsString = messageHeader.get(MESSAGE_SERVICE_FIELD).getAsString();
String inboundTypeAsString = messageHeader.get(MESSAGE_TYPE_FIELD).getAsString();
Service service = JSON_PARSER.fromJson(serviceAsString, Service.class);
InboundType inboundType = JSON_PARSER.fromJson(inboundTypeAsString, InboundType.class);
if (service == Service.STREAMING) {
out.add(decodeQuotesMessage(inboundType, messageAsJson));
} else {
}
}
So basically I'd need some logic in the else branch to pass the message to the next handler in the pipeline.
I am aware, that this approach is not the most efficient one but the architecture of my service has a slow path (running on a different thread pool), including this logic and a fast path. I can accept some slow code at this place.
In general, you need something like this:
if (service == Service.STREAMING) {
ctx.pipeline().addLast(new StreamingHandler());
} else {
ctx.pipeline().addLast(new OtherHandler());
}
out.add(decodeQuotesMessage(inboundType, messageAsJson));
ctx.pipeline().remove(this);
Logic behind is next:
You decoded header and you know now what flow you need to follow;
You add specific handler to the pipeline according to your header;
You add decoded message to 'out' list and thus you say "send this decoded message to next handler in pipeline, that would be handler defined in step 2 in case current handler is last in pipeline";
You remove the current handler from the pipeline to avoid handlers duplication in case your protocol will send the same header again and again. However, this is step is specific to your protocol and may be not necessary.
This is just the general approach, however, it really depends on your protocol flow.
I'm looking for an example like this but with a synchronous call. My program needs data from external source and should wait until response returns (or until timeout).
The Play WS library is meant for asynchronous requests and this is good!
Using it ensures that your server is not going to be blocked and wait for some response (your client might be blocked but that is a different topic).
Whenever possible you should always opt for the async WS call. Keep in mind that you still get access to the result of the WS call:
public static Promise<Result> index() {
final Promise<Result> resultPromise = WS.url(feedUrl).get().map(
new Function<WS.Response, Result>() {
public Result apply(WS.Response response) {
return ok("Feed title:" + response.asJson().findPath("title"));
}
}
);
return resultPromise;
}
You just need to handle it a bit differently - you provide a mapping function - basically you are telling Play what to do with the result when it arrives. And then you move on and let Play take care of the rest. Nice, isn't it?
Now, if you really really really want to block, then you would have to use another library to make the synchronous request. There is a sync variant of the Apache HTTP Client - https://hc.apache.org/httpcomponents-client-ga/index.html
I also like the Unirest library (http://unirest.io/java.html) which actually sits on top of the Apache HTTP Client and provides a nicer and cleaner API - you can then do stuff like:
Unirest.post("http://httpbin.org/post")
.queryString("name", "Mark")
.field("last", "Polo")
.asJson()
As both are publically available you can put them as a dependency to your project - by stating this in the build.sbt file.
All you can do is just block the call wait until get response with timeout if you want.
WS.Response response = WS.url(url)
.setHeader("Authorization","BASIC base64str")
.setContentType("application/json")
.post(requestJsonNode)
.get(20000); //20 sec
JsonNode resNode = response.asJson();
In newer Versions of play, response does ot have an asJson() method anymore. Instead, Jackson (or any other json mapper) must be applied to the body String:
final WSResponse r = ...;
Json.mapper().readValue(r, Type.class)
I'm fairly new to Java (I'm using Java SE 7) and the JVM and trying to write an asynchronous controller using:
Tomcat 7
Spring MVC 4.1.1
Spring Servlet 3.0
I have a component that my controller is delegating some work to that has an asynchronous portion and returns a ListenableFuture. Ideally, I'd like to free up the thread that initially handles the controller response as I'm waiting for the async operation to return, hence the desire for an async controller.
I'm looking at returning a DeferredResponse -- it seems pretty easy to bridge this with ListenableFuture -- but I can't seem to find any resources that explain how the response is delivered back to the client once the DeferredResponse resolves.
Maybe I'm not fully grok'ing how an asynchronous controller is supposed to work, but could someone explain how the response gets returned to the client once the DeferredResponse resolves? There has to be some thread that picks up the job of sending the response, right?
I recently used Spring's DeferredResponse to excellent effect in a long-polling situation that I recently coded. Focusing on the 'how' of the response getting back to the user is, I believe, not the correct way to think about the object. Depending upon where it's used, it returns messages to the user in exactly the same way as a regular, synchronous call would only in a delayed, asynchronous manner. Again, the object does not define nor propose a delivery mechanism. Just a way to 'insert' an asynchronous response into existing channels.
Per your query, yes, it does so by creating a thread that has a timeout of the user's specification. If the code completes before the timeout, using 'setResult', the object returns the code's result. Otherwise, if the timeout fires before the result, the default, also set by the user, is returned. Either way, the object does not return anything (other than the object itself) until one of these mechanisms is called. Also, the object has to then be discarded as it cannot be reused.
In my case, I was using a HTTP request/response function that would wrap the returned response in a DeferredResponse object that would provide a default response - asking for another packet from client so the browser would not time out - if the computation the code was working on did not return before the timeout. Whenever the computation was complete, it would send the response via the 'setResult' function call. In this situation both cases would simply use the HTTP response to send a packet back to the user. However, in neither case would the response go back to the user immediately.
In practice the object worked flawlessly and allowed me to implement an effective long-polling mechanism.
Here is a snippet of the code in my example:
#RequestMapping(method = RequestMethod.POST, produces = "application/text")
#ResponseBody
// public DeferredResult<String> onMessage(#RequestBody String message, HttpSession session) {
public DeferredResult<String> onMessage(InputStream is, HttpSession session) {
String message = convertStreamToString(is);
// HttpSession session = null;
messageInfo info = getMessageInfo(message);
String state = info.getState();
String id = info.getCallID();
DeferredResult<String> futureMessage =
new DeferredResult<>(refreshIntervalSecs * msInSec, getRefreshJsonMessage(id));
if(state != null && id != null) {
if(state.equals("REFRESH")) {
// Cache response for future and "swallow" call as it is restocking call
LOG.info("Refresh received for call " + id);
synchronized (lock) {
boolean isReplaceable = callsMap.containsKey(id) && callsMap.get(id).isSetOrExpired();
if (isReplaceable)
callsMap.put(id, futureMessage);
else {
LOG.warning("Refresh packet arrived on a non-existent call");
futureMessage.setResult(getExitJsonMessage(id));
}
}
} else if (state.equals("NEW")){
// Store response for future and pass the call onto the processing logic
LOG.info("New long-poll call received with id " + id);
ClientSupport cs = clientSupportMap.get(session.getId());
if(cs == null) {
cs = new ClientSupport(this, session.getId());
clientSupportMap.put(session.getId(), cs);
}
callsMap.put(id, futureMessage);
// *** IMPORTANT ****
// This method sets up a separate thread to do work
cs.newCall(message);
}
} else {
LOG.warning("Invalid call information");
// Return value immediately when return is called
futureMessage.setResult("");
}
return futureMessage;
}