Parallel DB/WS Calls

Parallel DB/WS Calls - java

I am building a set of web services with the intent to aggregate similar data sets across multiple backends (through db calls and service calls). Some of the queries could take more than a couple of seconds to run, and if I stack these requests sequentially, there is a chance the total run time would be outside of the desired response time.
I am hoping to make the calls in parallel, collect all results and then aggregate. What is the best approach to tackling this?
The services will be deployed to Websphere 6.1 (so java 5, j2ee 1.4).
Any information would be greatly appreciated.

Look into the java.util.concurrent API. In particular, you can create a thread Executor, pass in Callables, and get back Futures, that will be run asynchronously.
Your code will look something like this:
ExecutorService exec = Executors.newCachedThreadPool();
Future<ReplyA> raFuture = exec.submit(new Callable<ReplyA>() {
public ReplyA call() {
// call remote service here.
return new ReplyA(...);
}});
Future<ReplyB> rbFuture = exec.submit(new Callable<ReplyB>() {
public ReplyB call() {
// call remote service here.
return new ReplyB(...);
}});
ReplyA replyA = raFuture.get();
ReplyB replyB = rbFuture.get();
exec.shutdown();
You can also use the timeout versions of get() so you can do something reasonable if the responses are taking too long. If you decide to take this path, you would probably be better served with the ExecutorService's invokeAll method, so the timeout will apply to all of the Callables as a group:
Callable<Reply> taskA = new Callable<ReplyA>() { ... };
Callable<Reply> taskB = new Callable<ReplyB>() { ... };
List<Callable<Reply>> tasks = Arrays.asList(taskA, taskB);
List<Future<Reply>> futures = exec.invokeAll(tasks, 20, TimeUnit.SECONDS);
for(Future<Reply> future: futures) {
if(replyFuture.isCancelled()) {
// deal with it
} else {
Reply reply = future.get();
// do something with the reply.
}
}

Start daemon threads, let them do the job, join() them all, aggregate.
Another possible way is to place the queries into a Queue, and let MDBs handle one request at a time. Aggregation is a bit more confusing though -- have to get results from another queue, and accumulate them between replies, and handle the errors... uh, common, just do the threads! :)
RequestAThread rath = new RequestAThread(dataForRequestA);
RequestBThread rbth = new RequestBThread(dataForRequestB);
...
rath.start();
rbth.start();
...
rath.join();
ReplyA ra = rath.getReply();
rbth.join();
ReplyB rb = rbth.getReply();
Result r = aggregate(ra,rb);
Error handling by taste.

Related

Async method followed by a parallelly executed method in Java 8

After spending the day of learning about the java Concurrency API, I still dont quite get how could I create the following functionality with the help of CompletableFuture and ExecutorService classes:
When I get a request on my REST endpoint I need to:
Start an asynchronous task (includes DB query, filtering, etc.), which will give me a list of String URLs at the end
In the meanwhile, responde back to the REST caller with HTTP OK, that the request was received, I'm working on it
When the asynchronous task is finished, I need to send HTTP requests (with the payload, the REST caller gave me) to the URLs I got from the job. At most the number of URLs would be around a 100, so I need these to happen in parallel.
Ideally I have some syncronized counter which counts how many of the http requests were a success/fail, and I can send this information back to the REST caller (the URL I need to send it back to is provided inside the request payload).
I have the building blocks (methods like: getMatchingObjectsFromDB(callerPayload), getURLs(resultOfgetMachingObjects), sendHttpRequest(Url, methodType), etc...) written for these already, I just cant quite figure out how to tie step 1 and step 3 together. I would use CompletableFuture.supplyAsync() for step 1, then I would need the CompletableFuture.thenComponse method to start step 3, but it's not clear to me how parallelism can be done with this API. It is rather intuitive with ExecutorService executor = Executors.newWorkStealingPool(); though, which creates a thread pool based on how much processing power is available and the tasks can be submitted via the invokeAll() method.
How can I use CompletableFutureand ExecutorService together? Or how can I guarantee parallel execution of a list of tasks with CompletableFuture? Demonstrating code snippet would be much appreciated. Thanks.

You should use join() to wait for all thread finish.
Create Map<String, Boolean> result to store your request result.
In your controller:
public void yourControllerMethod() {
CompletableFuture.runAsync(() -> yourServiceMethod());
}
In your service:
// Execute your logic to get List<String> urls
List<CompletableFuture> futures = urls.stream().map(v ->
CompletableFuture.supplyAsync(url -> requestUrl(url))
.thenAcceptAsync(requestResult -> result.put(url, true or false))
).collect(toList()); // You have list of completeable future here
Then use .join() to wait for all thread (Remember that your service are executed in its own thread already)
CompletableFuture.allOf(futures).join();
Then you can determine which one success/fail by accessing result map
Edit
Please post your proceduce code so that other may understand you also.
I've read your code and here are the needed modification:
When this for loop was not commented out, the receiver webserver got
the same request twice,
I dont understand the purpose of this for loop.
Sorry in my previous answer, I did not clean it up. That's just a temporary idea on my head that I forgot to remove at the end :D
Just remove it from your code
// allOf() only accepts arrays, so the List needed to be converted
/* The code never gets over this part (I know allOf() is a blocking call), even long after when the receiver got the HTTP request
with the correct payload. I'm not sure yet where exactly the code gets stuck */
Your map should be a ConcurrentHashMap because you're modifying it concurrently later.
Map<String, Boolean> result = new ConcurrentHashMap<>();
If your code still does not work as expected, I suggest to remove the parallelStream() part.
CompletableFuture and parallelStream use common forkjoin pool. I think the pool is exhausted.
And you should create your own pool for your CompletableFuture:
Executor pool = Executors.newFixedThreadPool(10);
And execute your request using that pool:
CompletableFuture.supplyAsync(YOURTASK, pool).thenAcceptAsync(Yourtask, pool)

For the sake of completion here is the relevant parts of the code, after clean-up and testing (thanks to Mạnh Quyết Nguyễn):
Rest controller class:
#POST
#Path("publish")
public Response publishEvent(PublishEvent eventPublished) {
/*
Payload verification, etc.
*/
//First send the event to the right subscribers, then send the resulting hashmap<String url, Boolean subscriberGotTheRequest> back to the publisher
CompletableFuture.supplyAsync(() -> EventHandlerService.propagateEvent(eventPublished)).thenAccept(map -> {
if (eventPublished.getDeliveryCompleteUri() != null) {
String callbackUrl = Utility
.getUri(eventPublished.getSource().getAddress(), eventPublished.getSource().getPort(), eventPublished.getDeliveryCompleteUri(), isSecure,
false);
try {
Utility.sendRequest(callbackUrl, "POST", map);
} catch (RuntimeException e) {
log.error("Callback after event publishing failed at: " + callbackUrl);
e.printStackTrace();
}
}
});
//return OK while the event publishing happens in async
return Response.status(Status.OK).build();
}
Service class:
private static List<EventFilter> getMatchingEventFilters(PublishEvent pe) {
//query the database, filter the results based on the method argument
}
private static boolean sendRequest(String url, Event event) {
//send the HTTP request to the given URL, with the given Event payload, return true if the response is positive (status code starts with 2), false otherwise
}
static Map<String, Boolean> propagateEvent(PublishEvent eventPublished) {
// Get the event relevant filters from the DB
List<EventFilter> filters = getMatchingEventFilters(eventPublished);
// Create the URLs from the filters
List<String> urls = new ArrayList<>();
for (EventFilter filter : filters) {
String url;
try {
boolean isSecure = filter.getConsumer().getAuthenticationInfo() != null;
url = Utility.getUri(filter.getConsumer().getAddress(), filter.getPort(), filter.getNotifyUri(), isSecure, false);
} catch (ArrowheadException | NullPointerException e) {
e.printStackTrace();
continue;
}
urls.add(url);
}
Map<String, Boolean> result = new ConcurrentHashMap<>();
Stream<CompletableFuture> stream = urls.stream().map(url -> CompletableFuture.supplyAsync(() -> sendRequest(url, eventPublished.getEvent()))
.thenAcceptAsync(published -> result.put(url, published)));
CompletableFuture.allOf(stream.toArray(CompletableFuture[]::new)).join();
log.info("Event published to " + urls.size() + " subscribers.");
return result;
}
Debugging this was a bit harder than usual, sometimes the code just magically stopped. To fix this, I only put code parts into the async task which was absolutely necessary, and I made sure the code in the task was using thread-safe stuff. Also I was a dumb-dumb at first, and my methods inside the EventHandlerService.class used the synchronized keyword, which resulted in the CompletableFuture inside the Service class method not executing, since it uses a thread pool by default.
A piece of logic marked with synchronized becomes a synchronized block, allowing only one thread to execute at any given time.

How do I complete numerous jobs with a few threads giving each the same timeout to complete?

An idea I am trying to implement is the following.
I have 1000 urls to download data from to use it for post processing (say, calculating some statistics).
I don't really need all of the downloads to finish successfully, but as many as possible.
I assume that some of the locations might be unavailable, either responding nothing valuable (e.g., HTTP 503) or taking more that TO=10 seconds of time to process a request.
I have T=5 threads to process the urls in parallel, giving the equal timeout TO to each.
As soon as one completes (what I expect to happen far earlier that TO exceeds) I aggregate some statistics (what is a very fast operation) and start the next download (if any).
The solution I have come up so far with is
ExecutorService executorService = Executors.newFixedThreadPool(T);
ExecutorCompletionService<MyResult> completionService = new ExecutorCompletionService<>(executorService);
urls.forEach(url -> {
Callable<MyResult> callable = () -> new MyResult(url);
completionService.submit(callable);
});
for (int i = 0; i < urls.size(); i++) {
Future<MyResult> resultFuture = completionService.poll(TO, TimeUnit.SECONDS);
if (resultFuture == null)
continue;
MyResult myResult = resultFuture.get();
myAggregate(myResult.getRate());
}
It looks like somewhat I am trying to achieve. But it for instance neither gives every download the same timeout nor cancels the Futures properly. So, what is the correct solution?

Try using the invokeAll-Method, you simply put your Callables in a List and then call invokeAll() on your ExecutorService giving it a timeout as second and third argument.
executorService.invokeAll(callableList, 20, TimeUnit.SECONDS);

Spring AMQP convertSendAndReceive with Restful multi threaded producer

I am trying to decide whether convertSendAndReceive is going to work for the following use case:
I have a RESTful web service that needs to make RPC calls and get a response back in order to service the request. I've never used the reply-to functionality in spring-amqp or RabbitMQ for that matter.
Will this work or am I headed down a path fraught with peril?
EDIT: My concern is whether the thread producing the message will get the correct corresponding response back and not get another thread's response.
I have added to a test that's listed in the spring-amqp documentation called JavaConfigFixedReplyQueueTests. I added the following test case:
(My connectionFactory bean is different but that's just to specify our SSL configuration for my company's rabbitmq instance so I'm not listing that here. All the existing tests passed with that change.)
#Test
public void testReplyContainer_multiple_threads() throws Exception
{
fixedReplyQRabbitTemplate.setReplyTimeout(-1);
// limit the number of actual threads
int poolSize = 100;
ExecutorService service = Executors.newFixedThreadPool(poolSize);
List<Future<?>> futures = new ArrayList<>();
for(int n = 0; n < 1000; n++)
{
Future<?> f = service.submit(makeNumberedRunnable(n));
futures.add(f);
}
// wait for all tasks to complete before continuing
for(Future<?> f : futures)
{
f.get();
}
// shut down the executor service so that this thread can exit
service.shutdownNow();
}
private Runnable makeNumberedRunnable(int n)
{
return new Runnable()
{
#Override
public void run()
{
assertEquals("FOO" + n, fixedReplyQRabbitTemplate.convertSendAndReceive("foo" + n));
}
};
}
Basically my test is creating many threads and making sure that the correct thread specific response is returned. The test passes just fine which gives me a little confidence in my approach.
I'm hoping Gary Russell can chime in or maybe Artem Bilan to give me their expert opinion. However, I welcome anyone knowledgeable in this area to give me their advice.
Thanks for your time.

All of the ...sendAndReceive(...) methods will properly correlate the request/reply so that the proper reply is returned on the requesting thread.

concurrent http request to independent web services

i'm trying to find a simple way to send http request concurrently to diferent web services. each request is completely independent of each other.
currently, my implementation look like this ( just a simplification, don't pay attention to design )
let's say a i have a List queries;
public class Service {
private List<HttpClient> httpClients; // one for each web service
public List<QueryResult> doQueries(List<Query> queries) {
ExecutorService service = Executors.... ;
List<Callable<QueryResult>> .... ;
for ( Query q : queries ) {
Future<> .....
}
service.invokeAll(...) ;
***// what should i do from here ?
// how should i wait all those tasks to finish ?***
}
}
my question is specifically that.
how do i wait ?

You seem to create a list of Callable and each callable will return result of type QueryResult as clear from List<Callable<QueryResult>>. You will get Future after submitting them to ExecutorService. So use code in this way:
List<Future<QueryResult >> futures = executorService.invokeAll(callables);
for(Future<QueryResult> future : futures){
System.out.println("future.get = " + future.get());
}
executorService.shutdown();
If you want to set some maximum time to wait for result you can use awaitTermination method as well. IMO ExecutorCompletionService is more suited for your requirements and you can read about it in my article at dzone.

You have 3 choices:
execute each request on a separate thread. Since each thread consumes a lot of memory, you can get OutOfMemoreError if >100 requests run in parallel.
limit the number of threads as akhil_mittal suggested. The number of concurrent requests will be also limited.
Use an async io library, e.g. nio2. They allow thousands of simultaneous requests with moderate memory consumption.

How to terminate CXF webservice call within Callable upon Future cancellation

Edit
This question has gone through a few iterations by now, so feel free to look through the revisions to see some background information on the history and things tried.
I'm using a CompletionService together with an ExecutorService and a Callable, to concurrently call the a number of functions on a few different webservices through CXF generated code.. These services all contribute different information towards a single set of information I'm using for my project. The services however can fail to respond for a prolonged period of time without throwing an exception, prolonging the wait for the combined set of information.
To counter this I'm running all the service calls concurrently, and after a few minutes would like to terminate any of the calls that have not yet finished, and preferably log which ones weren't done yet either from within the callable or by throwing an detailed Exception.
Here's some highly simplified code to illustrate what I'm doing already:
private Callable<List<Feature>> getXXXFeatures(final WiwsPortType port,
final String accessionCode) {
return new Callable<List<Feature>>() {
#Override
public List<Feature> call() throws Exception {
List<Feature> features = new ArrayList<Feature>();
//getXXXFeatures are methods of the WS Proxy
//that can take anywhere from second to never to return
for (RawFeature raw : port.getXXXFeatures(accessionCode)) {
Feature ft = convertFeature(raw);
features.add(ft);
}
if (Thread.currentThread().isInterrupted())
log.error("XXX was interrupted");
return features;
}
};
}
And the code that concurrently starts the WS calls:
WiwsPortType port = new Wiws().getWiws();
List<Future<List<Feature>>> ftList = new ArrayList<Future<List<Feature>>>();
//Counting wrapper around CompletionService,
//so I could implement ccs.hasRemaining()
CountingCompletionService<List<Feature>> ccs =
new CountingCompletionService<List<Feature>>(threadpool);
ftList.add(ccs.submit(getXXXFeatures(port, accessionCode)));
ftList.add(ccs.submit(getYYYFeatures(port accessionCode)));
ftList.add(ccs.submit(getZZZFeatures(port, accessionCode)));
List<Feature> allFeatures = new ArrayList<Feature>();
while (ccs.hasRemaining()) {
//Low for testing, eventually a little more lenient
Future<List<Feature>> polled = ccs.poll(5, TimeUnit.SECONDS);
if (polled != null)
allFeatures.addAll(polled.get());
else {
//Still jobs remaining, but unresponsive: Cancel them all
int jobsCanceled = 0;
for (Future<List<Feature>> job : ftList)
if (job.cancel(true))
jobsCanceled++;
log.error("Canceled {} feature jobs because they took too long",
jobsCanceled);
break;
}
}
The problem I'm having with this code is that the Callables aren't actually canceled when waiting for port.getXXXFeatures(...) to return, but somehow keep running. As you can see from the if (Thread.currentThread().isInterrupted()) log.error("XXX was interrupted"); statements the interrupted flag is set after port.getFeatures returns, this is only available after the Webservice call completes normally, instead of it having been interrupted when I called Cancel.
Can anyone tell me what I am doing wrong and how I can stop the running CXF Webservice call after a given time period, and register this information in my application?
Best regards, Tim

Edit 3 New answer.
I see these options:
Post your problem on the Apache CXF as feature request
Fix ACXF yourself and expose some features.
Look for options for asynchronous WS call support within the Apache CXF
Consider switching to a different WS provider (JAX-WS?)
Do your WS call yourself using RESTful API if the service supports it (e.g. plain HTTP request with parameters)
For über experts only: use true threads/thread group and kill the threads with unorthodox methods.

The CXF docs have some instructions for setting the read timeout on the HTTPURLConnection:
http://cwiki.apache.org/CXF20DOC/client-http-transport-including-ssl-support.html
That would probably meet your needs. If the server doesn't respond in time, an exception is raised and the callable would get the exception. (except there is a bug where is MAY hang instead. I cannot remember if that was fixed for 2.2.2 or if it's just in the SNAPSHOTS right now.)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.