I'm trying to implement excel export for some amount of data. After 5 minutes I receive a 504 Gateway timeout. In the backend the process continues with its work.
For the whole service to finish, I need approximately 15 minutes. Is there anything I can do to prevent this? I dont have access to the servers in production.
The app is Spring boot with Oracle database. I'm using POI for this export.
One common way to handle these kinds of problems is to have the first request start the process in the background, and when the file has been generated, download the results from another place. The first request finishes immediately, and the user can then check another view to see if the file has been generated, and download the results.
You can export the data in smaller chunks. Run a test with say 10K records, make a note of the id of the last record and repeat the export starting at the next record. If 10K finishes quickly, then try 50K. If you have a timer that might come in handy. Good luck.
I had the same situation where the timeout of the network calls wasn't in our hand, so I guess you have something where it is 5 mins to receive the 1st byte and then the timeout is gone.
My solution was, let's assume you have a controller and a query layer to talk to the database. In this case, you make your process in the Async way. The call to this controller should just trigger that async execution and return the success status immediately, without waiting. Here execution will happen in the background. Futures can be used here as they are async and you can also handle the result once completed by using callback methods of Future.
You can implement using Future and callback methods in java8 like below:
Futures.addCallback(
exportData,
new FutureCallback<String>() {
public void onSuccess(String message) {
System.out.println(message);
}
public void onFailure(Throwable thrown) {
thrown.getCause();
}
},
service)
and in Scala like:
val result = Future {
exportData(data)
}
result.onComplete {
case Success(message) => println(s"Got the callback result:
$message")
case Failure(e) => e.printStackTrace
}
Related
I am developing a Java web application using Spring.
What I would like to do is that after the user gets to a page, the code starts running a function every 10 seconds, keeping track on the time the last action was performed.
I tried to do so with a Scheduler but it starts running immediately - and not only after the user gets to a page.
#Scheduled(fixedRate = 60000)
public void run(String param) {
//just an example of action to be performed repeatedly
System.out.println("Previously performed action was " + new SimpleDateFormat("dd/MM/yyyy HH:mm:ss").format(previousActionTime)) + " with " + param);
//update previousActionTime
previousActionTime.setSeconds(previousActionTime.getSeconds() + 10);
}
Moreover I don't know what it is a convenient way to store the time when the last action automated action was performed.
The scheduler should be somehow activated when browsing to the page:
#RequestMapping(value = "/hellopage", method = { RequestMethod.POST, RequestMethod.GET })
public String hellopage(HttpServletRequest request, HttpServletResponse response) {
// Activate scheduler
run(request.getParameter("param1"))
...
}
The scheduler (or whatever performs the automated actions) should stop as soon as the user gets again to the same web page triggering the automated actions and should run in background not blocking any other code from execution (to be precise, I cannot simply put a while loop with Sys.sleep in the function mapped to the page URL request why the page should do other things)
Any help?
Consider using ScheduledExecutorService.scheduleAtFixedRate for this as the Spring scheduler are independent of any users' request (which you have already observed and noted in question).
You may use shutdownNow to terminate the scheduler once the users' session is no longer valid / a new request is received. To achieve this you could maintain cache of previous executor(s) with user id (or any relevant information) to identify the instance which should be invalidated.
As an alternative you could use Timer and TimerTask if more fine grained control is required (however not recommended as noted here)
There are two common ways of achieving this.
The first is to run your timer client-side, in JavaScript, and then runs an AJAX/websocket/whatever call. This has many advantages - once the user navigates away from your site, the timer will stop, and you're not tying up server-side resources so your application will scale much more cleanly. This is by far the cleanest solution if your timer is linked to a single user.
The second is to use a message queue; pop a message on the queue and have an asynchronous process checking those messages, ideally aggregating multiple client sessions in a single database request. You need to figure out how to detect sessions timing out and remove the message from the message queue.
This approach is best when your timer is looking at information that's not tightly connected to the current user.
I have a spring boot web application with the functionality to update an entity called StudioLinking. This entity describes a temporary, mutable, descriptive logical link between two IoT devices for which my web app is their cloud service. The Links between these devices are ephemeral in nature, but the StudioLinking Entity persists on the database for reporting purposes. StudioLinking is stored to the SQL based datastore in the conventional way using Spring Data/ Hibernate. From time to time this StudioLinking entity will be updated with new information from a Rest API. When that link is updated the devices need to respond (change colors, volume, etc). Right now this is handled with polling every 5 seconds but this creates lag from when a human user enters an update into the system and when the IoT devices actually update. It could be as little as a millisecond or up to 5 seconds! Clearly increasing the frequency of the polling is unsustainable and the vast majority of the time there are no updates at all!
So, I am trying to develop another Rest API on this same application with HTTP Long Polling which will return when a given StudioLinking entity is updated or after a timeout. The listeners do not support WebSocket or similar leaving me with Long Polling. Long polling can leave a race condition where you have to account for the possibility that with consecutive messages one message may be "lost" as it comes in between HTTP requests (while the connection is closing and opening, a new "update" might come in and not be "noticed" if I used a Pub/Sub).
It is important to note that this "subscribe to updates" API should only ever return the LATEST and CURRENT version of the StudioLinking, but should only do so when there is an actual update or if an update happened since the last checkin. The "subscribe to updates" client will initially POST an API request to setup a new listening session and pass that along so the server knows who they are. Because it is possible that multiple devices will need to monitor updates to the same StudioLinking entity. I believe I can acomplish this by using separately named consumers in the redis XREAD. (keep this in mind for later in the question)
After hours of research I believe the way to acomplish this is using redis streams.
I have found these two links regarding Redis Streams in Spring Data Redis:
https://www.vinsguru.com/redis-reactive-stream-real-time-producing-consuming-streams-with-spring-boot/
https://medium.com/#amitptl.in/redis-stream-in-action-using-java-and-spring-data-redis-a73257f9a281
I also have read this link about long polling, both of these links just have a sleep timer during the long polling which is for demonstration purposes but obviously I want to do something useful.
https://www.baeldung.com/spring-deferred-result
And both these links were very helpful. Right now I have no problem figuring out how to publish the updates to the Redis Stream - (this is untested "pseudo-code" but I don't anticipate having any issues implementing this)
// In my StudioLinking Entity
#PostUpdate
public void postToRedis() {
StudioLinking link = this;
ObjectRecord<String, StudioLinking> record = StreamRecords.newRecord()
.ofObject(link)
.withStreamKey(streamKey); //I am creating a stream for each individual linking probably?
this.redisTemplate
.opsForStream()
.add(record)
.subscribe(System.out::println);
atomicInteger.incrementAndGet();
}
But I fall flat when it comes to subscribing to said stream: So basically what I want to do here - please excuse the butchered pseudocode, it is for idea purposes only. I am well aware that the code is in no way indicative of how the language and framework actually behaves :)
// Parameter studioLinkingID refers to the StudioLinking that the requester wants to monitor
// updateList is a unique token to track individual consumers in Redis
#GetMapping("/subscribe-to-updates/{linkId}/{updatesId}")
public DeferredResult<ResponseEntity<?>> subscribeToUpdates(#PathVariable("linkId") Integer linkId, #PathVariable("updatesId") Integer updatesId) {
LOG.info("Received async-deferredresult request");
DeferredResult<ResponseEntity<?>> output = new DeferredResult<>(5000l);
deferredResult.onTimeout(() ->
deferredResult.setErrorResult(
ResponseEntity.status(HttpStatus.REQUEST_TIMEOUT)
.body("IT WAS NOT UPDATED!")));
ForkJoinPool.commonPool().submit(() -> {
//----------------------------------------------
// Made up stuff... here is where I want to subscribe to a stream and block!
//----------------------------------------------
LOG.info("Processing in separate thread");
try {
// Subscribe to Redis Stream, get any updates that happened between long-polls
// then block until/if a new message comes over the stream
var subscription = listenerContainer.receiveAutoAck(
Consumer.from(studioLinkingID, updateList),
StreamOffset.create(studioLinkingID, ReadOffset.lastConsumed()),
streamListener);
listenerContainer.start();
} catch (InterruptedException e) {
}
output.setResult("IT WAS UPDATED!");
});
LOG.info("servlet thread freed");
return output;
}
So is there a good explanation to how I would go about this? I think the answer lies within https://docs.spring.io/spring-data/redis/docs/current/api/org/springframework/data/redis/core/ReactiveRedisTemplate.html but I am not a big enough Spring power user to really understand the terminology within Java Docs (the Spring documentation is really good, but the JavaDocs is written in the dense technical language which I appreciate but don't quite understand yet).
There are two more hurdles to my implementation:
My exact understanding of spring is not at 100% yet. I haven't yet reached that a-ha moment where I really fully understand why all these beans are floating around. I think this is the key to why I am not getting things here... The configuration for the Redis is floating around in the Spring ether and I am not grasping how to just call it. I really need to keep investigating this (it is a huge hurdle to spring for me).
These StudioLinking are short lived, so I need to do some cleanup too. I will implement this later once I get the whole thing up off the ground, I do know it will be needed.
Why don't you use a blocking polling mechanism? No need to use fancy stuff of spring-data-redis. Just use simple blocking read of 5 seconds, so this call might take around 6 seconds or so. You can decrease or increase the blocking timeout.
class LinkStatus {
private final boolean updated;
LinkStatus(boolean updated) {
this.updated = updated;
}
}
// Parameter studioLinkingID refers to the StudioLinking that the requester wants to monitor
// updateList is a unique token to track individual consumers in Redis
#GetMapping("/subscribe-to-updates/{linkId}/{updatesId}")
public LinkStatus subscribeToUpdates(
#PathVariable("linkId") Integer linkId, #PathVariable("updatesId") Integer updatesId) {
StreamOperations<String, String, String> op = redisTemplate.opsForStream();
Consumer consumer = Consumer.from("test-group", "test-consumer");
// auto ack block stream read with size 1 with timeout of 5 seconds
StreamReadOptions readOptions = StreamReadOptions.empty().block(Duration.ofSeconds(5)).count(1);
List<MapRecord<String, String, String>> records =
op.read(consumer, readOptions, StreamOffset.latest("test-stream"));
return new LinkStatus(!CollectionUtils.isEmpty(records));
}
I have 2 data sources: DB and server. When I start the application, I call the method from the repository (MyRepository):
public Observable<List<MyObj>> fetchMyObjs() {
Observable<List<MyObj>> localData = mLocalDataSource.fetchMyObjs();
Observable<List<MyObj>> remoteData = mRemoteDataSource.fetchMyObjs();
return Observable.concat(localData, remoteData);
}
I subscribe to it as follows:
mMyRepository.fetchMyObjs()
.compose(applySchedulers())
.subscribe(
myObjs -> {
//do somthing
},
throwable -> {
//handle error
}
);
I expect that the data from the database will be loaded faster, and when the download of data from the network is completed, I will simply update the data in Activity.
When the Internet is connected, everything works well. But when we open the application without connecting to the network, then mRemoteDataSource.fetchMyObjs(); throws UnknownHostException and on this all Observable ends (the subscriber for localData does not work (although logs tell that the data from the database was taken)). And when I try to call the fetchMyObjs() method again from the MyRepository class (via SwipeRefresh), the subscriber to localData is triggered.
How can I get rid of the fact that when the network is off, when the application starts, does the subscriber work for localData?
Try some of error handling operators:
https://github.com/ReactiveX/RxJava/wiki/Error-Handling-Operators
I'd guess onErrorResumeNext( ) will be fine but you have to test it by yourself. Maybe something like this would work for you:
Observable<List<MyObj>> remoteData = mRemoteDataSource.fetchMyObjs()
.onErrorResumeNext()
Addidtionally I am not in position to judge if your idea is right or not but maybe it's worth to think about rebuilding this flow. It is not the right thing to ignore errors - that's for sure ;)
You can observe your chain with observeOn(Scheduler scheduler, boolean delayError) and delayError set to true.
delayError - indicates if the onError notification may not cut ahead of onNext notification on the other side of the scheduling boundary. If true a sequence ending in onError will be replayed in the same order as was received from upstream
I'm playing around with Vert.x and quite new to the servers based on event loop as opposed to the thread/connection model.
public void start(Future<Void> fut) {
vertx
.createHttpServer()
.requestHandler(r -> {
LocalDateTime start = LocalDateTime.now();
System.out.println("Request received - "+start.format(DateTimeFormatter.ISO_DATE_TIME));
final MyModel model = new MyModel();
try {
for(int i=0;i<10000000;i++){
//some simple operation
}
model.data = start.format(DateTimeFormatter.ISO_DATE_TIME) +" - "+LocalDateTime.now().format(DateTimeFormatter.ISO_DATE_TIME);
} catch (Exception e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
r.response().end(
new Gson().toJson(model)
);
})
.listen(4568, result -> {
if (result.succeeded()) {
fut.complete();
} else {
fut.fail(result.cause());
}
});
System.out.println("Server started ..");
}
I'm just trying to simulate a long running request handler to understand how this model works.
What I've observed is the so called event loop is blocked until my first request completes. Whatever little time it takes, subsequent request is not acted upon until the previous one completes.
Obviously I'm missing a piece here and that's the question that I have here.
Edited based on the answers so far:
Isn't accepting all requests considered to be asynchronous? If a new
connection can only be accepted when the previous one is cleared
off, how is it async?
Assume a typical request takes anywhere between 100 ms to 1 sec (based on the kind and nature of the request). So it means, the
event loop can't accept a new connection until the previous request
finishes(even if its winds up in a second). And If I as a programmer
have to think through all these and push such request handlers to a
worker thread , then how does it differ from a thread/connection
model?
I'm just trying to understand how is this model better from a traditional thread/conn server models? Assume there is no I/O op or
all the I/O op are handled asynchronously? How does it even solve
c10k problem, when it can't start all concurrent requests parallely and have to wait till the previous one terminates?
Even if I decide to push all these operations to a worker thread(pooled), then I'm back to the same problem isn't it? Context switching between threads?
Edits and topping this question for a bounty
Do not completely understand how this model is claimed to asynchronous.
Vert.x has an async JDBC client (Asyncronous is the keyword) which I tried to adapt with RXJava.
Here is a code sample (Relevant portions)
server.requestStream().toObservable().subscribe(req -> {
LocalDateTime start = LocalDateTime.now();
System.out.println("Request for " + req.absoluteURI() +" received - " +start.format(DateTimeFormatter.ISO_DATE_TIME));
jdbc.getConnectionObservable().subscribe(
conn -> {
// Now chain some statements using flatmap composition
Observable<ResultSet> resa = conn.queryObservable("SELECT * FROM CALL_OPTION WHERE UNDERLYING='NIFTY'");
// Subscribe to the final result
resa.subscribe(resultSet -> {
req.response().end(resultSet.getRows().toString());
System.out.println("Request for " + req.absoluteURI() +" Ended - " +LocalDateTime.now().format(DateTimeFormatter.ISO_DATE_TIME));
}, err -> {
System.out.println("Database problem");
err.printStackTrace();
});
},
// Could not connect
err -> {
err.printStackTrace();
}
);
});
server.listen(4568);
The select query there takes 3 seconds approx to return the complete table dump.
When I fire concurrent requests(tried with just 2), I see that the second request completely waits for the first one to complete.
If the JDBC select is asynchronous, Isn't it a fair expectation to have the framework handle the second connection while it waits for the select query to return anything.?
Vert.x event loop is, in fact, a classical event loop existing on many platforms. And of course, most explanations and docs could be found for Node.js, as it's the most popular framework based on this architecture pattern. Take a look at one more or less good explanation of mechanics under Node.js event loop. Vert.x tutorial has fine explanation between "Don’t call us, we’ll call you" and "Verticles" too.
Edit for your updates:
First of all, when you are working with an event loop, the main thread should work very quickly for all requests. You shouldn't do any long job in this loop. And of course, you shouldn't wait for a response to your call to the database.
- Schedule a call asynchronously
- Assign a callback (handler) to result
- Callback will be executed in the worker thread, not event loop thread. This callback, for example, will return a response to the socket.
So, your operations in the event loop should just schedule all asynchronous operations with callbacks and go to the next request without awaiting any results.
Assume a typical request takes anywhere between 100 ms to 1 sec (based on the kind and nature of the request).
In that case, your request has some computation expensive parts or access to IO - your code in the event loop shouldn't wait for the result of these operations.
I'm just trying to understand how is this model better from a traditional thread/conn server models? Assume there is no I/O op or all the I/O op are handled asynchronously?
When you have too many concurrent requests and a traditional programming model, you will make thread per each request. What this thread will do? They will be mostly waiting for IO operations (for example, result from database). It's a waste of resources. In our event loop model, you have one main thread that schedule operations and preallocated amount of worker threads for long tasks. + None of these workers actually wait for the response, they just can execute another code while waiting for IO result (it can be implemented as callbacks or periodical checking status of IO jobs currently in progress). I would recommend you go through Java NIO and Java NIO 2 to understand how this async IO can be actually implemented inside the framework. Green threads is a very related concept too, that would be good to understand. Green threads and coroutines are a type of shadowed event loop, that trying to achieve the same thing - fewer threads because we can reuse system thread while green thread waiting for something.
How does it even solve c10k problem, when it can't start all concurrent requests parallel and have to wait till the previous one terminates?
For sure we don't wait in the main thread for sending the response for the previous request. Get request, schedule long/IO tasks execution, next request.
Even if I decide to push all these operations to a worker thread(pooled), then I'm back to the same problem isn't it? Context switching between threads?
If you make everything right - no. Even more, you will get good data locality and execution flow prediction. One CPU core will execute your short event loop and schedule async work without context switching and nothing more. Other cores make a call to the database and return response and only this. Switching between callbacks or checking different channels for IO status doesn't actually require any system thread's context switching - it's actually working in one worker thread. So, we have one worker thread per core and this one system thread await/checks results availability from multiple connections to database for example. Revisit Java NIO concept to understand how it can work this way. (Classical example for NIO - proxy-server that can accept many parallel connections (thousands), proxy requests to some other remote servers, listen to responses and send responses back to clients and all of this using one or two threads)
About your code, I made a sample project for you to demonstrate that everything works as expected:
public class MyFirstVerticle extends AbstractVerticle {
#Override
public void start(Future<Void> fut) {
JDBCClient client = JDBCClient.createShared(vertx, new JsonObject()
.put("url", "jdbc:hsqldb:mem:test?shutdown=true")
.put("driver_class", "org.hsqldb.jdbcDriver")
.put("max_pool_size", 30));
client.getConnection(conn -> {
if (conn.failed()) {throw new RuntimeException(conn.cause());}
final SQLConnection connection = conn.result();
// create a table
connection.execute("create table test(id int primary key, name varchar(255))", create -> {
if (create.failed()) {throw new RuntimeException(create.cause());}
});
});
vertx
.createHttpServer()
.requestHandler(r -> {
int requestId = new Random().nextInt();
System.out.println("Request " + requestId + " received");
client.getConnection(conn -> {
if (conn.failed()) {throw new RuntimeException(conn.cause());}
final SQLConnection connection = conn.result();
connection.execute("insert into test values ('" + requestId + "', 'World')", insert -> {
// query some data with arguments
connection
.queryWithParams("select * from test where id = ?", new JsonArray().add(requestId), rs -> {
connection.close(done -> {if (done.failed()) {throw new RuntimeException(done.cause());}});
System.out.println("Result " + requestId + " returned");
r.response().end("Hello");
});
});
});
})
.listen(8080, result -> {
if (result.succeeded()) {
fut.complete();
} else {
fut.fail(result.cause());
}
});
}
}
#RunWith(VertxUnitRunner.class)
public class MyFirstVerticleTest {
private Vertx vertx;
#Before
public void setUp(TestContext context) {
vertx = Vertx.vertx();
vertx.deployVerticle(MyFirstVerticle.class.getName(),
context.asyncAssertSuccess());
}
#After
public void tearDown(TestContext context) {
vertx.close(context.asyncAssertSuccess());
}
#Test
public void testMyApplication(TestContext context) {
for (int i = 0; i < 10; i++) {
final Async async = context.async();
vertx.createHttpClient().getNow(8080, "localhost", "/",
response -> response.handler(body -> {
context.assertTrue(body.toString().contains("Hello"));
async.complete();
})
);
}
}
}
Output:
Request 1412761034 received
Request -1781489277 received
Request 1008255692 received
Request -853002509 received
Request -919489429 received
Request 1902219940 received
Request -2141153291 received
Request 1144684415 received
Request -1409053630 received
Request -546435082 received
Result 1412761034 returned
Result -1781489277 returned
Result 1008255692 returned
Result -853002509 returned
Result -919489429 returned
Result 1902219940 returned
Result -2141153291 returned
Result 1144684415 returned
Result -1409053630 returned
Result -546435082 returned
So, we accept a request - schedule a request to the database, go to the next request, we consume all of them and send a response for each request only when everything is done with the database.
About your code sample I see two possible issues - first, it looks like you don't close() connection, which is important to return it to pool. Second, how your pool is configured? If there is only one free connection - these requests will serialize waiting for this connection.
I recommend you to add some printing of a timestamp for both requests to find a place where you serialize. You have something that makes the calls in the event loop to be blocking. Or... check that you send requests in parallel in your test. Not next after getting a response after previous.
How is this asynchronous? The answer is in your question itself
What I've observed is the so called event loop is blocked until my
first request completes. Whatever little time it takes, subsequent
request is not acted upon until the previous one completes
The idea is instead of having a new for serving each HTTP request, same thread is used which you have blocked by your long running task.
The goal of event loop is to save the time involved in context switching from one thread to another thread and utilize the ideal CPU time when a task is using IO/Network activities. If while handling your request it had to other IO/Network operation eg: fetching data from a remote MongoDB instance during that time your thread will not be blocked and instead an another request would be served by the same thread which is the ideal use case of event loop model (Considering that you have concurrent requests coming to your server).
If you have long running tasks which does not involve Network/IO operation, you should consider using thread pool instead, if you block your main event loop thread itself other requests would be delayed. i.e. for long running tasks you are okay to pay the price of context switching for for server to be responsive.
EDIT:
The way a server can handle requests can vary:
1) Spawn a new thread for each incoming request (In this model the context switching would be high and there is additional cost of spawning a new thread every time)
2) Use a thread pool to server the request (Same set of thread would be used to serve requests and extra requests gets queued up)
3) Use a event loop (single thread for all the requests. Negligible context switching. Because there would be some threads running e.g: to queue up the incoming requests)
First of all context switching is not bad, it is required to keep application server responsive, but, too much context switching can be a problem if the number of concurrent requests goes too high (roughly more than 10k). If you want to understand in more detail I recommend you to read C10K article
Assume a typical request takes anywhere between 100 ms to 1 sec (based
on the kind and nature of the request). So it means, the event loop
can't accept a new connection until the previous request finishes(even
if its winds up in a second).
If you need to respond to large number of concurrent requests (more than 10k) I would consider more than 500ms as a longer running operation. Secondly, Like I said there are some threads/context switching involved e.g.: to queue up incoming requests, but, the context switching amongst threads would be greatly reduced as there would be too few threads at a time. Thirdly, if there is a network/IO operation involved in resolving first request second request would get a chance to be resolved before first is resolved, this is where this model plays well.
And If I as a programmer have to think
through all these and push such request handlers to a worker thread ,
then how does it differ from a thread/connection model?
Vertx is trying to give you best of threads and event loop, so, as programmer you can make a call on how to make your application efficient under both the scenario i.e. long running operation with and without network/IO operation.
I'm just trying to understand how is this model better from a
traditional thread/conn server models? Assume there is no I/O op or
all the I/O op are handled asynchronously? How does it even solve c10k
problem, when it can't start all concurrent requests parallely and
have to wait till the previous one terminates?
The above explanation should answer this.
Even if I decide to push all these operations to a worker
thread(pooled), then I'm back to the same problem isn't it? Context
switching between threads?
Like I said, both have pros and cons and vertx gives you both the model and depending on your use case you got to choose what is ideal for your scenario.
In these sort of processing engines, you are supposed to turn long running tasks in to asynchronously executed operations and these is a methodology for doing this, so that the critical thread can complete as quickly as possible and return to perform another task. i.e. any IO operations are passed to the framework to call you back when the IO is done.
The framework is asynchronous in the sense that it supports you producing and running these asynchronous tasks, but it doesn't change your code from being synchronous to asynchronous.
Edit
This question has gone through a few iterations by now, so feel free to look through the revisions to see some background information on the history and things tried.
I'm using a CompletionService together with an ExecutorService and a Callable, to concurrently call the a number of functions on a few different webservices through CXF generated code.. These services all contribute different information towards a single set of information I'm using for my project. The services however can fail to respond for a prolonged period of time without throwing an exception, prolonging the wait for the combined set of information.
To counter this I'm running all the service calls concurrently, and after a few minutes would like to terminate any of the calls that have not yet finished, and preferably log which ones weren't done yet either from within the callable or by throwing an detailed Exception.
Here's some highly simplified code to illustrate what I'm doing already:
private Callable<List<Feature>> getXXXFeatures(final WiwsPortType port,
final String accessionCode) {
return new Callable<List<Feature>>() {
#Override
public List<Feature> call() throws Exception {
List<Feature> features = new ArrayList<Feature>();
//getXXXFeatures are methods of the WS Proxy
//that can take anywhere from second to never to return
for (RawFeature raw : port.getXXXFeatures(accessionCode)) {
Feature ft = convertFeature(raw);
features.add(ft);
}
if (Thread.currentThread().isInterrupted())
log.error("XXX was interrupted");
return features;
}
};
}
And the code that concurrently starts the WS calls:
WiwsPortType port = new Wiws().getWiws();
List<Future<List<Feature>>> ftList = new ArrayList<Future<List<Feature>>>();
//Counting wrapper around CompletionService,
//so I could implement ccs.hasRemaining()
CountingCompletionService<List<Feature>> ccs =
new CountingCompletionService<List<Feature>>(threadpool);
ftList.add(ccs.submit(getXXXFeatures(port, accessionCode)));
ftList.add(ccs.submit(getYYYFeatures(port accessionCode)));
ftList.add(ccs.submit(getZZZFeatures(port, accessionCode)));
List<Feature> allFeatures = new ArrayList<Feature>();
while (ccs.hasRemaining()) {
//Low for testing, eventually a little more lenient
Future<List<Feature>> polled = ccs.poll(5, TimeUnit.SECONDS);
if (polled != null)
allFeatures.addAll(polled.get());
else {
//Still jobs remaining, but unresponsive: Cancel them all
int jobsCanceled = 0;
for (Future<List<Feature>> job : ftList)
if (job.cancel(true))
jobsCanceled++;
log.error("Canceled {} feature jobs because they took too long",
jobsCanceled);
break;
}
}
The problem I'm having with this code is that the Callables aren't actually canceled when waiting for port.getXXXFeatures(...) to return, but somehow keep running. As you can see from the if (Thread.currentThread().isInterrupted()) log.error("XXX was interrupted"); statements the interrupted flag is set after port.getFeatures returns, this is only available after the Webservice call completes normally, instead of it having been interrupted when I called Cancel.
Can anyone tell me what I am doing wrong and how I can stop the running CXF Webservice call after a given time period, and register this information in my application?
Best regards, Tim
Edit 3 New answer.
I see these options:
Post your problem on the Apache CXF as feature request
Fix ACXF yourself and expose some features.
Look for options for asynchronous WS call support within the Apache CXF
Consider switching to a different WS provider (JAX-WS?)
Do your WS call yourself using RESTful API if the service supports it (e.g. plain HTTP request with parameters)
For über experts only: use true threads/thread group and kill the threads with unorthodox methods.
The CXF docs have some instructions for setting the read timeout on the HTTPURLConnection:
http://cwiki.apache.org/CXF20DOC/client-http-transport-including-ssl-support.html
That would probably meet your needs. If the server doesn't respond in time, an exception is raised and the callable would get the exception. (except there is a bug where is MAY hang instead. I cannot remember if that was fixed for 2.2.2 or if it's just in the SNAPSHOTS right now.)