Threading in Servlets

Threading in Servlets - java

I am working on a servlet that can take a few hours to complete the request. However, the client calling the servlet is only interested in knowing whether the request has been received by the servlet or not. The client doesn't want to wait hours before it gets any kind of response from the servlet. Also since calling the servlet is a blocking call, the client cannot proceed until it receives the response from the servlet.
To avoid this, I am thinking of actually launching a new thread in the servlet code. The thread launched by the servlet will do the time consuming processing allowing the servlet to return a response to the client very quickly. But I am not sure if this an acceptable way of working around the blocking nature of servlet calls. I have looked into NIO but it seems like it is not something that is guaranteed to work in any servlet container as the servlet container has be NIO based also.

What you need is a job scheduler because they give assurance that a job will be finished, even in case a server is restarted.
Take a look at java OSS job schedulers, most notably Quartz.

Your solution is correct, but creating threads in enterprise applications is considered a bad practice. Better use a thread pool or JMS queue.
You have to take into account what should happen server goes down during processing, how to react when multiple requests (think: hundreds or even thousands) occur at the same time, etc. So you have chosen the right direction, but it is a bit more complicated.

A thread isn't bad but I recommend throwing this off to an executor pool as a task. Better yet a long running work manager. It's not a bad practice to return quickly like you plan. I would recommend providing some sort of user feedback indicating where the user can find information about the long running job. So:
Create a job representing the work task with a unique ID
Send the job to your background handler object (that contains an executor)
Build a url for the unique job id.
Return a page describing where they can get the result
The page with the result will have to coordinate with this background job manager. While it's computing you can have this page describe the progress. When its done the page can display the results of the long running job.

Related

Multithreading with Jersey

Here are two links which seem to be contradicting each other. I'd sooner trust the docs:
Link 1
Request processing on the server works by default in a synchronous processing mode
Link 2
It already is multithreaded.
My question:
Which is correct. Can it be both synchronous and multithreaded?
Why do the docs say the following?:
in cases where a resource method execution is known to take a long time to compute the result, server-side asynchronous processing model should be used
If the docs are correct, why is the default action synchronous? All requests are asynchronous on client-side javascript by default for user experience, it would make sense then that the default action for server-side should also be asynchronous too.
If the client does not need to serve requests in a specific order, then who cares how "EXPENSIVE" the operation is. Shouldn't all operations simply be asynchronous?

Request processing on the server works by default in a synchronous processing mode
Each request is processed on a separate thread. The request is considered synchronous because that request holds up the thread until the request is finished processing.
It already is multithreaded.
Yes, the server (container) is multi-threaded. For each request that comes in, a thread is taken from the thread pool, and the request is tied to the particular request.
in cases where a resource method execution is known to take a long time to compute the result, server-side asynchronous processing model should be used
Yes, so that we don't hold up the container thread. There are only so many threads in the container thread pool to handle requests. If we are holding them all up with long processing requests, then the container may run out of threads, blocking other requests from coming in. In asynchronous processing, Jersey hands the thread back to the container, and handle the request processing itself in its own thread pool, until the process is complete, then send the response up to the container, where it can send it back to the client.
If the client does not need to serve requests in a specific order, then who cares how "EXPENSIVE" the operation is.
Not really sure what the client has to do with anything here. Or at least in the context of how you're asking the question. Sorry.
Shouldn't all operations simply be asynchronous?
Not necessarily, if all the requests are quick. Though you could make an argument for it, but that would require performance testing, and numbers you can put up against each other and make a decision from there. Every system is different.

Java Long Polling: Separate Thread?

Because of browser compatibility issues, I have decided to use long polling for a real time syncing and notification system. I use Java on the backend and all of the examples I've found thus far have been PHP. They tend to use while loops and a sleep method. How do I replicate this sort of thing in Java? There is a Thread.sleep() method, which leads me to...should I be using a separate thread for each user issuing a poll? If I don't use a separate thread, will the polling requests be blocking up the server?

[Update]
First of all, yes it is certainly possible to do a straightforward, long polling request handler. The request comes in to the server, then in your handler you loop or block until the information you need is available, then you end the loop and provide the information. Just realize that for each long polling client, yes you will be tying up a thread. This may be fine and perhaps this is the way you should start. However - if your web server is becoming so popular that the sheer number of blocking threads is becoming a performance problem, consider an asynchronous solution where you can keep a large numbers of client requests pending - their request is blocking, that is not responding until there is useful data, without tying up one or more threads per client.
[original]
The servlet 3.0 spec provides a standard for doing this kind asynchronous processing. Google "servlet 3.0 async". Tomcat 7 supports this. I'm guessing Jetty does also, but I have not used it.
Basically in your servlet request handler, when you realize you need to do some "long" polling, you can call a method to create an asynchronous context. Then you can exit the request handler and your thread is freed up, however the client is still blocking on the request. There is no need for any sleep or wait.
The trick is storing the async context somewhere "convenient". Then something happens in your app and you want to push data to the client, you go find that context, get the response object from it, write your content and invoke complete. The response is sent back to the client without you having to tie up a thread for each client.

Not sure this is the best solution for what you want but usually if you want to do this at period intervals in java you use the ScheduleExecutorService. There is a good example at the top of the API document. The TimeUnit is a great enum as you can specify the period time easily and clearly. So you can specify it to run every x minutes, hours etc

Synchronous, Asynchronous and Command Client Requests with GWT and GAE

In designing my GWT/GAE app, it has become evident to me that my client-side (GWT) will be generating three types of requests:
Synchronous - "answer me right now! I'm important and require a real-time response!!!"
Asynchronous - "answer me when you can; I need to know the answer at some point but it's really not all that ugent."
Command - "I don't need an answer. This isn't really a request, it's just a command to do something or process something on the server-side."
My game plan is to implement my GWT code so that I can specify, for each specific server-side request (note: I've decided to go with RequestFactory over traditional GWT-RPC for reasons outside the scope of this question), which type of request it is:
SynchronousRequest - Synchronous (from above); sends a command and eagerly awaits a response that it then uses to update the client's state somehow
AsynchronousRequest - Asynchronous (from above); makes an initial request and somehow - either through polling or the GAE Channel API, is notified when the response is finally received
CommandRequest - Command (from above); makes a server-side request and does not wait for a response (even if the server fails to, or refuses to, oblige the command)
I guess my intention with SynchronousRequest is not to produce a totally blocking request, however it may block the user's ability to interact with a specific Widget or portion of the screen.
The added kicker here is this: GAE strongly enforces a timeout on all of its frontend instances (60 seconds). Backend instances have much more relaxed constraints for timeouts, threading, etc. So it is obvious to me that AsynchronousRequests and CommandRequests should be routed to backend instances so that GAE timeouts do not become an issue with them.
However, if GAE is behaving badly, or if we're hitting peak traffic, or if my code just plain sucks, I have to account for the scenario where a SynchronousRequest is made (which would have to go through a timeout-regulated frontend instance) and will timeout unless my GAE server code does something fancy. I know there is a method in the GAE API that I can call to see how many milliseconds a request has before its about to timeout; but although the name of it escapes me right now, it's what this "fancy" code would be based off of. Let's call it public static long GAE.timeLeftOnRequestInMillis() for the sake of this question.
In this scenario, I'd like to detect that a SynchronousRequest is about to timeout, and somehow dynamically convert it into an AsynchronousRequest so that it doesn't time out. Perhaps this means sending an AboutToTimeoutResponse back to the client, and force the client to decide about whether to resend as an AsynchronousRequest or just fail. Or perhaps we can just transform the SynchronousRequest into an AsynchronousRequest and push it to a queue where a backend instance will consume it, process it and return a response. I don't have any preferences when it comes to implementation, so long as the request doesn't fail or timeout because the server couldn't handle it fast enough (because of GAE-imposed regulations).
So then, here is what I'm actually asking here:
How can I wrap a RequestFactory call inside SynchronousRequest, AsynchronousRequest and CommandRequest in such a way that the RequestFactory call behaves the way each of them is intended? In other words, so that the call either partially-blocks (synchronous), can be notified/updated at some point down the road (asynchronous), or can just fire-and-forget (command)?
How can I implement my requirement to let a SynchronousRequest bypass GAE's 60-second timeout and still get processed without failing?
Please note: timeout issues are easily circumvented by re-routing things to backend instances, but backends don't/can't scale. I need scalability here as well (that's primarily why I'm on GAE in the first place!) - so I need a solution that deals with scalable frontend instances and their timeouts. Thanks in advance!

If the computation that you want GAE to do is going to take longer than 60 seconds, then don't wait for the results to be computed before sending a response. According to your problem definition, there is no way to get around this. Instead, clients should submit work orders, and wait for a notification from the server when the results are ready. Requests would consist of work orders, which might look something like this:
class ComputeDigitsOfPiWorkOrder {
// parameters for the computation
int numberOfDigitsToCompute;
// Used by the GAE app to contact the requester when results are ready.
ClientId clientId;
}
This way, your GAE app can respond as soon as the work order is saved (e.g. in Task Queue), and doesn't have to wait until it actually finishes calculating a billion digits of pi before responding. Your GWT client then waits for the result using the Channel API.
In order to give some work orders higher priority, you can use multiple task queues. If you want Task Queue work to scale automatically, you'll want to use push queues. Implementing priority using push queues is a little tricky, but you can configure high priority queues to have faster feed rate.
You could replace Channel API with some other notification solution, but that would probably be the most straightforward.

Multithreading a jsp?

I'm new to jersey, jsp's and web application development in general so hopefully this isn't a silly question. I've got a jsp and currently when the user hits a button on it, it starts a HTTP request which takes about 5-10 minutes to return. Once it finishes they're redirected to another page.
I'm wondering, is it possible or even advisable that I multithread the application so that the heavy processing will start but the user get's redirected to the next .jsp right away. If multithreading is not possible is there another method that you would recommend for dealing with heavy processing in a web application?

A JSP is basically a Servlet (it's translated in a Java Servlet Class and compiled). Teoretically you can start a new thread in a servlet (and hence in a JSP, via scriptlet), but that's really not advised for multiple reasons.
It'd be better recommended to make an asynchronous HTTP call via ajax, then, once the call is done immediately show something else to the user, and when the call back returns display the results.

Rather than create a new thread each time it might be more efficient to have a worker thread which continually polls a shared queue. Using, for example, ArrayBlockingQueue you web request can simple add an object to the queue and return to the user, and your worker thread (or repeating scheduled job) can take care of the heavy weight processing.

Instead of waiting for process to complete in a JSP, you can create a TimerTask (or Quartz Job) and set it for immediate execution and redirect user to some other page. Have that Job store the result in some central place that can be accessed by another JSP (in case you want to pull result of Job later, may be through ajax) Doing so, you save yourself from managing threads manually (which is error prone), you get async functionality, user does not need to see the blank browser screen for around 5-10 minutes.

It is possible.
Create a thread, store its reference somewhere that is available everywhere (a static Map) and store its key (in the session, in the code of the JSP's answer).
Following calls can retrieve the thread and check its state/results.
Anyway, use with care:
a) You will need to control that old results are deleted. It is inevitable that sometimes the browser will close, so you need a Watchdog to clear data obviously no longer needed.
b) The user are not used to this kind of behavior. There is a serious risk that they will just "go back" and try to launch the thread again, and again, and again. Try to control it (ideally the id of the thread will be linked to the user, so as long as an older thread is active an user cannot launch another one).

If a REST web service call fails, should a message or event queue be used to retry later?

I'm building a web service with a RESTful interface (lets call it MY_API). This service relies on another RESTful webservice to handle certain aspects (calling it OTHER_API). I'd like to determine what types of best practices I should consider using to handle failures of OTHER_API.
Scenario
My UI is a single page javascript application. There are some fairly complex actions a user can take, which can easily take the user a minute or two to complete. When they are done, they click the SAVE button and MY_API is called to save the data.
MY_API has everything it needs to persist the information submitted by the user. However, there is an action that must take place that is handled by OTHER_API. For instance, OTHER_API might handle sending out an emails. Or perhaps it handles adding line items to my user's billing statement. In both cases, these are critical things than must be completed, but they don't have to happen right now, they just need to happen eventually.
If OTHER_API fails, I don't want to simply tell the user their action has failed, as they spent a lot of time doing it and this will make the experience less than optimal.
Questions
So should I create some sort of Message or Event Queue that can save these failed REST requests to OTHER_API and process them later?
Any advice or suggestions on techniques to go about saving REST requests for delayed processing?
Is there a recommended open source message queue solution that would work for this type of scenario with JSON-based REST web services? Java is preferred as my backend is written in it.
Are there other techniques I should consider?

Rather than approach this by focusing on the failure state, it'd be faster and more robust to recognize that these actions should be performed asynchronously and out-of-band from the request by the UI. You should indeed use a message/event/job queue, and just pop those jobs right onto that queue as quickly as possible, and respond to the original request as quickly as possible. Once you've done that, the asynchronous job can be performed independently of the original request, and at its own pace — including with retries as needed.
If you want your API to indicate that there are aspects of the request which have not completed, you can use the HTTP response Status Code 202 (Accepted).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.