I'm working in a Java EE application and I want that some WebServices are executed in parallel.
I would like to know the pros and cons of 2 different approaches:
Use JMS queues and MDBs, so each message I put in the queue would be executed in parallel. This way the application part that put the message into the queue would have a while, that waits the MDBs to response in a RS Queue.
Use the java concurrent API (Future / Callable).
ADDED
This is what the application needs to do:
The application already does it via an MDB, but I was thinking about a refactoring.
TODAY'S SCENARIO:
//CALLER CLASS
FOREACH INTEGRATION
PUT MESSAGE INTO A QUEUE AND STORE AN ARRAY OF CORRELATION_IDs
END
THREAD.SLEEP(X) // SOMETIME FOR INTEGRATION TO FINISH
WHILE (true){
GET RESPONSE FROM THE RESPONSE QUEUE FOR EACH INTEGRATION USING THE CORRELATION PREVIOUSLY STORED
}
//MDB CLASS
HAS A HUGE SWITCH CASE THAT PROCESS EACH INTEGRATION
RETURN THE RESULT INTO THE RESPONSE QUEUE;
Questions:
Is it ok to use the concurrent API in java? In my opinion using the concurrent API will eliminate a layer of failure (JMS).
My deployment environment is Websphere. Is it a good practice to create your own threads with the concurrent java API.
Thanks in advance
Whatever solution you go with, you will eventually need to cope with a burst of traffic. The JMS/MDB the burst is controlled by the queue effectively. Also a point to consider is that the queue can be made persistence, so it will survive a server restart. Also a queue can be distributed across many servers, giving you horizontal scalability.
The thread approach is of course quicker to develop, test and deploy. However, I would consider using a BlockingQueue so that your threads do not run amock.
jms pros: you can have persistence, you can connect to existing infrastructure
jms cons: seems to heavy to be used only as a dispatcher
manual concurrency cons: well, it's manual. and parallel programming is difficult. some webservers (especially clouds) may forbid to create your own threads
not sure what exactly you want to do but webserver by default processes requests in parallel, so maybe you don't need anything else?
Related
I'm learning about Vertx and it's ecosysteme, firstly i learned about Event loop, and the concept is really nice to me.
But since Servlet 3.1 we can use async support in JAVA based servers.
I'm using Spring and it's new class named deferredresult which can take thread from tomcat, give execution of logic to thread from executor thread pool that make thread from tomcat free to handle another requests and then when it done return response.
In event loop all blocking calls should be done by worker vertx, the concept is absolute the same, you give a thread to blocking call and provide callback when task is done event loop execute callback and return response.
These concepts looks really similar to me.
Maybe i miss something but what is the difference between these concepts?
Thread pool based web servers use worker threads as the primary execution context for user requests. When you develop a Spring or JavaEE application, you call a lot of blocking code (JDBC, Hibernate, JAX-RS client, ...etc).
Then the servlet 3.1 async API was added to solve problems like long polling (if all your worker threads are waiting, you cannot handle requests any more).
Event loop based systems (Vert.x, Node) however are built to handle a lot of user requests with a single thread. Usually you combine them with non blocking database drivers or web clients.
It's a very powerful model (less thread migrations, warm caches, ...etc). But you must not block the event loop or you can't handle events any more.
In an ideal world you would use only non blocking libraries but reality is many Java libraries are not and we shouldn't just throw this legacy away. As a workaround, Vert.x provides a way to offload blocking code execution to a worker pool.
I hope this clarifies a bit and shows that, beyond the similarities, the use cases are different.
In Spring, you can let controllers return a Callable instead of T, which will immediately release the request processing thread and compute the result in a MvcAsync Thread managed by the WebAsyncManager. You just need to wrap the controller method content in a return () -> {... return result; };. Very easy!
But what is the point? What is the difference between
a) having 500 request processing threads and letting them do all the work and
b) having just a few request processing threads and executing all requests in Callables with a concurrencyLimit of 500?
The second option b) actually looks worse to me, since there is overhead involved in managing the whole MvcAsync magic.
I get how you can harvest #Async methods to execute two methods at once and return a result once both finished, but I obviously didn't understand Callable controller methods.
Suppose you have a Tomcat server that has 10 threads listening for client requests. If you have a client that invokes an endpoint that takes 5 seconds to respond, that client holds that thread for those 5 seconds. Add a few concurrent clients and you will soon run out of threads during those 5 seconds.
The situation is even worse, because during most of those 5 seconds your request is doing mostly I/O, which means you just block your thread to do nothing but waiting.
So, the ability of Spring to use Callable, CompletableFuture or ListenableFuture as the return types of controllers is precisely to allow programmers to overcome this kind of problem to a certain extend.
Fundamentally, just returning one of these types is only going to release the Web Server thread making it available to be used by another client. So you get to attend more clients in the same amount of time, However that, by itself, may not be enough to implement a non-blocking IO (aka NIO) API.
Most of these features come from the core functionality offered by Servlet API and Servlet Async IO, which Spring should probably use under the hood. You may want to take a look at the following interesting videos that helped me understand this from the ground up:
Scale your Web Applications with Servlet 3.1 Async I/O, Part 1
Scale your Web Applications with Servlet 3.1 Async I/O, Part 2
Into the Wild with Servlet Async IO
Those videos explain the idea behind Servlet Async I/O and the final goal of implementing NIO Web apps as well.
The holy grail here is to reach a point in which the threads in your thread pool are never blocked waiting for some I/O to happen. They are either doing some CPU bound work, or they're back in the thread pool where they can be used by some other client. When you do I/O you don't introduce wait, you register some form of callback that will be used to tell you when the results are ready, and in the meantime you can use your valuable CPU cores to work on something else. If you think it over, a Callable, a CompletableFuture or a ListenableFuture are that sort of callback objects that Spring infrastructure is using under the hood to invoke their functionality to attend a request in a separate thread.
This increases your throughput, since you can attend more clients concurrently, simply by optimizing the use of your valuable CPU resources, particularly if you do it in a NIO way, since as you can imagine, just moving the request to another thread, although beneficial (since you free a valuable Tomcat thread), would still be blocking and therefore, you'd be just moving your problem to another thread pool.
I believe this fundamental principle is also behind a good part of the work that the Spring team is currently doing in Project Reactor since in order to leverage the power of this type of features you need to introduce asynchronous programming in your APIs and that is hard to do.
That's also the reason for the proliferation of frameworks like Netty, RxJava, Reactive Streams Initiative and the Project Reactor. They all are seeking to promote this type of optimization and programming model.
There is also an interesting movement of new frameworks that leverage this powerful features and are trying to compete with or even complement Spring yet limited functionality in that area. I'm talking of interesting projects like Vert.x and Ratpack and now that we're at it, this feature is one of the major selling points of Node.js as well.
I'm not really understanding the dyno and worker process model of Heroku as it relates to a single process but multi-threaded Java-based server.
For example: How do I know (for a single dyno) how many processors are available for my background threads? Do I need to use something like RabbitMQ and create a separate process (app) for each background processing task and communicate between the server and these? Seems a little overkill for some Scheduled Tasks using Thread Cached Executors. Should all Futures be changed to inter-process Futures?
I guess it comes down to this question. Can I no longer write a multi-threaded server and scale the processors available to my server process in order to accommodate my thread activity? Or do I need to refactor my architecture to use separate processes for concurrency? If the former, do I need workers or just multiple dynos?
Thanks.
Heroku supports multiple concurrency models, so it's really up to you how you would like to architect your application. You have access to the full Java stack, so if something makes more sense to just be run as multiple threads in your web processes, you can definitely do that, or you can always enqueue jobs on something like RabbitMQ or Redis and process them on separate worker dynos. Multithreading is simpler and makes sense if the amount of work is light and proportional to your web requests because it will be scaled along with the web dynos; however, if the work is large, not proportional, and/or needs to be scaled independently, then breaking it out into a separate process would be better.
Heroku was originally just a Ruby platform, which does not have the same threading capabilities as Java, so the use of separate worker dynos is more important for Ruby and this is reflected in some of the documentation and examples out there, which might have led to your confusion. Luckily, with Java you have more options available to you and can use what's best for the job at hand.
I wonder if there is a way to make asynchronous calls to a database?
For instance, imagine that I've a big request that take a very long time to process, I want to send the request and receive a notification when the request will return a value (by passing a Listener/callback or something). I don't want to block waiting for the database to answer.
I don't consider that using a pool of threads is a solution because it doesn't scale, in the case of heavy concurrent requests this will spawn a very large number of threads.
We are facing this kind of problem with network servers and we have found solutions by using select/poll/epoll system call to avoid having one thread per connection. I'm just wondering how to have a similar feature with database request?
Note:
I'm aware that using a FixedThreadPool may be a good work-around, but I'm surprised that nobody has developed a system really asynchronous (without the usage of extra thread).
** Update **
Because of the lack of real practical solutions, I decided to create a library (part of finagle) myself: finagle-mysql. It basically decodes/decodes mysql request/response, and use Finagle/Netty under the hood. It scales extremely well even with huge number of connections.
I don't understand how any of the proposed approaches that wrap JDBC calls in Actors, executors or anything else can help here - can someone clarify.
Surely the basic problem is that the JDBC operations block on socket IO. When it does this it blocks the Thread its running on - end of story. Whatever wrapping framework you choose to use its going to end up with one thread being kept busy/blocked per concurrent request.
If the underlying database drivers (MySql?) offers a means to intercept the socket creation (see SocketFactory) then I imagine it would be possible to build an async event driven database layer on top of the JDBC api but we'd have to encapsulate the whole JDBC behind an event driven facade, and that facade wouldn't look like JDBC (after it would be event driven). The database processing would happen async on a different thread to the caller, and you'd have to work out how to build a transaction manager that doesn't rely on thread affinity.
Something like the approach I mention would allow even a single background thread to process a load of concurrent JDBC exec's. In practice you'd probably run a pool of threads to make use of multiple cores.
(Of course I'm not commenting on the logic of the original question just the responses that imply that concurrency in a scenario with blocking socket IO is possible without the user of a selector pattern - simpler just to work out your typical JDBC concurrency and put in a connection pool of the right size).
Looks like MySql probably does something along the lines I'm suggesting ---
http://code.google.com/p/async-mysql-connector/wiki/UsageExample
It's impossible to make an asynchronous call to the database via JDBC, but you can make asynchronous calls to JDBC with Actors (e.g., actor makes calls to the DB via JDBC, and sends messages to the third parties, when the calls are over), or, if you like CPS, with pipelined futures (promises) (a good implementation is Scalaz Promises)
I don't consider that using a pool of threads is a solution because it doesn't scale, in the case of heavy concurrent requests this will spawn a very large number of threads.
Scala actors by default are event-based (not thread-based) - continuation scheduling allows creating millions of actors on a standard JVM setup.
If you're targeting Java, Akka Framework is an Actor model implementation that has a good API both for Java and Scala.
Aside from that, the synchronous nature of JDBC makes perfect sense to me. The cost of a database session is far higher than the cost of the Java thread being blocked (either in the fore- or background) and waiting for a response. If your queries run for so long that the capabilities of an executor service (or wrapping Actor/fork-join/promise concurrency frameworks) are not enough for you (and you're consuming too many threads) you should first of all think about your database load. Normally the response from a database comes back very fast, and an executor service backed with a fixed thread pool is a good enough solution. If you have too many long-running queries, you should consider upfront (pre-)processing - like nightly recalculation of the data or something like that.
Perhaps you could use a JMS asynchronous messaging system, which scales pretty well, IMHO:
Send a message to a Queue, where the subscribers will accept the message, and run the SQL process. Your main process will continue running and accepting or sending new requests.
When the SQL process ends, you can run the opposite way: send a message to a ResponseQueue with the result of the process, and a listener on the client side accept it and execute the callback code.
It looks like a new asynchronous jdbc API "JDBC next" is in the works.
See presentation here
You can download the API from here
Update:
This new jdbc API was later named ADBA. Then on September 2019 work was stopped see mailing list post.
R2DBC seems to achieve similar goals. It already supports most major databases (except oracle db). Note that this project is a library and not part of the jdk.
There is no direct support in JDBC but you have multiple options like MDB, Executors from Java 5.
"I don't consider that using a pool of threads is a solution because it doesn't scale, in the case of heavy concurrent requests this will spawn a very large number of threads."
I am curious why would a bounded pool of threads is not going to scale? It is a pool not thread-per-request to spawn a thread per each request. I have been using this for quite sometime on a heavy load webapp and we have not seen any issues so far.
As mentioned in other answers JDBC API is not Async by its nature.
However, if you can live with a subset of the operations and a different API there are solutions. One example is https://github.com/jasync-sql/jasync-sql that works for MySQL and PostgreSQL.
A solution is being developed to make reactive connectivity possible with standard relational databases.
People wanting to scale while retaining usage of relational databases
are cut off from reactive programming due to existing standards based
on blocking I/O. R2DBC specifies a new API that allows reactive code
that work efficiently with relational databases.
R2DBC is a specification designed from the ground up for reactive
programming with SQL databases defining a non-blocking SPI for
database driver implementors and client library authors. R2DBC drivers
implement fully the database wire protocol on top of a non-blocking
I/O layer.
R2DBC's WebSite
R2DBC's GitHub
Feature Matrix
Ajdbc project seems to answer this problem http://code.google.com/p/adbcj/
There is currently 2 experimental natively async drivers for mysql and postgresql.
An old question, but some more information. It is not possible to have JDBC issue asynchronous requests to the database itself, unless a vendor provides an extension to JDBC and a wrapper to handle JDBC with. That said, it is possible to wrap JDBC itself with a processing queue, and to implement logic that can process off the queue on one or more separate connections. One advantage of this for some types of calls is that the logic, if under heavy enough load, could convert the calls into JDBC batches for processing, which can speed up the logic significantly. This is most useful for calls where data is being inserted, and the actual result need only be logged if there is an error. A great example of this is if inserts are being performed to log user activity. The application won't care if the call completes immediately or a few seconds from now.
As a side note, one product on the market provides a policy driven approach to allowing asynchronous calls like those I described to be made asynchronously (http://www.heimdalldata.com/). Disclaimer: I am co-founder of this company. It allows regular expressions to be applied to data transformation requests such as insert/update/deletes for any JDBC data source, and will automatically batch them together for processing. When used with MySQL and the rewriteBatchedStatements option (MySQL and JDBC with rewriteBatchedStatements=true) this can significantly lower overall load on the database.
You have three options in my opinion:
Use a concurrent queue to distribute messages across a small and fixed number of threads. So if you have 1000 connections you will have 4 threads, not 1000 threads.
Do the database access on another node (i.e. another process or machine) and have your database client make asynchronous network calls to that node.
Implement a true distributed system through asynchronous messages. For that you will need an messaging queue such as CoralMQ or Tibco.
Diclaimer: I am one of the developers of CoralMQ.
The Java 5.0 executors might come handy.
You can have a fixed number of threads to handle long-running operations. And instead of Runnable you can use Callable, which return a result. The result is encapsulated in a Future<ReturnType> object, so you can get it when it is back.
Here is an outline about what an non-blocking jdbc api could look like from Oracle presented at JavaOne:
https://static.rainfocus.com/oracle/oow16/sess/1461693351182001EmRq/ppt/CONF1578%2020160916.pdf
So it seems that in the end, truly asynchronous JDBC calls will indeed be possible.
Just a crazy idea : you could use an Iteratee pattern over JBDC resultSet wrapped in some Future/Promise
Hammersmith does that for MongoDB.
I am just thinking ideas here. Why couldn't you have a pool of database connections with each one having a thread. Each thread has access to a queue. When you want to do a query that takes a long time, you can put on the queue and then one of threads will pick it up and handle it. You will never have too many threads because the number of your threads are bounded.
Edit: Or better yet, just a number of threads. When a thread sees something in a queue, it asks for a connection from the pool and handles it.
The commons-dbutils library has support for an AsyncQueryRunner which you provide an ExecutorService to and it returns a Future. Worth checking out as it's simple to use and ensure you won't leak resources.
If you are interested in asynchronous database APIs for Java you should know that there is a new initiative to come up with a set of standard APIs based on CompletableFuture and lambdas. There is also an implementation of these APIs over JDBC which can be used to practice these APIs:
https://github.com/oracle/oracle-db-examples/tree/master/java/AoJ
The JavaDoc is mentioned in the README of the github project.
I've been asked to design and implement a system for receiving a high volume of automated sensor data from a large number of devices. This data will be produced at regular intervals and sent to the server as xml in an http post. The devices will keep resending the same data if they don't receive a specific acknowledgment from the server. Some potentially heavy duty processing of this data will need to occur before it's inserted to a number of tables in the main database via a transaction, and additionally some data points will need to be enqueued to be re-directed to other external urls.
I'm planning on using a Java application server (leaning towards GlassFish) with a servlet to receive the incoming data. I'd like to implement some kind of queuing mechanism to store the data temporarily so that the response back to the sensor isn't dependent on all the intermediate processing. Separate independent queues are also a requirement for the data re-direction piece. After doing some research the two main options seem to be:
1) Install a database on the app server and use tables for the various queues. The queues would be processed by a Java application, either running in the app server or standalone as it's own service.
2) Use a database backed JMS solution to implement the queuing.
I'm not that familiar with JMS but from what I've read it seems to be the better solution in this case. The primary requirement is that no sensor data ever be lost or dropped from the queue before being processed and that it be processed more or less sequentially. We'd also like to make it easy to halt the processing of some of the queues at certain times but still have them accumulate data and for these messages to never automatically expire.
With strategy 1 it's obvious to me how to meet these requirements but it may be less robust and scalable, and more complex to develop than strategy 2, since I'll need to write my own multi-threaded code to handle the various independent queues. I'm wondering what the potential pitfalls could be in using JMS queues for this purpose since I've never worked with them before.
Data integrity is a big issue so I need to make sure JMS can guarantee no data loss in the event of a server reboot, power outage, or if the queue gets very large for some reason. For instance could a problem completing transactions to the main database for a period of time potentially cause the JVM to run out of memory, crash, and lose all accumulated data? (This would be the nightmare scenario).
Also, I was wondering if there would be any way to pause the JMS queue processing via an app server admin tool or to easily see what's in the queue (I would be enqueuing an object which would be the message xml plus some other data, including timestamp received, etc.) I've read a few posts on here that deal with related issues but wanted to get some direct feedback. Basically I'd like to know of instances (if any) where JMS is not an appropriate queuing solution and if this is one of those cases. Any advice is greatly appreciated.
Kaleb's answer talks about the benefits of JMS quite eloquently, but since you're asking about pitfalls, here's what I can think of.
Not all JMS implementations are equal. In theory you can use whatever implementation suits your needs, but unless you're prepared to do some serious load testing and failure condition testing, you can't know that a particular implementation isn't going to fail under your particular use case.
Most JMS use a transactional datastore like a relational database as their back end. That means that rather than writing directly to whatever datastore you're familiar with, you have to rely on the JMS implementation's extra layer between you and that stored messages.
While swapping JMS implementations to find the one that perfectly fits your needs may seem like a simple endeavor because of the homogeneous JMS API, the critical features for failure handling, JMS server monitoring, and all the other cool stuff that exists above and beyond messaging is going to be a hassle to deal with if you do change your implementation.
That said, I think you'd be crazy to write to the DB yourself instead of going with JMS. On the first point, ActiveMQ is a venerable JMS server used in many enterprise environments. On the second point, the fact is you'd just end up writing that extra layer yourself in order to implement messaging, and your code won't have the benefit of thousands of eyes (or a set of paid developers who's sole job it is to respond to customers and make sure the JMS implementation is solid). On the third point, well the same ends up being true of your backend datastore. Use JMS, you'll save yourself trouble in the long run.
If you want to go the JMS route, a standalone JMS-compatible message broker (separate from your app server) would be a good choice. Message brokers range from free open-source (like ActiveMQ at http://activemq.apache.org/ or OpenMQ at https://mq.dev.java.net/), to large-scale commercial solutions (IBM's WebSphere MQ at http://www-01.ibm.com/software/integration/wmq/ is one of the largest).
Message brokers offer guaranteed delivery (provided the server's up and listening), and you can do quite a bit to ensure that the system is fail-safe including integrated backup broker servers and instant power backup. Broker queues can eventually run out of room if your app server isn't picking up the messages, but you can assign huge queue depth (100's of GB) and have the server send alerts if the messages aren't getting processed and the queue reaches a certain percentage.
Your Java app would then run on a different server entirely, and would connect to the broker and pull messages off of the queue as fast as possible. If the app server crashes or stops picking up messages for any other reason, the broker would just keep all messages in that queue until the app server begins picking them up again.
You will be wanting to implement a poison message queue in your implementation - this is the place that messages unable to be processed after some number of retries will arrive.
You will probably need to write some code that can examine the messages in that queue and re-send them to the appropriate destination after fixing whatever is causing them to fail.
If sequence of message processing is important, a message ending up in the poison queue could mean all processing is halted until that message is corrected.
As far as fault tolerance goes, you can have multiple instances of the consuming services subscribe to the same queue or topic, providing an ability to continue processing even if one or more instances goes down.
Finally, have a watchdog process that pings the various consumers on your message queue, and if one doesn't respond, have it send a message that results in a new instance being started. In this way, your message processing environment can be somewhat self regulating.