I have been thinking about why JDBC is only blocking operation and why I can't set some listener to the hypothetical event handler onResultSetArrived(ResultSet rs). Why I have to block single one thread per each JDBC query.
After a while I've dive into Java Sockets (I suppose JDBC is build on top of them) and realised that there also isn't any event handling. Only option to provide non-blocking read is through the available() method but this is very inefficient as it has to be checked periodically in the loop.
As far as I'm aware, interruption is fundamental thing in PC. It goes down from the hardware up to the operating system. In the Java it can be implemented into event driven approach in read value from Socket.
Now, my question is am I missing something and there exists some workaround or current architecture in Java really is one thread per one blocking operation? And if yes isn't it inefficient?
In Java, you can have many threads. A thread is doing its stuff until it is blocked somewhere (typically, on a mutex or a I/O operation). Of course, this does not block other threads.
The fundamental scenario of multithreaded applications is that you use multiple threads when waiting for a blocked thread would introduce too much waiting. Definition of "too much" here depends entirely on you, but in general, this is how you achieve beter performance through better utilization of resources.
There are some limitations in how threads in Java work, however. Most, if not all of them are when the thread is blocked somewhere "outside" of Java such as in OS call or external (native) library. Theoretically, if native code blocks a thread, Java can not do anything about it. Normally, this should not be a problem unless the native code has a bug.
So in the case of a blocking JDBC response, you would create a new thread which would do other work while first thread is waiting for database to complete. Alternatively, you could make a thread just for doing JDBC. You could make it exactly like you want (with listeners etc.) except for limitations imposed by OS. So it's possible, but it's probably not provided out-of-the-box by JDBC drivers. There is a lot of infrastructure already in core Java which you might find useful (thread pools, workers, synchronized collections). But as with any multithreading, you need to be very careful with accessing data from different threads simultaneously.
Since Java 7, there is also support for non-blocking I/O (NIO). This is almost exactly what you are describing. I/O is offloaded to OS, so your operations return immediately and you get a callback when the operation is finished. However, not all libraries support NIO. For my work, I have never had a reason to use it, because I could always implement the same stuff with my threads at least as good.
If the question is whether the "current architecture in Java really is one thread per one blocking operation" and by "blocking operation" you mean "database operation" then the answer is no. Most database drivers available for Java currently are jdbc-based and do work that way. But there are usable alternatives (https://spring.io/blog/2016/11/28/going-reactive-with-spring-data) and more on the way (
https://blogs.oracle.com/java/jdbc-next:-a-new-asynchronous-api-for-connecting-to-a-database , https://dzone.com/articles/spring-5-webflux-and-jdbc-to-block-or-not-to-block). For how this works see How is ReactiveMongo implemented so that it is considered non-blocking?
For jdbc there are also ways to wrap the jdbc calls (Wrapping blocking I/O in project reactor , Spring webflux and reading from database ) and projects pursuing this approach (https://dzone.com/articles/myth-asynchronous-jdbc)
Related
I am trying to understand the core principles of non-blocking programming (and frameworks like project reactor). The main idea is to have "thread pool" with determined number of threads (executors) and tasks which are executed there. We should not have any blocked threads. In "user code" we just run something to execute and leave callback (what to do with the result). Out "user" thread is not blocked, right. But what if my task depends on some jdbc query. My task will request this query and then will be blocked waiting for the result, right? So, this thread is blocked.
But we avoid thread creating (which is expensive). Is it the core benefit of this style?
If my thread pool consists of 2 executors and both are blocked waiting for something, other tasks will not be executed, right? How to avoid it? Create more than 2 threads?
Threads are relatively costly system resources. For example, each thread needs memory for the call stack. How much this is depends on the operating system, but typically it's something like 1 or 2 MB. This means it's not a good idea to start thousands of threads - you'd waste 1 or 2 GB memory just on the call stacks of 1000 threads.
So, to do things more efficiently you want to limit the number of threads, for example using a thread pool to handle work. The thread pool makes it possible to manage the number of threads that are being used.
However, imagine that you'd have a thread pool with 10 threads, and then 10 requests come in. Each of your threads will be reserved to handle a request. While they are busy, you can't handle request #11 because there is no thread free. When you are using blocking I/O, then, even though all your 10 threads are doing nothing (waiting for I/O to complete), request #11 cannot be handled...
When you use non-blocking I/O, threads will never need to wait for I/O - so when the handling request #3 is suspended because it needs the result of an I/O operation, the thread that was handling it can temporarily switch to handling other requests.
So, with non-blocking I/O, you never have waiting threads and you are using system resources more efficiently.
This will only work if you are using non-blocking I/O from the front to the back of your system. If at the back-end you are using JDBC, which is a blocking API, then you'll loose the full benefit of non-blocking I/O.
Therefore, if you have a database at the back-end, this works best if you have a DB which supports non-blocking I/O. Some NoSQL databases like MongoDB support this, and for some relational databases there are special drivers / APIs available that support this. You won't be using JDBC in that case, because JDBC is an inherently blocking API.
Oracle is working on a new API for relational databases tentatively called
ADBA which will allow you to do non-blocking / async I/O with relational databases but it's not ready yet.
Project Reactor is an implementation of Reactive Streams specification. The specification overview can be found at ReactiveManifest. It's not just creating a set of threads and letting them do their jobs, It's the framework or the runtime (in this case ProjectReactor) that will organize your code in such a way that it'll presumably behave as nonblocking. Also, the whole system implementation has to be in this fashion otherwise you won't be benefited from the reactive streams.
If my thread pool consists of 2 executors and both are blocked waiting for something, other tasks will not be executed, right? How to avoid it? Create more than 2 threads?
The answer to this will be yes, and no. The framework may are may not create threads. Since the code will be interleaved among the threads, Since the non-blocking system are event-driven including the low-level operations (ex, libuv I/O), It's not necessary for a thread to wait for the completion of an I/O operation. Meanwhile, the thread will be executing something meaningful. The completion of the task will be notified and the dependent code can be executed by any of the available thread. The goal of such a system is to utilize the CPU to the fullest with limited resources(threads).
Taken from http://www.reactive-streams.org.
The main goal of Reactive Streams is to govern the exchange of stream data across an asynchronous boundary—think passing elements on to another thread or thread-pool—while ensuring that the receiving side is not forced to buffer arbitrary amounts of data. In other words, back pressure is an integral part of this model in order to allow the queues which mediate between threads to be bounded. The benefits of asynchronous processing would be negated if the communication of back pressure were synchronous (see also the Reactive Manifesto), therefore care has to be taken to mandate fully non-blocking and asynchronous behavior of all aspects of a Reactive Streams implementation.
It's the Reactor framework that enforces and help you in building a completely non-blocking system from the ground up.
I am dealing with below OutOfMemory exception in WAS 6.1.
Exception in thread "UnitHoldingsPolicySummary" java.lang.OutOfMemoryError: unable to create new native thread.
I have done a lot of research on this to prevent this. After Googling, I have found that, this happens when the Native memory gets exhausted due to creation of lots of threads concurrently.
Now, after analysing the below logs, we can figure out that, inside the application, the threads are created explicitely, which I read is a very very bad practice. (Can experts please confirm this?)
07/07/14 08:50:38:165 BST] 0000142c SystemErr R Exception in thread "xxxxxx" java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:574)
at com.fp.sv.controller.business.thread.xxxxxxxxxexecute(Unknown Source)
at com.fp.sv.controller.business.thread.xxxxxxxxx.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
I am more into WAS administration and doesn't possess much knowledge on Java and thread creation in Java. Now I need to discuss this with developer, but before that I want to be 100% confirmed that my findings are correct and developers should correct the code by not explicitely creating the threads.
What all things that I need to check on application server side before blaming this on code?
On solaris, I am firing the command pmap -x 9547|grep -i stack|wc -l to check how many threads are getting created on that instance of time. I could see during the 'OutOfMemory' issue, this number is very high.
Could you please confirm whether this command is the good way to checknumber of threads currently active?
Editing the question with my latest findings
Also, when this issue happens, at the same one of the MQ queue gets piled up as WAS doesn't pick up the messages from the queue. I could see below error in the application specific logs.
Non recoverable Exception detected whilst connecting to queue manager or response queue
Underlying reason = MQJE001: Completion Code 2, Reason 2102
Can this issue related to MQ as well?Which in turn causes OutOfMemory issue?
Regards,
Rahul
There are different possibilities of implementing a threading system for a virtual machine. The two extreme forms are:
Green threads: All Java Thread instances are managed within one native OS thread. This can cause problems if a method blocks within a native invocation what makes this implementation complex. In the end, implementers need to introduce renegade threads for holding native locks to overcome such limitations.
Native threads: Each Java Thread instance is backed by a native OS thread.
For the named limitations of green threads, all modern JVM implementations, including HotSpot, choose the later implementation. This implies that the OS needs to reserve some memory for each created thread. Also, there is some runtime overhead for creating such a thread as it needs direct interaction with the underlying OS. At some point, these costs accumulate and the OS refuses the creation of new threads to prevent the stability of your overall system.
Threads should therefore be pooled for resue. Object pooling is normally considered bad practice as many programers used it to ease the JVM's garbage collector. This is not longer useful as modern garbage collectors are optimized for handling short-living objects. Today, by pooling objects you might in contrary slow down your system. However, if an object is backed by costly native resources (as a Thread), pooling is still a recommended practice. Look into the ExecutorService for a canonical way of pooling threads in Java.
In general, consider that context switches of threads are expensive. You should not create a new thread for small tasks, this will slow your application down. Rather make your application less concurrent. You only have a number of cores which can work concurrently in the first place, creating more threads than your (non-virtual) cores will not improve runtime performance. Are you implementing some sort of divide-and-conquer algorithm? Look into Java's ForkJoinPool.
Yes, it's a bad practice. Normally, you don't manage threads inside a Java EE server. By "normally" I mean "while developing business applications".
According to http://www.oracle.com/technetwork/java/restrictions-142267.html:
Why is thread creation and management disallowed?
The EJB specification assigns to the EJB container the responsibility
for managing threads. Allowing enterprise bean instances to create and
manage threads would interfere with the container's ability to control
its components' lifecycle. Thread management is not a business
function, it is an implementation detail, and is typically complicated
and platform-specific. Letting the container manage threads relieves
the enterprise bean developer of dealing with threading issues.
Multithreaded applications are still possible, but control of
multithreading is located in the container, not in the enterprise
bean.
However, I don't think your logs demonstrate that threads are being created explicitly. If you want to be 100% sure, decompile the deployables and look at the code in those lines.
Also take a look at this:
"java.lang.OutOfMemoryError : unable to create new native Thread"
And this:
https://plumbr.eu/outofmemoryerror/unable-to-create-new-native-thread
Concerning the number of threads used by your app, I'd try to use a monitoring tool like JConsole, or VisualVm.
I wonder if there is a way to make asynchronous calls to a database?
For instance, imagine that I've a big request that take a very long time to process, I want to send the request and receive a notification when the request will return a value (by passing a Listener/callback or something). I don't want to block waiting for the database to answer.
I don't consider that using a pool of threads is a solution because it doesn't scale, in the case of heavy concurrent requests this will spawn a very large number of threads.
We are facing this kind of problem with network servers and we have found solutions by using select/poll/epoll system call to avoid having one thread per connection. I'm just wondering how to have a similar feature with database request?
Note:
I'm aware that using a FixedThreadPool may be a good work-around, but I'm surprised that nobody has developed a system really asynchronous (without the usage of extra thread).
** Update **
Because of the lack of real practical solutions, I decided to create a library (part of finagle) myself: finagle-mysql. It basically decodes/decodes mysql request/response, and use Finagle/Netty under the hood. It scales extremely well even with huge number of connections.
I don't understand how any of the proposed approaches that wrap JDBC calls in Actors, executors or anything else can help here - can someone clarify.
Surely the basic problem is that the JDBC operations block on socket IO. When it does this it blocks the Thread its running on - end of story. Whatever wrapping framework you choose to use its going to end up with one thread being kept busy/blocked per concurrent request.
If the underlying database drivers (MySql?) offers a means to intercept the socket creation (see SocketFactory) then I imagine it would be possible to build an async event driven database layer on top of the JDBC api but we'd have to encapsulate the whole JDBC behind an event driven facade, and that facade wouldn't look like JDBC (after it would be event driven). The database processing would happen async on a different thread to the caller, and you'd have to work out how to build a transaction manager that doesn't rely on thread affinity.
Something like the approach I mention would allow even a single background thread to process a load of concurrent JDBC exec's. In practice you'd probably run a pool of threads to make use of multiple cores.
(Of course I'm not commenting on the logic of the original question just the responses that imply that concurrency in a scenario with blocking socket IO is possible without the user of a selector pattern - simpler just to work out your typical JDBC concurrency and put in a connection pool of the right size).
Looks like MySql probably does something along the lines I'm suggesting ---
http://code.google.com/p/async-mysql-connector/wiki/UsageExample
It's impossible to make an asynchronous call to the database via JDBC, but you can make asynchronous calls to JDBC with Actors (e.g., actor makes calls to the DB via JDBC, and sends messages to the third parties, when the calls are over), or, if you like CPS, with pipelined futures (promises) (a good implementation is Scalaz Promises)
I don't consider that using a pool of threads is a solution because it doesn't scale, in the case of heavy concurrent requests this will spawn a very large number of threads.
Scala actors by default are event-based (not thread-based) - continuation scheduling allows creating millions of actors on a standard JVM setup.
If you're targeting Java, Akka Framework is an Actor model implementation that has a good API both for Java and Scala.
Aside from that, the synchronous nature of JDBC makes perfect sense to me. The cost of a database session is far higher than the cost of the Java thread being blocked (either in the fore- or background) and waiting for a response. If your queries run for so long that the capabilities of an executor service (or wrapping Actor/fork-join/promise concurrency frameworks) are not enough for you (and you're consuming too many threads) you should first of all think about your database load. Normally the response from a database comes back very fast, and an executor service backed with a fixed thread pool is a good enough solution. If you have too many long-running queries, you should consider upfront (pre-)processing - like nightly recalculation of the data or something like that.
Perhaps you could use a JMS asynchronous messaging system, which scales pretty well, IMHO:
Send a message to a Queue, where the subscribers will accept the message, and run the SQL process. Your main process will continue running and accepting or sending new requests.
When the SQL process ends, you can run the opposite way: send a message to a ResponseQueue with the result of the process, and a listener on the client side accept it and execute the callback code.
It looks like a new asynchronous jdbc API "JDBC next" is in the works.
See presentation here
You can download the API from here
Update:
This new jdbc API was later named ADBA. Then on September 2019 work was stopped see mailing list post.
R2DBC seems to achieve similar goals. It already supports most major databases (except oracle db). Note that this project is a library and not part of the jdk.
There is no direct support in JDBC but you have multiple options like MDB, Executors from Java 5.
"I don't consider that using a pool of threads is a solution because it doesn't scale, in the case of heavy concurrent requests this will spawn a very large number of threads."
I am curious why would a bounded pool of threads is not going to scale? It is a pool not thread-per-request to spawn a thread per each request. I have been using this for quite sometime on a heavy load webapp and we have not seen any issues so far.
As mentioned in other answers JDBC API is not Async by its nature.
However, if you can live with a subset of the operations and a different API there are solutions. One example is https://github.com/jasync-sql/jasync-sql that works for MySQL and PostgreSQL.
A solution is being developed to make reactive connectivity possible with standard relational databases.
People wanting to scale while retaining usage of relational databases
are cut off from reactive programming due to existing standards based
on blocking I/O. R2DBC specifies a new API that allows reactive code
that work efficiently with relational databases.
R2DBC is a specification designed from the ground up for reactive
programming with SQL databases defining a non-blocking SPI for
database driver implementors and client library authors. R2DBC drivers
implement fully the database wire protocol on top of a non-blocking
I/O layer.
R2DBC's WebSite
R2DBC's GitHub
Feature Matrix
Ajdbc project seems to answer this problem http://code.google.com/p/adbcj/
There is currently 2 experimental natively async drivers for mysql and postgresql.
An old question, but some more information. It is not possible to have JDBC issue asynchronous requests to the database itself, unless a vendor provides an extension to JDBC and a wrapper to handle JDBC with. That said, it is possible to wrap JDBC itself with a processing queue, and to implement logic that can process off the queue on one or more separate connections. One advantage of this for some types of calls is that the logic, if under heavy enough load, could convert the calls into JDBC batches for processing, which can speed up the logic significantly. This is most useful for calls where data is being inserted, and the actual result need only be logged if there is an error. A great example of this is if inserts are being performed to log user activity. The application won't care if the call completes immediately or a few seconds from now.
As a side note, one product on the market provides a policy driven approach to allowing asynchronous calls like those I described to be made asynchronously (http://www.heimdalldata.com/). Disclaimer: I am co-founder of this company. It allows regular expressions to be applied to data transformation requests such as insert/update/deletes for any JDBC data source, and will automatically batch them together for processing. When used with MySQL and the rewriteBatchedStatements option (MySQL and JDBC with rewriteBatchedStatements=true) this can significantly lower overall load on the database.
You have three options in my opinion:
Use a concurrent queue to distribute messages across a small and fixed number of threads. So if you have 1000 connections you will have 4 threads, not 1000 threads.
Do the database access on another node (i.e. another process or machine) and have your database client make asynchronous network calls to that node.
Implement a true distributed system through asynchronous messages. For that you will need an messaging queue such as CoralMQ or Tibco.
Diclaimer: I am one of the developers of CoralMQ.
The Java 5.0 executors might come handy.
You can have a fixed number of threads to handle long-running operations. And instead of Runnable you can use Callable, which return a result. The result is encapsulated in a Future<ReturnType> object, so you can get it when it is back.
Here is an outline about what an non-blocking jdbc api could look like from Oracle presented at JavaOne:
https://static.rainfocus.com/oracle/oow16/sess/1461693351182001EmRq/ppt/CONF1578%2020160916.pdf
So it seems that in the end, truly asynchronous JDBC calls will indeed be possible.
Just a crazy idea : you could use an Iteratee pattern over JBDC resultSet wrapped in some Future/Promise
Hammersmith does that for MongoDB.
I am just thinking ideas here. Why couldn't you have a pool of database connections with each one having a thread. Each thread has access to a queue. When you want to do a query that takes a long time, you can put on the queue and then one of threads will pick it up and handle it. You will never have too many threads because the number of your threads are bounded.
Edit: Or better yet, just a number of threads. When a thread sees something in a queue, it asks for a connection from the pool and handles it.
The commons-dbutils library has support for an AsyncQueryRunner which you provide an ExecutorService to and it returns a Future. Worth checking out as it's simple to use and ensure you won't leak resources.
If you are interested in asynchronous database APIs for Java you should know that there is a new initiative to come up with a set of standard APIs based on CompletableFuture and lambdas. There is also an implementation of these APIs over JDBC which can be used to practice these APIs:
https://github.com/oracle/oracle-db-examples/tree/master/java/AoJ
The JavaDoc is mentioned in the README of the github project.
I am intermediate in java. I am working with a company quite new to company, they have asked me to take session on "threading concepts in java with real example" as I don't have hands on on threading I can just prepare slides hoe threads can be implemented using Thread class or Runnable interface.
Can anybody please help me out with real scenario of threads and its implementation?
Thanks in advance
I would recommend Brien Goetz' "Java Concurrency In Practice". It talks about features added to the JDK above and beyond Thread and Runnable that will make your life better.
In 2016 it's a better idea to dig into the java.util.concurrency package and JDK 8 lambdas and parallel stream. No one should be trying to write multithreaded code with raw Thread unless they know what they're doing. We've been given better abstractions - use them.
I'd probably start with Sun's (now Oracle's) excellent documentation on concurrency.
As an example, you can create something like a banking application where you have a shared data structure (accounts), and you have multiple threads operating on the account (performing withdrawals and deposits).
The questions is too generic to answer but here are few details. Threads are used to run multiple things in parallel (in theory only it is based on lot of other factors like num of cpus, no of cores etc etc). Multi-threading has gone through lot of improvements in JDK since its inception. You can read the tutorial here: http://download.oracle.com/javase/tutorial/essential/concurrency/
A real life example can be a telecom application SCP (service control point), that receives lot of requests , (in order of 400 in a sec). The application that handles the request employs a master-slave configuration. There is a thread pool, each thread of which is waiting for a signal to run.
The master thread, receives the request, the request data is posted in some object, that the thread functions reads, and then the thread is signaled to run. When the processing is finished the worker thread is returned to the thread pool.
There can be a flag, that informs about the status of the thread for example, busy, idle, bad etc.
I have a C program that will be storing and retrieving alot of data in a Java store. I am putting alot of stress in my C program and multiple threads are adding and retrieving data from Java store. How will java handle such load? Because if there is only one main thread running JVM and handling all the requests from C, then it may become bottleneck for me. Will Java create multiple threads to handle the load, or is it programmers job to create and later on abort the threads?
My java store is just a Hashtable that stores the data from C, as is, against a key provided.
You definitely want to check the jni documentation about threading, which has information around attaching multiple native threads to the JVM. Also you should consider which Map implementation that you need to use. If accessing from multiple Hashtable will work, but may introduce a bottle neck as it is synchronized on every call, which will effectively mean a single thread reading or writing at a time. Consider the ConcurrentHashMap, which uses lock striping and providers better concurrent throughput.
A couple of things to consider if you are concerned about bottlenecks and latency.
On a heavily loaded system, locking can introduce a high overhead. If the size of you map and the frequency of write allows, consider using an immutable map and perform a copy on write approach, where a single thread will handle writes by making updates to a copy of the map and replacing the original with the new version (make sure the reference is a volatile variable). This will allow reads to occur without blocking.
Calling from C to Java via JNI will probably become a bottle neck too, its not as fast as calling in the other direction (Java to C). You can pass Direct ByteBuffers through to Java that contain references to the C data structures and allow Java to call back down to C via the Direct ByteBuffer.
Plain Java requires that you write your own threading.
If you are communicating to java via web services it's likely that the web container will manage threads for you.
I guess you are using JNI, so then the situation is potentially more complex. Depending upon exactly how you are doing your JNI calls you can get at multiple threads in the JVM.
I've got to ask ... JNI is pretty gnarly and error prone, all too easy to bring down the whole process and get all manner of mysterious errors. Are there not C libraries containing a HashTable you could use? Or even write one, it's got to be less work than doing JNI.
I think this depends on the java code's implementation. If it proves not to thread, here's a potentially cleaner alternative to messy JNI:
Create a Java daemon process that communicates with your store, which INTERNALLY is threaded on requests, to guarantee efficient load handling. Use a single ExecutorService created by java.util.concurrent.Executors to service a work queue of store/retrieve operations. Each store/retrieve method call submits a Callable to the work queue and waits for it to be run. The ExecutorService will automagically queue and multithread the store/retrieval operations. This whole thing should be less than 100 lines of code, aside from communications with the C program.
You can communicate with this Java daemon from C using inter-process communication techniques (probably a socket), which would avoid JNI and let one Java daemon thread service numerous instances of the C program.
Alternately, you could use JNI to call the basic store/retrieve operations on your daemon. Same as currently, except the java daemon can decorate methods to provide caching, synchronization, and all sorts of fancy goodies associated with threading.