Initialising your app using multithreading in Java

Initialising your app using multithreading in Java - java

I'm new to the whole multithreading in Java thing and I'm unable to get my head around something.
I'm trying to initialise my app properly using multithreading.
For example, I'm using a database (mongodb to be exact) and need to initialise a connection to it, then connect and check a collection exists and then read from it.
Once I have that, I will eventually have a list view (JavaFX) that will display the information taken from the database.
Ideally, whilst this is going on, I'd like other things to be done (in true mutlithreading style).
Would I need to put each submitted task into a queue of sorts and then iterate through, wait if they're not ready and then remove them once they're finished?
I've always done this singlethreaded and it's always been slow
Cheers

Use an asynchronous MongoDB client like the high level ones mentioned here
MongoDB RxJava Driver (An RxJava implementation of the MongoDB Driver)
MongoDB Reactive Streams Java Driver (A Reactive Streams implementation for the JVM)

For such purpose you could add connection pool for your application.
Everything depends on configuration for your project.
The best is to make it configurable. When load is low just have ~4 connections (min) at pool. If load is increased it could go up till 20 (max).

You need to coordinate partial tasks. If you represent tasks with threads, coordination can be done using semaphores and/or blocking queues.
More effective way is represent tasks as dataflow actors - they consume less memory, and you can generate all the tasks at the very start.

Related

What is the fastest way to connect two Java processes on the same physical machine?

I have a large in-memory cache inside my Java application, which is being filled after application starts. It's makes redeployments extremelly expensive and slowers development process.
To solve the problem I'd like to outsource the cache to a separate Java process. What is the fastest way to connect two Java processes on Linux?

As a fastest solution I'd recommend you to use Hazelcast. They support distributed maps. You can define simple, 2 nodes cluster, so when both your processes are up the date will be shared, when one of them is going down the data will be still in the memory of dedicated process, when the main process is up again the data will be shared again.
The only thing that you have to change in your code is the line where you create instance of your map. You have to use the Hazelcast API instead of new HashMap<>().

Google App Engine Java Program concurrency

In order to improve the execution speed of a Java program running in Google App Engine, can I create additional Java threads during the runtime to make use of idle machines in the data center?
I've found conflicting data thus far.

If your primary concern is to improve the execution time, take a look at Memcache and Tasks. They can be used to reduce or avoid the latency of reading from or writing to the Datastore or other storage options, fetching URLs, sending emails, etc. If you do a lot of difficult computations that can run in parallel, look at MapReduce API.
Once you remove all the delays from your program, there will be no reason to use multiple threads within a single request.
Note that App Engine instances can use multithreading to execute multiple requests at the same time, so they tend to use allocated resources efficiently. To enable it, see:
https://developers.google.com/appengine/docs/java/config/appconfig#Java_appengine_web_xml_Using_concurrent_requests

If you have a problem that calls for a multithreaded solution, you can use threads (as described on the link that you included in your question).
However, based on your reasoning ("to make use of idle machines in the datacenter"), it seems like you're misguided. You should not use threads for that reason. You use the machines hours that you pay for and not more. The only time you will have an idle machine is if you tell App Engine to keep around an extra idle machine so that it doesn't have to start up an extra machine your app gets a big usage spike.
Most of the time, unless you are truly doing parallel computation, you won't need to use multiple threads in App Engine. For instance, the datastore has an asynchronous API so that you can do multiple datastore operations in parallel without having to deal with threads yourself.
Does that make sense?

Java Cache Data before entering to database

we have to implement a monitoring module to existing web application. The purpose of the application is to record time taken at each method executed at run time and saving to 2 monitoring tables. As there are many method invocations happening in the application, we can't save to table every time a method is executed.We have used spring aop to intercept time.We used redis to cache data, and in every ten minutes, take the data from cache and save in the database.But redis seems to be a troublesome ideas as it keep making new connections and this give the application a nightmare.Is there any alternative way to do this.We considered about writing to a file and taking data from the file periodically.But that also seemed to be a resource consuming solutions.

I would suggest storing everything in your Java application. If all you want is a few stats, you can keep track of this information in a HashMap and write it every 10 minutes to the database from another thread. Min/Max/Avg time won't take much memory, and you can reset stats once it's written to the database.
If that doesn't suite your needs for some reason, Redis should be fast, and can even help you compute your statistics. I'd setup a connection pool to prevent it from making a new network connection each time. Then, I'd use asynchronous writes (pipelining) in your Redis client to speed up writes. Jedis and JRedis both support this.

multithreaded web application in java

I am doing a web application which has Java as a front end and shell script as a back end. The concept is I need to process multiple files in the back end. I will get the date range from the user (for example from July 1st-8th) and for each day process around 100 files. So in total I have 800 files to process.
I will get these details from JSP and delegate a background call to shell script and get back the results and display the same to the user.
Now I did all these in a sequential approach - by which I mean without threads. So there is only one main thread that executes and the user has to wait till 800 files are processed sequentially. However this is really slow. And because of this I am thinking of going for threads. Since I am a beginner of threads, I read a some stuffs regarding this and I have come up with the following idea:
As I read threads work have to be split .. I thought of splitting the
8 day work to 4 threads where each thread would perform 2 day work
I would like to know whether I am following a correct approach and my major concerns are:
Is it recommended to spawn multiple threads from a web application
Whether or not this is a good approach
Some guidance of how to proceed with this. An example instance would be great. Thank you.

Yes, you can run the long processing job in multi-threaded or in any high performance environment. You should also you Servlet 3.0 Asynchronous Request Processing to suspend the request thread and wait till the Long processing task is done.

Yes, there's nothing wrong with spawning multiple threads from a web application. In fact, if you're running a Servlet container (which you most likely are since you're using Java), it's already spawning multiple threads for you. In general a Servlet container will automatically spawn a new thread (or reuse one out of a pool) to handle each request it receives.
Your approach is fine, thought you'll want to fine-tune the number of threads to something that is suitable given the hardware configuration of your system and the amount of concurrent load on your web service. Also note that while spinning up a bunch of threads will reduce the total amount of time needed to process all the data, it will still leave a potentially large chunk of time before any data is ready to go back to the user. So you might get a better result by doing smaller work units sequentially, and posting each batch of results to the user-interface as soon as it is ready. Then it will still take a long while for the user to have all the data, but that can start viewing at least a portion of it almost immediately.

The way to improve user experience is not by parallelizing at Servlet level on 100000 threads but rather to provide incremental rendering of the view. First of all it would be useful to separate your application in multiple layers, according to the MVC pattern for example.
Saying that, you will have to look on how
Create a service that is able to return partial answers and a last answer, meaning that all available data has been returned. Each of this answers can be computed in parallel to improve performance.
Fill a web page incrementally, tipically by calling back this service which returns a JSON string you use to add data to the DOM. Every time you get an answer, if this is a partial answer, you call again the service providing the previous sequence number.
If you look to Liligo to understand, you will see how this is works. The technique I described is known as polling, but there are others technique to obtain similar asynchronous results at UI Level. In general, you don't want to work directly with the Servlet API, which is a very low level API,but rather use a reasonable framework or abstraction for that.
If you want a warm advice, you should have a look to the Play! framework http://www.playframework.org/documentation/2.0.2/JavaStream HTTP streaming.

Create threads in a web application is not a good solution. It is a bad design because normally it would be the container (web server) who is charged with that activity. So I think you have to find another solution.
I suggest you putting the shell scripts in cron, scheduled to run each minute, and to "activate" them you can touch files that act as semaphores. At each run the scripts verify if the web application touched the semaphore file, if so they read the date interval from those files and then start to process.

Is asynchronous jdbc call possible?

I wonder if there is a way to make asynchronous calls to a database?
For instance, imagine that I've a big request that take a very long time to process, I want to send the request and receive a notification when the request will return a value (by passing a Listener/callback or something). I don't want to block waiting for the database to answer.
I don't consider that using a pool of threads is a solution because it doesn't scale, in the case of heavy concurrent requests this will spawn a very large number of threads.
We are facing this kind of problem with network servers and we have found solutions by using select/poll/epoll system call to avoid having one thread per connection. I'm just wondering how to have a similar feature with database request?
Note:
I'm aware that using a FixedThreadPool may be a good work-around, but I'm surprised that nobody has developed a system really asynchronous (without the usage of extra thread).
** Update **
Because of the lack of real practical solutions, I decided to create a library (part of finagle) myself: finagle-mysql. It basically decodes/decodes mysql request/response, and use Finagle/Netty under the hood. It scales extremely well even with huge number of connections.

I don't understand how any of the proposed approaches that wrap JDBC calls in Actors, executors or anything else can help here - can someone clarify.
Surely the basic problem is that the JDBC operations block on socket IO. When it does this it blocks the Thread its running on - end of story. Whatever wrapping framework you choose to use its going to end up with one thread being kept busy/blocked per concurrent request.
If the underlying database drivers (MySql?) offers a means to intercept the socket creation (see SocketFactory) then I imagine it would be possible to build an async event driven database layer on top of the JDBC api but we'd have to encapsulate the whole JDBC behind an event driven facade, and that facade wouldn't look like JDBC (after it would be event driven). The database processing would happen async on a different thread to the caller, and you'd have to work out how to build a transaction manager that doesn't rely on thread affinity.
Something like the approach I mention would allow even a single background thread to process a load of concurrent JDBC exec's. In practice you'd probably run a pool of threads to make use of multiple cores.
(Of course I'm not commenting on the logic of the original question just the responses that imply that concurrency in a scenario with blocking socket IO is possible without the user of a selector pattern - simpler just to work out your typical JDBC concurrency and put in a connection pool of the right size).
Looks like MySql probably does something along the lines I'm suggesting ---
http://code.google.com/p/async-mysql-connector/wiki/UsageExample

It's impossible to make an asynchronous call to the database via JDBC, but you can make asynchronous calls to JDBC with Actors (e.g., actor makes calls to the DB via JDBC, and sends messages to the third parties, when the calls are over), or, if you like CPS, with pipelined futures (promises) (a good implementation is Scalaz Promises)
I don't consider that using a pool of threads is a solution because it doesn't scale, in the case of heavy concurrent requests this will spawn a very large number of threads.
Scala actors by default are event-based (not thread-based) - continuation scheduling allows creating millions of actors on a standard JVM setup.
If you're targeting Java, Akka Framework is an Actor model implementation that has a good API both for Java and Scala.
Aside from that, the synchronous nature of JDBC makes perfect sense to me. The cost of a database session is far higher than the cost of the Java thread being blocked (either in the fore- or background) and waiting for a response. If your queries run for so long that the capabilities of an executor service (or wrapping Actor/fork-join/promise concurrency frameworks) are not enough for you (and you're consuming too many threads) you should first of all think about your database load. Normally the response from a database comes back very fast, and an executor service backed with a fixed thread pool is a good enough solution. If you have too many long-running queries, you should consider upfront (pre-)processing - like nightly recalculation of the data or something like that.

Perhaps you could use a JMS asynchronous messaging system, which scales pretty well, IMHO:
Send a message to a Queue, where the subscribers will accept the message, and run the SQL process. Your main process will continue running and accepting or sending new requests.
When the SQL process ends, you can run the opposite way: send a message to a ResponseQueue with the result of the process, and a listener on the client side accept it and execute the callback code.

It looks like a new asynchronous jdbc API "JDBC next" is in the works.
See presentation here
You can download the API from here
Update:
This new jdbc API was later named ADBA. Then on September 2019 work was stopped see mailing list post.
R2DBC seems to achieve similar goals. It already supports most major databases (except oracle db). Note that this project is a library and not part of the jdk.

There is no direct support in JDBC but you have multiple options like MDB, Executors from Java 5.
"I don't consider that using a pool of threads is a solution because it doesn't scale, in the case of heavy concurrent requests this will spawn a very large number of threads."
I am curious why would a bounded pool of threads is not going to scale? It is a pool not thread-per-request to spawn a thread per each request. I have been using this for quite sometime on a heavy load webapp and we have not seen any issues so far.

As mentioned in other answers JDBC API is not Async by its nature.
However, if you can live with a subset of the operations and a different API there are solutions. One example is https://github.com/jasync-sql/jasync-sql that works for MySQL and PostgreSQL.

A solution is being developed to make reactive connectivity possible with standard relational databases.
People wanting to scale while retaining usage of relational databases
are cut off from reactive programming due to existing standards based
on blocking I/O. R2DBC specifies a new API that allows reactive code
that work efficiently with relational databases.
R2DBC is a specification designed from the ground up for reactive
programming with SQL databases defining a non-blocking SPI for
database driver implementors and client library authors. R2DBC drivers
implement fully the database wire protocol on top of a non-blocking
I/O layer.
R2DBC's WebSite
R2DBC's GitHub
Feature Matrix

Ajdbc project seems to answer this problem http://code.google.com/p/adbcj/
There is currently 2 experimental natively async drivers for mysql and postgresql.

An old question, but some more information. It is not possible to have JDBC issue asynchronous requests to the database itself, unless a vendor provides an extension to JDBC and a wrapper to handle JDBC with. That said, it is possible to wrap JDBC itself with a processing queue, and to implement logic that can process off the queue on one or more separate connections. One advantage of this for some types of calls is that the logic, if under heavy enough load, could convert the calls into JDBC batches for processing, which can speed up the logic significantly. This is most useful for calls where data is being inserted, and the actual result need only be logged if there is an error. A great example of this is if inserts are being performed to log user activity. The application won't care if the call completes immediately or a few seconds from now.
As a side note, one product on the market provides a policy driven approach to allowing asynchronous calls like those I described to be made asynchronously (http://www.heimdalldata.com/). Disclaimer: I am co-founder of this company. It allows regular expressions to be applied to data transformation requests such as insert/update/deletes for any JDBC data source, and will automatically batch them together for processing. When used with MySQL and the rewriteBatchedStatements option (MySQL and JDBC with rewriteBatchedStatements=true) this can significantly lower overall load on the database.

You have three options in my opinion:
Use a concurrent queue to distribute messages across a small and fixed number of threads. So if you have 1000 connections you will have 4 threads, not 1000 threads.
Do the database access on another node (i.e. another process or machine) and have your database client make asynchronous network calls to that node.
Implement a true distributed system through asynchronous messages. For that you will need an messaging queue such as CoralMQ or Tibco.
Diclaimer: I am one of the developers of CoralMQ.

The Java 5.0 executors might come handy.
You can have a fixed number of threads to handle long-running operations. And instead of Runnable you can use Callable, which return a result. The result is encapsulated in a Future<ReturnType> object, so you can get it when it is back.

Here is an outline about what an non-blocking jdbc api could look like from Oracle presented at JavaOne:
https://static.rainfocus.com/oracle/oow16/sess/1461693351182001EmRq/ppt/CONF1578%2020160916.pdf
So it seems that in the end, truly asynchronous JDBC calls will indeed be possible.

Just a crazy idea : you could use an Iteratee pattern over JBDC resultSet wrapped in some Future/Promise
Hammersmith does that for MongoDB.

I am just thinking ideas here. Why couldn't you have a pool of database connections with each one having a thread. Each thread has access to a queue. When you want to do a query that takes a long time, you can put on the queue and then one of threads will pick it up and handle it. You will never have too many threads because the number of your threads are bounded.
Edit: Or better yet, just a number of threads. When a thread sees something in a queue, it asks for a connection from the pool and handles it.

The commons-dbutils library has support for an AsyncQueryRunner which you provide an ExecutorService to and it returns a Future. Worth checking out as it's simple to use and ensure you won't leak resources.

If you are interested in asynchronous database APIs for Java you should know that there is a new initiative to come up with a set of standard APIs based on CompletableFuture and lambdas. There is also an implementation of these APIs over JDBC which can be used to practice these APIs:
https://github.com/oracle/oracle-db-examples/tree/master/java/AoJ
The JavaDoc is mentioned in the README of the github project.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.