I had written a code using executor service in java. Here I am creating 10 worker threads to process database fetched rows. Each thread will be assigned with one resultant row. This approach will work fine when the application is deployed and running on single instance/node.
Can anyone suggest how this will behave when my application is deployed in multiple nodes/cluster?
Do I have to take care of any part of code before deploying into cluster?
04/12/15: Any more suggestions?
You should consider the overhead of each task. Unless the task is of moderate size, you might want to batch them.
In a distributed context the overhead if much higher so you are more likely to need to batch the work.
You will need to a framework, so the considerations will depend on the framework you chose.
Related
I did an application to do some testing on network nodes like ping test, retrieve disk space ans so on.
I use a scheduled batchlet to run the actions but I wonder if it is the rigth use of batchlet?
Does an EJB timer should be more relevant? Also, when I run a batchlet, my glassfish server keeps a log of the batch job and I don't necessary need it (especially with the amount of batch jobs genereted during a day).
If I need to run some job in the same schedule time, I think batchled can do it but EJB timer too?
Could you give me your input on the rigth way to achieve this?
Thanks,
Ersch
This isn't a question with a clear answer, but there is a bit of a cost in factoring your application as a batch job, and I would look at what I'm getting to see if it's worth doing so.
So you're thinking about a job consisting of a single Batchlet step. Well, there'd be nothing gained from "restart" functions, neither at the failing step within a job nor leveraging checkpoints within a chunk step. The batchlet programming model is quite simple... even if you really like #BatchProperty you'd have to deal with an XML now to do so.
This only starts to get more interesting if you want to start, view, and manage these executions along with the rest of your batch jobs. This might be because you're working with an implementation that offers some kind of implementation-specific add-on function. An example of this could be an integration with external scheduler software, allowing jobs to be scheduled by it. At the other extreme, if you found value in having a persisted record of all your batch job executions in one place (the job repository, usually a persistent DB), then that could also make this worthwhile for you.
But if you don't care for any of that, then an EJB timer could be the way to go instead.
Using an EJB timer is appropriate when your task executes in an eye blink (or thereabouts).
Otherwise use the batching mechanism.
Long running tasks executed from EJB timers can be problematical because they execute in transactions which normally time out after a short period of time. Increasing this transaction time out also increases the chances of database and perhaps other resource locks which can impact normal operation of your application.
I've got a Spring Web application that's running on two different instances.
The two instances aren't aware of each other, they run on distinct servers.
That application has a scheduled Quartz job but my problem is that the job shouldn't execute simultaneously on the instances, as its a mail sending job, it could cause duplicate emails being sent.
I'm using RAMJobStore and JDBCJobStore is not an option for me due to the large number of tables it requires.(I cant afford to create many tables due to internal restriction)
The solutions I thought about:
-creating a single control table, that has to be checked everytime a job starts (with repeatable read isolation level to avoid concurrency issues) The problem is that if the server is killed, the table might be left in a invalid state.
-using properties to define a single server to be the job running server. Problem is that if that server goes down, jobs will stop running
Has anyone ever experienced this problem and do you have any thoughts to share?
Start with the second solution (deactivate qartz on all nodes except one). It is very simple to do and it is safe. Count how frequently your server goes down. If it is inacceptable then try the first solution. The problem with the first solution is that you need a good skill in mutithreaded programming to implement it without bugs. It is not so simple if multithreading is not your everyday task. And a cost of some bug in your implementation may be bigger than actual profit.
I'm not really understanding the dyno and worker process model of Heroku as it relates to a single process but multi-threaded Java-based server.
For example: How do I know (for a single dyno) how many processors are available for my background threads? Do I need to use something like RabbitMQ and create a separate process (app) for each background processing task and communicate between the server and these? Seems a little overkill for some Scheduled Tasks using Thread Cached Executors. Should all Futures be changed to inter-process Futures?
I guess it comes down to this question. Can I no longer write a multi-threaded server and scale the processors available to my server process in order to accommodate my thread activity? Or do I need to refactor my architecture to use separate processes for concurrency? If the former, do I need workers or just multiple dynos?
Thanks.
Heroku supports multiple concurrency models, so it's really up to you how you would like to architect your application. You have access to the full Java stack, so if something makes more sense to just be run as multiple threads in your web processes, you can definitely do that, or you can always enqueue jobs on something like RabbitMQ or Redis and process them on separate worker dynos. Multithreading is simpler and makes sense if the amount of work is light and proportional to your web requests because it will be scaled along with the web dynos; however, if the work is large, not proportional, and/or needs to be scaled independently, then breaking it out into a separate process would be better.
Heroku was originally just a Ruby platform, which does not have the same threading capabilities as Java, so the use of separate worker dynos is more important for Ruby and this is reflected in some of the documentation and examples out there, which might have led to your confusion. Luckily, with Java you have more options available to you and can use what's best for the job at hand.
I am working on an application in which I want multiple tasks to be executed simultaneously.
I also want to be able to keep track of the number of such tasks being run in parallel, and sometimes add yet another task to be processed in parallel, in addition to the current set of tasks already being processed.
One more thing- I want to do the above, not only in a desktop app, but also in a cloud app, in which I initialise another virtual machine running Tomcat, and then repeat all of the above in that instance.
What is the best way to do this? If you can point me to the correct theory/guides on this subject, that would be great, although code samples are also welcome.
Concurrency is a huge topic in Java, please take your time for it
Lesson: Concurrency
Concurrency in a Java program is accomplished by starting your own Threads. Multiple processes can only be realized with multiple JVMs. When you are done with the basics, you want to take a look at Executors. They will help to implement your application in a structured way since they abstract from Threads to Tasks.
I don't know how much time you have planned for this, but if you are really at the start, get Java Concurrency in Practice, read it and write a kick-ass concurrent Java application.
Raising the whole thing to a distributed level is a whole other story. You cannot tackle that all at once.
Wow... What a series of steps. Start by extending Runnable, then using Thread to run and manage your Jobs. After that, you can get into Tomcat.
I'm trying to write a Spring web application on a Weblogic server that makes several independent database SELECTs(i.e. they can safely be called concurrently), one of which takes 15 minutes to execute.
Once all the results are fetched, an email containing the results will be sent to a user list.
What's a good way to get around this problem? Is there a Spring library that can help or do I go ahead and create daemon threads to do the job?
EDIT: This will have to be done at the application layer (business requirement) and the email will be sent out by the web application.
Are you sure you are doing everything optimally? 15 minutes is a really long time unless you have a gabillion rows across dozens of tables and need a heckofalot of joins....this is your highest priority -- why is it taking so long?
Do you do the email job at set intervals, or is it invoked from your web app? If set intervals, you should do it in an outside job, possibly on another machine. You can use daemons or the quartz scheduler.
If you need to fire this process off from the web app, you need to do it asynchronously. You could use JMS, or you could just have a table into which you enter a new job request, with daemon process that looks for new jobs every X time period. Firing off background threads is possible, but its error prone and not worth the complication, especially since you have other valid options that are simpler.
If you are asking about Spring support for long-running, possibly asynchronous tasks, you have a choice between Spring JMS support and Spring Batch.
You can use spring quartz to schedule the job. That way the jobs will run in the same container but will not require an http request to trigger them.