java load balancer and Queue management (like PBS) - java

We have a simple java app which runs user requests (commands) on the command line and outputs results. So, as an example, say the user submits a job along the lines of:
cmd.exe ping -10 google.com
Typically, we have around 100 users, submitting 100's of requests to be exceuted. Currently, the java app simply queues the jobs and run them sequentially, without being "democratic" or "greedy" about it.
What we would like to do is have the ability to prioritize jobs, run jobs in a more "even" fashion (say 2 users have submitted 100 jobs each, we would like to run 1 job per user and switch back and forth).
To this end, I was wondering if there are any opensource tools such as PBS:
http://code.google.com/p/pbs4java/
which could integrate with java. A search on google does not reveal a lot - any comments or suggestions would be much appreciated.
UPDATE: The most important criterions I am after are:
[1] Should be opensource
[2] Should be able to integrate with Java.

I suggest taking a look at Akka Dispatch and priority features. You should manage how you want to assign the priorities to the tasks. So, in your case, as users submit more tasks, it could lead to degrading to lower priorities through time. The metrics should be defined by your problem requirement and injected into the library.
As a side node, Akka uses HawtDispatch ideas, so you might also take a look at the library if it suits you.

Related

Java/Database project automation

I have a Java/Database project in Netbeans that I would like to run once a day at a set time. I am using Derby for the database driver. I am trying to automate a process.
How can I 'schedule' this program to run at specified times?
How can I customize this to keep running until a certain criteria is met?
Say my criteria is that It has to populate 500 rows in the database. (So say at the scheduled time it runs it can only populate 400 rows, then maybe 2 hours later it tries running again to fill the last 100 rows)
Lastly, what are the best practices of automation and scheduled tasks?
How can I 'schedule' this program to run at specified times?
This can be done one of two ways, depending on your operating system - write a job that kicks off the java program at the intervals you need. You may then hook up the job to be started off on start up.
In Linux you can accomplish this with a cron job or so. On windows you may refer to this http://support.microsoft.com/kb/308569.
You may also program the scheduler into your java program using http://quartz-scheduler.org or http://www.sauronsoftware.it/projects/cron4j/ .
How can I customize this to keep running until a certain criteria is met?
This is perhaps best established from within your program, although it is hard to give you directions without much info.
Lastly, what are the best practices of automation and scheduled tasks?
Depending on your application architecture, scheduling and automation can be handled either from within the app or get support from the operating system. The criteria depends on how much control the application needs, which platform makes scheduling easy etc.
Hope this helps.
Quartz is a scheduling project for Java. I have used it in many projects and find it to be very intuitive.
It may be a little over the top for what your after but worth a look anyway.
You can make use of Timer for scheduling the events & the events/task must be implemented using TimerTask

How do I execute multiple processes simultaneously in Java?

I am working on an application in which I want multiple tasks to be executed simultaneously.
I also want to be able to keep track of the number of such tasks being run in parallel, and sometimes add yet another task to be processed in parallel, in addition to the current set of tasks already being processed.
One more thing- I want to do the above, not only in a desktop app, but also in a cloud app, in which I initialise another virtual machine running Tomcat, and then repeat all of the above in that instance.
What is the best way to do this? If you can point me to the correct theory/guides on this subject, that would be great, although code samples are also welcome.
Concurrency is a huge topic in Java, please take your time for it
Lesson: Concurrency
Concurrency in a Java program is accomplished by starting your own Threads. Multiple processes can only be realized with multiple JVMs. When you are done with the basics, you want to take a look at Executors. They will help to implement your application in a structured way since they abstract from Threads to Tasks.
I don't know how much time you have planned for this, but if you are really at the start, get Java Concurrency in Practice, read it and write a kick-ass concurrent Java application.
Raising the whole thing to a distributed level is a whole other story. You cannot tackle that all at once.
Wow... What a series of steps. Start by extending Runnable, then using Thread to run and manage your Jobs. After that, you can get into Tomcat.

multithreaded web application in java

I am doing a web application which has Java as a front end and shell script as a back end. The concept is I need to process multiple files in the back end. I will get the date range from the user (for example from July 1st-8th) and for each day process around 100 files. So in total I have 800 files to process.
I will get these details from JSP and delegate a background call to shell script and get back the results and display the same to the user.
Now I did all these in a sequential approach - by which I mean without threads. So there is only one main thread that executes and the user has to wait till 800 files are processed sequentially. However this is really slow. And because of this I am thinking of going for threads. Since I am a beginner of threads, I read a some stuffs regarding this and I have come up with the following idea:
As I read threads work have to be split .. I thought of splitting the
8 day work to 4 threads where each thread would perform 2 day work
I would like to know whether I am following a correct approach and my major concerns are:
Is it recommended to spawn multiple threads from a web application
Whether or not this is a good approach
Some guidance of how to proceed with this. An example instance would be great. Thank you.
Yes, you can run the long processing job in multi-threaded or in any high performance environment. You should also you Servlet 3.0 Asynchronous Request Processing to suspend the request thread and wait till the Long processing task is done.
Yes, there's nothing wrong with spawning multiple threads from a web application. In fact, if you're running a Servlet container (which you most likely are since you're using Java), it's already spawning multiple threads for you. In general a Servlet container will automatically spawn a new thread (or reuse one out of a pool) to handle each request it receives.
Your approach is fine, thought you'll want to fine-tune the number of threads to something that is suitable given the hardware configuration of your system and the amount of concurrent load on your web service. Also note that while spinning up a bunch of threads will reduce the total amount of time needed to process all the data, it will still leave a potentially large chunk of time before any data is ready to go back to the user. So you might get a better result by doing smaller work units sequentially, and posting each batch of results to the user-interface as soon as it is ready. Then it will still take a long while for the user to have all the data, but that can start viewing at least a portion of it almost immediately.
The way to improve user experience is not by parallelizing at Servlet level on 100000 threads but rather to provide incremental rendering of the view. First of all it would be useful to separate your application in multiple layers, according to the MVC pattern for example.
Saying that, you will have to look on how
Create a service that is able to return partial answers and a last answer, meaning that all available data has been returned. Each of this answers can be computed in parallel to improve performance.
Fill a web page incrementally, tipically by calling back this service which returns a JSON string you use to add data to the DOM. Every time you get an answer, if this is a partial answer, you call again the service providing the previous sequence number.
If you look to Liligo to understand, you will see how this is works. The technique I described is known as polling, but there are others technique to obtain similar asynchronous results at UI Level. In general, you don't want to work directly with the Servlet API, which is a very low level API,but rather use a reasonable framework or abstraction for that.
If you want a warm advice, you should have a look to the Play! framework http://www.playframework.org/documentation/2.0.2/JavaStream HTTP streaming.
Create threads in a web application is not a good solution. It is a bad design because normally it would be the container (web server) who is charged with that activity. So I think you have to find another solution.
I suggest you putting the shell scripts in cron, scheduled to run each minute, and to "activate" them you can touch files that act as semaphores. At each run the scripts verify if the web application touched the semaphore file, if so they read the date interval from those files and then start to process.

Where to host a periodically running Python or Java service?

I'm going to build a little service which monitors an IMAP email account and acts on the read messages. For this it just has to run every say 10 min, no external trigger required, but I want to host this service externally (so that I don't need to worry about up times.)
To be machine independent I could write the service in Java or Python. What are good hosting providers for this? and which of the two languages is better supported?
The service has either to run the whole time (and must do the waiting itself) or it has to be kicked off every 10 min. I guess most (web) hosts are geared towards request driven code (e.g. JSP) and I assume they shut down processes which run forever. Who offers hosting for user-written services like the one mentioned above?
Depending on what actions you require, and your requirements for resources, Google App Engine might be quite suitable for both Python and Java services (GAE supports both languages quite decently). cron jobs can be set to run every 10 minutes (the URL I gave shows how to do that with Python) and you can queue more tasks if the amount of work you need to perform on a certain occasion exceeds the 30-seconds limit that GAE supports.
GAE is particularly nice to get started and experiment since it has reasonably-generous free quotas for most all resources your jobs could consume (you need to enable billing, provide a credit card, and set up a budget, to allow your jobs to consume more than their free quota, though).
If you decide that GAE has limitations you can't stand, or would cost you too much for billed use of resources over the free quotas, any hosting provider supporting a Unix-like cron jobs scheduler should be acceptable. Starting up from scratch a Python script every 10 minutes may be faster than starting up from scratch a JVM, but that depends on what it is that you have to do every 10 minutes (for some kinds of tasks Python will be just as fast, or maybe even faster -- for others it will be slower, and we have no way to guess what kinds of tasks you require or at what "tipping point" the possibly-faster JVM will "pay for its own startup" wrt the possibly-slower Python... basically you'll need to assess that for yourself!-).
You are lucky, since Google AppEngine provides CRON jobs both for Python and Java.
GAE - Python
GAE - Java
Check out Google App Engine. You can set up a cron job for your Java or Python script.

Workload Distribution / Parallel Execution in JAVA

I have a situation here where I need to distribute work over to multiple JAVA processes running in different JVMs, probably different machines.
Lets say I have a table with records 1 to 1000. I am looking for work to be collected and distributed is sets of 10. Lets say records 1-10 to workerOne. Then records 11-20 to workerThree. And so on and so forth. Needless to say workerOne never does the work of workerTwo unless and until workerTwo couldnt do it.
This example was purely based on database but could be extended to any system, I believe be it File processing, email processing and so forth.
I have a small feeling that the immediate response would be to go for a Master/Worker approach. However here we are talking about different JVMs. Even if one JVM were to come down the other JVM should just keep doing its work.
Now the million dollar question would be: Are there any good frameworks(production ready) that would give me facility to do this. Even if there are concrete implementations of specific needs like Database records, File processing, Email processing and their likes.
I have seen the Java Parallel Execution Framework, but am not sure if it can be used for different JVMs and if one were to come down would the other keep going.I believe Workers could be on multiple JVMs, but what about the Master?
More Info 1: Hadoop would be a problem because of the JDK 1.6 requirement. Thats bit too much.
Thanks,
Franklin
Might want to look into MapReduce and Hadoop
You could also use message queues. Have one process that generates the list of work and packages it in nice little chunks. It then plops those chunks on a queue. Each one of the workers just keeps waiting on the queue for something to show up. When it does, the worker pulls a chunk off the queue and processes it. If one process goes down, some other process will pick up the slack. Simple and people have been doing it that way for a long time so there's a lot information about it on the net.
Check out Hadoop
I believe Terracotta can do this. If you are dealing with web pages, JBoss can be clustered.
If you want to do this yourself you will need a work manager which keeps track of jobs to do, jobs in progress and jobs never done which needs to be rescheduled. The workers then ask for something to do, do it, and send the result back, asking for more.
You may want to elaborate on what kind of work you want to do.
The problem you've described is definitely best solved using the master/worker pattern.
You should have a look into JavaSpaces (part of the Jini framework), it's really well suited to this kind of thing. Basically you just want to encapsulate each task to be carried out inside a Command object, subclassing as necesssary. Dump these into the JavaSpace, let your workers grab and process one at a time, then reassemble when done.
Of course your performance gains will totally depend on how long it takes you to process each set of records, but JavaSpaces won't cause any problems if distributed across several machines.
If you work on records in a single database, consider performing the work within the database itself using stored procedures. The gain for processing the records on different machine might be negated by the cost of retrieving and transmitting the work between the database and the computing nodes.
For file processing it could be a similar case. Working on files in (shared) filesystem might introduce large I/O pressure for OS.
And the cost for maintaining multiple JVM's on multiple machines might be an overkill too.
And for the question: I used the JADE (Java Agent Development Environment) for some distributed simulation once. Its multi-machine suppord and message passing nature might help you.
I would consider using Jgroups for that. You can cluster your jvms and one of your nodes can be selected as master and then can distribute the work to the other nodes by sending message over network. Or you can already partition your work items and then manage in master node the distribution of the partitions like partion-1 one goes to JVM-4 , partion-2 goes to JVM-3, partion-3 goes to JVM-2 and so on. And if JVM-4 goes down it will be realized by the master node and then master node will tell to one of the other nodes to start pick up partition-1 as well.
One other alternative which is easier to use is redis pub sub support. http://redis.io/topics/pubsub . But then you will have to maintain redis servers which i dont like.

Categories

Resources