SAX data collection design pattern

SAX data collection design pattern - java

I am new to Java development, and looking for general design patterns for a data collection application written in Java.
I have already written the prototype, which is a basic Java console application that uses SAX to retrieve data and store it in a database.
Obviously, this is not a Web app, so it doesn't need to run in a container like Tomcat, but what would people recommend? The application currently uses a basic Java timer to run every 5 minutes.
So the basic requirements that I can think of are
It needs to run all the time, so if it crashes, it needs to be restarted.
It needs to do its work every 5 minutes, so it needs a timer.
It could use Hibernate, but not if it creates any overhead, as this is a highly
date intensive application.
So what I am looking for are suggestions like:
You could run a timer widget thingumbob under Tomcat anyway and get requirement #1.... or Spring 99 has all of the features you need.
etc.

For this type of application you could have a main process that spawns a thread that does the actual work. This thread would be in a loop that basically checks to see if it supposed to be running or not. If it is running it continues. Once it does its work you can use Thread.sleep(msToSleep) to put the thread to sleep for 5 minutes. So it would go in a continuous cycle of working and sleeping. Not timer required. The main process can "ping" the thread to see if it is still functional and if it is not spawn a new thread. Depending on the OS there are similar techniques to make sure the main process is running. Using an ORM like Hibernate will add overhead so you will have to way the trade-offs between transaction performance and ease of development. If you are converting your data to objects yourself you will have to use a profiler to see if you are actually implementing it more efficiently than an ORM can.

Related

Google App Engine Java Program concurrency

In order to improve the execution speed of a Java program running in Google App Engine, can I create additional Java threads during the runtime to make use of idle machines in the data center?
I've found conflicting data thus far.

If your primary concern is to improve the execution time, take a look at Memcache and Tasks. They can be used to reduce or avoid the latency of reading from or writing to the Datastore or other storage options, fetching URLs, sending emails, etc. If you do a lot of difficult computations that can run in parallel, look at MapReduce API.
Once you remove all the delays from your program, there will be no reason to use multiple threads within a single request.
Note that App Engine instances can use multithreading to execute multiple requests at the same time, so they tend to use allocated resources efficiently. To enable it, see:
https://developers.google.com/appengine/docs/java/config/appconfig#Java_appengine_web_xml_Using_concurrent_requests

If you have a problem that calls for a multithreaded solution, you can use threads (as described on the link that you included in your question).
However, based on your reasoning ("to make use of idle machines in the datacenter"), it seems like you're misguided. You should not use threads for that reason. You use the machines hours that you pay for and not more. The only time you will have an idle machine is if you tell App Engine to keep around an extra idle machine so that it doesn't have to start up an extra machine your app gets a big usage spike.
Most of the time, unless you are truly doing parallel computation, you won't need to use multiple threads in App Engine. For instance, the datastore has an asynchronous API so that you can do multiple datastore operations in parallel without having to deal with threads yourself.
Does that make sense?

Mapping the heroku cedar model to a multithreaded application

I'm not really understanding the dyno and worker process model of Heroku as it relates to a single process but multi-threaded Java-based server.
For example: How do I know (for a single dyno) how many processors are available for my background threads? Do I need to use something like RabbitMQ and create a separate process (app) for each background processing task and communicate between the server and these? Seems a little overkill for some Scheduled Tasks using Thread Cached Executors. Should all Futures be changed to inter-process Futures?
I guess it comes down to this question. Can I no longer write a multi-threaded server and scale the processors available to my server process in order to accommodate my thread activity? Or do I need to refactor my architecture to use separate processes for concurrency? If the former, do I need workers or just multiple dynos?
Thanks.

Heroku supports multiple concurrency models, so it's really up to you how you would like to architect your application. You have access to the full Java stack, so if something makes more sense to just be run as multiple threads in your web processes, you can definitely do that, or you can always enqueue jobs on something like RabbitMQ or Redis and process them on separate worker dynos. Multithreading is simpler and makes sense if the amount of work is light and proportional to your web requests because it will be scaled along with the web dynos; however, if the work is large, not proportional, and/or needs to be scaled independently, then breaking it out into a separate process would be better.
Heroku was originally just a Ruby platform, which does not have the same threading capabilities as Java, so the use of separate worker dynos is more important for Ruby and this is reflected in some of the documentation and examples out there, which might have led to your confusion. Luckily, with Java you have more options available to you and can use what's best for the job at hand.

Java/Database project automation

I have a Java/Database project in Netbeans that I would like to run once a day at a set time. I am using Derby for the database driver. I am trying to automate a process.
How can I 'schedule' this program to run at specified times?
How can I customize this to keep running until a certain criteria is met?
Say my criteria is that It has to populate 500 rows in the database. (So say at the scheduled time it runs it can only populate 400 rows, then maybe 2 hours later it tries running again to fill the last 100 rows)
Lastly, what are the best practices of automation and scheduled tasks?

How can I 'schedule' this program to run at specified times?
This can be done one of two ways, depending on your operating system - write a job that kicks off the java program at the intervals you need. You may then hook up the job to be started off on start up.
In Linux you can accomplish this with a cron job or so. On windows you may refer to this http://support.microsoft.com/kb/308569.
You may also program the scheduler into your java program using http://quartz-scheduler.org or http://www.sauronsoftware.it/projects/cron4j/ .
How can I customize this to keep running until a certain criteria is met?
This is perhaps best established from within your program, although it is hard to give you directions without much info.
Lastly, what are the best practices of automation and scheduled tasks?
Depending on your application architecture, scheduling and automation can be handled either from within the app or get support from the operating system. The criteria depends on how much control the application needs, which platform makes scheduling easy etc.
Hope this helps.

Quartz is a scheduling project for Java. I have used it in many projects and find it to be very intuitive.
It may be a little over the top for what your after but worth a look anyway.

You can make use of Timer for scheduling the events & the events/task must be implemented using TimerTask

How do I execute multiple processes simultaneously in Java?

I am working on an application in which I want multiple tasks to be executed simultaneously.
I also want to be able to keep track of the number of such tasks being run in parallel, and sometimes add yet another task to be processed in parallel, in addition to the current set of tasks already being processed.
One more thing- I want to do the above, not only in a desktop app, but also in a cloud app, in which I initialise another virtual machine running Tomcat, and then repeat all of the above in that instance.
What is the best way to do this? If you can point me to the correct theory/guides on this subject, that would be great, although code samples are also welcome.

Concurrency is a huge topic in Java, please take your time for it
Lesson: Concurrency
Concurrency in a Java program is accomplished by starting your own Threads. Multiple processes can only be realized with multiple JVMs. When you are done with the basics, you want to take a look at Executors. They will help to implement your application in a structured way since they abstract from Threads to Tasks.
I don't know how much time you have planned for this, but if you are really at the start, get Java Concurrency in Practice, read it and write a kick-ass concurrent Java application.
Raising the whole thing to a distributed level is a whole other story. You cannot tackle that all at once.

Wow... What a series of steps. Start by extending Runnable, then using Thread to run and manage your Jobs. After that, you can get into Tomcat.

multithreaded web application in java

I am doing a web application which has Java as a front end and shell script as a back end. The concept is I need to process multiple files in the back end. I will get the date range from the user (for example from July 1st-8th) and for each day process around 100 files. So in total I have 800 files to process.
I will get these details from JSP and delegate a background call to shell script and get back the results and display the same to the user.
Now I did all these in a sequential approach - by which I mean without threads. So there is only one main thread that executes and the user has to wait till 800 files are processed sequentially. However this is really slow. And because of this I am thinking of going for threads. Since I am a beginner of threads, I read a some stuffs regarding this and I have come up with the following idea:
As I read threads work have to be split .. I thought of splitting the
8 day work to 4 threads where each thread would perform 2 day work
I would like to know whether I am following a correct approach and my major concerns are:
Is it recommended to spawn multiple threads from a web application
Whether or not this is a good approach
Some guidance of how to proceed with this. An example instance would be great. Thank you.

Yes, you can run the long processing job in multi-threaded or in any high performance environment. You should also you Servlet 3.0 Asynchronous Request Processing to suspend the request thread and wait till the Long processing task is done.

Yes, there's nothing wrong with spawning multiple threads from a web application. In fact, if you're running a Servlet container (which you most likely are since you're using Java), it's already spawning multiple threads for you. In general a Servlet container will automatically spawn a new thread (or reuse one out of a pool) to handle each request it receives.
Your approach is fine, thought you'll want to fine-tune the number of threads to something that is suitable given the hardware configuration of your system and the amount of concurrent load on your web service. Also note that while spinning up a bunch of threads will reduce the total amount of time needed to process all the data, it will still leave a potentially large chunk of time before any data is ready to go back to the user. So you might get a better result by doing smaller work units sequentially, and posting each batch of results to the user-interface as soon as it is ready. Then it will still take a long while for the user to have all the data, but that can start viewing at least a portion of it almost immediately.

The way to improve user experience is not by parallelizing at Servlet level on 100000 threads but rather to provide incremental rendering of the view. First of all it would be useful to separate your application in multiple layers, according to the MVC pattern for example.
Saying that, you will have to look on how
Create a service that is able to return partial answers and a last answer, meaning that all available data has been returned. Each of this answers can be computed in parallel to improve performance.
Fill a web page incrementally, tipically by calling back this service which returns a JSON string you use to add data to the DOM. Every time you get an answer, if this is a partial answer, you call again the service providing the previous sequence number.
If you look to Liligo to understand, you will see how this is works. The technique I described is known as polling, but there are others technique to obtain similar asynchronous results at UI Level. In general, you don't want to work directly with the Servlet API, which is a very low level API,but rather use a reasonable framework or abstraction for that.
If you want a warm advice, you should have a look to the Play! framework http://www.playframework.org/documentation/2.0.2/JavaStream HTTP streaming.

Create threads in a web application is not a good solution. It is a bad design because normally it would be the container (web server) who is charged with that activity. So I think you have to find another solution.
I suggest you putting the shell scripts in cron, scheduled to run each minute, and to "activate" them you can touch files that act as semaphores. At each run the scripts verify if the web application touched the semaphore file, if so they read the date interval from those files and then start to process.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

SAX data collection design pattern - java

Related

Google App Engine Java Program concurrency

Mapping the heroku cedar model to a multithreaded application

Java/Database project automation

How do I execute multiple processes simultaneously in Java?

multithreaded web application in java

Categories

Resources