Recently I have come across a requirement where in I have to provide a custom jar to applications and this jar would contain threads that would query a database periodically and fetch messages(records) for that particular application which use them. So for example of app A uses this jar, then the threads in the jar would fetch all messages only for app A.
The database is a shared db between apps.
This works fine for standalone apps but for apps deployed over a cluster in an enterprise application server (weblogic in my case), this fails since all nodes on the cluster run in their own JVM and each one spawns a listener thread for the same app. So there can be conditions where in two threads run at the same time and fetch same records and there would be double processing. Cannot use synchronization since that will lead to performance bottle necks.
I cant use singleton timer EJBS. Have heard about the workmanager but not sufficient examples over the net. I am using the spring core framework.
If any of you could give any suggestions, it would be great.
Thanks.
First of all please stop thinking threads if you're dealing with JavaEE, it's supposed to provide higher level of abstraction for higher level of mindsets.
JavaEE 7 provides ManagedScheduledExecutorService
Quartz works great in that scenario - only one node in your JavaEE cluster is going to execute the job
Related
I have an EJB packaged in an EAR and deployed to Glassfish.
Currently we just use Glassfish/Eclipselink for caching.
But our server is starting to come under heavy loads and I want to set it up behind a load balancer on AWS.
The problem is, I don't want my cache to be out of sync for automatically spun up instances. I want the instances to be completely automatic.
I know you can set Glassfish up in a cluster, but as far as I know that isn't automatic. I would have to manage it myself. I want to fully automate everything.
It would be awesome if the Glassfish instances could be completely independent of each other, and I could use Redis or another server like that to offload the cache. That way the cache would be in one place, the Glassfish instances could spin up and down automatically and it would never matter, I wouldn't have to register them with a Glassfish cluster. I could also use the same Redis cache for the front end of the application. Glassfish is running the business layer accessible by API calls. The front end web is running separately. I was going to set up a Redis cache for that also, but if they could both share the same cache, that would be awesome.
Any ideas?
I can only answer on basis of a conceptual level, since I don't know the used products in detail.
Regardless if you add another level of caching, you need to care about the data consistency within your application.
In a cluster setup, a local non-distributed cache is no problem. The consistancy coordination solves this, e.g. via JMS. You need to explore how to setup the consistancy coordination across your cluster.
We have a Spring Integration application which is polling a mongodb:inbound-channel-adapter like so:
<int-mongodb:inbound-channel-adapter channel="n2s.mongoResults"
collection-name="entities"
query="{_id: {$regex: 'mpl/objectives'}})">
<!-- Run every 15 minutes -->
<int:poller fixed-rate="900000"/>
</int-mongodb:inbound-channel-adapter>
Everything works fine. However, this application is deployed to a cluster and so multiple servers are running the same poller. We'd like to coordinate these servers such that only one runs the pipeline.
Of course, the servers don't know about each other, so we probably need to coordinate them through a locking mechanism in a database. Any suggestions on how to achieve this?
Notes:
We have access to both a MongoDB database and an Oracle database in this workflow. From the perspective of the workflow, it makes more sense to lock on the Oracle database.
It's fine if all server execute the polling step and then one server locks to actually process the records, if that's easier to achieve.
Any suggestions on how to achieve this?
You could use distributed locking tool like like Zookeeper. Another alternative would be to change from a simple fixed trigger to a scheduling framework like Quartz which will ensure that the job only executes on a single node.
It's fine if all server execute the polling step and then one server
locks to actually process the records, if that's easier to achieve.
Yea that's what I would do. I think it's by far the easiest approach. See this post for details on how to do locking with Oracle.
There are several options, including:
Set auto-startup="false" and use some management tool to monitor the servers and ensure that exactly one adapter is running (you can use a control-bus or JMX to start/stop the adapter.
Run the application in SpringXD containers; set the module count for the source (containing the mongo adapter) and the XD admin will make sure an instance is running. It uses Zookeeper to manage state.
Use a distributed lock to ensure only one instance processes messages. Spring Integration itself comes with a RedisLockRegistry for such things or you can use any distributed lock mechanism.
There is a Java EE application where we have batches of jobs to process. Processing involves calling an external service that has a limitation so that we can send only N number of requests concurrently. This bottleneck has to be implemented in our application logic and I am wondering how could we achieve this in the best way. Fortunately clustering is not a requirement, so we can confine the problem to a single server instance.
My first idea would be using an ExecutorService backed by a
ThreadPool with N working threads so that the ThreadPool object
would act as the regulator. Of course this is not an EE solution.
My second idea would be somehow configuring such a ThreadPool in
the container and using that, but I have not found any feature like
this so far.
The third idea is using a Semaphore(N) object in a #Singleton
EJB.
The fourth idea is somehow creating a limited pool of stateless
session beans and putting the limited-resource access in those. As
the bean number is managed by the container, the resource usage will
be limited as well
(To clarify: a general solution would be the best, but it is known that we're running on Glassfish 3.1.1 and maybe later on JBoss 6.x)
Could you suggest me a good architecture for this problem and/or comment on my ideas to help my decision?
Why don't you use Works? Have a look here for an overview of how to use Works in JBoss and Weblogic. I don't know about Glasshfish, I'll leave the research to you now ;)
In short, Works are EE compliant threads.
The canonical solution for concurrent message processing in Java EE is to use MDBs. You can limit the number of concurrently running tasks by limiting the MDB pool size.
Setting MDB Pool Size in Glassfish
JBoss 7 EJB3 Subsystem Configuration Guide
I am looking for a pattern and/or framework which can model the following problem in an easily configurable way.
Every say 3 minutes, I needs to have a set of jobs kick off in a web application context that will concurrently hit web services to obtain the latest version of data, and push it off to a database. The problem is the database will be being heavily used to read the data from to do tons of complex calculations on the data. We are currently using spring so I have been looking at Spring Batch to run this process does anyone have any suggestions/patterns/examples of using Spring or other technologies of a similar system?
We have used ServletContextlisteners to kick off TimerTasks in our web applications when we needed processes to run repeatedly. The ServletContextListener kicks off when the app server starts the application or when the application is restarted. Then the timer tasks act like a separate thread that repeats your code for the specified period of time.
ServletContextListener
http://www.javabeat.net/examples/2009/02/26/servletcontextlistener-example/
TimerTask
http://enos.itcollege.ee/~jpoial/docs/tutorial/essential/threads/timer.html
Is refactoring the job out of the web application and into a standalone app a possibility?
That way you could stick the batch job onto a separate batch server (so that the extra load of the batch job wouldn't impact your web application), which then calls the web services and updates the database. The job can then be kicked off using something like cron or Autosys.
We're using Spring-Batch for exactly this purpose.
The database design would also depend on what the batched data is used for. If it is for reporting purposes, I would recommend separating the operational database from the reporting database, using a database link to obtain the required data from the operational database into the reporting database and then running the complex queries on the reporting database. That way the load is shifted off the operational database.
I think it's worth also looking into frameworks like camel-integration. Also take a look at the so called Enterprise Integration Patterns. Check the catalog - it might provide you with some useful vocabulary to think about the scaling/scheduling problem at hand.
The framework itself integrates really well with Spring.
I have a Java EE application that has two components: First is a service that scrapes some information from internet and fills it into database. Second is a web interface (deployed on tomcat) from where user can browse that information.
What could be the best approach to implement the first component? Should it be run as a background Daemon/Service or a thread within the container?
I would personally separate them into different processes. Aside from anything else, it means you can restart one without worrying about the other. It also means you can really easily deploy them on different machines without pointlessly installing Tomcat for a service which doesn't actually need a web interface.
Depending on the type of application framework, Spring lets you use Quartz or the java.util.concurrent framework. Spring has a TaskExecutor abstraction (see the Spring documentation) which simplifies a lot of this, but check to see which fits best with your design.
Spring or Quartz (managed by Spring) then controls the creation and starting/stopping of Threads or Executors or Jobs, along with their frequency/period and other scheduling parameters, and also manages any pooling of jobs you might require.
I use these for all my background tasks and batch jobs in any Java EE applications I write with no problems. Since the jobs are Spring managed POJOs, they have access to the full dependency injection framework and so on that that Spring entails, and of course you can switch between scheduler frameworks with a simple change to you application configuration XML file as your needs change or scale.
There is nothing wrong with having background jobs inside a web container, but you MUST let the web container know about it so it can be stopped and started properly.
Have a look at the load-on-startup tag in web.xml. There are some advice on http://wiki.metawerx.net/wiki/Web.xml.LoadOnStartup