There is a Java EE application where we have batches of jobs to process. Processing involves calling an external service that has a limitation so that we can send only N number of requests concurrently. This bottleneck has to be implemented in our application logic and I am wondering how could we achieve this in the best way. Fortunately clustering is not a requirement, so we can confine the problem to a single server instance.
My first idea would be using an ExecutorService backed by a
ThreadPool with N working threads so that the ThreadPool object
would act as the regulator. Of course this is not an EE solution.
My second idea would be somehow configuring such a ThreadPool in
the container and using that, but I have not found any feature like
this so far.
The third idea is using a Semaphore(N) object in a #Singleton
EJB.
The fourth idea is somehow creating a limited pool of stateless
session beans and putting the limited-resource access in those. As
the bean number is managed by the container, the resource usage will
be limited as well
(To clarify: a general solution would be the best, but it is known that we're running on Glassfish 3.1.1 and maybe later on JBoss 6.x)
Could you suggest me a good architecture for this problem and/or comment on my ideas to help my decision?
Why don't you use Works? Have a look here for an overview of how to use Works in JBoss and Weblogic. I don't know about Glasshfish, I'll leave the research to you now ;)
In short, Works are EE compliant threads.
The canonical solution for concurrent message processing in Java EE is to use MDBs. You can limit the number of concurrently running tasks by limiting the MDB pool size.
Setting MDB Pool Size in Glassfish
JBoss 7 EJB3 Subsystem Configuration Guide
Related
Recently I have come across a requirement where in I have to provide a custom jar to applications and this jar would contain threads that would query a database periodically and fetch messages(records) for that particular application which use them. So for example of app A uses this jar, then the threads in the jar would fetch all messages only for app A.
The database is a shared db between apps.
This works fine for standalone apps but for apps deployed over a cluster in an enterprise application server (weblogic in my case), this fails since all nodes on the cluster run in their own JVM and each one spawns a listener thread for the same app. So there can be conditions where in two threads run at the same time and fetch same records and there would be double processing. Cannot use synchronization since that will lead to performance bottle necks.
I cant use singleton timer EJBS. Have heard about the workmanager but not sufficient examples over the net. I am using the spring core framework.
If any of you could give any suggestions, it would be great.
Thanks.
First of all please stop thinking threads if you're dealing with JavaEE, it's supposed to provide higher level of abstraction for higher level of mindsets.
JavaEE 7 provides ManagedScheduledExecutorService
Quartz works great in that scenario - only one node in your JavaEE cluster is going to execute the job
Okay. This is again a question of industry practice.
Tomcat = Web Container
JBoss, WebLogic, etc = Application Servers that have Web Container within (for JBoss, its forked Tomcat)
Spring does not need Application Server like JBoss. If we use enterprise services like JMS, etc we can use independent systems like RabbitMQ, ApacheMQ, etc.
Question is why do people still use JBoss and other Application Serves for purely spring based applications?
What are the advantages Spring can make use of, by using Application Servers? Like object pooling? What specific advantages does Application Server offers? How are those configured?
If not for spring, for what other purposes Application Servers are used for Spring/Hibernate, etc stack? (Use cases)
Actually I would say listening for JMS is probably the best reason for an application server. A stand alone message broker does not fix the problem since you still need a component that's listening for messages. The best way to do this is to use a MDB. In theory you can use Springs MessageListenerContainer. However this has several disadvantages like JMS only supports blocking reads and Spring therefore needs to spin up it's own threads which is totally unsupported (even in Tomcat) and can break transactions, security, naming (JNDI) and class loading (which in turn can break remoting). A JCA resource adapter is free to do whatever it wants including spinning up threads via WorkManager. Likely a database is used besides JMS (or another destination) at which point you need XA-transactions and JTA, in other words an application server. Yes you can patch this into servlet container but that this point it becomes indistinguishable from an application server.
IMHO the biggest reason against application servers is that it takes years after a spec is published (which in turn takes years as well) until severs implement the spec and have ironed out the worst bugs. Only now, right before EE 7 is about to be published do we have are EE 6 servers starting to appear that are not totally riddled with bugs. It gets comical to the point where some vendors do no longer fix bugs in their EE 6 line because they're already busy with the upcoming EE 7 line.
Edit
Long explanation of the last paragraph:
Java EE in a lot of places relies on what's called contextual information. Information that's not explicitly passed as an argument from the server/container to the application but implicitly "there". For example the current user for security checks. The current transaction or connection. The current application for looking up classes to lazily load code or deserialize objects. Or the current component (servlet, EJB, …) for doing JNDI look ups. All this information is in thread locals that the server/container sets before calling a component (servlet, EJB, …). If you create your own threads then the server/container doesn't know about them and all the features relying on this information don't work anymore. You might get away with this by just not using any of those features in threads you spawn.
Some links
http://www.oracle.com/technetwork/java/restrictions-142267.html#threads
http://www.ibm.com/developerworks/websphere/techjournal/0609_alcott/0609_alcott.html#spring-4
If we check the Servlet 3.0 specification we find:
2.3.3.3 Asynchronous processing
Java Enterprise Edition features such as Section 15.2.2, “Web Application Environment” on page 15-174 and Section 15.3.1, “Propagation of Security Identity in EJBTM Calls” on page 15-176 are available only to threads executing the initial request or when the request is dispatched to the container via the AsyncContext.dispatch method. Java Enterprise Edition features may be available to other threads operating directly on the response object via the AsyncContext.start(Runnable) method.
This is about asynchronous processing but the same restrictions apply for custom threads.
public void start(Runnable r) - This method causes the container to dispatch a thread, possibly from a managed thread pool, to run the specified Runnable. The container may propagate appropriate contextual information to the Runnable.
Again, asynchronous processing but the same restrictions apply for custom threads.
15.2.2 Web Application Environment
This type of servlet container should support this behavior when performed on threads created by the developer, but are not currently required to do so. Such a requirement will be added in the next version of this specification. Developers are cautioned that depending on this capability for application-created threads is not recommended, as it is non-portable.
Non-portable means it can may in one server but not in an other.
When you want do receive messages with JMS outside of an MDB you can use four methods on javax.jms.MessageConsumer:
#receiveNoWait() you can to this in a container thread, it doesn't block, but it's like peeking. If no message is present it just returns null. This isn't very well suited for listening to messages.
#receive(long) you can to this in a container thread, it does block. You generally don't wan't to do blocking waits in a container thread. Again not very well suited for listening to messages.
#receive(), this blocks possibly indefinitely. Again not very well suited for listening to messages.
#setMessageListener() this is what you want, your get a callback when a message arrives. However unless the library can hook into the application server this won't be a container thread. The hooks into the application server are only available via JCA to resource adapters.
So yes, it may work, but it's not guaranteed and there are a lot of things that may break.
You are right that you don't really need a true application server (implementing all Java EE specs) to use Spring. The biggest reason people don't use true Java EE apps like JBoss is that then have been slow as #$##% on cold start up time making development a pain (hot deploy still doesn't work that well).
You see there are two camps:
Java EE
Spring Framework.
One of the camps believes in the spec/committee process and the other believes in benevolent dictator / organic OSS process. Both have people with their "agendas".
Your probably not going to get a very good unbiased answer as these two camps are much like the Emacs vs VIM war.
Answer your questions w/ a Spring bias
Because it in theory buys your less vendor lock-in (albeit I have found this to be the opposite).
Spring's biggest advantage is AspectJ AOP. By far.
I guess see Philippe's answer.
(start of rant)
Since #PhilippeMarschall defended Java EE I will say that I have done the Tomcat+RabbitMQ+Spring route and it works quite well. #PhilippeMarschall discussion is valid if you want proper JTA+JMS but with proper setup with Sprig AMQP and an a good transactional database like Postgresql this is less of an issue. Also he is incorrect about the message queue transactions not being bound/synchronized to the platform transactions as Spring supports this (and IMHO much more elegantly with #Transactional AOP). Also AMQP is just plain superior to JMS.
(end of rant)
We are using JBoss over tomcat for the JNDI data sources and pooling.. It makes it so the programmer don't have to know anything about the database but its JNDI name
I have a Java EE application that has two components: First is a service that scrapes some information from internet and fills it into database. Second is a web interface (deployed on tomcat) from where user can browse that information.
What could be the best approach to implement the first component? Should it be run as a background Daemon/Service or a thread within the container?
I would personally separate them into different processes. Aside from anything else, it means you can restart one without worrying about the other. It also means you can really easily deploy them on different machines without pointlessly installing Tomcat for a service which doesn't actually need a web interface.
Depending on the type of application framework, Spring lets you use Quartz or the java.util.concurrent framework. Spring has a TaskExecutor abstraction (see the Spring documentation) which simplifies a lot of this, but check to see which fits best with your design.
Spring or Quartz (managed by Spring) then controls the creation and starting/stopping of Threads or Executors or Jobs, along with their frequency/period and other scheduling parameters, and also manages any pooling of jobs you might require.
I use these for all my background tasks and batch jobs in any Java EE applications I write with no problems. Since the jobs are Spring managed POJOs, they have access to the full dependency injection framework and so on that that Spring entails, and of course you can switch between scheduler frameworks with a simple change to you application configuration XML file as your needs change or scale.
There is nothing wrong with having background jobs inside a web container, but you MUST let the web container know about it so it can be stopped and started properly.
Have a look at the load-on-startup tag in web.xml. There are some advice on http://wiki.metawerx.net/wiki/Web.xml.LoadOnStartup
I want to start a background process in a Java EE (OC4J 10) environment. It seems wrong to just start a Thread with "new Thread" But I can't find a good way for this.
Using a JMS queue is difficult in my special case, since my parameters for this method call are not serializable.
I also thought about using an onTimeout Timer Method on a session bean but this does not allow me to pass parameters (as far as I know).
Is there any "canon" way to handle such a task, or do I just have to revert to "new Thread" or a java.concurrent.ThreadPool.
Java EE usually attempts to removing threading from the developers concerns. (It's success at this is a completely different topic).
JMS is clearly the preferred approach to handle this.
With most parameters, you have the option of forcing or faking serialization, even if they aren't serializable by default. Depending on the data, consider wrapping it in a serializable object that can reload the data. This will clearly depend on the parameter and application.
JMS is the Java EE way of doing this. You can start your own threads if the container lets you, but that does violate the Java EE spec (you may or may not care about this).
If you don't care about Java EE generic compliance (if you would in fact resort to threads rather than deal with JMS), the Oracle container will for sure have proprietary ways of doing this (such as the OracleAS Job Scheduler).
Don't know OCJ4 in detail but I used the Thread approach and a java.util.Timer approach to perform some task in a Tomcat based application. In Java 5+ there is an option to use one of the Executor services (Sheduled, Priority).
I don't know about the onTimeout but you could pass parameters around in the session itself, the app context or in a static variable (discouraged would some say). But the name tells me it is invoked when the user's session times out and you want to do some cleanup.
Using the JMS is the right way to do it, but it's heavier weight.
The advantage you get is that if you need multiple servers, one server or whatever, once the servers are configured, your "Threading" can now be distributed to multiple machines.
It also means you don't want to send a message for a truly trivial amount of work or with a massive amount of data. Choose your interface points well.
see here for some more info:
stackoverflow.com/questions/533783/why-spawning-threads-in-j2ee-container-is-discouraged
I've been creating threads in a container (Tomcat, JBoss) with no problem, but they were really simple queues, and I don't rely on clustering.
However, EJB 3.1 will introduce asynchronous invocation that you may find useful:
http://www.theserverside.com/tt/articles/article.tss?track=NL-461&ad=700869&l=EJB3-1Maturity&asrc=EM_NLN_6665442&uid=2882457
Java EE doesn't really forbid you to create your own threads, it's the EJB spec that says "unmanaged threads" arn't allowed. The reason is that these threads are unknown to the application server and therefore the container cannot manage things like security and transactions on these threads.
Nevertheless there are lots of frameworks out there that do create their own threads. For example Quartz, Axis and Spring. Changes are your already using one of these, so it's not that bad to create your own threads as long as you're aware of the consequences. That said I agree with the others that the use of JMS or JCA is preferred over manual thread creation.
By the way, OC4J allows you to create your own threads. However it doesn't allow JNDI lookups from these unmanaged threads. You can disable this restriction by specifying the -userThreads argument.
I come from a .NET background, and JMS seems quite heavy-weight to me. Instead, I recommend Quartz, which is a background-scheduling library for Java and JEE apps. (I used Quartz.NET in my ASP.NET MVC app with much success.)
Is there a way to deactivate the optimization for collocated objects that Weblogic uses by default for a specific EJB ?
EDIT: Some context :
We have a scheduler service that runs inside one node of the cluster. This is for historic reasons and cannot be changed at the moment.
This service makes call to an EJB and we would like to load balance these calls. Unfortunately at the moment every calls runs on the node that hosts the scheduler service because of the optimization mentioned in the question.
I was thinking of coding a custom load balancing class however this optimization seems to be done before the load balancing step happens.
Supposing you are trying to call a remote EJB (load balancing on local ejbs can only be obtained through an indirection trick like Patrick mentioned) you will have to create a new InitialContext using the address of the cluster instead of a particular server. This new IC will provide stubs as if you were a foreign client, subject to the same load balancing strategies as they are.
Unfortunately, this means that EJB3 injections won't work. You will have to do the lookup yourself. There is a chance, an this is pure speculation, that those stubs you can get from the cluster IC are serializable. In other words, it might be possible to bind them and get them injected using #Resource afterwards.
Not being too familiar with guts of weblogic but reading their material I would say you cannot, without some amount of trickery.
It does seem though that you can put a JMS (MDB) facade in front of your EJB that does not abide by the collocated object optimization.
OR
If your scheduler is servlet based, you should be able to deploy it in a separate web-application within the container and have it execute calls to the EJB cluster.
How many nodes are there in your cluster? Have you considered deploying the EJB to nodes other than the one the scheduler service has been deployed to?
I happen to stumble upon this old question.
Is it possible that you create a JMS queue in between the scheduler and the actual executor? Since all managed servers will consume from the same queue, the queue will act as a load-balancer, in a way.
If you have solved this issue differently, I'd be interested to know how :)