We are designing an application in which MDBs will pick up incoming messages and do a series of tasks.
Some of these are functional like XML validation and some are like aspects such as logging, MIS entries etc.
Edit:
Message Types can be different based on the functionality such as Ordering or Raising Faults or Information Services like Postcode Lookups. These also vary on the Caller so an Order for Caller A is different from B, but mostly the XML structure should be the same.
Each will go through functional units of work such as Validation of Sender, Validation of product codes (if placing an order), Security checks (IP based), Registration into our DB if valid, Error Queue if not and so on.
My question is since we want to make the functional bits modular such that we can build one MDB which does functions A,B,C and another MDB to do B,C,D and so on based on what type of message it is - and based on which tasks are common across all the message types.
What design pattern should I be using for this?
Secondly, is there a way for me to configure these functions in an XML file, so that the MDB reads the XML to see which functions it has to execute and in what sequence? This is as an alternative to having the modules in Helper POJOs or Session Beans which are linked from the main MDB which is what we currently thought of.
#shinynewbike: for your problem, it would be better to use MDB to just read the message and determine the type of message and then MDB can consult a factory class to return a list of handlers implementing same interface and which MDB can iterate over to call...so basically a command design pattern. A sample XML configuration:-
<configuration>
<handler name="A" class="A"/>
<handler name="B" class="B"/>
...
<handlers-stack name="stack1">
<handler ref="A">
<handler ref="c">
</handlers>
<message type="X" handlers-ref="stack1"/>
<message type="Y" handlers-ref="stack2"/>
</configuration>
Strategy, probably, with the quirk that each MDB can have several strategies. If you want to configure the set of strategies that a bean uses in a file (or an env-entry or similar), then you'll have to obtain references to the strategies via JNDI, rather than having them injected, which is a minor pain.
In a non-EJB world, i would suggest Observer, but with EJBs, i think it's rather hard to have one component give another a long-lived reference to itself. Unless they're #Singleton, which MDBs aren't.
Pattern terms aside, we have strived at our company to keep business logic out of the MDB class itself. This works really well for what you are trying to build here, which almost sounds more like an Enterprise Service Bus (ESB) Service Gateway pattern. Check out the following links from MSDN (good page even though it isnt Java) and Martin Fowler.
I would recommend allowing the MDB to take in the messages. Then you could use other patterns (Command, Strategy, Factory, etc) to do the actual work. Or, the main MDB could figure out where the message should be forwarded to and then forward the message to a queue dedicated to a particular type of function.
This does add some administrative and resource overhead from the perspective of more queues and MDB's. But it also adds a bit more separation between the different logic for the differing messages (ie, separation of concerns). And it also gives you the ability to throttle the differing "implementation" queues differently depending on performance needs, rather than having one queue be the bottleneck for all.
There are performance considerations to adding new queues. I wish I could give you a concrete answer as to "how much", but I can't really as it depends on what you choose to use for your application server and your JMS and/or messaging provider. And unfortunately, there is no magic "number" for what is right. You really have to sit down and discuss with other architects how many queues you need. It is best to do this upfront with your design. This will hammer out any number of queues. Next try to figure out the load on the system. How big will your messages be? 100KB? 1MB? 5MB? larger? smaller? And then how many messages will be coming through the system at a time? With numbers like these you can revisit your decision on the number of queues and see if it still makes sense. You can also have your application server/messaging admins (or you if that happens to be you) throttle the queues with the different configuration settings so as to allow for smoother messaging through your system. (You may also need to tune the application/messaging servers JVM heap settings too depending on what you encounter).
Sadly, the best way to gain performance by throttling your application server is by reading, reading, reading whatever documentation and forums say about it. And also by experience of working it yourself.
But even with that. The most important thing is good and yet simplistic design. If you go overboard and make a queue for everything you may impact performance. But then again you may not. But you might over-complicate your application and make it harder to troubleshoot.
I'll try to find some more links for you, but honestly take what we all say here and discuss it with your fellow developers.
And if you could alter your question to mention how many message types you might deal with or their purpose, we all could give you a better recommendation along the design, number of queues, etc.
While I realize that this does not squarely answer your question, you might want to have a look at the apache commons chain
Related
Requirement: Log events like Page Views and form Submits. Each page has ~1 second SLA. The application can have 100's of concurrent users at a time.
Log events are stored into the Database.
Solution: My initial thought was to use an async logging approach where the control returns back to the application and the logging happens in a different thread (via Spring's Thread pool task executor).
However someone suggested using JMS would be a more robust approach. Is the added work(Setting-up queue(s), writing to the queue(s), reading from the queue(s)) required when using this approach worthwhile?
What are some of the best practices / things to look out for (in a production environment) when implementing something like this?
Both approaches are valid, but one is vulnerable if you app unexpectedly stops. In your first scenario, events yet to be written to the database will be lost. Using a persistent JMS queue will mean that those events will be read from the queue and persisted to the database upon restart.
Of course, if your DB writes are so much slower than placing a message of similar size on to a JMS queue, you may be solving the wrong problem?
Using JMS for logging is a complete mismatch. JMS is a Java Abstraction for a Middleware Tool like MQ Series. That is complete overkill, and will let you go through a setup and configuration hell. JMS also lets you place messages in a transactional context, so you already get quickly the idea that JMS might be not much better than Database writes as #rjsang suggested.
This is not that JMS is not a nice technolgy. It is a good technology where it is applied properly.
For Assynchronous logging, you better just depend on a Logging API that directly supports it like Log4j2. In your case, you might be looking to configure a AsyncAppender with a JDBCAppender. Log4j2 has many more appenders as additional options, including one for JMS. However, by at least using a Logging abstraction, you make that all configurable and will make it possible to change your mind at a later time.
In the future we might have something similar to Asynchronous CDI Events, which should work similar to JMS, but would be much more lightweight. Maybe you can get something similar to work by combining CDI Events with EJB Asynchronous Methods. As long as you don't use EJB's with a remote interface, it should also be pretty lightweight.
You could give it a try using fully async and external tooling if you want to. If you have to stick to your SLA at any price and resilience is important for you, you could try using either logstash or process your logs offline. With doing so, you decouple your application from the database and you are no longer depending on the database performance. If the database is slow and you're using async loggers, queues might run full.
With logstash using GELF the whole log processing is handled within a different (or even remote) JVM. Offline processing (e.g. you write CSV logs) allows you to load the log data afterwards into the database.
How should I design an application comprised of numerous (but identical) independent processes that need to communicate data to an enterprise application and be monitored and accessible by a web interface?
Here's a more concrete example in Java:
The independent processes are multiple instances of a standalone J2SE application that receives on initialization data about a "user" entity and then starts doing stuff regarding this "user" (this is an infinite process and so any batch sort of design would be wrong here and also similarly, the starting time of these processes is irrelevant)
The enterprise application is a set of J2EE beans and web-services that implement business logic, DB access etc.. and that are (for example) hosted on GlassFish.
The web front is a set of JSPs (perhaps also on GlassFish) that work with the beans.
Now ideally, I want a way for the processes in (1) to be able to invoke methods from the beans in (2), but also for the beans in (2) to be able and update the processes (1) about things.
So these are the required flows of executions, assuming there are 10 independent process of (1) running for 10 different users (consider a "user" something easily identifiable by, say, a number):
Something happens in one of the processes of (1) and they invoke a method from the enterprise application (2) with some data.
One of the real, human, users (which was already identified by the web app) clicks something on a web-page of (3), this invokes a method in (2), and then some "magical" entity (which I have no idea how to name) finds the independent process from (1) that is responsible for this particular user and updates the process with some new data.
My best approach so far is to expose these J2SE apps by JMX and go from there, but I have one thing I don't understand - who or what should be holding a key-pair list of the sort "the process at URI X is responsible for user Y" and then directing the calls accordingly.
BTW, please feel free to give any advice outside of the Java platform (!), as long as it is a platform that can be scaled easily.
EDIT:
Also, is there a way to "host" such independent processes on some app-server? Something that will re-spawn processes if they fail, allow for deployment and monitoring of such processes on remote machines etc.?
There has been some time since I have used Java Message Service in the past so I am afraid I am not up-to-date with the technical details, but from your description it seems like it would suit your case, to handle communication between the adminstration GUI and the client processes.
There are various options (I believe you are interested for asynchronous communication) so you should take a look on the latest developments to examine yourself if it fits your case or not.
Regarding the data size that the server would exchange with the processes I believe this is a different topic and I must say that the answer depends. Would it be better to send all data in the message? Or would the message be just a notification so the client to be notified and then connect to some enterprise bean to check some new state? I would prefer the latter case but this is something you should decide based on your requirements. I wouldn't blindly exclude the first option unless I had some apparent evidence that this wouldn't work.
Regarding the scaling I don't think it can be much worse then the scaling of your rest of your beans. As much the server is concerned they processes are all clients that need to be served.
Please take the above advice with a grain of salt: I don't know specifics of your problem/design. I am speaking more about in a general way.
I hope that helps
Are there any recommendations, best practices or good articles on providing integration hooks ?
Let's say I'm developing a web based ordering system. Eventually I'd like my client to be able to write some code, packaged it into a jar, dump it into the classpath, and it would change the way the software behaves.
For example, if an order comes in, the code
1. may send an email or sms
2. may write some additional data into the database
3. may change data in the database, or decide that the order should not be saved into the database (cancel the data save)
Point 3 is quite dangerous since it interferes too much with data integrity, but if we want integration to be that flexible, is it doable ?
Options so far
1. provide hooks for specific actions, e.g. if this and that occurs, call this method, client will write implementation for that method, this is too rigid though
2. mechanism similar to servlet filters, there is code before the actual action is executed and code after, not quite sure how this could be designed though
We're using Struts2 if that matters.
This integration must be able to detect a "state change", not just the "end state" after the core action executes.
For example if an order changes state from In Progress to Paid, then it will do something, but if it changes from Draft to Paid, it should not do anything.The core action in this case would be loading the order object from the database, changing the state to Paid, and saving it again (or doing an sql update).
Many options, including:
Workflow tool
AOP
Messaging
DB-layer hooks
The easiest (for me at the time) was a message-based approach. I did a sort-of ad-hoc thing using Struts 2 interceptors, but a cleaner approach would use Spring and/or JMS.
As long as the relevant information is contained in the message, it's pretty much completely open-ended. Having a system accessible via services/etc. means the messages can tap back in to the main app in ways you haven't anticipated.
If you want this to work without system restarts, another option would be to implement handlers in a dynamic language (e.g., Groovy). Functionality can be stored in a DB. Using a Spring factory makes this pretty fun and reduces some of the complexity of a message-based approach.
One issue with a synchronous approach, however, is if a handler deadlocks or takes a long time; it can impact that thread at the least, or the system as a whole under some circumstances.
I've been scratching my head around developing a simple plugin based architecture on top of Spring, for one of my current apps. No matter how much separation one could achieve using patterns like MVC, one always reaches a point where coupling is inevitable.
Thus, I started weighing options. At first I thought that filters are a good one. Every plugin I'd make would be a filter, which then I will simply insert into the filter map. Of course, this will create a bit of overhead when enumerating and checking all the filters, but at least , controllers won't have to care what has happened to the data before it reached them, or what happens afterwards, they will just care to fetch the models (through DAO or whatnot) and return them.
The problem with this is that not all of my app requests are HTTP-based. Some are based on emails, others are internally scheduled (timed), so Filters won't help much, unless I try to adapt every type of incoming request to HTTPRequest, which would be too much.
Another one I thought about was annotation based AOP, where I annotate every method, where the plugin would intercept methods based on certain conventions. My problem with is that first I am not so experienced with AOP in general, and second, simply writing all those conventions already suggests a bit of coupling
By far the option that mostly appeals to my way of thinking is using Spring-based events. Every type of request handler within my app (web controller, email handler, etc) will be a sort of an event dispatcher, which will dispatch Spring events on every major action. On the other hand, plugins will simply listen for when a particular event happens, and do some logic. This will allow me to utilize point #1 as well, as some of those plugins could be filters as well, i.e when they receive a notification that a certain controller action is done, they may just decide to do nothing, and rather wait for when they get called by the filter chain. I see this as a somewhat nice approach. Of course here comes the overhead again, of dispatching events, plus the fact that every involved class will eb coupled with Spring forever, but I see this as a necessary evil.
My main concern regarding Spring events is performance, both in terms of latency, and memory footprint.
I am still not an expert, so a bunch of feedback here would be of tremendous help. Are spring events the best for this type of architecture, or there is another solution that I've missed? I am aware that there might even be some third-party solutions out there already, so I'd be glad if someone could point out one or two tried and proven ones.
Thanks.
The concept of a plugin can be achieved with the Spring bean factory. If you create a common interface you can define multiple beans that implement it and inject them where needed. Or you can use a factorybean to deliver the right plugin for the job.
Your idea of using events is called an 'Event Driven Architecture'. This goes a lot further than just plugins because it not only decouples from the implementation but also offers the possibility to decouple from which instance is used (multiple handlers), which location (multiple machines) and the time at which the request is handled (asynchronous handling). The tradeoff is an increased overall complexity, a reduced component-level complexity and the need for a messaging infrastructure. Often JMS is used, but if you just want a single-node setup both Spring and Mule offer simple in-memory modes as well.
To help you further you should expand a bit on the requirements you are trying to meet and the architectural improvements you want. So far you have mentioned that you want to use plugins and described some possible solutions, but you have not really described what you are trying to achieve.
I want to start a background process in a Java EE (OC4J 10) environment. It seems wrong to just start a Thread with "new Thread" But I can't find a good way for this.
Using a JMS queue is difficult in my special case, since my parameters for this method call are not serializable.
I also thought about using an onTimeout Timer Method on a session bean but this does not allow me to pass parameters (as far as I know).
Is there any "canon" way to handle such a task, or do I just have to revert to "new Thread" or a java.concurrent.ThreadPool.
Java EE usually attempts to removing threading from the developers concerns. (It's success at this is a completely different topic).
JMS is clearly the preferred approach to handle this.
With most parameters, you have the option of forcing or faking serialization, even if they aren't serializable by default. Depending on the data, consider wrapping it in a serializable object that can reload the data. This will clearly depend on the parameter and application.
JMS is the Java EE way of doing this. You can start your own threads if the container lets you, but that does violate the Java EE spec (you may or may not care about this).
If you don't care about Java EE generic compliance (if you would in fact resort to threads rather than deal with JMS), the Oracle container will for sure have proprietary ways of doing this (such as the OracleAS Job Scheduler).
Don't know OCJ4 in detail but I used the Thread approach and a java.util.Timer approach to perform some task in a Tomcat based application. In Java 5+ there is an option to use one of the Executor services (Sheduled, Priority).
I don't know about the onTimeout but you could pass parameters around in the session itself, the app context or in a static variable (discouraged would some say). But the name tells me it is invoked when the user's session times out and you want to do some cleanup.
Using the JMS is the right way to do it, but it's heavier weight.
The advantage you get is that if you need multiple servers, one server or whatever, once the servers are configured, your "Threading" can now be distributed to multiple machines.
It also means you don't want to send a message for a truly trivial amount of work or with a massive amount of data. Choose your interface points well.
see here for some more info:
stackoverflow.com/questions/533783/why-spawning-threads-in-j2ee-container-is-discouraged
I've been creating threads in a container (Tomcat, JBoss) with no problem, but they were really simple queues, and I don't rely on clustering.
However, EJB 3.1 will introduce asynchronous invocation that you may find useful:
http://www.theserverside.com/tt/articles/article.tss?track=NL-461&ad=700869&l=EJB3-1Maturity&asrc=EM_NLN_6665442&uid=2882457
Java EE doesn't really forbid you to create your own threads, it's the EJB spec that says "unmanaged threads" arn't allowed. The reason is that these threads are unknown to the application server and therefore the container cannot manage things like security and transactions on these threads.
Nevertheless there are lots of frameworks out there that do create their own threads. For example Quartz, Axis and Spring. Changes are your already using one of these, so it's not that bad to create your own threads as long as you're aware of the consequences. That said I agree with the others that the use of JMS or JCA is preferred over manual thread creation.
By the way, OC4J allows you to create your own threads. However it doesn't allow JNDI lookups from these unmanaged threads. You can disable this restriction by specifying the -userThreads argument.
I come from a .NET background, and JMS seems quite heavy-weight to me. Instead, I recommend Quartz, which is a background-scheduling library for Java and JEE apps. (I used Quartz.NET in my ASP.NET MVC app with much success.)