replicating data between sites in a j2ee app

replicating data between sites in a j2ee app - java

we have a J2EE app on which we're still working. It runs on Oracle DB. and the business tier is coded with EJB 2.0 with a rich client interface.
now, the application is going to be deployed on multiple sites. and each site will be creating new items (new contracts, etc.).
what we want to realize is to duplicate all new items added to a central location DB using same DB schema as local ones.
what do you think is the most effective way to accomplish this ?
I've thought about serializing all new items created and sending them to the remote site for integration through Java Message Service queue. is that approach good ?
and there's also going to be some changes to be replicated back to the satellites.

I would say that a synchronous relationship with the centre introduces coupling that you don't want. Hence your asynch idea seems pretty good to me. You presumably have some location-dependent identifier in the records so that new contracts creations in the different locations will not clash and you accept some latency in the replication to the centre.
So the simple case is that just use JMS messages from each location to the centre.
The nice thing about this appraoch is that the satelites don't even need to know about the database structure in the centre, it can be designed completely idepedently.
Things get more interesting if you also need to replicate changes back from the centre to the satelites. The big question is whether we might get conflicts between changes at the centre and changes at the satelite.
Simple case: Any data item has one "home". For example the originating satelite is the place where changes are made. Or after creation the centre is the only place ti make changes. In which case we can treat the centre as the "Hub", it can propogate changes out to the satelites. Simple JMS will do just fine for that.
Slightly harder case: Changes can be made anywhere, but only in one place at a time. Hence we can introduce somne kind if locking scheme. I would tend to have the Centre as the owner and use synchrnous web services to lock and update the data. Now we are coupled, but that necessary if we are to have a definitive owner.
Quite complex case: Anyone can changes anything anywhere without locking. It's kind of an "Act First, Apologise Later" approach. We take the optimistic approach that changes won't clash. We can send the changes for approval to the centre and the centre can either use an optimistic locking or merge non-conflicting changes approach. I would tend to do this by queueing changes a the originator, but actually processing them by synchronous calls. So decoupling the specification of change from the availability of the centre. Some more sophisticated Databases have diff/merge capabilities that may help with this.
Big questions are the extent to which you want to be coupled to the availability of the centre and the likelihood of conflicting changes. Quite often cunning application design can greatly reduce the likelihood of conflict.

Related

Is there a common pattern to replace In Memory Datastores in a running application?

Let's say you have an application and this application is not using a common database, but a loads all the data in-memory and it works all from there.
Now, this is a WS and every few hours you want to 'update' your in-memory database. Since you don't want to risk an downtime and just want to replace the in-memory datastore on the fly. But there could be certain risk coming with that:
Certain data, which were available in the 'old' datastore might not be there anymore and if someone is consuming your data right in that moment, they will end up with an error.
Certain data could change and might just not be named the same or just slightly different.
I was just wondering if there is a certain approach or pattern for this kind of problem?
My approach would be to have for a certain time two datastores. The 'old' and the 'new' one. All new session get served by the new datastore and all old session will use the old datastore and might be forced to refresh to use the new one at some point. After that, I would throw the old one away and start the same song again.

The "DBMS way" to solve this would be to implement "read repeatability", usually through MVCC (multi-version concurrency control). You're somewhat doing that with the 'old' and 'new' datastore concept you describe, but in a coarse (i.e. not granular) way. A DBMS would do it on a per-SQL-cursor basis. But, depending on the potential size of the user community for the web service, you might run into scaling issues.

Should we be reusing JPA entities

Currently I am working for a company that has 6-7 Java EE projects. They are multi module maven projects that are all fairly large and serve different purposes. As such, their models are very different but for the most part the data is stored in the same database.
The problem, to me, is that since there are a few areas of overlap, they simply inject the existing DAOs all the way up the depedency chain. So I have
A-parent
-A-JPA
-A-DAO
B-Parent
-B-JPA
-B-DAO
-A-JPA
-B-DAO
etc, etc. They are really only using 2 percent of the other projects model and respective DAO.
I am trying to attempt to decouple these dependencies by simply duplicating the entities needed (and only including fields/mappings for the things that are really needed) so that the same EJB isn't deployed 7 times (or more when clustered), but apparently I'm not making a convincing argument. Can anyone help point me to an article with best practices for this situation or help bring up points to explain to him.
TLDR: I want each project to have its own set of entities even if there is a very small bit of overlap to reduce dependencies between projects as well as make it so we aren't deploying the same EJB's 7 times. My boss thinks there is nothing wrong with these being unnecessarily coupled. Am I making a big deal about this for nothing? Thanks!

If it is a single data model that is being maintained for various applications to use, the persistence entities (and even their DAOs) may be seen as the Java API to that database, and I'd put it in a central component. Some organizations may even drive the design from the database upwards, and reverse engineer the persistence entities, in which case they'll be same or similar for different users.
Whether such central component is a library (reused by other components) or an EJB of its own (that's being called by other components) I would let depend on the desired transactional and caching behavior of the application, and how you see responsibilities being organized. On one project we strongly upheld the rule that each piece of data could only be maintained by a single component (a service or an EJB) and others would have to go through that single component.
If it is a common domain model, but every EJB implement its own data storage for that, then the domain model may be shared, and I would not share the persistence entities. Then you get into the discussion of sharing the domain model among different components. The world may be viewed in slightly different ways from within different sub-domains and I feel that you end up designing your domains slightly different across different sub-systems, hence there I would possibly vote against reuse.
Everyone's mileage may vary, and I may see things differently given the actual circumstances of a particular project.

Can REST in practice really be stateless? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Consider the situation.
I am writing a statistical analysis app. The app has multiple tiers.
Frontend UI written for multiple device types, desktop, browser,
mobile.
Mid-tier servlet that offers a so-called REST service to
these frontend.
Backend that performs the extreme computation of the statistical
processing.
Which communicates with a further backend database
Due to the reason that statistical analysis requires huge amount of processing power, you would never dream of delegating such processing to the front-end.
The statistical analyses consists of procedures or a series of
work-flow steps.
Some steps may require so much processing power, you would not want
to repeat them.
If you have a work-flow of 20 steps, you cannot execute step 20
without first executing step 19, which cannot be executed without
first executing step 18, so on and so forth.
There are observation points, such that, for example, the
statistician must inspect results of steps 3, 7, 9, 14, 19 before
telling to client-side to proceed to the next step.
Each of these steps are a so-called request to the REST service, to
tell the backend supercomputer to progressively set up the
statistical model in memory.
There are many workflows. Some workflows may incidentally share step
results. e.g., Flow[dry]:Step[7] may share Flow[wet]:Step[10]. Due
to the amount of processing involved, we absolutely have prevent
repeating a step that might incidentally have already be
accomplished by another flow.
Therefore, you can see that in the so-called REST service being designed,
it is not possible that each request be independent of any previous request.
Therefore, how true can the following statement be?
All REST interactions are stateless. That is, each request contains
all of the information necessary for a connector to understand the
request, independent of any requests that may have preceded it.
Obviously, the application I described, requires that request be dependent on previous request. There are three possibilities that I can see concerning this app.
My app does not comply to REST, since it cannot comply to stateless requests. It may use the JAX-RS framework, but using JAX-RS and all the trappings of REST does not make it REST, simply because it fails the stateless criteria.
My app is badly designed - I should disregard trying to avoid the temporal and financial cost restacking up a statistical model even if it took 5 - 15 minutes for a workflow. Just make sure there is no dependence on previous requests. Repeat costly steps when necessary.
The stateless criteria is outdated. My understanding of REST is outdated/defective in that REST community have been constantly ignoring this criteria.
Is my app considered RESTful?
New Question: ISO 9000
Finally, in case my app is not completely considered RESTFul, would all references to "REST" need to be omitted to pass ISO 9000 certification?
new edit:
REST-in-piece
OK, my colleague and I have discussed this and decided to call such an architecture/pattern REST-in-piece = REST in piecemeal stages.

ISTM, you're reading too much into to statelessness. A REST API supports traditional CRUD operations. The API for CouchDB is good example of how DB state is updated by a series of stateless transactions.
Your task is to identify what the resources are and the "state transfers" between them. Each step in your workflow is a different state transfer, marked by a different URI. Each update/change to a resource has an accompanying POST/PATCH or an idempotent PUT or DELETE operation.
If you want to gain a better of understanding of what is means to be RESTful and the reasons behind each design choice, I recommend spending a hour reading Chapter 5 of Roy Fielding's Dissertation.
When making design choices, just think about what the principles of RESTful design are trying to accomplish. Setup your design so that queries are safe (don't change state) and that they are done in a ways that can be bookmarkable, cacheable, distributable, etc. Let each step in the workflow jump to a new state with a distinct URI so that a user can back-up, branch out different ways, etc. The whole idea is to create a scalable, flexible design.

You are updating an in memory model via a REST api. This means that you are maintaining state on the server between requests.
The REST-ful way of addressing this would be to make the client maintain the state by simply processing the request and returning all the information for constructing the next request in the response. The server then reconstructs the in memory model from the information in the request and then does its thing. That way, if you operate in a e.g. a clustered environment, any of the available servers would be able to handle the request.
Whether or not this is the most efficient way to do things depends on your application. There are loads of enterprise applications that use a server side session and elaborate load balancing to ensure that clients always use the same nodes in a cluster. So having server side state is an entirely valid design choice and there are plenty of ways of implementing this robustly. However, server side state generally complicates scaling out and REST in the purest sense is all about avoiding server side state and avoiding complexity.
A workaround/compromise is persisting the state in some kind of database or store. That way your nodes can fetch the state from disk before processing a request.
It all depends on what you need and what is acceptable for you. As the previous commenter mentioned, don't get too hung up on this whole stateful-ness thing. Clearly somebody will have to maintain state and the question is merely what the best place is to put that state for you and how you access it. Basically there are a couple of tradeoffs that basically have to do with various what-if type scenarios. For example, if the server crashes, do you want your client to re-run the entire set of requests to reconstruct the calculation or do you prefer to simply resend the last request? I can imagine that you don't really need high availability here and don't mind the low risk that something occasionally goes wrong for your clients. In that case, having the state on the server side in memory is an acceptable solution.
Assuming your server holds the computation state in some hash map, a REST-ful way of passing the state around then could be simply sending back the key for the model in the response. That's a perfectly REST-ful API and you can change the implementation to persist the state or do something else without changing the API when needed. And this is the main point of being REST-ful: decouple the implementation details from the API. Your client doesn't need to know where you put the state or how you store it. All it needs is a resource representation of that state that can be manipulated.
Of course the key should be represented as a URI. I recommend you read Jim Webber's "REST in practice". It's a great introduction to designing REST-ful APIs.

Recommendations on providing integration api

Are there any recommendations, best practices or good articles on providing integration hooks ?
Let's say I'm developing a web based ordering system. Eventually I'd like my client to be able to write some code, packaged it into a jar, dump it into the classpath, and it would change the way the software behaves.
For example, if an order comes in, the code
1. may send an email or sms
2. may write some additional data into the database
3. may change data in the database, or decide that the order should not be saved into the database (cancel the data save)
Point 3 is quite dangerous since it interferes too much with data integrity, but if we want integration to be that flexible, is it doable ?
Options so far
1. provide hooks for specific actions, e.g. if this and that occurs, call this method, client will write implementation for that method, this is too rigid though
2. mechanism similar to servlet filters, there is code before the actual action is executed and code after, not quite sure how this could be designed though
We're using Struts2 if that matters.
This integration must be able to detect a "state change", not just the "end state" after the core action executes.
For example if an order changes state from In Progress to Paid, then it will do something, but if it changes from Draft to Paid, it should not do anything.The core action in this case would be loading the order object from the database, changing the state to Paid, and saving it again (or doing an sql update).

Many options, including:
Workflow tool
AOP
Messaging
DB-layer hooks
The easiest (for me at the time) was a message-based approach. I did a sort-of ad-hoc thing using Struts 2 interceptors, but a cleaner approach would use Spring and/or JMS.
As long as the relevant information is contained in the message, it's pretty much completely open-ended. Having a system accessible via services/etc. means the messages can tap back in to the main app in ways you haven't anticipated.
If you want this to work without system restarts, another option would be to implement handlers in a dynamic language (e.g., Groovy). Functionality can be stored in a DB. Using a Spring factory makes this pretty fun and reduces some of the complexity of a message-based approach.
One issue with a synchronous approach, however, is if a handler deadlocks or takes a long time; it can impact that thread at the least, or the system as a whole under some circumstances.

External database changes -> Hibernate -> Client

An existing external system makes regular (every few seconds) updates to several database tables. We want to build a dashboard type user interface which allows the user to view additional records and important updates in near real-time. The user interface would also allow some transactions which would result in database changes.
Our thoughts are to use a stack with Hibernate and Flex (see http://dl.dropbox.com/u/1431390/overview.jpg) but we are open to using any free/open source technology. There are a few issues we are unsure about should we use our proposed stack:
1) How to automatically update the POJOs with database changes? As far as I understand it, there is no way of hibernate knowing about any changes made outside its own session. Therefore, some sort of polling would have to be done to pick up new and changed records.
2) We were planning to push the data to datagrids within a flex UI (using BlazeDS or WebORB). This seems to rely on identifying the changes and pushing these as updates down the channel. However, if we use the Hibernate->POJO approach identifying these changes could be fairly complex as we have refreshed the data. Is there a better solution which will push the changes on the fly? I would have thought this was a common requirement but I can't find much information online.
Any advice would be gratefully appreciated on either the architecture or the specific issues.
Many thanks,
Ken

For 1) - Use polling or if you have enough budget use a database that supports pushing JMS messages from triggers (DB2, Oracle, MSSql server).
For 2) - There is a commercial product built by Adobe which can solve this problem easier (it has this feature that you are looking for). It has a steep learning curve and is targeted for enterprise. Otherwise you will have to implement your own solution - refresh only the changed data etc.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.