Background]
- There are two java applications (A and B), and they can only communicate via Oracle DB
- A and B share the same database table
- A and B stores the data in cache
Problem]
If A performs simple transaction (insert/update/delete), the cache in A is updated. Also, the cache in B should be updated automatically!
Current Status]
Two solutions I found and tried
- Solution1) Using DatabaseChangeListener
- Solution2) Using Socket Programming
Question]
The solution will be used for company, and I would like to know if there is anything that I can improve my solutions.
1) What could be disadvantages if I use DatabaseChangeListener?
2) What could be disadvantages if I use socket programming? (Maybe it's too low-level that developer cannot maintain due to company policy?)
3) I heard there are 3rd party cache that also supports synchronization. Am I correct?
Please let me know if you need more information!
Thank you very much in advance!
[EDIT]
If would be much appreciated if you can leave a comment when you down-vote this. I would like to know how I can improve this question with your feedback! Thank you
Your question appears every now and then with slightly different aspects. One useful answer to that is here: Guava Cache, how to block access while doing removal
About using the DatabaseChangeListener:
Although you are fine with oracle, I would discourage the use of vendor specific interfaces. For me, it would be okay to use, if it is an performance optimization, but I would never use vendor specific interfaces for basic functionality.
Second, the usage of the change listener may still lead to dirty reads.
About "distributed caches" as veritas suggested:
There is a difference between distributed caches and clustered caches. Distributed caches spread (aka distribute) the cached data on different nodes, clustered caches are caches for clustered applications that keep track of data consistency within the cluster. A distributed cache usually is a clustered cache, but not the other way around. For a general idea on the topic I recommend the infinispan documentation on clustering as an intro: http://infinispan.org/docs/7.0.x/user_guide/user_guide.html#_clustering
Wrap up:
A clustered cache implementation is the thing you need. However, if you want data consistency, you still need to carefully design your transaction handling.
You can, of course, also do socket communication yourself and send simple object invalidate messages to the other applications. The challenging part is the error handling. When was the invalidate successful? Is there a timeout for the other nodes to acknowledge? When to drop a node and maintain a cluster state at all?
I will suggest for the 3rd Party Cache, if you have many similar use cases or many tables need to be updated .
Please read about terracotta Distributed Cache.
It gives exactly what you want.
You can also look for hazelcast or memcached
Related
within our company it's kind of standard to create repositories for data which is originally stored in the database as described for example in https://thinkinginobjects.com/2012/08/26/dont-use-dao-use-repository/.
Our web infrastructure consist of a few independent web applications within Tomcat 7 for printing, product description, product order (this is not persisted in the database!), category description etc.
They are all build on Servlet 2 API.
So each instance/implementation of repository holds a specialised kind of data represented by serializable classes and the instances of this serialzable classes are set up/filled by an periodically executed database query (for every resultrow the setters of the fields are called; reminds me of domain oriented entity beans with CMP).
The repositories are initialized on the servlets init sequences (so every servlet keeps it's own set of instances).
Each context has a own connection to the Oracle database (set up by resource description file on deployment).
All the data is read only, we never need to write back to the database.
Because we need some of these data types for more than one web application (context) and some even for more than one servlet within the same web context repositories with an identical data type are instantiated more than once - e.g. four times, twice within the same application.
In the end some of the data is doubled and I'm not sure if this is as clever and efficient as it should be. It should be possible to share the same repository object to more than one application (JNDI?) but at least it must be possible to share it for several servlets within the same application context.
Despite I'm irritated by the idea to use a "self build" repository instead of something like a well tested, open developed cache (ehcache, jcs, ...) because some of these caches also provide options for distributed caches (so it should also work within the same container).
If certain entries are searched the search algorithm iterates over all entries in the repository (s. link above). For every search pattern there are specialised functions which are directly called from within the business logic classes using the "entity beans"; there's no specification object or interface.
In the end the application server as a whole does not perform that well and it uses a hell lot of RAM (at least for approximately 10000 DB entries); this is in my opinion most probably correlated to the use of serializeable XSD-to-JAXB-generated classes.
Additionally every time a application is deployed for tests you have to wait at least two minutes until all entries of the database have been loaded into the repositories - when deploying on live there's a well recognizable out of service phase on context/servlet start up.
I tend to think all of this is closely related to the solutions I described above.
Because I haven't got any experiences in this field and I'm new in the company I don't want to be to obtrusive.
Maybe you can help me to evaluate ideas for a better setup:
Is it for performance and memory better to unify all the repositories into one "repository servlet" and request objects from there via HTTP (don't think so, though it seems quite modular/distributed system friendly) or should I try to go with JNDI (never did that before) and connect to the repository similar to a JDBC database?
Wouldn't it be even more sensible, faster and efficient to at least use only one single connection pool for the whole Tomcat (and reference this connection pool from within the web apps deployment descriptor)? Or might that slow down connections or limit it in any other aspect?
I was told that the cache system (ehcache) didn't work well (at least not with the performance of the self written solution - though: I can't believe that). I imagine the usage of repositories backed by a distributed (as across all contexts) cache used in all web applications should not only reduce memory footprint significantly but should not be significantly slower. - I believe it will be faster and have shorter start up times respectively it shouldn't be needed to redeploy it that often.
I'm very grateful for every tip or hint and your thoughts. Would be marvellous to get a peer review of my ideas based on practical experiences.
So thank you very much in advance!
Is it better to hold a repository for every web application (context) or is it better to share a common instance by JDNI or a similar technique
Unless someone proves me otherwise I would say there is no way to do it, in a standard way, meaning as defined in the Servlet Sepc or in the rest of the Java EE spec canon.
There are technical ways to do it which probably depend on a specific application server implementation, but this cannot be "better" in its universal sense.
If you have two applications that operate on the same data, I wonder whether the partitioning of the applications is useful. Maybe all functionality operating on some kind of data needs to be in the same application?
within our company it's kind of standard to create repositories for data which is originally stored in the database as described for example in https://thinkinginobjects.com/2012/08/26/dont-use-dao-use-repository/.
I looked up Evans in our book shelf. The blog post is quite weird. A repository and a DAO are basically the same thing, it provides CRUD operations for an object or for a tree of objects (Evans says only the the aggregate roots).
The repositories are initialized on the servlets init sequences (so every servlet keeps it's own set of instances). Each context has a own connection to the Oracle database (set up by resource description file on deployment). [ ... ]
In the end the application server as a whole does not perform that well and it uses a hell lot of RAM
When something performs badly its the best to do profiling, e.g. with YourKit or with perf and FlameGraphs if you are on Linux. If your applications need a lot of RAM, analyze the heap e.g. with Eclipse MAT. There is no way somebody can give you a recommendation or hint on a best practice without seeing any line of code.
A general answer would include anyting about performance tuning for Oracle DBs, JDBC, Java Collections and Concurrent Programming, Networking and Operating Systems.
I was told that the cache system (ehcache) didn't work well (at least not with the performance of the self written solution - though: I can't believe that)
I can. EHCache is between 10-20 times slower then a simple HashMap. See: cache benchmarks. You only need a map, when you do a complete preload and don't have any mutations.
I imagine the usage of repositories backed by a distributed (as across all contexts) cache used in all web applications should not only reduce memory footprint significantly but should not be significantly slower
Distributed caches need to go over the network and add serialization/deserialization overhead. That's probably another factor 30 slower. When is the distributed cache updated?
I'm very grateful for every tip or hint and your thoughts.
Wrap up:
Do the normal software engineering homework, do profiling and analyzing and spend the effort of tuning at the right places
Ask specific questions on one topic on stackoverflow and share your code and performance data. Ask a question about one thing at one time and read https://stackoverflow.com/help/on-topic
You may also come to the conclusion that there is nothing to tune. There are applications out there that need a day to build up an in memory data structure from persistent data. Maybe its just a lot of data? If you do not like the downtime use green blue deployment. Also use smaller data sets for development and testing
In an technical discussion, I had this question of how to maintain a Single instance across nodes,
then I answered the below approaches
1) DB based solution
2) Distributed Cache
3) Sharding
4)Maintain the Singleton single Instance in load balancer
Interviewer was expecting something more, since he replied
DB based and Cache will work and sharding will not work and no comments on load balancer approach, then further he added that let us assume that DB and Cache approach are not allowed in your application, give me another option
I was stuck at this point.
Then I googled and found the following
How to create singleton java class for multiple jvm support?
Singleton in Cluster environment
https://javaarchitectforum.com/2013/02/19/singleton-design-pattern-with-example/
Also found some support from the application servers
http://www.onjava.com/pub/a/onjava/2003/08/20/jboss_clustering.html
http://docs.oracle.com/cd/E12840_01/wls/docs103/cluster/service_migration.html#wp1051458
http://docs.oracle.com/cd/E24152_01/Platform.10-1/ATGPlatformProgGuide/html/s1005runningthesameschedulableservice01.html
Kindly help me with your thoughts which would be the best approach to implement single instance(singleton) across nodes
All options will fail if you do not know what you are doing. I'd like to call the singleton an anti-pattern rather than a design pattern, because the way it is often used, is usually recipe for disaster.
Now, lets get to the answer. You should ask the interviewer: why is a singleton really needed? What state really needs to be stored at one specific location? Is this state mutable? What is the concurrency strategy? Will there be mostly writes? Or mostly reads? Does all access have to be synchronised?
You are thinking in the wrong direction because you think a solution to this problem actually exists. Well, you will find a way, but it will be a compromise between concurrent performance and absolute synchronous access.
Now, let us quickly walk through the options you have provided yourself:
1) DB based solution
This could work, but you have to ensure your locking strategy is supported by the database, and that you use it wisely.
2) Distributed Cache
This is essentially the same as a DB based solution. You could even write your own microservice to do this job. You should realise that a distributed cache is exactly the same as a database, only one that is optimised for concurrent reads on the same data (that has an expiration strategy). Keep in mind that if not well-explained, a caching solution may sound as if you are not aware that a cache usually invalidates/expires, and has a just-in-time generation fallback. This may sound as if you did not understood the problem of wanting to have a singleton instance "alive" at all time.
3) Sharding
Sharding is a technique you can you use if your singleton is too big to fit into one database based solution mentioned before. I highly doubt that your singleton will get so big.
4) Maintain the Singleton single Instance in load balancer
This makes your load balancer schizophrenic and is a bad idea. Load balancers are really simple components, and should stay that way.
It is possible that you might benefit from http://sw1nn.com/blog/2012/04/11/clojure-stm-what-why-how/ STM, which is a technique in Clojure and thus should work on the JVM. I furthermore agree with anything Jan-Willem Gmelig Meyling has said above. This is an anti pattern you should avoid. My STM suggestion is specifically in a language that does not allow you to mutate state, which is a predicate for such things not to become dangerous (it would prevent the schizofrenic issue JWGM talks about.)
In Java, I have a HashMap containing objects (which can be serializable, if it helps). Elsewhere on a network, I have another HashMap in another copy of the application that I would like to stay in sync with the first.
For example if on computer A, someone runs myMap.put("Hello", "World"); and on computer B, someone runs myMap.put("foo", "bar");, then after some time delay for changes to propagate, both computers would have mayMap.get("Hello") == "World" and mayMap.get("foo") == "bar".
Is this requirement met by an existing facility in the Java language, a library, or some other program? If this is already a "solved problem" it would be great not to have to write my own code for this.
If there are multiple ways of achieving this I would prefer, in priority order:
Changes are guaranteed to propagate 100% of the time (doesn't matter how long it takes)
Changes propagate rapidly
Changes propagate with minimal bandwidth use between computers.
(Note: I have had trouble searching for solutions as results are dominated by questions about synchronizing access to a Map from multiple threads in the same application. This is not what my question is about.)
You could look at the hazelcast in-memory database.
It's an open source solution designed for distributed architectures.
It maps really well to your problem since the hazelcast IMap extends java.util.Map.
Link: Hazelcast IMap
what you are trying to do is call clustering between two node
here i have some solution
you can achieve your requirement using serialization make your map
serializable read and write state of map in each interval of time
and sync it.this is core and basic way to achieve your
functionality.but by using serialization you have to manually manage
sync of map(i.e you have to do code for that)
Hazelcast open source distributed caching mechanism hazelcast
is best api and have reach libarary to achive cluster environment
and share data between different node
coherence web also provide mechanism to achieve clustering by
Oracle
Ehcache is a cache library introduced in 2003 to improve
performance by reducing the load on underlying resources. Ehcache is
not for both general-purpose caching and caching Hibernate
(second-level cache), data access objects, security credentials, and
web pages. It can also be used for SOAP and RESTful server caching,
application persistence, and distributed caching
among all of above Hazelcast is best api go through it will sure help you
Currently we have 2 app severs, each has application level cache and has centralized database server. To keep both servers app cache in sync we have set up JMS broker in between. On cache clear on one server which sends message to JMS, since other is registered so it will get the message and clears the perticular entry based on message content.
Since this messaging system adds latency in clearing the cache entry, for some amount of time there will be inconsistency between application level caches.
So we thought of having centralized cache server to avoid all this extra work to done to keep all caches in sync.
We are thinking of using Ehcache/Terracotta or Hazelcast, these cache hold resultsets, locks info, and some system specific varaibles.
Please suggest best cache solution for us.
I probably can't suggest the best solution for you but I'll try to give some ideas:
Hazelcast: offers very easy to use distributed map (and lot's of other things worth to have a look at - distributed SQL Query is very neat):
Map<String, Object> map = Hazelcast.getMap("xxx");
and you are done. Work on the map using standard API's. Hazelcast config/setup is quite easy (compared to Ehcache/TC). The monitoring webapp is also easy to use and helpful but there are things missing. Performance should be more than sufficient for a small cluster (like your 2 servers).
Ehcache/Terracotta: would introduce a new infrastructure component to your setup (Terracotta Server) - may be a downside. Using this setup is in my experience quite intense in terms of things to learn and try out. The promise is enterprise class level performance and monitoring facilities.
If you don't have extreme high performance requirements I personally would go for Hazelcast and avoid the complexity of Ehcache/TC.
We have been using centralized Memcached server (as Hibernate 2nd level cache and other caching requirement) and its working well for us. We are using Memcached with XMemcached client and so far its working without any problem.
I am working with an object that serves as a database in my application. However, I need to have redundant copies of this database. So, on init, I create multiple instances (say 5) copies of the same object. (I am using JAVA for this, so any hint of pre-existing libraries could be helpful as well.)
The object is a server that listens on a port for request for the information it is holding. This information may be updated by other entities via the same or a different port at any time.
My question is as follows:
Would a lock strategy
work in this case? That is, every time an update is made in
any instance, that instance contacts
all other instances and passes the
update.
During this time, all the requests
(read or update) from other entities
are queued.
Would this approach work? I have my doubts because, even if this works, I think the system is creating its own bottleneck. What do you guys say? Is there a better way of doing this distributed synchronization?
What you're describing is a distributed cache. The big player in that space is currently Coherence though I believe JBoss Cache is catching up.
As for rolling your own, having seen the complexity in what superficially sounds quite a simple problem, I wouldn't recommend it in a comercial setting, though it'd be a fun home project.
Are you talking about a distributed cache? Have you looked at ehcache?
Would this approach work? I have my
doubts because, even if this works, I
think the system is creating its own
bottleneck.
It would be creating its own bottleneck. You'd be better off using an in-memory database like HSQLDB or an embedded database like SQLite.
There is lot more to distributed syntonization than it's possible to mention in a single answer. You have to worry about two-phase commits, network partitions, etc. etc. I would advise you to look into an existing distributed DB solution combined with an n-tier Java EE architecture that includes load-balancing.