how to keep object instances synchronized - java

I am working with an object that serves as a database in my application. However, I need to have redundant copies of this database. So, on init, I create multiple instances (say 5) copies of the same object. (I am using JAVA for this, so any hint of pre-existing libraries could be helpful as well.)
The object is a server that listens on a port for request for the information it is holding. This information may be updated by other entities via the same or a different port at any time.
My question is as follows:
Would a lock strategy
work in this case? That is, every time an update is made in
any instance, that instance contacts
all other instances and passes the
update.
During this time, all the requests
(read or update) from other entities
are queued.
Would this approach work? I have my doubts because, even if this works, I think the system is creating its own bottleneck. What do you guys say? Is there a better way of doing this distributed synchronization?

What you're describing is a distributed cache. The big player in that space is currently Coherence though I believe JBoss Cache is catching up.
As for rolling your own, having seen the complexity in what superficially sounds quite a simple problem, I wouldn't recommend it in a comercial setting, though it'd be a fun home project.

Are you talking about a distributed cache? Have you looked at ehcache?

Would this approach work? I have my
doubts because, even if this works, I
think the system is creating its own
bottleneck.
It would be creating its own bottleneck. You'd be better off using an in-memory database like HSQLDB or an embedded database like SQLite.

There is lot more to distributed syntonization than it's possible to mention in a single answer. You have to worry about two-phase commits, network partitions, etc. etc. I would advise you to look into an existing distributed DB solution combined with an n-tier Java EE architecture that includes load-balancing.

Related

Is it better to hold a repository for every web application (context) or is it better to share a common instance by JNDI or a similar technique

within our company it's kind of standard to create repositories for data which is originally stored in the database as described for example in https://thinkinginobjects.com/2012/08/26/dont-use-dao-use-repository/.
Our web infrastructure consist of a few independent web applications within Tomcat 7 for printing, product description, product order (this is not persisted in the database!), category description etc.
They are all build on Servlet 2 API.
So each instance/implementation of repository holds a specialised kind of data represented by serializable classes and the instances of this serialzable classes are set up/filled by an periodically executed database query (for every resultrow the setters of the fields are called; reminds me of domain oriented entity beans with CMP).
The repositories are initialized on the servlets init sequences (so every servlet keeps it's own set of instances).
Each context has a own connection to the Oracle database (set up by resource description file on deployment).
All the data is read only, we never need to write back to the database.
Because we need some of these data types for more than one web application (context) and some even for more than one servlet within the same web context repositories with an identical data type are instantiated more than once - e.g. four times, twice within the same application.
In the end some of the data is doubled and I'm not sure if this is as clever and efficient as it should be. It should be possible to share the same repository object to more than one application (JNDI?) but at least it must be possible to share it for several servlets within the same application context.
Despite I'm irritated by the idea to use a "self build" repository instead of something like a well tested, open developed cache (ehcache, jcs, ...) because some of these caches also provide options for distributed caches (so it should also work within the same container).
If certain entries are searched the search algorithm iterates over all entries in the repository (s. link above). For every search pattern there are specialised functions which are directly called from within the business logic classes using the "entity beans"; there's no specification object or interface.
In the end the application server as a whole does not perform that well and it uses a hell lot of RAM (at least for approximately 10000 DB entries); this is in my opinion most probably correlated to the use of serializeable XSD-to-JAXB-generated classes.
Additionally every time a application is deployed for tests you have to wait at least two minutes until all entries of the database have been loaded into the repositories - when deploying on live there's a well recognizable out of service phase on context/servlet start up.
I tend to think all of this is closely related to the solutions I described above.
Because I haven't got any experiences in this field and I'm new in the company I don't want to be to obtrusive.
Maybe you can help me to evaluate ideas for a better setup:
Is it for performance and memory better to unify all the repositories into one "repository servlet" and request objects from there via HTTP (don't think so, though it seems quite modular/distributed system friendly) or should I try to go with JNDI (never did that before) and connect to the repository similar to a JDBC database?
Wouldn't it be even more sensible, faster and efficient to at least use only one single connection pool for the whole Tomcat (and reference this connection pool from within the web apps deployment descriptor)? Or might that slow down connections or limit it in any other aspect?
I was told that the cache system (ehcache) didn't work well (at least not with the performance of the self written solution - though: I can't believe that). I imagine the usage of repositories backed by a distributed (as across all contexts) cache used in all web applications should not only reduce memory footprint significantly but should not be significantly slower. - I believe it will be faster and have shorter start up times respectively it shouldn't be needed to redeploy it that often.
I'm very grateful for every tip or hint and your thoughts. Would be marvellous to get a peer review of my ideas based on practical experiences.
So thank you very much in advance!
Is it better to hold a repository for every web application (context) or is it better to share a common instance by JDNI or a similar technique
Unless someone proves me otherwise I would say there is no way to do it, in a standard way, meaning as defined in the Servlet Sepc or in the rest of the Java EE spec canon.
There are technical ways to do it which probably depend on a specific application server implementation, but this cannot be "better" in its universal sense.
If you have two applications that operate on the same data, I wonder whether the partitioning of the applications is useful. Maybe all functionality operating on some kind of data needs to be in the same application?
within our company it's kind of standard to create repositories for data which is originally stored in the database as described for example in https://thinkinginobjects.com/2012/08/26/dont-use-dao-use-repository/.
I looked up Evans in our book shelf. The blog post is quite weird. A repository and a DAO are basically the same thing, it provides CRUD operations for an object or for a tree of objects (Evans says only the the aggregate roots).
The repositories are initialized on the servlets init sequences (so every servlet keeps it's own set of instances). Each context has a own connection to the Oracle database (set up by resource description file on deployment). [ ... ]
In the end the application server as a whole does not perform that well and it uses a hell lot of RAM
When something performs badly its the best to do profiling, e.g. with YourKit or with perf and FlameGraphs if you are on Linux. If your applications need a lot of RAM, analyze the heap e.g. with Eclipse MAT. There is no way somebody can give you a recommendation or hint on a best practice without seeing any line of code.
A general answer would include anyting about performance tuning for Oracle DBs, JDBC, Java Collections and Concurrent Programming, Networking and Operating Systems.
I was told that the cache system (ehcache) didn't work well (at least not with the performance of the self written solution - though: I can't believe that)
I can. EHCache is between 10-20 times slower then a simple HashMap. See: cache benchmarks. You only need a map, when you do a complete preload and don't have any mutations.
I imagine the usage of repositories backed by a distributed (as across all contexts) cache used in all web applications should not only reduce memory footprint significantly but should not be significantly slower
Distributed caches need to go over the network and add serialization/deserialization overhead. That's probably another factor 30 slower. When is the distributed cache updated?
I'm very grateful for every tip or hint and your thoughts.
Wrap up:
Do the normal software engineering homework, do profiling and analyzing and spend the effort of tuning at the right places
Ask specific questions on one topic on stackoverflow and share your code and performance data. Ask a question about one thing at one time and read https://stackoverflow.com/help/on-topic
You may also come to the conclusion that there is nothing to tune. There are applications out there that need a day to build up an in memory data structure from persistent data. Maybe its just a lot of data? If you do not like the downtime use green blue deployment. Also use smaller data sets for development and testing

Single instance(singleton) of a class in a java application deployed in many nodes

In an technical discussion, I had this question of how to maintain a Single instance across nodes,
then I answered the below approaches
1) DB based solution
2) Distributed Cache
3) Sharding
4)Maintain the Singleton single Instance in load balancer
Interviewer was expecting something more, since he replied
DB based and Cache will work and sharding will not work and no comments on load balancer approach, then further he added that let us assume that DB and Cache approach are not allowed in your application, give me another option
I was stuck at this point.
Then I googled and found the following
How to create singleton java class for multiple jvm support?
Singleton in Cluster environment
https://javaarchitectforum.com/2013/02/19/singleton-design-pattern-with-example/
Also found some support from the application servers
http://www.onjava.com/pub/a/onjava/2003/08/20/jboss_clustering.html
http://docs.oracle.com/cd/E12840_01/wls/docs103/cluster/service_migration.html#wp1051458
http://docs.oracle.com/cd/E24152_01/Platform.10-1/ATGPlatformProgGuide/html/s1005runningthesameschedulableservice01.html
Kindly help me with your thoughts which would be the best approach to implement single instance(singleton) across nodes
All options will fail if you do not know what you are doing. I'd like to call the singleton an anti-pattern rather than a design pattern, because the way it is often used, is usually recipe for disaster.
Now, lets get to the answer. You should ask the interviewer: why is a singleton really needed? What state really needs to be stored at one specific location? Is this state mutable? What is the concurrency strategy? Will there be mostly writes? Or mostly reads? Does all access have to be synchronised?
You are thinking in the wrong direction because you think a solution to this problem actually exists. Well, you will find a way, but it will be a compromise between concurrent performance and absolute synchronous access.
Now, let us quickly walk through the options you have provided yourself:
1) DB based solution
This could work, but you have to ensure your locking strategy is supported by the database, and that you use it wisely.
2) Distributed Cache
This is essentially the same as a DB based solution. You could even write your own microservice to do this job. You should realise that a distributed cache is exactly the same as a database, only one that is optimised for concurrent reads on the same data (that has an expiration strategy). Keep in mind that if not well-explained, a caching solution may sound as if you are not aware that a cache usually invalidates/expires, and has a just-in-time generation fallback. This may sound as if you did not understood the problem of wanting to have a singleton instance "alive" at all time.
3) Sharding
Sharding is a technique you can you use if your singleton is too big to fit into one database based solution mentioned before. I highly doubt that your singleton will get so big.
4) Maintain the Singleton single Instance in load balancer
This makes your load balancer schizophrenic and is a bad idea. Load balancers are really simple components, and should stay that way.
It is possible that you might benefit from http://sw1nn.com/blog/2012/04/11/clojure-stm-what-why-how/ STM, which is a technique in Clojure and thus should work on the JVM. I furthermore agree with anything Jan-Willem Gmelig Meyling has said above. This is an anti pattern you should avoid. My STM suggestion is specifically in a language that does not allow you to mutate state, which is a predicate for such things not to become dangerous (it would prevent the schizofrenic issue JWGM talks about.)

Synchronize between two Java application via Database

Background]
- There are two java applications (A and B), and they can only communicate via Oracle DB
- A and B share the same database table
- A and B stores the data in cache
Problem]
If A performs simple transaction (insert/update/delete), the cache in A is updated. Also, the cache in B should be updated automatically!
Current Status]
Two solutions I found and tried
- Solution1) Using DatabaseChangeListener
- Solution2) Using Socket Programming
Question]
The solution will be used for company, and I would like to know if there is anything that I can improve my solutions.
1) What could be disadvantages if I use DatabaseChangeListener?
2) What could be disadvantages if I use socket programming? (Maybe it's too low-level that developer cannot maintain due to company policy?)
3) I heard there are 3rd party cache that also supports synchronization. Am I correct?
Please let me know if you need more information!
Thank you very much in advance!
[EDIT]
If would be much appreciated if you can leave a comment when you down-vote this. I would like to know how I can improve this question with your feedback! Thank you
Your question appears every now and then with slightly different aspects. One useful answer to that is here: Guava Cache, how to block access while doing removal
About using the DatabaseChangeListener:
Although you are fine with oracle, I would discourage the use of vendor specific interfaces. For me, it would be okay to use, if it is an performance optimization, but I would never use vendor specific interfaces for basic functionality.
Second, the usage of the change listener may still lead to dirty reads.
About "distributed caches" as veritas suggested:
There is a difference between distributed caches and clustered caches. Distributed caches spread (aka distribute) the cached data on different nodes, clustered caches are caches for clustered applications that keep track of data consistency within the cluster. A distributed cache usually is a clustered cache, but not the other way around. For a general idea on the topic I recommend the infinispan documentation on clustering as an intro: http://infinispan.org/docs/7.0.x/user_guide/user_guide.html#_clustering
Wrap up:
A clustered cache implementation is the thing you need. However, if you want data consistency, you still need to carefully design your transaction handling.
You can, of course, also do socket communication yourself and send simple object invalidate messages to the other applications. The challenging part is the error handling. When was the invalidate successful? Is there a timeout for the other nodes to acknowledge? When to drop a node and maintain a cluster state at all?
I will suggest for the 3rd Party Cache, if you have many similar use cases or many tables need to be updated .
Please read about terracotta Distributed Cache.
It gives exactly what you want.
You can also look for hazelcast or memcached

Is there a standard way of synchronising a Map of objects across a network?

In Java, I have a HashMap containing objects (which can be serializable, if it helps). Elsewhere on a network, I have another HashMap in another copy of the application that I would like to stay in sync with the first.
For example if on computer A, someone runs myMap.put("Hello", "World"); and on computer B, someone runs myMap.put("foo", "bar");, then after some time delay for changes to propagate, both computers would have mayMap.get("Hello") == "World" and mayMap.get("foo") == "bar".
Is this requirement met by an existing facility in the Java language, a library, or some other program? If this is already a "solved problem" it would be great not to have to write my own code for this.
If there are multiple ways of achieving this I would prefer, in priority order:
Changes are guaranteed to propagate 100% of the time (doesn't matter how long it takes)
Changes propagate rapidly
Changes propagate with minimal bandwidth use between computers.
(Note: I have had trouble searching for solutions as results are dominated by questions about synchronizing access to a Map from multiple threads in the same application. This is not what my question is about.)
You could look at the hazelcast in-memory database.
It's an open source solution designed for distributed architectures.
It maps really well to your problem since the hazelcast IMap extends java.util.Map.
Link: Hazelcast IMap
what you are trying to do is call clustering between two node
here i have some solution
you can achieve your requirement using serialization make your map
serializable read and write state of map in each interval of time
and sync it.this is core and basic way to achieve your
functionality.but by using serialization you have to manually manage
sync of map(i.e you have to do code for that)
Hazelcast open source distributed caching mechanism hazelcast
is best api and have reach libarary to achive cluster environment
and share data between different node
coherence web also provide mechanism to achieve clustering by
Oracle
Ehcache is a cache library introduced in 2003 to improve
performance by reducing the load on underlying resources. Ehcache is
not for both general-purpose caching and caching Hibernate
(second-level cache), data access objects, security credentials, and
web pages. It can also be used for SOAP and RESTful server caching,
application persistence, and distributed caching
among all of above Hazelcast is best api go through it will sure help you

Highest Performance Database in Java

I need ideas to implement a (really) high performance in-memory Database/Storage Mechanism in Java. In the range of storing 20,000+ java objects, updated every 5 or so seconds.
Some options I am open to:
Pure JDBC/database combination
JDO
JPA/ORM/database combination
An Object Database
Other Storage Mechanisms
What is my best option? What are your experiences?
EDIT: I also need like to be able to Query these objects
You could try something like Prevayler (basically an in-memory cache that handles serialization and backup for you so data persists and is transactionally safe). There are other similar projects.
I've used it for a large project, it's safe and extremely fast.
If it's the same set of 20,000 objects, or at least not 20,000 new objects every 5 seconds but lots of changes, you might be better off cacheing the changes and periodically writing the changes in batch mode (jdbc batch updates are much faster than individual row updates). Depends on whether you need each write to be transactionally wrapped, and whether you'll need a record of the change logs or just aggregate changes.
Edit: as other posts have mentioned Prevayler I thought I'd leave a note on what it does:
Basically you create a searchable/serializable object (typically a Map of some sort) which is wrapped in a Prevayler instance, which is serialized to disk. Rather than making changes directly to your map, you make changes by sending your Prevayler instance a serializable record of your change (just an object that contains the change instruction). Prevayler's version of a transaction is to write your serialization changes to disk so that in the event of failure it can load the last complete backup and then replay the changes against that. It's safe, although you do have to have enough memory to load all of your data, and it's a fairly old API, so no generic interfaces, unfortunately. But definitely stable and works as advertised.
I highly recommend H2. This is a kind of "second generation" version of HSQLDB done by one of the original authors. H2 allows us to unit-test our DAO layer without requiring an actual PostgreSQL database, which is awesome.
There is an active net group and mailing list, and the author Thomas Mueller is very responsive to queries (hah, little pun there.)
I don't know if it is the fastest option, but I've been very satisfied with H2 whenever I've used it. It's written by the same person who originally wrote Hypersonic (which later became HSQLDB).
Another option that is allegedly very fast is Prevayler.
It is a bit of an old question, but these days there is a whole lot of databases that have a level of performance of 20,000/s. Which database to chose depends on data structure and type of queries you'd like to be making. It also depends on overall volume.
We had similar problem with large volume of time series data, about 300,000 rec/s and we ended up writing a new database, with simple enough API and decent performance. It can do about 2,000,000 object writes/s and we did away without ORM.
It later evolved into QuestDB.
Try the following, it performs really well with Hibernate and other ORM frameworks
http://hsqldb.org/
Chronicle Map is an embeddable pure Java persistent database, providing a simple java.util.Map interface. It withstands about 1 million queries/updates per second from a single thread, consistent read/write performance and scales almost linearly to the number of cores in the machine.
Here are some recent performance research with actual numbers:
Comparison of Jetbrains Xodus, Oracle Berkeley DB JE BTree, MapDB TreeMap, Chronicle Map and H2 MVStore Map
LmdbJava Benchmarks
I would give a try to OrientDB.
Terracotta might also be an answer for you. It allows multiple VMs to share objects so you can distribute load etc...
You can also check out db4o
If you want to store all of your data in memory, you might want to look at Prevayler.
I've never used it myself, but it seems like a much better solution than using a relational database for those cases in which all of your data can be stored in memory.
Berkeley DB for Java is a fast in memory database, extremely useful for simple object graphs.
hsqldb is quite fast, but it is not ACID transaction-safe. The fastest java-database I know is db4o: benchmarks.
Edit: Please notice that Prevayler is not a database, see http://www.prevayler.org/wiki.jsp?topic=PrevaylerIsNotADatabase. If you're out of RAM, you're out of luck.
H2 is truly fantastic, indeed, in memory, normal server and transactional, you have it all. However It doesn't compare in performance to the object databases, I see Db4o mentioned, I have had much better performance with Neodatis in fact, and everything nicely set up in Maven repositories. Although not very robust, like a Ferrari, fast but not a truck like Oracle.
You can try CSQL (available under open source and enterprise version) It provides 30X performance improvement over disk based database systems and provides JDBC interface. It can be configured to work as stand alone main memory database or as a transparent cache to MySQL, Postgres, Oracle databases.

Categories

Resources