I am moving my project to spring boot 2.3.5 which has driver 4.6 and LatencyAwarePolicy looks like disappeared from existence. I wonder do we have similar policy builder for driver 4.6 or what is the best approach for 4.6.
https://docs.datastax.com/en/drivers/java/3.6/com/datastax/driver/core/policies/LatencyAwarePolicy.Builder.html
I search but could not find anything in the docs. maybe https://github.com/datastax/java-driver/blob/4.x/core/src/main/java/com/datastax/oss/driver/internal/core/loadbalancing/BasicLoadBalancingPolicy.java
with BasicLoadBalancingPolicy I can connect without data center name but I am confused is this as good as LatencyAwarePolicy ?
The default load balancing policy in 4.x now has best practices baked in, including token awareness and busy node avoidance (which was the goal of LatencyAware).
This blog post discusses more:
https://www.datastax.com/blog/improved-client-request-routing-apache-cassandratm
You can still implement any LBP you care to by implementing the LoadBalancingPolicy interface, but it normally should not be necessary.
Related
I have a spring application on heroku (seems like there is no way to use ehCache there). I found several projects on github like hibernate-memcached for the second level cache, but they have just few stars on github and I'm not sure if they are not buggy.
Would you recommend anything for java + hibernate 4 second level cache based on memcached?
https://github.com/kwon37xi/hibernate4-memcached
I made this.
One of heroku user sent me an email that he was using this library.
I have never been used heroku, but anyway I think this library work fine with heroku.
I heard heruku memcached requires authentication, refer to this wiki page https://github.com/kwon37xi/hibernate4-memcached/wiki/SpyMemcachedAdapter about authentication.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
There are four high level APIs to access Cassandra and I do not have time to try them all. So I hoped to find somebody who could help me to choose the proper one.
I'll try to write down my findings about them:
Datanucleus-Cassandra-Plugin
pros:
supports JPA1, JPA2, JDO1 - JDO3 - as I read in a review, JDO scales better than Hibernate with JPA
all the pros as mentioned in kundera?
cons:
no exeirience with JDO up to now (relevant only for me of course ;)
documentation not found!
kundera
pros:
JPA 1.0 annotations with all advantages (standard conform, no boilerplate code, ...)
promise for following features in near future: JPA listeners, #PrePersist #PostPersist etc. - Relationships, #OneToMany, #ManyToMany etc. - Transactional support, #Transactional
cons:
early development stage of the plugin?
bugs?
no possibillity to fix problems in the JDO / JPA framework?
s7 pelops
pros:
pure java api --> finer control over persistence?
cons:
pure java api --> boilerplate code
hector 0.7
pros:
mavenized
spring integration --> dependency injection
pure java api --> finer control over persistence?
jmx monitoring?
managing of nodes seems to be easy and flexible
cons:
pure java api (no annotations) --> boiler plate code
Conclusion so far
As I am confident with RDMS, Hibernate, JPA, Spring and not so up to date anymore with EJB, my first impression was, to go for kundera would have been the right choice. But after reading some posts regarding JPO, DataNucleus, I am not sure anymore. As the learning curve should be steep (also for expirienced JPA developers?) for DataNucleus, I am not sure, whether I should go for it.
My major concern is the status of the plugin. Also the forum support/help for JDO and Datanucleus-Cassandra-Plugin, as it is not as wide spread, as far as I understood.
Is anybody out there, who has experience, with some of the framworks already and can give me a hint? Maybe a mixed strategy would make sense as well. In cases (if they exist) JDO is not flexible/sufficient/whatever enough for my needs, to fall back to one of the easier APIs of pelops or hector? Is this possible? Is there an approach like in JPA to get an sql connection and fetch/put data?
After reading a bit on, I found following additional information:
Datanucleus-Cassandra-Plugin is based on the pelops, which also can be accessed for more flexibility, more performance (?), which should be used on the column families with a lot of data, JDO/JPA access should be only used on "administrative" data, where performance is not so important and data amount is not overwhelming.
Which still leaves the question open to start with hector or pelops.
pelops for it's later Datanucleus-Cassandra-Plugin extensibility, or
hector for it's more sufficient support on node hanldling.
I tried most of these solutions and find hector the best. Even when you have some problem you can always reach people who wrote hector in #cassandra in freenode. and the code is more mature as far as I concern. In cassandra client the most critical part would be connection pooling management (since all the clients do mostly the same operations through thrift, but connection pooling is what makes high level client roll). In that case I would vote for hector since I am using it in production for over a year now with no visible problem (1 reconnect issue fixed as soon as I discovered and send an email about it).
I am still using cassandra 0.6 though.
The author of the datanucleus plugin, Todd Nine, is working on the next-gen JPA support in Hector now.
The Hector client was the API that we choose because of the following things that it had:
Connection Pooling (huge performance gain when sharing a connection to a node)
Complete Custom Configuration using interfaces for most everything.
Auto Discovery Hosts
Custom Load Balancing Policy definitions (LeastActiveBalancingPolicy or RoundRobinBalancingPolicy or implement LoadBalancingPolicy)
Light-weight adapter on top of the Thrift API.
Great examples: See hector-examples
Built in JMX support.
Downside of Hector:
Documentation not bad, but the Java Docs are lacking a bit. That could easily be a Git fork / pull request by the user community.
The ORM support was a bit limited, but not urgent for usage in our case. I couldn't get some of the one-to-many associations to work easily, plus lack of describing what type of Cassandra model (super columns or column families for associated collections). Also a lack of Java examples (maybe there are some, please post if you find some).
Also, I tried using kundera with very little success. Not many examples to use or try, very little forum support. It appears to be maintained by one person, which makes it even hard to choose a tool like that. It appears based on the SVN activity it was migrating to using Hadoop instead or support for it as well.
Kundera 2.0.4 released.
Major Changes in this release:
Cross-datastore persistence( Easy to migerate existing mysql app over nosql)
support for relational databases (e.g Mysql etc)
replace solandra with lucene based indexing.
Support added for bi-directinal associations.
Performance improvement fixes.
I would propose also Astyanax, I'm working with it and I'm quite happy. Only the documentation is not really good.
Astyanax API
Astyanax implements a fluent API which guides the caller to narrow or
customize the query via a set of well defined interfaces. We've also
included some recipes that will be executed efficiently and as close
to the low level RPC layer as possible. The client also makes heavy
use of generics and overloading to almost eliminate the need to
specify serializers.
Some key features of the API include:
Key and column types are defined in a ColumnFamily class which
eliminates the need to specify serializers.
Multiple column family key types in the same keyspace. Annotation based composite column names.
Automatic pagination.
Parallelized queries that are token aware.
Configurable consistency level per operation.
Configurable retry policy per operation.
Pin operations to specific node.
Async operations with a single timeout using Futures.
Simple annotation based object mapping.
Operation result returns host, latency, attempt count.
Tracer interfaces to log custom events for operation failure and success.
Optimized batch mutation.
Completely hide the clock for the caller, but provide hooks to customize it.
Simple CQL support.
RangeBuilders to simplify constructing simple as well as composite column ranges.
Composite builders to simplify creating composite column names.
Recipes Recipes for some common use cases:
CSV importer.
JSON exporter to convert any query result to JSON with a wide range of
customizations.
Parallel reverse index search.
Key unique constraint validation.
http://techblog.netflix.com/2012/01/announcing-astyanax.html
I suggest you give Kundera-2.0.1 a try. It has gone a major change since its inception and I see a lot of new features getting added and bugs being fixed. Currently it supports JPA 1.0 and Cassandra 0.7.6 but they are planning to add support for Cassandra 0.8 and JPA 2.0 very soon. There is a pretty good example here: https://github.com/impetus-opensource/Kundera/wiki/Getting-Started-in-5-minutes
You can try Achilles, a new Entity Manager I've developed that supports all CQL3 features.
Entity mapping
JPA style operations
Limited support for join
Mapping of clustered entities using compound primary key
Queries (native, typed, slice)
Support for counters
Support for Consistency level
TTL & timestamp
JUnit 4 Rule to start embedded Cassandra server for testing
And so more ...
There are 2 implementations: Thrift & CQL
The Thrift version relies on Hector under the hood.
The CQL version pulls the brand new Java Driver Core from Datastax for all operations
Quick reference here
Is it possible (and does it make sense) to use the JDO Level 2 Cache for the Google App Engine Datastore?
First of all, why is there no documentation about this on Google's pages? Are there some problems with it? Do we need to set up limits to protect our memcache quota?
According to DataNucleus on Stackoverflow, you can set the following persistence properties:
datanucleus.cache.level2.type=javax.cache
datanucleus.cache.level2.cacheName={cache name}
Is that all? Can we choose any cache name?
Other sources on the Internet report using different settings.
Also, it seems we need to download the DataNucleus Cache support plugin. Which version would be appropriate? And do we just place it in WEB-INF/lib or does it need more setup to activate it?
Before you can figure this out, you have to answer one question:
Which version of DataNucleus are you using?
Everything on this post has to do with the old version of the plugin -- v1. Only recently has the Google Plugin for Eclipse supported v2 of the DataNucleus plugin for AppEngine (which is basically the conduit between AppEngine and the DataNucleus Core).
I'd recommend upgrading to v2 of the Datanucleus plugin for AppEngine -- if you're using Eclipse, it's easy -- there's a UI for it that allows you to select v1 or v2. Just go to your Project properties and find the App Engine settings and look for "Datanucleus JDO/JPA version".
Plus, you have to make a change to your jdo-config.xml. Specifically, you have to change just one property:
<property name="javax.jdo.PersistenceManagerFactoryClass" value="org.datanucleus.api.jdo.JDOPersistenceManagerFactory"/>
SO -- Once you've upgraded to v2, this is how you specify caching (an addition to jdoconfig.xml):
<property name="datanucleus.cache.level2.type" value="jcache"/>
<property name="datanucleus.cache.level2.cacheName" value="NameItWhateverYouWant"/>
At this point, caching should happen automatically every time you put and get using a PersistenceManager. Hooray!
No known problems with anything to do with L2 caching and GAE/J. If people have problems then perhaps they ought to report them to Google. Set the cache name to what you wish. Anything put into memcache has to be Serializable, obviously, since that is what memcache does. Yes, you need the datanucleus-cache plugin (ver 1.x), and put it in the same place as any other DN jars. One day Google will update to use DN 2.x
It seems to have problems instead: I tried (with JPA) and I got the error someone else already reported: http://code.google.com/p/datanucleus-appengine/issues/detail?id=163
I'm hoping to find out what tools folks use to synchronize data between databases. I'm looking for a JDBC solution that can be used as a command-line tool.
There used to be a tool called Sync4J that used the SyncML framework but this seems to have fallen by the wayside.
I have heard that the Data Replication Service provided by Db4O is really good. It allows you to use Hibernate to back onto a RDBMS - I don't think it supports JDBC tho (http://www.db4o.com/about/productinformation/drs/Default.aspx?AspxAutoDetectCookieSupport=1)
There is an open source project called Daffodil, but I haven't investigated it at all. (https://daffodilreplicator.dev.java.net/)
The one I am currently considering using is called SymmetricDS (http://symmetricds.sourceforge.net/)
There are others, they each do it slightly differently. Some use triggers, some poll, some use intercepting JDBC drivers. You need to decide what technical limitations you are under to determine which one you really want to use.
Wikipedia provides a nice overview of different techniques (http://en.wikipedia.org/wiki/Multi-master_replication) and also provides a link to another alternative DBReplicator (http://dbreplicator.org/).
If you have a model and DAO layer that exists already for your codebase, you can just create your own sync framework, it isn't hard.
Copy data is as simple as:
read an object from database A
remove database metadata (uuid, etc)
insert into database B
Syncing has some level of knowledge about what has been synced already. You can either do it at runtime by getting a list of uuids from TableInA and TableInB and working out which entries are new, or you can have a table of items that need to be synced (populate with a trigger upon insert/update in TableInA), and run from that. Your tool can be a TimerTask so databases are kept synced at the time granularity that you desire.
However there is probably some tool out there that does it all without any of this implementation faff, and each implementation would be different based on business needs anyway. In addition at the database level there will be replication tools.
True synchronization requires some data that I hope your database schema has (you can read the SyncML doc to see how they proceed). Sync4J won't help you much, it's really high-level and XML oriented. If you don't foresee any conflicts (which means: really easy synchronisation), you could try with a lightweight ETL like Enhydra Octopus.
I'm primarily using Oracle at the moment, and the most full-featured route I've come across is Red Gate's Data Compare:
http://www.red-gate.com/products/oracle-development/data-compare-for-oracle/
This old blog gives a good summary of the solution routes available:
http://www.novell.com/coolsolutions/feature/17995.html
The JDBC-specific offerings I've come across have been very basic. The solution mentioned by Aidos seems the most feature complete if you want to go down the publish-subscribe route:
http://symmetricds.codehaus.org/
Hope this helps.
We are looking at implementing a caching framework for our application to help improve the overall performance of our site.
We are running on Websphere 6.1, Spring, Hibernate and Oracle. The focus currently is on the static data or data that changes very little through out the day but is used a lot.
So any info would be great. I have started my google search and reading but wanted to get a feel for what has worked for members of the community. One of the things that we are interested in is the ability to have the system invalidate the cache when a change does happen in the underlining data.
Thanks
Update:
I came across an article on the IBM web site where it says that Hibernate's Cluster aware caches in conjunction with WebSphere Application Server has not been determined yet; therefore, is is not yet determined whether or not their use is supported.
Thoughts on that? We are running in a clustered environment.
Well the hibernate cache system does just that, I used ehCache effectively and easily with Hibernate (and the second level cache system).
Lucene could be an option too depending on the situation. Hibernate Search or Compass could help with that (although it might take some major work).
Replication using Terracotta could also be an option although I've never done it.
Hibernate has first and second level caching built in. You can configure it to use EhCache, OSCache, SwarmCache, or C3P0 - your choice.
You can also leverage WebSphere's default cache i.e. DynaCache to implement the second level cache. This will allow you to administer, monitor and configure your cache leveraging WebSphere caching infrastructure
I've used ehCache and OSCache and found OSCache to be easier to configure and use.
One of the things that we are
interested in is the ability to have
the system invalidate the cache when a
change does happen in the underlining
data.
From what I can see, Hibernate doesn't actually do the above - from Hibernate's docs:
Be careful. Caches are never aware of
changes made to the persistent store
by another application (though they
may be configured to regularly expire
cached data).
Obviously what it means is that a cache doesn't have ESP, and can't tell if an app not in the cluster has called straight DML on the database - but I am guessing that what you want is an ability to expose a service for legacy apps to hook in and invalidate the cache when they do update that data. And there isn't, to my knowledge, a suggestion about how this might be done.