Any alternatives to Spring Data Graph, jo4neo, ogrm? - java

I'm looking for easy-to-use graph DB + ORM solution. The requirements are:
Fluent Java interfaces, no need to use any XMLs.
Ease of graph traversal: "give me all entities of these types, starting from this one, traverse only using this set of relation types".
Full text search out of the box: p.2 + "only consider entities where this field contains this text"
No need to operate on graph level: Neo4j is great, but I'd like to avoid working with setProperty/getProperty directly.
I've already checked these:
ogrm - not supported anymore.
jo4neo - looks like doesn't work p.2 and p.3
Spring Data Graph - seems to be great things, but it's too immature - spent a week trying to make it work fine in Eclipse - no success.
Are there any other similar tools I need to check?

Spring Data Graph is the most actively developed, with a recently released version 1.1.0 and lots of work planned before SpringOne in October.
However, it does create a challenge for IDEs because of the AspectJ enhanced POJOs. Have a look at the documentation for some help getting that going.
Cheers,
Andreas

As of January 2015, Hibernate has started supporting neo4j:
http://hibernate.org/ogm/
Obviously, you can't query using hql, but they support using Cypher queries.

There is also the very new spring-data-gremlin which does everything of what you want with the power of spring-data.
It also allows native queries, spatial indexes and a bunch of other cool stuff.
Note: It is quite immature, but still worth a look.

Related

How can I persist a POJO with transactions, versioning, history, and hierarchy - using no-schema?

I am writing a system where I want to be able to persist various "Job Templates" to some kind of datastore. In our system, the Templates are domain objects which contain descriptions of how to do various mathematical calculations.
I would like to have some kind of robust store for these Template POJOs - preferably some kind of shared/remote repository so that multiple systems could access, and potentially modify, the Templates.
I am slightly overwhelmed by the number of NoSQL, graph-, document- and object-databases out there, and was hoping to get some guidance from people who've done this before!
In my ideal world, this would be a magical no-schema datastore, so that any Template object could be written out to it with a simple method call, and then retrieved again later. I would ideally like to have the following features -
1) Versioning - so I can "overwrite" a previous POJO, and track the version changes
2) History - so I can go back (time machine style) and retrieve prior versions
3) Transactions - so everything is consistent and bulletproof
4) Hierarchy - so I can group POJOs, and find them by path, etc
Some options appear to include Jackrabbit OCM, OrientDB, CouchDB. Also there is JDO (DataNucleus) vs JPA. Could anyone share any insights to cut through the fog of too much choice?
What an open question!
I use Jackrabbit, which fulfills all your needs and is very easy to get up and running. You can perform XPATH queries as well as several other types. Not very good on documentation but the JCR docs are usually enough to get by. There are other JCR implementations. You might want to consider them/find benchmarks if you intend on pushing the limits :)

Hibernate Search, Lucene or any other alternative?

I have a query which is doing ILIKE on some 11 string or text fields of table which is not big (500 000), but for ILIKE obviously too big, search query takes round 20 seconds. Database is postgres 8.4
I need to implement this search to be much faster.
What came to my mind:
I made additional TVECTOR column assembled from all columns that need to be searched, and created the full text index on it. The fulltext search was quite fast. But...I can not map this TVECTOR type in my .hbms. So this idea fell off (in any case i thaught it more as a temporary solution).
Hibernate search. (Heard about it first time today) It seems promissing, but I need experienced opinion on it, since I dont wanna get into the new API, possibly not the simplest one, for something which could be done simpler.
Lucene
In any case, this has happened now with this table, but i would like to solution to be more generic and applied for future cases related to full text searches.
All advices appreciated!
Thanx
I would strongly recommend Hibernate Search which provides a very easy to use bridge between Hibernate and Lucene. Rememeber you will be using both here. You simply annotate properties on your domain classes which you wish to be able to search over. Then when you update/insert/delete an entity which is enabled for searching Hibernate Search simply updates the relevant indexes. This will only happen if the transaction in which the database changes occurs was committed i.e. if it's rolled back the indexes will not be broken.
So to answer your questions:
Yes you can index specific columns on specific tables. You also have the ability to Tokenize the contents of the field so that you can match on parts of the field.
It's not hard to use at all, you simply work out which properties you wish to search on. Tell Hibernate where to keep its indexes. And then can use the EntityManager/Session interfaces to load the entities you have searched for.
Since you're already using Hibernate and Lucene, Hibernate Search is an excellent choice.
What Hibernate Search will primarily provide is a mechanism to have your Lucene indexes updated when data is changed, and the ability to maximize what you already know about Hibernate to simplify your searches against the Lucene indexes.
You'll be able to specify what specific fields in each entity you want to be indexed, as well as adding multiple types of indexes as needed (e.g., stemmed and full text). You'll also be able to manage to index graph for associations so you can make fairly complex queries through Search/Lucene.
I have found that it's best to rely on Hibernate Search for the text heavy searches, but revert to plain old Hibernate for more traditional searching and for hydrating complex object graphs for result display.
I recommend Compass. It's an open source project built on top of Lucene that provider a simpler API (than Lucene). It integrates nicely with many common Java libraries and frameworks such as Spring and Hibernate.
I have used Lucene in the past to index database tables. The solution works great, but remeber that you need to maintain the index. Either, you update the index every time your objects are persisted or you have a daemon indexer that dump the database tables in your Lucene index.
Have you considered Solr? It's built on top of Lucene and offers automatic indexing from a DB and a Rest API.
A year ago I would have recommended Compass. It was good at what it does, and technically still happily runs along in the application I developed and maintain.
However, there's no more development on Compass, with efforts having switched to ElasticSearch. From that project's website I cannot quite determine if it's ready for the Big Time yet or even actually alive.
So I'm switching to Hibernate Search which doesn't give me that good a feeling but that migration is still in its initial stages, so I'll reserve judgement for a while longer.
All the projects are based on Lucene. If you want to implement a very advanced features I advice you to use Lucene directly. If not, you may use Solr which is a powerful API on top of lucene that can help you index and search from DB.

High Level Java Client selection for Apache Cassandra [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
There are four high level APIs to access Cassandra and I do not have time to try them all. So I hoped to find somebody who could help me to choose the proper one.
I'll try to write down my findings about them:
Datanucleus-Cassandra-Plugin
pros:
supports JPA1, JPA2, JDO1 - JDO3 - as I read in a review, JDO scales better than Hibernate with JPA
all the pros as mentioned in kundera?
cons:
no exeirience with JDO up to now (relevant only for me of course ;)
documentation not found!
kundera
pros:
JPA 1.0 annotations with all advantages (standard conform, no boilerplate code, ...)
promise for following features in near future: JPA listeners, #PrePersist #PostPersist etc. - Relationships, #OneToMany, #ManyToMany etc. - Transactional support, #Transactional
cons:
early development stage of the plugin?
bugs?
no possibillity to fix problems in the JDO / JPA framework?
s7 pelops
pros:
pure java api --> finer control over persistence?
cons:
pure java api --> boilerplate code
hector 0.7
pros:
mavenized
spring integration --> dependency injection
pure java api --> finer control over persistence?
jmx monitoring?
managing of nodes seems to be easy and flexible
cons:
pure java api (no annotations) --> boiler plate code
Conclusion so far
As I am confident with RDMS, Hibernate, JPA, Spring and not so up to date anymore with EJB, my first impression was, to go for kundera would have been the right choice. But after reading some posts regarding JPO, DataNucleus, I am not sure anymore. As the learning curve should be steep (also for expirienced JPA developers?) for DataNucleus, I am not sure, whether I should go for it.
My major concern is the status of the plugin. Also the forum support/help for JDO and Datanucleus-Cassandra-Plugin, as it is not as wide spread, as far as I understood.
Is anybody out there, who has experience, with some of the framworks already and can give me a hint? Maybe a mixed strategy would make sense as well. In cases (if they exist) JDO is not flexible/sufficient/whatever enough for my needs, to fall back to one of the easier APIs of pelops or hector? Is this possible? Is there an approach like in JPA to get an sql connection and fetch/put data?
After reading a bit on, I found following additional information:
Datanucleus-Cassandra-Plugin is based on the pelops, which also can be accessed for more flexibility, more performance (?), which should be used on the column families with a lot of data, JDO/JPA access should be only used on "administrative" data, where performance is not so important and data amount is not overwhelming.
Which still leaves the question open to start with hector or pelops.
pelops for it's later Datanucleus-Cassandra-Plugin extensibility, or
hector for it's more sufficient support on node hanldling.
I tried most of these solutions and find hector the best. Even when you have some problem you can always reach people who wrote hector in #cassandra in freenode. and the code is more mature as far as I concern. In cassandra client the most critical part would be connection pooling management (since all the clients do mostly the same operations through thrift, but connection pooling is what makes high level client roll). In that case I would vote for hector since I am using it in production for over a year now with no visible problem (1 reconnect issue fixed as soon as I discovered and send an email about it).
I am still using cassandra 0.6 though.
The author of the datanucleus plugin, Todd Nine, is working on the next-gen JPA support in Hector now.
The Hector client was the API that we choose because of the following things that it had:
Connection Pooling (huge performance gain when sharing a connection to a node)
Complete Custom Configuration using interfaces for most everything.
Auto Discovery Hosts
Custom Load Balancing Policy definitions (LeastActiveBalancingPolicy or RoundRobinBalancingPolicy or implement LoadBalancingPolicy)
Light-weight adapter on top of the Thrift API.
Great examples: See hector-examples
Built in JMX support.
Downside of Hector:
Documentation not bad, but the Java Docs are lacking a bit. That could easily be a Git fork / pull request by the user community.
The ORM support was a bit limited, but not urgent for usage in our case. I couldn't get some of the one-to-many associations to work easily, plus lack of describing what type of Cassandra model (super columns or column families for associated collections). Also a lack of Java examples (maybe there are some, please post if you find some).
Also, I tried using kundera with very little success. Not many examples to use or try, very little forum support. It appears to be maintained by one person, which makes it even hard to choose a tool like that. It appears based on the SVN activity it was migrating to using Hadoop instead or support for it as well.
Kundera 2.0.4 released.
Major Changes in this release:
Cross-datastore persistence( Easy to migerate existing mysql app over nosql)
support for relational databases (e.g Mysql etc)
replace solandra with lucene based indexing.
Support added for bi-directinal associations.
Performance improvement fixes.
I would propose also Astyanax, I'm working with it and I'm quite happy. Only the documentation is not really good.
Astyanax API
Astyanax implements a fluent API which guides the caller to narrow or
customize the query via a set of well defined interfaces. We've also
included some recipes that will be executed efficiently and as close
to the low level RPC layer as possible. The client also makes heavy
use of generics and overloading to almost eliminate the need to
specify serializers.
Some key features of the API include:
Key and column types are defined in a ColumnFamily class which
eliminates the need to specify serializers.
Multiple column family key types in the same keyspace. Annotation based composite column names.
Automatic pagination.
Parallelized queries that are token aware.
Configurable consistency level per operation.
Configurable retry policy per operation.
Pin operations to specific node.
Async operations with a single timeout using Futures.
Simple annotation based object mapping.
Operation result returns host, latency, attempt count.
Tracer interfaces to log custom events for operation failure and success.
Optimized batch mutation.
Completely hide the clock for the caller, but provide hooks to customize it.
Simple CQL support.
RangeBuilders to simplify constructing simple as well as composite column ranges.
Composite builders to simplify creating composite column names.
Recipes Recipes for some common use cases:
CSV importer.
JSON exporter to convert any query result to JSON with a wide range of
customizations.
Parallel reverse index search.
Key unique constraint validation.
http://techblog.netflix.com/2012/01/announcing-astyanax.html
I suggest you give Kundera-2.0.1 a try. It has gone a major change since its inception and I see a lot of new features getting added and bugs being fixed. Currently it supports JPA 1.0 and Cassandra 0.7.6 but they are planning to add support for Cassandra 0.8 and JPA 2.0 very soon. There is a pretty good example here: https://github.com/impetus-opensource/Kundera/wiki/Getting-Started-in-5-minutes
You can try Achilles, a new Entity Manager I've developed that supports all CQL3 features.
Entity mapping
JPA style operations
Limited support for join
Mapping of clustered entities using compound primary key
Queries (native, typed, slice)
Support for counters
Support for Consistency level
TTL & timestamp
JUnit 4 Rule to start embedded Cassandra server for testing
And so more ...
There are 2 implementations: Thrift & CQL
The Thrift version relies on Hector under the hood.
The CQL version pulls the brand new Java Driver Core from Datastax for all operations
Quick reference here

Databases and Java

I am starting out writing java code and interacting with databases for my "nextbigthing" project. Can someone direct me towards the best way to deal with adding/updating tables/records to databases? Here is my problem. There is too much repitition when it comes to DB code in java. I have to create the tables first (I use mysql). I then create classes in Java for each table. Then I create a AddRow, DeleteRow, UpdateRow and Search* depending on my need. For every table, every need creating this huge ass sql statement and the classes all seems like a huge waste of my time. There has to be a better, easier, more efficient way of doing things. Is there something out there that I do not know that will let me just tell Java what the table is and it automatically generate the queries and execute them for me? Its simple SQL that can be auto generated if it knows the column names and DB table inter dependencies. Seems like a very reasonable thing to have.
Check out Hibernate - a standard Java ORM solution.
User hibernate for mapping your classes to Database.
Set its hbm2ddl.auto to update to avoid writing DDL yourself. But note that this is not the most optimal way to take it to production.
Consider using Hibernate:
https://www.hibernate.org/
It can create java classes with regular CRUD methods from existing database schema.
Of course there is a much better way !
You really want to learn some bits of Java EE, and in particular JPA for database access.
For a complete crash course on Java EE, check out the Sun the Java EE 5 tutorial.
http://java.sun.com/javaee/5/docs/tutorial/doc/
Part 4 - Enterprise Beans
Part 5 - Persistence (JPA)
Then you want to try Hibernate (for instance) which has an implementation of JPA.
This is for Java 5 or later.
If you are still in Java 2, you might want to try Hibernate or iBatis.
You can also try iBatis, if you want control over SQL. Else JPA is good.
You can also try using Seam Framework. It has good reverse-engineering tools.
There is also torque (http://db.apache.org/torque/) which I personally prefer because it's simpler, and does exactly what I need.
With torque I can define a database with mysql(Well I use Postgresql, but Mysql is supported too) and Torque can then query the database and then generate java classes for each table in the database. With Torque you can then query the database and get back Java objects of the correct type.
It supports where clauses (Either with a Criteria object or you can write the sql yourself) and joins.
It also support foreign keys, so if you got a User table and a House table, where a user can own 0 or more houses, there will be a getHouses() method on the user object which will give you the list of House objects the user own.
To get a first look at the kind of code you can write, take a look at
http://db.apache.org/torque/releases/torque-3.3/tutorial/step5.html which contains examples which show how to load/save/query data with torque. (All the classes used in this example are auto-generated based on the database definition).
Or, if Hibernate is too much, try Spring JDBC. It eliminates a lot of boilerplate code for you.
iBatis is another good choice, intermediate between Spring JDBC and Hibernate.
It's just a matter of using the right tools. Use an IDE with tools to autogenerate the one and other.
If you're using Eclipse for Java EE and decide to head to JPA, then I can recommend to take benefit of the builtin Dali plugin. There's a nice PDF tutorial out at Eclipse.org.
If you're using Eclipse for Java EE and decide to head to "good ol" Hibernate, then I can recommend to take benefit of the Hibernatetools plugin. There's good reference guide out at Hibernate.org.
Both tools are capable of reverse-engineering from a SQL table to fullworthy Javabeans/entities and/or mapping files. It really takes most of boilerplate pains away. The DAO pattern is slightly superflous when grabbing JPA. In case of Hibernate you can consider to use a Generic DAO.

What JDBC tools do you use for synchronization of data sources?

I'm hoping to find out what tools folks use to synchronize data between databases. I'm looking for a JDBC solution that can be used as a command-line tool.
There used to be a tool called Sync4J that used the SyncML framework but this seems to have fallen by the wayside.
I have heard that the Data Replication Service provided by Db4O is really good. It allows you to use Hibernate to back onto a RDBMS - I don't think it supports JDBC tho (http://www.db4o.com/about/productinformation/drs/Default.aspx?AspxAutoDetectCookieSupport=1)
There is an open source project called Daffodil, but I haven't investigated it at all. (https://daffodilreplicator.dev.java.net/)
The one I am currently considering using is called SymmetricDS (http://symmetricds.sourceforge.net/)
There are others, they each do it slightly differently. Some use triggers, some poll, some use intercepting JDBC drivers. You need to decide what technical limitations you are under to determine which one you really want to use.
Wikipedia provides a nice overview of different techniques (http://en.wikipedia.org/wiki/Multi-master_replication) and also provides a link to another alternative DBReplicator (http://dbreplicator.org/).
If you have a model and DAO layer that exists already for your codebase, you can just create your own sync framework, it isn't hard.
Copy data is as simple as:
read an object from database A
remove database metadata (uuid, etc)
insert into database B
Syncing has some level of knowledge about what has been synced already. You can either do it at runtime by getting a list of uuids from TableInA and TableInB and working out which entries are new, or you can have a table of items that need to be synced (populate with a trigger upon insert/update in TableInA), and run from that. Your tool can be a TimerTask so databases are kept synced at the time granularity that you desire.
However there is probably some tool out there that does it all without any of this implementation faff, and each implementation would be different based on business needs anyway. In addition at the database level there will be replication tools.
True synchronization requires some data that I hope your database schema has (you can read the SyncML doc to see how they proceed). Sync4J won't help you much, it's really high-level and XML oriented. If you don't foresee any conflicts (which means: really easy synchronisation), you could try with a lightweight ETL like Enhydra Octopus.
I'm primarily using Oracle at the moment, and the most full-featured route I've come across is Red Gate's Data Compare:
http://www.red-gate.com/products/oracle-development/data-compare-for-oracle/
This old blog gives a good summary of the solution routes available:
http://www.novell.com/coolsolutions/feature/17995.html
The JDBC-specific offerings I've come across have been very basic. The solution mentioned by Aidos seems the most feature complete if you want to go down the publish-subscribe route:
http://symmetricds.codehaus.org/
Hope this helps.

Categories

Resources