I have a query which is doing ILIKE on some 11 string or text fields of table which is not big (500 000), but for ILIKE obviously too big, search query takes round 20 seconds. Database is postgres 8.4
I need to implement this search to be much faster.
What came to my mind:
I made additional TVECTOR column assembled from all columns that need to be searched, and created the full text index on it. The fulltext search was quite fast. But...I can not map this TVECTOR type in my .hbms. So this idea fell off (in any case i thaught it more as a temporary solution).
Hibernate search. (Heard about it first time today) It seems promissing, but I need experienced opinion on it, since I dont wanna get into the new API, possibly not the simplest one, for something which could be done simpler.
Lucene
In any case, this has happened now with this table, but i would like to solution to be more generic and applied for future cases related to full text searches.
All advices appreciated!
Thanx
I would strongly recommend Hibernate Search which provides a very easy to use bridge between Hibernate and Lucene. Rememeber you will be using both here. You simply annotate properties on your domain classes which you wish to be able to search over. Then when you update/insert/delete an entity which is enabled for searching Hibernate Search simply updates the relevant indexes. This will only happen if the transaction in which the database changes occurs was committed i.e. if it's rolled back the indexes will not be broken.
So to answer your questions:
Yes you can index specific columns on specific tables. You also have the ability to Tokenize the contents of the field so that you can match on parts of the field.
It's not hard to use at all, you simply work out which properties you wish to search on. Tell Hibernate where to keep its indexes. And then can use the EntityManager/Session interfaces to load the entities you have searched for.
Since you're already using Hibernate and Lucene, Hibernate Search is an excellent choice.
What Hibernate Search will primarily provide is a mechanism to have your Lucene indexes updated when data is changed, and the ability to maximize what you already know about Hibernate to simplify your searches against the Lucene indexes.
You'll be able to specify what specific fields in each entity you want to be indexed, as well as adding multiple types of indexes as needed (e.g., stemmed and full text). You'll also be able to manage to index graph for associations so you can make fairly complex queries through Search/Lucene.
I have found that it's best to rely on Hibernate Search for the text heavy searches, but revert to plain old Hibernate for more traditional searching and for hydrating complex object graphs for result display.
I recommend Compass. It's an open source project built on top of Lucene that provider a simpler API (than Lucene). It integrates nicely with many common Java libraries and frameworks such as Spring and Hibernate.
I have used Lucene in the past to index database tables. The solution works great, but remeber that you need to maintain the index. Either, you update the index every time your objects are persisted or you have a daemon indexer that dump the database tables in your Lucene index.
Have you considered Solr? It's built on top of Lucene and offers automatic indexing from a DB and a Rest API.
A year ago I would have recommended Compass. It was good at what it does, and technically still happily runs along in the application I developed and maintain.
However, there's no more development on Compass, with efforts having switched to ElasticSearch. From that project's website I cannot quite determine if it's ready for the Big Time yet or even actually alive.
So I'm switching to Hibernate Search which doesn't give me that good a feeling but that migration is still in its initial stages, so I'll reserve judgement for a while longer.
All the projects are based on Lucene. If you want to implement a very advanced features I advice you to use Lucene directly. If not, you may use Solr which is a powerful API on top of lucene that can help you index and search from DB.
Related
I have recently started using the Jackcess library in Java for dealing with MS Access databases. The library is pretty good but I have a question regarding searching rows.
Consider that I have "Jack loves apples" in a row of a column named X, what piece of code would I use to search for all the rows of X containing the word "apples"? I know this can be easily done using wildcards in SQL but since there is no way to use SQL queries in Jackcess, that's not a valid option.
I considered using UCanAccess but I have issues with the library, even if I use the "memory=false" option while loading the database, it still takes almost 1.4GB of memory.
#centic's answer was accurate until jackcess version 3.5.0. As of the 3.5.0 release, you can use the new PatternColumnPredicate class to do various wildcard/pattern/regex searches using Cursors.
With Jackcess you need to iterate the rows and apply the filter yourself. As long as your filter is fairly static, this should be fairly easy to build.
I have a relational database with few tables. Some of them have columns that I want to enable autocompletion / autocorrection on (e.g. titles, tags, categories).
I have seen that Apache Solr, which builds upon Lucene indexing can offer such functionality. Also data can be fed in to Solr from relational database.
My question is: is this the best way I can get autocomplete and autocorrect services for my entities? Or am I killing a mosquito with a bazooka here?
Solr requires a lot of resources, memory and stuff and I wonder if something far simpler can do the trick for me.
How many unique values do you have in title, tags , categories? A few thousand? Then I think you can get away with using a Trie Data structure. A few million records in those columns? Then Solr / Elasticsearch might be good option.
I have used Trie for autosuggestion. Building a Trie is expensive. But you can store the trie in Memcached or even SQL and update it periodically when new data is added to your columns.
I'm looking for easy-to-use graph DB + ORM solution. The requirements are:
Fluent Java interfaces, no need to use any XMLs.
Ease of graph traversal: "give me all entities of these types, starting from this one, traverse only using this set of relation types".
Full text search out of the box: p.2 + "only consider entities where this field contains this text"
No need to operate on graph level: Neo4j is great, but I'd like to avoid working with setProperty/getProperty directly.
I've already checked these:
ogrm - not supported anymore.
jo4neo - looks like doesn't work p.2 and p.3
Spring Data Graph - seems to be great things, but it's too immature - spent a week trying to make it work fine in Eclipse - no success.
Are there any other similar tools I need to check?
Spring Data Graph is the most actively developed, with a recently released version 1.1.0 and lots of work planned before SpringOne in October.
However, it does create a challenge for IDEs because of the AspectJ enhanced POJOs. Have a look at the documentation for some help getting that going.
Cheers,
Andreas
As of January 2015, Hibernate has started supporting neo4j:
http://hibernate.org/ogm/
Obviously, you can't query using hql, but they support using Cypher queries.
There is also the very new spring-data-gremlin which does everything of what you want with the power of spring-data.
It also allows native queries, spatial indexes and a bunch of other cool stuff.
Note: It is quite immature, but still worth a look.
I'm using Hibernate EntityManager and Hibernate Annotations for ORM in a very early stage project. The project needs to launch soon, but the specs are changing constantly and I am concerned that the system will be launched and live data will be collected, and then the specs will change again and I will be in a situation where I need to change the database schema.
How can I set things up in order to minimize the impact of this? Are there any open source projects that deal with this kind of migration? Can Hibernate do this automatically (without wiping the database)?
Your advice is much appreciated.
It's more a functional or organizational problem than a technical one. No tool will automatically guess how to migrate data from one schema to another one. You'd better learn how to write stored procedure in order to migrate your data.
You'll probably need to disable constraints, create temporary table and columns, copy lots of data, and then delete the temporary tables and columns and re-enable constraints to have migrate your data.
Once in maintenance mode, every new feature that modifies the schema should also come with the script allowing to migrate from the current schema and data in production to the new one.
No system can possibly create datamigration scripts automatically from just the original and the final schema. There just isn't enough information.
Consider for example a new column. Should it just contain the default value? Or a value calculated from other fields/tables.
There is a good book about refactoring databases: http://www.amazon.com/Refactoring-Databases-Evolutionary-Addison-Wesley-Signature/dp/0321774515/ref=sr_1_1?ie=UTF8&qid=1300140045&sr=8-1
But there is little to no tool support for this kind of stuff.
I think the best thing you can do in advance:
Don't let anybody access the database but your application
If something else absolutely must access the db directly, give it a separate set of view specially for that purpose. This allows you to change your table structure by keeping at least the structure of what other systems see.
Have tons of tests. I just posted an article wich (with the upcoming 2nd and 3rd part) might help a little with this: http://blog.schauderhaft.de/2011/03/13/testing-databases-with-junit-and-hibernate-part-1-one-to-rule-them/
Hibernate can update the database entity model with data in the database. So do that and write migration code in java which sets or removes data relationships.
This works, and we have done it multiple times. But of course, try to follow a flexible development process; make what you know for sure first, then reevaluate the requirements - scrum etc.
In your case, I would recommend a NoSQL database. I don't have much experience with such kind of databases so I can't recommend any current implementation so you may want to check this too.
I'm hoping to find out what tools folks use to synchronize data between databases. I'm looking for a JDBC solution that can be used as a command-line tool.
There used to be a tool called Sync4J that used the SyncML framework but this seems to have fallen by the wayside.
I have heard that the Data Replication Service provided by Db4O is really good. It allows you to use Hibernate to back onto a RDBMS - I don't think it supports JDBC tho (http://www.db4o.com/about/productinformation/drs/Default.aspx?AspxAutoDetectCookieSupport=1)
There is an open source project called Daffodil, but I haven't investigated it at all. (https://daffodilreplicator.dev.java.net/)
The one I am currently considering using is called SymmetricDS (http://symmetricds.sourceforge.net/)
There are others, they each do it slightly differently. Some use triggers, some poll, some use intercepting JDBC drivers. You need to decide what technical limitations you are under to determine which one you really want to use.
Wikipedia provides a nice overview of different techniques (http://en.wikipedia.org/wiki/Multi-master_replication) and also provides a link to another alternative DBReplicator (http://dbreplicator.org/).
If you have a model and DAO layer that exists already for your codebase, you can just create your own sync framework, it isn't hard.
Copy data is as simple as:
read an object from database A
remove database metadata (uuid, etc)
insert into database B
Syncing has some level of knowledge about what has been synced already. You can either do it at runtime by getting a list of uuids from TableInA and TableInB and working out which entries are new, or you can have a table of items that need to be synced (populate with a trigger upon insert/update in TableInA), and run from that. Your tool can be a TimerTask so databases are kept synced at the time granularity that you desire.
However there is probably some tool out there that does it all without any of this implementation faff, and each implementation would be different based on business needs anyway. In addition at the database level there will be replication tools.
True synchronization requires some data that I hope your database schema has (you can read the SyncML doc to see how they proceed). Sync4J won't help you much, it's really high-level and XML oriented. If you don't foresee any conflicts (which means: really easy synchronisation), you could try with a lightweight ETL like Enhydra Octopus.
I'm primarily using Oracle at the moment, and the most full-featured route I've come across is Red Gate's Data Compare:
http://www.red-gate.com/products/oracle-development/data-compare-for-oracle/
This old blog gives a good summary of the solution routes available:
http://www.novell.com/coolsolutions/feature/17995.html
The JDBC-specific offerings I've come across have been very basic. The solution mentioned by Aidos seems the most feature complete if you want to go down the publish-subscribe route:
http://symmetricds.codehaus.org/
Hope this helps.