Importing nodes into Neo4j using batch importer with automatic indexing

Importing nodes into Neo4j using batch importer with automatic indexing - java

I have imported nodes using jdbc importer but am unable to figure out auto_index support. How do I get auto indexing?

The tool you link to does give instructions for indexing, but I've never used it and it doesn't seem to be up to date. I would recommend you use one of the importing tools listed here. You can convert your comma separated file to tab separated and use this batch importer or one of the neo4j-shell tools, both of which support automatic indexing.
If you want to use a JDBC driver, for instance with some data transfer tool like Pentaho Kettle, there are instructions and links on the Neo4j import page, first link above.
I know from another question that you use regular expressions heavily and it is possible that 'automatic index', which is a Lucene index, may be very good for that, since you can query the index with regexp directly. But if you want to index your nodes within their labels, the new type of index in 2.0, then you don't need to setup indexing before importing. You can create an index at any time and it is populated in the background. If that's what you want, you can read the documentation about working with indices from Java API and Cypher.

Related

Is there a proper way to use wildcards/regex in Jackcess?

I have recently started using the Jackcess library in Java for dealing with MS Access databases. The library is pretty good but I have a question regarding searching rows.
Consider that I have "Jack loves apples" in a row of a column named X, what piece of code would I use to search for all the rows of X containing the word "apples"? I know this can be easily done using wildcards in SQL but since there is no way to use SQL queries in Jackcess, that's not a valid option.
I considered using UCanAccess but I have issues with the library, even if I use the "memory=false" option while loading the database, it still takes almost 1.4GB of memory.

#centic's answer was accurate until jackcess version 3.5.0. As of the 3.5.0 release, you can use the new PatternColumnPredicate class to do various wildcard/pattern/regex searches using Cursors.

With Jackcess you need to iterate the rows and apply the filter yourself. As long as your filter is fairly static, this should be fairly easy to build.

How to read Lucene indexes from Solr

I have an existing web application which uses lucene to create indexes. Now as per the requirement, I have to set up Solr which will serve as a search engine for many other web application inculding my web app. I do not want to create indexes within Solr. Hence, I need to tell Solr to read indexes from lucene instead of creating indexes within Solr and reading from its own.
As a beginner of Solr, first I used nutch to create indxes and then used those indxes within Solr. But I'm unaware how to make Solr understand to read indexes from lucene. I did not find any documentation around this. Kindly advice how to achive this.

It is not possible in any reliable way.
It's like saying you built an application in Ruby and now want to use Rails on top of existing database structure. Solr (as Rails) has it's own expectations naming and workflows around Lucene core and there is no migration path.
You can try using Luke to confirm the internal data structures differences between Lucene and Solr for yourself.

I have never done that before but as Solr is built on Lucene, you can try these steps. dataDir is the main point here
I am assuming you deploying it in /usr/local so change accordingly, and have basing knowledge of solr configuration.
Download Solr and copy dist/apache-solr-x.x.x.war to tomcat/webapps
Copied example/solr/conf to /usr/local/solr/
Set solr.home to /usr/local/solr
In solrconfig.xml, change dataDir to /usr/local/solr/data (Solr looks for the index directory inside)
change schema.xml accordingly ie. you need to change fields,

Getting search term suggestions from the CQ5.5 index

I am implementing an auto-suggest facility for a search input box using CQ5.5.
This article on Predictive Search mentions a search/suggestion component in AEM (5.6), which seems to be present in CQ5.5, but missing the com.day.cq.search.suggest.impl.SuggestionIndexManager service dependencies it requires.
Is it possible to add this facility through some add-on package or alternative CQ5.5 feature?
It seems that the underlying Lucene suggest API does not seem to be exposed, but perhaps there some Jackrabbit API that I could use?

It is out of the box available starting with CQ/AEM 5.6. For 5.5 (and even 5.4 IIRC) it is available as a feature pack to customers (cq search suggestions). Please contact daycare or the usual channels.
The way it works is that it can store an auto-complete word index in the repository (an optimized JCR structure is used here, no Lucene et. al.). To populate this index an API can be used by passing words and their frequencies, e.g. based on the frequency that search terms are actually searched for by end users (Google-style, only works well if you have many searches going on).
Another way or the way to build an initial index is provided which reads the custom Lucene index maintained by Jackrabbit.

Hibernate Search, Lucene or any other alternative?

I have a query which is doing ILIKE on some 11 string or text fields of table which is not big (500 000), but for ILIKE obviously too big, search query takes round 20 seconds. Database is postgres 8.4
I need to implement this search to be much faster.
What came to my mind:
I made additional TVECTOR column assembled from all columns that need to be searched, and created the full text index on it. The fulltext search was quite fast. But...I can not map this TVECTOR type in my .hbms. So this idea fell off (in any case i thaught it more as a temporary solution).
Hibernate search. (Heard about it first time today) It seems promissing, but I need experienced opinion on it, since I dont wanna get into the new API, possibly not the simplest one, for something which could be done simpler.
Lucene
In any case, this has happened now with this table, but i would like to solution to be more generic and applied for future cases related to full text searches.
All advices appreciated!
Thanx

I would strongly recommend Hibernate Search which provides a very easy to use bridge between Hibernate and Lucene. Rememeber you will be using both here. You simply annotate properties on your domain classes which you wish to be able to search over. Then when you update/insert/delete an entity which is enabled for searching Hibernate Search simply updates the relevant indexes. This will only happen if the transaction in which the database changes occurs was committed i.e. if it's rolled back the indexes will not be broken.
So to answer your questions:
Yes you can index specific columns on specific tables. You also have the ability to Tokenize the contents of the field so that you can match on parts of the field.
It's not hard to use at all, you simply work out which properties you wish to search on. Tell Hibernate where to keep its indexes. And then can use the EntityManager/Session interfaces to load the entities you have searched for.

Since you're already using Hibernate and Lucene, Hibernate Search is an excellent choice.
What Hibernate Search will primarily provide is a mechanism to have your Lucene indexes updated when data is changed, and the ability to maximize what you already know about Hibernate to simplify your searches against the Lucene indexes.
You'll be able to specify what specific fields in each entity you want to be indexed, as well as adding multiple types of indexes as needed (e.g., stemmed and full text). You'll also be able to manage to index graph for associations so you can make fairly complex queries through Search/Lucene.
I have found that it's best to rely on Hibernate Search for the text heavy searches, but revert to plain old Hibernate for more traditional searching and for hydrating complex object graphs for result display.

I recommend Compass. It's an open source project built on top of Lucene that provider a simpler API (than Lucene). It integrates nicely with many common Java libraries and frameworks such as Spring and Hibernate.

I have used Lucene in the past to index database tables. The solution works great, but remeber that you need to maintain the index. Either, you update the index every time your objects are persisted or you have a daemon indexer that dump the database tables in your Lucene index.
Have you considered Solr? It's built on top of Lucene and offers automatic indexing from a DB and a Rest API.

A year ago I would have recommended Compass. It was good at what it does, and technically still happily runs along in the application I developed and maintain.
However, there's no more development on Compass, with efforts having switched to ElasticSearch. From that project's website I cannot quite determine if it's ready for the Big Time yet or even actually alive.
So I'm switching to Hibernate Search which doesn't give me that good a feeling but that migration is still in its initial stages, so I'll reserve judgement for a while longer.

All the projects are based on Lucene. If you want to implement a very advanced features I advice you to use Lucene directly. If not, you may use Solr which is a powerful API on top of lucene that can help you index and search from DB.

What tool can suggest index and covering indexes for dynamicly created queries?

Our web application basically dynamically generates tables and relations. It also generate indexes on the basic level. We are looking for a MySQL profiler that will be able to suggest indexes. We have come across this two profilers :
MYSQL JET PROFILER : will not tell use what index or covering index will do the job.
ROT : will not work on a live database, basically you have to give the query and schema in diff files.
None of these above will do the job. Any ideas?

QOT supports server-side schemata. Check out the latest Launchpad version. Also it never required "diff" files. You just had to provide normal SQL scripts, like ones generated by mysqldump. Now both ways (i.e. scripts and live connection) supported.
BR,
Vladimir

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Importing nodes into Neo4j using batch importer with automatic indexing - java

I have imported nodes using jdbc importer but am unable to figure out auto_index support. How do I get auto indexing?

Related

Is there a proper way to use wildcards/regex in Jackcess?

How to read Lucene indexes from Solr

Getting search term suggestions from the CQ5.5 index

Hibernate Search, Lucene or any other alternative?

What tool can suggest index and covering indexes for dynamicly created queries?

Categories

Resources