Getting search term suggestions from the CQ5.5 index

Getting search term suggestions from the CQ5.5 index - java

I am implementing an auto-suggest facility for a search input box using CQ5.5.
This article on Predictive Search mentions a search/suggestion component in AEM (5.6), which seems to be present in CQ5.5, but missing the com.day.cq.search.suggest.impl.SuggestionIndexManager service dependencies it requires.
Is it possible to add this facility through some add-on package or alternative CQ5.5 feature?
It seems that the underlying Lucene suggest API does not seem to be exposed, but perhaps there some Jackrabbit API that I could use?

It is out of the box available starting with CQ/AEM 5.6. For 5.5 (and even 5.4 IIRC) it is available as a feature pack to customers (cq search suggestions). Please contact daycare or the usual channels.
The way it works is that it can store an auto-complete word index in the repository (an optimized JCR structure is used here, no Lucene et. al.). To populate this index an API can be used by passing words and their frequencies, e.g. based on the frequency that search terms are actually searched for by end users (Google-style, only works well if you have many searches going on).
Another way or the way to build an initial index is provided which reads the custom Lucene index maintained by Jackrabbit.

Related

How to read Lucene indexes from Solr

I have an existing web application which uses lucene to create indexes. Now as per the requirement, I have to set up Solr which will serve as a search engine for many other web application inculding my web app. I do not want to create indexes within Solr. Hence, I need to tell Solr to read indexes from lucene instead of creating indexes within Solr and reading from its own.
As a beginner of Solr, first I used nutch to create indxes and then used those indxes within Solr. But I'm unaware how to make Solr understand to read indexes from lucene. I did not find any documentation around this. Kindly advice how to achive this.

It is not possible in any reliable way.
It's like saying you built an application in Ruby and now want to use Rails on top of existing database structure. Solr (as Rails) has it's own expectations naming and workflows around Lucene core and there is no migration path.
You can try using Luke to confirm the internal data structures differences between Lucene and Solr for yourself.

I have never done that before but as Solr is built on Lucene, you can try these steps. dataDir is the main point here
I am assuming you deploying it in /usr/local so change accordingly, and have basing knowledge of solr configuration.
Download Solr and copy dist/apache-solr-x.x.x.war to tomcat/webapps
Copied example/solr/conf to /usr/local/solr/
Set solr.home to /usr/local/solr
In solrconfig.xml, change dataDir to /usr/local/solr/data (Solr looks for the index directory inside)
change schema.xml accordingly ie. you need to change fields,

Lucene / Solr security when OLS (Oracle Label Security) is required

This is going to be a bit open-ended initially. My team has a requirement to utilize Oracle Label Security (OLS). Because we would like to enable "fast" search capabilities (Solr/Lucene) how can we correctly retrieve data that is cached (Lucene/ Solr) based on the OLS policy in place?

One way you can use external systems like your OLS is Solr's PostFilter interface. A very good write up how to do use this has been published in the article Custom security filtering in Solr by Erik Hatcher.
Basically you have a hook after all search and filtering has been done. There you can open a connection to your database and filter the search results according to the user's access rights.
To speed this up, you should consider to place some security relevant artifacts into you index, which you then include as ordinary filter. That way you can do a pre-filtering, so that you do not overwhelm the PostFilter.
Currently there is nothing pre-build by the community, but you could kick-off something on GitHub.

Importing nodes into Neo4j using batch importer with automatic indexing

I have imported nodes using jdbc importer but am unable to figure out auto_index support. How do I get auto indexing?

The tool you link to does give instructions for indexing, but I've never used it and it doesn't seem to be up to date. I would recommend you use one of the importing tools listed here. You can convert your comma separated file to tab separated and use this batch importer or one of the neo4j-shell tools, both of which support automatic indexing.
If you want to use a JDBC driver, for instance with some data transfer tool like Pentaho Kettle, there are instructions and links on the Neo4j import page, first link above.
I know from another question that you use regular expressions heavily and it is possible that 'automatic index', which is a Lucene index, may be very good for that, since you can query the index with regexp directly. But if you want to index your nodes within their labels, the new type of index in 2.0, then you don't need to setup indexing before importing. You can create an index at any time and it is populated in the background. If that's what you want, you can read the documentation about working with indices from Java API and Cypher.

Hibernate Search, Lucene or any other alternative?

I have a query which is doing ILIKE on some 11 string or text fields of table which is not big (500 000), but for ILIKE obviously too big, search query takes round 20 seconds. Database is postgres 8.4
I need to implement this search to be much faster.
What came to my mind:
I made additional TVECTOR column assembled from all columns that need to be searched, and created the full text index on it. The fulltext search was quite fast. But...I can not map this TVECTOR type in my .hbms. So this idea fell off (in any case i thaught it more as a temporary solution).
Hibernate search. (Heard about it first time today) It seems promissing, but I need experienced opinion on it, since I dont wanna get into the new API, possibly not the simplest one, for something which could be done simpler.
Lucene
In any case, this has happened now with this table, but i would like to solution to be more generic and applied for future cases related to full text searches.
All advices appreciated!
Thanx

I would strongly recommend Hibernate Search which provides a very easy to use bridge between Hibernate and Lucene. Rememeber you will be using both here. You simply annotate properties on your domain classes which you wish to be able to search over. Then when you update/insert/delete an entity which is enabled for searching Hibernate Search simply updates the relevant indexes. This will only happen if the transaction in which the database changes occurs was committed i.e. if it's rolled back the indexes will not be broken.
So to answer your questions:
Yes you can index specific columns on specific tables. You also have the ability to Tokenize the contents of the field so that you can match on parts of the field.
It's not hard to use at all, you simply work out which properties you wish to search on. Tell Hibernate where to keep its indexes. And then can use the EntityManager/Session interfaces to load the entities you have searched for.

Since you're already using Hibernate and Lucene, Hibernate Search is an excellent choice.
What Hibernate Search will primarily provide is a mechanism to have your Lucene indexes updated when data is changed, and the ability to maximize what you already know about Hibernate to simplify your searches against the Lucene indexes.
You'll be able to specify what specific fields in each entity you want to be indexed, as well as adding multiple types of indexes as needed (e.g., stemmed and full text). You'll also be able to manage to index graph for associations so you can make fairly complex queries through Search/Lucene.
I have found that it's best to rely on Hibernate Search for the text heavy searches, but revert to plain old Hibernate for more traditional searching and for hydrating complex object graphs for result display.

I recommend Compass. It's an open source project built on top of Lucene that provider a simpler API (than Lucene). It integrates nicely with many common Java libraries and frameworks such as Spring and Hibernate.

I have used Lucene in the past to index database tables. The solution works great, but remeber that you need to maintain the index. Either, you update the index every time your objects are persisted or you have a daemon indexer that dump the database tables in your Lucene index.
Have you considered Solr? It's built on top of Lucene and offers automatic indexing from a DB and a Rest API.

A year ago I would have recommended Compass. It was good at what it does, and technically still happily runs along in the application I developed and maintain.
However, there's no more development on Compass, with efforts having switched to ElasticSearch. From that project's website I cannot quite determine if it's ready for the Big Time yet or even actually alive.
So I'm switching to Hibernate Search which doesn't give me that good a feeling but that migration is still in its initial stages, so I'll reserve judgement for a while longer.

All the projects are based on Lucene. If you want to implement a very advanced features I advice you to use Lucene directly. If not, you may use Solr which is a powerful API on top of lucene that can help you index and search from DB.

Full text search on Google App Engine (Java)

There are a few threads floating around on the topic, but I think my use-case is somewhat different.
What I want to do:
Full text search component for my GAE/J app
The index size is small: 25-50MB or so
I do not need live updates to the index, a periodic re-indexing is fine
This is for auto-complete and the like, so it needs to be extremely fast (I get the impression that implementing an inverted index in Datastore introduces considerable latency)
My strategy so far (just planning, haven't tried implementing anything yet):
Use Lucene with RAMDirectory
A periodic cron job creates the index, serializes it to the Datastore, stores an update id (or timestamp)
Search servlet loads the index on startup and creates the RAMDirectory
On each request the servlet checks the current update id and reloads the index as necessary
The main thing I'm fuzzy on is how to synchronize in-memory data between instances - will this work, or am I missing something?
Also, how far can I push it before I start having problems with memory use? I couldn't find anything on RAM quotas for GAE. (This index is small, but I can think of more stuff I'd like to add)
And, of course, any thoughts on better approaches?

If you're okay with periodic rebuilds, and your index is small, your current approach sounds mostly okay. Instead of building the index online and serializing it to the datastore, though, why not build it offline, and upload it with the app? Then, you can instantiate it directly from the disk store, and to push an update, you deploy a new version of your app.

Recently GAE added "text search" service. Take a look at GAE Java Text Search

For autocomplete, perhaps you could store the top N matches for each prefix (basically what you'd put in the drop-down menu) in memcache? The memcache entities could be backed by entities in the datastore and reloaded if needed.

Well, as of GAE 1.5.0 looks like resident Backends can be used to create a search service.
Of course, there's no free quota for these.

App Engine now includes a full-text search API (Experimental): https://developers.google.com/appengine/docs/java/search/

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Getting search term suggestions from the CQ5.5 index - java

Related

How to read Lucene indexes from Solr

Lucene / Solr security when OLS (Oracle Label Security) is required

Importing nodes into Neo4j using batch importer with automatic indexing

Hibernate Search, Lucene or any other alternative?

Full text search on Google App Engine (Java)

Categories

Resources