Tutorial on faceted search with java and lucene 3.6 - java

I'm looking for a tutorial about faceted search using java and lucene 3.6. I don't want to user solr. I need something that describe the steps needed to make the index (with categories, etc.) and how to do the search, classes and methods to use, etc. Thanks in advance...

I think this may help you.
It is official user guide.
1. http://lucene.apache.org/core/3_6_1/api/contrib-facet/org/apache/lucene/facet/doc-files/userguide.html
2. http://shaierera.blogspot.com/2012/11/lucene-facets-part-1.html

Here is a a tutorial for generating a quick Lucene search - http://www.hascode.com/2010/03/how-to-build-a-quick-lucene-search/
It might be a bit too basic for you though

But on the same blog there's a tutorial how to achieve faceting using Hibernate Search so there is Lucene and JPA under the hood but perhaps this is not a combination that you're searching for:
http://www.hascode.com/2012/03/hibernate-search-faceting-discrete-and-range-faceting-by-example/
In the Lucene 4.x documentation there's a whole chapter about this topic but it is an early release:
http://lucene.apache.org/core/4_0_0-ALPHA/facet/org/apache/lucene/facet/doc-files/userguide.html
If you've found a better source, please keep us up to date :)

Related

Is there solr alternative without using xml or binnary

I'm finding a full-text search framework to search my blog, i have seen solr getting started, but i find solr it's black-box framework, in my experience, if it raise some error, it will hard to debug and know what happen in its inner code.
so my question is is there java search framework that i can use it with pure java(without xml or bin)?
Solr is open-source, you can see it here http://lucene.apache.org/solr/
You normally would not need to debug it to begin with. Solr just wraps the Lucene engine in a REST API with a nice web interface so what would really your problem be?
If you want alternatives, nowadays the best/only alternative with lots of support would be elastic search, you can read more here http://www.elasticsearch.org/overview/
Have a blast.

neo4j java API quick example using REST

I am trying to get the project from github to work.
It can be found here: https://github.com/neo4j/java-rest-binding
Has anyone put this into a JAR already? I am to connect to a local neo4j store. Any other suggestions would be appreciated. I just want to be able to do a quick access of node zero.
As a complement to Axel's answer, here is a good available tutorial where you can find a lot of examples of Neo4J uses (core APIs, indexing, traversals, Cypher and REST interactions): https://github.com/jimwebber/neo4j-tutorial.
Especially, Koan11 class illustrates how to call the built-in REST API, Koan12 shows how to roll your own API via unmanaged extensions.
I recently posted answer to a similar question here.
Neo4j, REST API, java - cypher queries
From there you can grab my pom.xml and my java file for your quick example.
Also you might find this tutorial helpful.
http://thought-bytes.blogspot.com/2013/07/getting-started-with-neo4j-java-rest-heroku.html

Add faceting over multivalued to application using Hibernate Search

we use Hibernate Search in our application. We use faceting. Recently we have found a big limitation. Faceting over fields that can have multiple values doesn't work properly with Hibernate Search - if a document has multiple values for faceted field (ex. multiple categories), only one of the values is taken into account.
I can currently think of a couple two solutions:
use bobo-browse (http://code.google.com/p/bobo-browse/)
solr (http://lucene.apache.org/solr/)
In both solutions we continue to maintain the index using Hiberante Search and making queries as we did before (using Hiberante Search), and run additional bobo-browse or solr query for faceting, where required (bobo-browse or solr would use index in kind of "read-only" manner). The problem is that we update index quite often, and would like to get really fresh data in faceting queries. Bobo-browse doesn't automatically integrate with Hiberante Search, and to keep search up to date, I might get into some problems (ex. https://groups.google.com/forum/?fromgroups=#!topic/bobo-browse/sn_Efc-YClU). The documentation looks a bit untidy and not yet completed. Solr on the other hand seems like a really big thing to add, just to get faceting work properly. And I'm still afraid I might run into some problems with updating/refreshing index.
Do you have any experience in that matter? Any suggestions?
As a Hibernate Search developer, I'd suggest to join us and help implement what you need.
Noone of us actually needed multivalued faceting so we're not really sure which solution to pick either; it seems you have a real need, that's perfect to explore the alternatives and try them out.
Hibernate Search already depends on many Solr modules especially because of the large collection of excellent analysers. I'm confident we could find a way to embed the faceting logic of Solr and package it nicely in our consistent API, without the need to actually start Solr in server mode.
I guess we could do the same with Bobo-browse; I'd prefer Solr to not add other dependencies, but if bobo-browse proofs a superior solution why not.. but you can help us in this choice.
What would you get in exchange?
we'll maintain it: compatibility will stay with any future version. hopefully you'll help a bit.
eternal gratitude from other users ;)
rock solid testing from thousands of other users
bugfixes and improvements from ..
a rock star badge on your CV
What is required?
unit tests
documentation updates
sensible code
https://community.jboss.org/wiki/ContributingToHibernateSearch
I also use Bobo Browse in combination with Hibernate Search. I also have the problem with regular updates and the read-only issue. Bobo is not the easiest library out there and I've looked several times at ways to integrate with Hibernate Search and just gave up because of the complexity.
I use timed reloads of the index in order to ensure freshness but that creates a lot of garbage to be collected. Lucene has over time optimized the process of reopening indexreaders, but the Bobo team is not really focused on supporting that. https://linkedin.jira.com/browse/BOBO-31 describes this issue.
The Hibernate Search infrastructure should provide enough flexibility to integrate. Zoie is a real-time indexing system like Hibernate Search that is integrated with Bobo https://linkedin.jira.com/wiki/display/BOBO/Realtime+Faceting+with+Zoie Perhaps it can inspire your efforts
This is something of a solution to the multi-value facet-count problem for hibernate-search.
Blog: http://outbottle.com/hibernate-search-multivalue-facet-counts/
The blog is complete with a Java Class that can be reused to generate facet-counts for single-value and multi-value fields.
The solution provided is based on the BitSet solution provided here: http://sujitpal.blogspot.ie/2007/04/lucene-search-within-search-with.html
The blog has a Maven project which demonstrates the solution quite comprehensively. The project demonstrates using the hibernate-search faceting API to filter on....
a date-range AND a 1-to-many (single-value) facet-group AND a many-to-many (multi-value) facet-group combined.
The solution is then invoked to correctly derive facet-counts for each facet-group.
The solution facilitates results similar to this jsFiddle emulation: http://goo.gl/y5C9UO (except that the emulation does not demo the range faceting).
The jsFiddle is part of a larger blog which explores the concept of facet searching in general: http://outbottle.com/understanding-faceted-searching/. If you’re like me and are finding the whole notion of facet-searching quite confusing then this will help.
It may not be the best solution in the world so feel free to feedback.

Searching with Hibernate Confusion

I am a beginner to Hibernate. I am browsing many web tutorials which is confusing me. I just want to know the direction which to go with searching when using Hibernate.
Some tutorials are telling use Hibernate Search with Lucene, other saying use criteria while other saying createSQLQuery.
Can someone guide me in this ?
You should start with Hibernate-Core read the Tutorial and test the examples, modify them. This is IMHO the best way to become familiar with a (any) framework.
Lucene is used by Hibernate-Search which is an extension to Hibernate-Core to index text fields and provide full text search. Don't get confused by the extension unless you managed the core functions.

Any Latent Semantic Indexing?

Is there any open source implementation of LSI in Java? I want to use that library for my project. I have seen jLSI but it implements some other model of LSI. I want a standard model.
Have you considered LDA (Latent Dirichlet allocation)? I haven't really either, but I encountered the same problem with LSI recently (patents). From what I understand LDA is a related/more powerful technique. http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation apparently has some links to open-source implementations.
A google search for java LSI leads to a similar question that recommends SemanticVectors. A package built on top of Lucene that is 'similar' to LSI. I don't know if it's closer than the jLSI implementation.
That thread also mentions that LSI is patented and there aren't a lot of implementations of it. So if you need a standard implementation you may have to use a language other than java.
The S-Space Package has an open source version of LSA, with bindings for the LSI document vectors. (Both approaches operate on the same term-document matrix and are equivalent except in the output.) It's a fairly scalable approach that uses the thin-SVD. I've used it to run LSI on all of Wikipedia with no issue (after removing the infrequent terms with less than 5 occurrences).
As Scott Ray mentioned, the SemanticVectors package also has a good LSI implementation that recently switched to using the same thin-SVD (SVDLIBJ), so you might check that out as if you hadn't before.
a google search for NLP tools provide this slides which i think helps ...
I believe that LSA/LSI was patented in 1989, which means the patent should have just expired. Hopefully we will see some nice open source applications soon.
Have you tried the Semantic Vector package?
http://code.google.com/p/semanticvectors/

Categories

Resources