What is the Use of Lucene? - java

i have heard lot of time the name Lucene , while i try to fetch details of web crawler it show up most of time.whats the use of Lucene?

Lucene is a search engine library designed to address the problem of performing keyword search over a large number of documents. The system works by processing the documents to extract all of the words, and then creating a reverse index. This index allows the search engine to quickly identify the documents containing the user's search term or terms, rank them, and then return them to the user.
Lucene supports a variety of advanced features such as phrase queries, wildcard queries and proximity queries (i.e. "cat" near "dog"), search for keywords within particular "fields" (e.g. subject, author) and so on.
Basically, it is one of the ways to add text search capability to document management applications of various kinds.

Lucene is a search engine. You would use lucene in a project if you wanted a fast indexed search. More details can be found on http://lucene.apache.org/java/docs/index.html

Related

Should I use multiple Lucene directories/indexes to search different types of data?

I have many MySQL tables to store different types of data like goods, catagories, brands, suppliers, etc. Each of them needs to implement full-text search via Lucene.
So I plan to build one Lucene Directory (and one IndexWriter + one IndexReader corresponding to this Directory) for each table, e.g.
HashMap<String, Directory> = ...;
put("goods", FSDirectory.open(luceneDirRoot + "/goods"));
put("catagories", FSDirectory.open(luceneDirRoot + "/catagories"));
...
Is this a good practice to use Lucene?
Furthur more, how can I know how many directories I made by Lucene, like MySQL command "SHOW TABLES"? new File(luceneDirRoot).listFiles() can be a choice but I am not sure whether there are other non-Lucene folders.
I would implement one Lucene index pro MySQL table provided you do not need to perform search over several tables. Alternative would be to write everything into one index and add table name into each lucene document, that way you could limit the search to particular table.
AFAIK Lucene does not support SHOW TABLES equivalent the way you desire it, but you might easily do that by yourself, e.g. by using naming convention for the directories.
I would recommend to look at Hibernate Search, this is a good match for your needs, it builds one index directory pro table and allows you to perform full text search while handling the low-level lucene issues for you. You just configure the index by annotating the JPA entities corresponding to your tables and have to implement the full text queries. This is much easier then doing naked Lucene with data from MySQL on your own, Hibernate Search builds the index for you and integrates well with data from relational DB such as MySQL.

Hibernate Search Result Ranking

I am using Hibernate Search Along with Lucene to implement full text search on my data base. I want to know that do hibernate search query or lucene query return top ranked and the most relevant results? Documentation says:
Apache Lucene provides a very flexible and powerful way to sort
results. While the default sorting (by relevance) is appropriate most
of the time
Link: http://docs.jboss.org/hibernate/search/4.2/reference/en-US/html_single/#search-query
Section: 5.1.3.3. Sorting
But I am very confused with the results as they are always arranged with the IDs of the objects. I just need the top 100 most relevant records.
See Customizing Lucene's scoring formula
Sorting by relevance is affected by your Analyzer choices. If you are getting results in the order of primary keys it is likely that they are all having the same score, which is normally very unlikely so my guess is that you're not enabling tokenization on any searched field.
Make sure you're tokenizing the fields used in the Query and they are using an appropriate Analyzer. To pick an appropriate one you'll have to experiment a bit as it depends on the language (if it's natural language) or on what kind of data you're indexing.
To actually debug the sort order applied by Relevance sort, see usage of Projections in the Hibernate Search documentation: both FullTextQuery.SCORE and FullTextQuery.EXPLANATION can be very useful to understand what's going on.
A handy utility to quickly experiment the effect of different Analyzers is to use org.hibernate.search.util.AnalyzerUtils. You can either write unit tests creating the Analyzer instance yourself or you can retrieve the analyzers by name using org.hibernate.search.engine.SearchFactory.getAnalyzer(String) or the base one used for a specific indexed entity by entity type: org.hibernate.search.engine.SearchFactory.getAnalyzer(Class).

Content searching in a web application

I want to do a content search in my database. And the requirement is it has to be like a google search completely based on Ajax. Can you guys suggest me any framework or architecture or any kind of idea?
Example:
Employee Table contains Employee First Name, Last Name , Middle Name and Email. I have to search the table by providing any one of the field and the details of that employee should be populated
Consider using an index-based search engine. Apache Lucene is immensely popular, high-performant and well documented.
There are different parts to your question:
1) Ajax library: you could use Jquery which provides simple ajax methods http://api.jquery.com/jQuery.ajax/
2) On the server end there are a couple of options depending on the type of database youre using. Is it relational, is it nosql? Is your choice of database flexible or is it set in stone?
Lucene provides a query language for an index with more complex information and search requirements. But if your use case is as simple as the one above, you might just shoot off different SQL queries (assuming your database is relational).

Using Hibernate-Search for Complex Queries instead of Criteria API

In an app using Wicket+Spring+JPA/Hibernate stack, I have an Inbox/Search page which should have quite complex search capabilities, where records saved in a database are filtered using a myriad of filtering options. So far I've used JPA Criteria API to build the database query but it's getting quite messy. I was wondering if Hibernate-Search would be a good fit for this even though I don't really need any full-text search capabilities, I just feel (from what I read about it) that producing the query might be a bit easier?
Sorry, but Hibernate Search is based on Lucence. It is not just an other query language.
Lucene does not serach for entities in your database, it search for attibutes in the Lucene index.
Hibernate Search add the functionality to connect the Entities from your Database to the Lucene Index.
Hibernate Search and Lucene are create tool when you need advanced full text search. But if you don't need it, it is only a lot of unnesseary work (and problems).
So, as long as you do not use Lucene, Hibernate Search does not fit your needs.
The primary use case for Hibernate Search is fulltext search. However, it can also be used to index/search simple attributes/criteria. Whether the syntax for writing the queries is simpler than a criteria query is a matter of taste.
If you are not using the fulltext search capabilities you have to consider that you are adding an additional step in your application. The search query will be run against the Lucene index which will return entity ids (unless projection is used). The matching entities will then be fetched from the database.
On the other hand, once you use Hibernate Search it is easy to "improve" your search by adding some fulltext search capabilities to some of your criteria (if possible).
Whether or not you are using Search, I think the key is to write some sort of framework which programmatically builds your queries - Search or Criteria queries.

Should I use Lucene only for search?

Our website needs to give out data to the world. This is open-source data that we have stored, and we want it to make it publicly available. It's about 2 million records.
We've implemented the search of these records using Lucene, which is fine, however we'd like to show an individual record (say the user clicks on it after the search is done) and provide more detailed information for that record.
This more detailed information however isn't stored in the index directly... there are like many-to-many relationships and we use our relational database (MySQL) to provide this information.
So like a single record belongs to a category, we want the user to click on that category and show the rest of the records within that category (lots more associations like this).
My question is, should we use Lucene also to store this sort of information and retrieve it through simple search (category:apples), or should MySQL continue doing this logical job? Should I use Lucene only for the search part?
EDIT
I would like to point out that all of our records are pretty static.... changes are made to this data once every week or so.
Lucene's strength lies in rapidly building an index of a set of documents and allowing you to search over them. If this "detailed information" does not need to be indexed or searched over, then don't store it in Lucene.
Lucene is not a database, it's an index.
You want to use Lucene to store data?, I thing it's ok, I've used Solr http://lucene.apache.org/solr/
which built on top of Lucene to work as search engine and store more data relate to the record that maybe use for front end display. It worked with 500k records for me, and 2mil records I think it should be fine.

Categories

Resources