Implementing Google like Searching using Hibernate Search and Lucene

Implementing Google like Searching using Hibernate Search and Lucene - java

I am using Hibernate Search and Lucene for full text Search on the content field of my document database. I have a search text box which is taking user query. I have fixed the search to phrase matching based search. I want to use the combination of search. To explain my point let's say user wants to search "United States". If I use phrase based search, it will give me every occurrence on the query and ignoring individual occurrences of "United" and "States". If I make the search to field matching, it will fetch all the results containing individual query words. My question is, Is there any direct way so that if user search for a phrase with quotations mark or any other mark, the hibernate search apply phrase based search. Other wise it retrieve the word based results. If user enter two query words separated with any Boolean character, it apply boolean search, etc. For example:
Example Query | Description
United States | Search for all occurrences of two words: United and States
"United States" | Search for phrase "United States"
United NOT States | Apply Boolean not query on United and States
etc
I want to implement something like google, I know that google is too power full but at least a little bit of it can be done. I just want to know that is there any built in functionality in Hibernate Search and lucene for this type of thing or I need to give user some operators, parse user query manually, implement some logic to find out operators, and other symbols and then apply query based on found symbols. Kindly Help

There is nothing like that directly in Hibernate Search, but Lucene has a query parser. For its syntax have a look at - http://lucene.apache.org/core/4_10_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description.
If you are happy with its functionality and syntax you could just pass the user input to the Lucene query parser. If not, you will need to write your own syntax and syntax parser which will translate the query into an appropriate Hibernate Search / Lucene query.

Related

Precise Search with MySQL

Lets suppose we have one table with column id, post_text.
And we have data in it as :
id post_text
Rent hotel Mohali near bus stop.
Hotel mohali
Rent hotel Delhi
Rent hotel Lima , lower prices.
lets suppose I want to search with word "Rent mohali" than I should get only first result.
i.e
id post_text
Rent hotel Mohali near bus stop.
because it both word rent and mohali is present in it.
#mysql, #java, #searching ,#precise search

If you're using MySQL and you'd like to use dynamic searching, you can use Full-Text feature which is more powerful than using the LIKE but be aware that you have to change the structure of you database to support searching index. Using the Full-Text, in addition to getting the result by semantic similarity, it offers more options to filter the query.
Like keyword in the other hand is not powerful and not dynamic but you don't have to change your database structure with indexes and nature language mode. If you follow that way, you have to find a way to split into multiple likes to try to find two things in one search.

SQLite Prefix search JAVA Virtual Table [duplicate]

I am using SQLite FTS extension in my iOS application.
It performs well but the problem is that it matches only string prefixes (or starts with keyword search).
i.e.
This works:
SELECT FROM tablename WHERE columnname MATCH 'searchterm*'
but following two don't:
SELECT FROM tablename WHERE columnname MATCH '*searchterm'
SELECT FROM tablename WHERE columnname MATCH '\*searchterm\*'
Is there any workaround for this or any way to use FTS to build a query similar to LIKE '%searchterm%' query.
EDIT:
As pointed out by Retterdesdialogs, storing the entire text in reverse order and running a prefix search on a reverse string is a possible solution for ends with/suffix search problem, which was my original question, but it won't work for 'contains' search. I have updated the question accordingly.

In my iOS and Android applications, I have shied away from FTS search for exactly the reason that it doesn't support substring matches due to lack of suffix queries.
The workarounds seem complicated.
I have resorted to using LIKE queries, which while being less performant than MATCH, served my needs.

The workaround is to store the reverse string in an extra column. See this link (its not exactly the same it should give a idea):
Search Suffix using Full Text Search

To get it to work for contains queries, you need to store all suffixes of the terms you want to be able to search. This has the downside of making the database really large, but that can be avoided by compressing the data.
SQLite FTS contains and suffix matches

Where can we use ElasticSearch and Where can we use MongoDB?

My question is, which situation we can choose MongoDB and which situation we can choose ElasticSearch.

If you have a case where you want to do search on particular word and you know that word is present in your db you can go for mongodb directly. But if you have a case where you want to do search partially then go for elastic search.
Example: If you do text indexing on your document's some fields, mongo text search will work on single word search. Suppose you have test field in your collection you did text indexing. test field has value " I am testing it ". on text search if you pass "testing" it will give you the document which has "testing" word in collection. But now if you search for "test" you will gonna get no data.
Instead if you try doing the same in elastic search even for "tes","testi" etc. partial search you will gonna get data in response.
reference: http://blog.mpayetta.com/elasticsearch/mongodb/2016/08/04/full-text-indexing-with-elastic-search-and-mongodb/

Is it possible to create a multivalued polyfield in Solr that will allow custom logic at query time?

I'm working with a pretty niche requirement to model a relational structure within Solr and thought that a custom polyfield would be the most suitable solution to my problem. In short, each record in the index will have a number of embargo and expiry dates for when the content should be considered 'available'. These dates are grouped with another kind of categorisation (let's say by device), so for example, any given item in the index may be available for mobile users between two dates, but only available for desktop users between another two dates.
Much like the currency and the latlon types, I would index the values as a comma separated list representing each availability window, for example:
mobile,2013-09-23T00:00:00Z,2013-09-30T00:00:00Z
So, a single index record could look like
{
id: "1234",
text: ["foobarbaz"],
availability: [
"mobile,2013-09-23T00:00:00Z,2013-09-30T00:00:00Z",
"pc,2013-09-22T00:00:00Z,2013-09-30T00:00:00Z"
]
}
The custom type would do the job of parsing the incoming value and storing it accordingly. Is this a viable solution? How would I approach the custom logic required at query time to filter by device and then make sure that NOW is within the provided dates?
My attempt so far has been based on the Currency field type, but now I've dialled it back to just storing the string in its un-parsed state. If I could prove that the filtering I want is even possible before using the polyfield features, then I'll know if it's worth continuing.
Does anybody else have any experience writing custom (poly)fields, or doing anything similar to what I'm doing?
Thanks!

If you want to be able to filter and search on these ranges, I don't think you'll have much luck storing records like that. It would make more sense to me to have a more structured document, something like:
id: "1234",
text: ["foobarbaz"],
mobileavailabilitystart: "mobile,2013-09-23T00:00:00Z",
mobileavailabilityend: "2013-09-30T00:00:00Z",
pcavailabilitystart: "2013-09-22T00:00:00Z",
pcavailabilityend: "2013-09-30T00:00:00Z"
Indexing the full contents of a csv line in Lucene/Solr, in a single field, would allow you to perform full-text searches on it, but would not be a good way to support querying for a specific element of it.

What is the Use of Lucene?

i have heard lot of time the name Lucene , while i try to fetch details of web crawler it show up most of time.whats the use of Lucene?

Lucene is a search engine library designed to address the problem of performing keyword search over a large number of documents. The system works by processing the documents to extract all of the words, and then creating a reverse index. This index allows the search engine to quickly identify the documents containing the user's search term or terms, rank them, and then return them to the user.
Lucene supports a variety of advanced features such as phrase queries, wildcard queries and proximity queries (i.e. "cat" near "dog"), search for keywords within particular "fields" (e.g. subject, author) and so on.
Basically, it is one of the ways to add text search capability to document management applications of various kinds.

Lucene is a search engine. You would use lucene in a project if you wanted a fast indexed search. More details can be found on http://lucene.apache.org/java/docs/index.html

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.