How can I automatically convert all Lucene TermQuery objects to PrefixQuery?

How can I automatically convert all Lucene TermQuery objects to PrefixQuery? - java

I'm using QueryParser with a StandardAnalyzer to parse a queryString. With this setup, if I search for "key short", it will not match the text "keyboard shortcut".
I think it's because the queryString "key short" gets parsed as BooleanQuery(TermQuery("key"), TermQuery("short")). If I wanted it to match "keyboard shortcut", I'd have to search for "key* short*". I'd like the QueryParser to do this for me automatically, ie produce: BooleanQuery(PrefixQuery("key"), PrefixQuery("short")) when given the queryString "key short".
Is this the right approach? If so, how should I go about doing this?

I never found a 'proper' solution to this, so I implemented a hack that appends wildcards to individual words in the raw query and then feeds that to the analyzer:
private static final Pattern QUERY_WORD_PATTERN = Pattern.compile("(?<= |^)(?!AND|OR)(\\w+)(?= |$)");
...
String processedQuery = String.format("%s OR %s",
QUERY_WORD_PATTERN.matcher(queryString).replaceAll("$1*"),
queryString);
Query query = new QueryParser(CONTENTS_FIELD, analyzer).parse(processedQuery);

Related

Lucene query modification

I have a requirement where I want to modify string formatted lucene query values.
I am taking lucene query as input from user interface and passing it to elastic.
For e.g.
Input : name:"abc" and age:26
Output expected: name: "abcmodified" and userage:26
How do I parse and modify string formatted lucene query in java?

Have you tried looking into org.apache.lucene.queryparser.classic.QueryParser? It has functionality to return a Lucene Query Object from an input string. For example:
String rawQuery = "name:abc AND age:26";
QueryParser parser = new QueryParser(Version.LUCENE_45, null, new WhitespaceAnalyzer(Version.LUCENE_45));
BooleanQuery query = (BooleanQuery) praser.parse(rawQuery);
query.clauses().get(0).setQuery(new TermQuery(new Term("name", "abcmodified")));
query.clauses().get(1).setQuery(new TermQuery(new Term("userage", "26")));
System.out.println(query);
Will print +name:abcmodified +userage:26, which is essentially what you want. Obviously you can have smarter processing using a recursive method that traverses the query based on the query type (Boolean, Prefix, Term, Fuzzy etc...)
Hope this helps!

Lucene Phrase Query not working

I have a String address = "456 SOME STREET";
which I have to search in Lucene, I have created the index for this
StringField address = new StringField(Constants.ORGANIZATION_ADDRESS, address,Field.Store.YES);
And I am using Phrase Query to search this String using below Code
String[] tokens = address.split("\\s+");
PhraseQuery addressQuery = new PhraseQuery(Constants.ORGANIZATION_ADDRESS, tokens);
finalQuery.add(addressQuery, BooleanClause.Occur.MUST);
But its not giving me any result,I have tried TermQuery as well but that is also not working. Would really appreciate any help because I have tried many options now and I am unable to figure out whats wrong
I have also tried below
For Indexing :
doc.add(new StringField(Constants.ORGANIZATION_ADDRESS, address,Field.Store.YES));
Search using Term Query :
fullAddressExact= fullAddressExact.toLowerCase();
TermQuery tq = new TermQuery(new Term(Constants.ORGANIZATION_ADDRESS,fullAddressExact));
finalQuery.add(tq, BooleanClause.Occur.MUST);
Even this doesnt give any result. My intention to get the exact match

You should probably use TextField, not StringField when indexing the documents.
StringField stores the string as is, without breaking it into tokens, so in your example the index will contain "456 SOME STREET". Only a TermQuery with this term will retrieve it (or a PrefixQuery).
TextField is the standard field when indexing text, it splits the text into tokens (using a Tokenizer) and indexes the words separately, in your example, 456, SOME, STREET can all be used to find the document.
Read more about it here (a bit old, but relevant).

Lucene: queryparser vs phrasequery or termquery

what are the advantages of not using queryparser and using phrasequery or termquery? It seems to me you can use queryparser to replace any of those?
For example, if I want to search for a exact phrase, I can do:
String searchString = "\"word1 word2\"";
QueryParser queryParser = new QueryParser(Version.LUCENE_46,"content", analyzer);
Query query = queryParser.parse(searchString);
or if I want to search for 2 terms, I can do
String searchString = "word1* AND word2*";
QueryParser queryParser = new QueryParser(Version.LUCENE_46,"content", analyzer);
Query query = queryParser.parse(searchString);
Currently, I am only using queryparser and it is working for me, but is this the correct way of using Lucene?

Main disadvantage of not using QueryParser is following (it's especially the case when using Solr/Elastic):
When you're creating the TermQuery, something like this:
Query q = new TermQuery("text", "keyword")
the problem will be that you need to apply analyzers/filters manually. Let's say user types KeyWord, then if you just pass it into TermQuery, you will not find anything, if during indexing time you were using lowercasing. Of course the lowercasing is simple, but do you want to apply everything in the code for stemming/nramming, etc., and not relying on existing functionality from analyzers/filters?

ElasticSearch Full Text Search

I try to run full text search with regular expression on elastic search java api. My filter is like this:
FilterBuilder qFilter= FilterBuilders.regexpFilter("_all",
". *"+text+". *");
But it matches with only one word not with a phrase. What I mean is, for example:
if there is a string in the soruce like: "one two three four five.." and when my text string is like these: "two" , "our", "thr" ... then it works.
But when my realTimeTextIn string is "two three" full text search doesn't work. I can't search one more than one words.
What I'm missing here?
The rest of the codes are something like this:
FilterBuilder qFilter = FilterBuilders.regexpFilter("_all", ".*"+q+".*");
SearchResponse response = ClientProvider.instance().getClient().prepareSearch(index)
.setTypes(type)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setPostFilter(qFilter)
.setFrom(0).setSize(250).setExplain(true)
.execute()
.actionGet();
Thanks for helps.

When text string is empty or null,this join method throws exception.
You can use regexp filter like this.
FilterBuilder qFilter = FilterBuilders.regexpFilter("_all",(".*"+q+".*").replace(" ", ".*"));

That is an interesting question. I found something like phrase queries and phrase matching:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/phrase-matching.html
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_phrase_search.html
In java api we can do this for query (I tested this):
SearchResponse response = client.prepareSearch(index)
.setTypes(type)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setFrom(0).setSize(250).setExplain(true).setQuery(QueryBuilders.matchPhraseQuery(field, "one two"))
.execute()
.actionGet();
I'm sorry, but I didn't find any solution.
You can try build a script filter (insert plain json to your filter instead of java method) or something called query filter:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-filter.html
I hope that it helped you a little.
EDIT:
Of course there is a simple solution, but I don't know if it satistfies you.
FilterBuilder qFilter= FilterBuilders.regexpFilter(
"_all",". *"+Joiner.on(".*").join(text.split(" "))+". *");

I happened to make a full text search like this using query builder
QueryBuilder queryBuilder = QueryBuilders.multiMatchQuery(query)
.field("name", 2.0f)
.field("email")
.field("title")
.field("jobDescription", 3.0f)
.type(MultiMatchQueryBuilder.Type.PHRASE_PREFIX);

Lucene: Multiple words in a single term

Let's say I have a docs like
stringfield:123456
textfield:name website stackoverflow
and If I build a query in the following manner
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_42);
QueryParser luceneQueryParser = new QueryParser(Version.LUCENE_42, "", analyzer);
Query luceneSearchQuery = luceneQueryParser.parse("textfield:\"name website\"");
it will return the doc as expected, but if I build my query using Lucene QueryAPI
PhraseQuery firstNameQuery = new PhraseQuery();
firstNameQuery.add(new Term("textfield","name website"));
it will not give me any result, i will have to tokenize "name website" and add each token in phrasequery.
Is there any default way in QueryAPI to tokenize as it does while parsing a String Query.
Sure I can do that myself but reinvent the wheel if it's already implemented.

You are adding the entire query as a single term to your PhraseQuery. You are on the right track, but when tokenized, that will not be a single term, but rather two. That is, your index has the terms name, website, and stackoverflow, but your query only has one term, which matches none of those name website.
The correct way to use a PhraseQuery, is to add each term to the PhraseQuery separately.
PhraseQuery phrase = new PhraseQuery();
phrase.add(new Term("textfield", "name"));
phrase.add(new Term("textfield", "website"));

When you:
luceneQueryParser.parse("textfield:\"name website\"");
Lucene will tokenize the string "name website", and get 2 terms.
When you:
new Term("textfield","name website")
Lucene will not tokenize the string "name website", instead use the whole as a term.
As the result what you said, when you index the document, the field textfield MUST be Indexed and Tokenized.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How can I automatically convert all Lucene TermQuery objects to PrefixQuery? - java

Related

Lucene query modification

Lucene Phrase Query not working

Lucene: queryparser vs phrasequery or termquery

ElasticSearch Full Text Search

Lucene: Multiple words in a single term

Categories

Resources