Lucene query object and search - java

I am upgrading from Lucene 3.6 to 5.3.0, but the search doesn't want to take my parameters when using 5.3.0.
This works in 3.6:
IndexSearcher searcher = new IndexSearcher(IndexReader.open(directory));
SimpleAnalyzer analyzer = new SimpleAnalyzer(Version.LUCENE_36);
QueryParser parser = new QueryParser(Version.LUCENE_36, "contents",
analyzer);
TopDocs topDocs = null;
Query query = parser.parse(queryString);
topDocs = searcher.search(query, 1000);
But in 5.3, the compiler is asking me to use SrndQuery, but I still get an error on the searcher.search method:
IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(directory));
Analyzer analyzer = new SimpleAnalyzer();
QueryParser parser = new QueryParser();
TopDocs topDocs = null;
SrndQuery query = QueryParser.parse(queryString);
topDocs = searcher.search(query, 1000);//**The method search(Query, int) in the type IndexSearcher is not applicable for the arguments (SrndQuery, int)**
Not sure what I am doing wrong here. Any ideas?
P.S. I am upgrading because I am not able to get Highlighted text from some PDFs I recently indexed.

It bears stating that you are using the Surround query parser, rather than the standard query parser (if you are intending to use the standard parser, you are importing the wrong one).
The problem you are running into is that a SrndQuery isn't really a lucene query, so you can't just run it into the searcher and get results. You need to transform it into lucene query to search with it. This is done via the SrndQuery.makeLuceneQueryField method. You'll need to create a BasicQueryFactory to pass into it, but they are easy to construct:
SrndQuery query = QueryParser.parse(queryString);
BasicQueryFactory factory = new BasicQueryFactory(1000 /*maxBasicQueries*/);
Query luceneQuery = query.makeLuceneQueryField("myDefaultField", factory);
topDocs = searcher.search(luceneQuery, 1000);
Somewhat Tangential Note: I kinda wondered if you should keep the BasicQueryFactory around, rather than creating a new one for every search, but appears to be unnecessary. Definitely nothing expensive going on in the ctor, and it looks like solr's SurroundQParserPlugin constructs a new one for each query it parses, so doing that should be fine.

Related

Apache Lucene createWeight() for wildcard query

I'm using Apache Lucene 6.6.0 and I'm trying to extract terms from the search query. Current version of code looks like this:
Query parsedQuery = new AnalyzingQueryParser("", analyzer).parse(query);
Weight weight = parsedQuery.createWeight(searcher, false);
Set<Term> terms = new HashSet<>();
weight.extractTerms(terms);
It works pretty much fine, but recently I noticed that it doesn't support queries with wildcards (i.e. * sign). If the query contains wildcard(s), then I get an exception:
java.lang.UnsupportedOperationException: Query
id:123*456 does not implement createWeight at
org.apache.lucene.search.Query.createWeight(Query.java:66) at
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:751)
at
org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:60)
at
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:225)
So is there a way to use createWeight() with wildcarded queries? Or maybe there's another way to extract search terms from query without createWeight()?
Long story short, it is necessary to rewrite the query, for example, as follows:
final AnalyzingQueryParser analyzingQueryParser = new AnalyzingQueryParser("", analyzer);
// TODO: The rewrite method can be overridden.
// analyzingQueryParser.setMultiTermRewriteMethod(MultiTermQuery.CONSTANT_SCORE_BOOLEAN_REWRITE);
Query parsedQuery = analyzingQueryParser.parse(query);
// Here parsedQuery is an instance of the org.apache.lucene.search.WildcardQuery class.
parsedQuery = parsedQuery.rewrite(reader);
// Here parsedQuery is an instance of the org.apache.lucene.search.MultiTermQueryConstantScoreWrapper class.
final Weight weight = parsedQuery.createWeight(searcher, false);
final Set<Term> terms = new HashSet<>();
weight.extractTerms(terms);
Please refer to the thread:
Nabble: Lucene - Java Users - How to get the terms matching a WildCardQuery in Lucene 6.2?
Mail archive: How to get the terms matching a WildCardQuery in Lucene 6.2?
for further details.
It seems the mentioned Stack Overflow question is this one: How to get matches from a wildcard Query in Lucene 6.2.

Apache Lucene 6 QueryParser range query is not working with IntPoint

I'm using Lucene 6 new IntPoint and I want to do some range search
Using IntPoint.newRangeQuery the search works and the correct documents are returned, however when I'm using QueryParser (classic) or the new StandardQueryParser nothing is returned.
// This works
Query query = IntPoint.newRangeQuery("duration",1,20);
System.out.println(query);
//This doesn't work
QueryParser parser = new QueryParser("name", analyzer);
Query query = parser.parse("duration:[1 TO 20]");
System.out.println(query);
//This doesn't work
StandardQueryParser queryParserHelper = new StandardQueryParser();
Query query = queryParserHelper.parse("timestamp:[1 TO 20]", "timestamp");
System.out.println(query);
// In all 3 cases it prints: timestamp:[1 TO 20]
Is this a bug or am I missing something?
It's not a bug, and I wouldn't say you are missing anything, really. QueryParser doesn't have any support for IntPoint fields, or any other numeric (PointValues) field types. Range queries in QueryParser syntax will always generate a TermRangeQuery, which will search for that field based on lexicographic order in the inverted index, which will not be work for searching PointValues fields. Generating these using IntPoint.newRangeQuery and similar methods is the correct thing to do.

Lucene: queryparser vs phrasequery or termquery

what are the advantages of not using queryparser and using phrasequery or termquery? It seems to me you can use queryparser to replace any of those?
For example, if I want to search for a exact phrase, I can do:
String searchString = "\"word1 word2\"";
QueryParser queryParser = new QueryParser(Version.LUCENE_46,"content", analyzer);
Query query = queryParser.parse(searchString);
or if I want to search for 2 terms, I can do
String searchString = "word1* AND word2*";
QueryParser queryParser = new QueryParser(Version.LUCENE_46,"content", analyzer);
Query query = queryParser.parse(searchString);
Currently, I am only using queryparser and it is working for me, but is this the correct way of using Lucene?
Main disadvantage of not using QueryParser is following (it's especially the case when using Solr/Elastic):
When you're creating the TermQuery, something like this:
Query q = new TermQuery("text", "keyword")
the problem will be that you need to apply analyzers/filters manually. Let's say user types KeyWord, then if you just pass it into TermQuery, you will not find anything, if during indexing time you were using lowercasing. Of course the lowercasing is simple, but do you want to apply everything in the code for stemming/nramming, etc., and not relying on existing functionality from analyzers/filters?

lucene BooleanQuery.Builder Build doesn't Work

Hello Guys i have a Question :)
I create a BooleanQuery Like this :
BooleanQuery.Builder qry = new BooleanQuery.Builder();
qry.add(new TermQuery(new Term("Name", "Anna")), BooleanClause.Occur.SHOULD);
And if i do a search like this now :
TopDocs docs = searcher.search(qry.build(), hitsPerPage);
it gets Zero Results ? But if I use this code :
TopDocs docs = searcher.search(parser.parse(qry.build().toString()), hitsPerPage);
Then I get the right results ? Can you explain me why I have to parse it again ?
I am using Version 5.5.0 and Name is a TextField
A TextField runs your data through an analyzer and will likely produce the term "anna" (lowercase). A TermQuery does not run anything through an analyzer, so it searches for "Anna" (uppercase) and this does not match. Create the TermQuery with the lowercased term and you should see results: new TermQuery(new Term("Name", "anna")).
The BooleanQuery has nothing to do with this, in fact, this particular query would rewrite itself to the underlying TermQuery, as this is the only subquery.
The parser takes the string "Name:Anna" (produced by the TermQuery), runs it through the analyzer and gives you a "Name:anna" TermQuery, that's why it works if you run the query through the parser – it involves the necessary analyzing step.

Lucene - get document ids from term

In Lucene 4.1, I see you can use DirectoryReader.docFreq() to get the number of documents in an index containing a given term. Is there a way to actually get those documents? Either the objects or id numbers would be fine. I think AtomicReader.termDocsEnum() would be useful, but I'm not sure if I can use AtomicReader - I don't see how to create an AtomicReader instance on a given directory.
Why not just search for it?
IndexSearcher searcher = new IndexSearcher(directoryReader);
TermQuery query = new TermQuery(new Term("field", "term"));
TopDocs topdocs = searcher.query(query, numberToReturn);

Categories

Resources