Apache Lucene createWeight() for wildcard query

Apache Lucene createWeight() for wildcard query - java

I'm using Apache Lucene 6.6.0 and I'm trying to extract terms from the search query. Current version of code looks like this:
Query parsedQuery = new AnalyzingQueryParser("", analyzer).parse(query);
Weight weight = parsedQuery.createWeight(searcher, false);
Set<Term> terms = new HashSet<>();
weight.extractTerms(terms);
It works pretty much fine, but recently I noticed that it doesn't support queries with wildcards (i.e. * sign). If the query contains wildcard(s), then I get an exception:
java.lang.UnsupportedOperationException: Query
id:123*456 does not implement createWeight at
org.apache.lucene.search.Query.createWeight(Query.java:66) at
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:751)
at
org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:60)
at
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:225)
So is there a way to use createWeight() with wildcarded queries? Or maybe there's another way to extract search terms from query without createWeight()?

Long story short, it is necessary to rewrite the query, for example, as follows:
final AnalyzingQueryParser analyzingQueryParser = new AnalyzingQueryParser("", analyzer);
// TODO: The rewrite method can be overridden.
// analyzingQueryParser.setMultiTermRewriteMethod(MultiTermQuery.CONSTANT_SCORE_BOOLEAN_REWRITE);
Query parsedQuery = analyzingQueryParser.parse(query);
// Here parsedQuery is an instance of the org.apache.lucene.search.WildcardQuery class.
parsedQuery = parsedQuery.rewrite(reader);
// Here parsedQuery is an instance of the org.apache.lucene.search.MultiTermQueryConstantScoreWrapper class.
final Weight weight = parsedQuery.createWeight(searcher, false);
final Set<Term> terms = new HashSet<>();
weight.extractTerms(terms);
Please refer to the thread:
Nabble: Lucene - Java Users - How to get the terms matching a WildCardQuery in Lucene 6.2?
Mail archive: How to get the terms matching a WildCardQuery in Lucene 6.2?
for further details.
It seems the mentioned Stack Overflow question is this one: How to get matches from a wildcard Query in Lucene 6.2.

Related

Elasticsearch multiMatchQuery with wildcard

I have a requirement to search in all the fileds data with wild card in java using elaticsearch spring data in spring boot
Here is my code
String queryString = "362*";
final Query searchQuery = new NativeSearchQueryBuilder().withFilter(QueryBuilders.multiMatchQuery(queryString))
.build();
SearchHits<Measurement> meas = elasticsearchTemplate.search(searchQuery, Measurement.class);
System.out.println(meas.getTotalHits());
i am not able to get results with wildcard.
can someone help here.

multi-match queries is the multi field version of the match query. And like this one, it does not support wildcards but works on analysed text. See the documentation at https://www.elastic.co/guide/en/elasticsearch/reference/7.13/query-dsl-multi-match-query.html and https://www.elastic.co/guide/en/elasticsearch/reference/7.13/query-dsl-match-query.html

How to use JOOQ's parser to extract table names from SQL statements [duplicate]

Using the JOOQ parser API, I'm able to parse the following query and get the parameters map from the resulting Query object. From this, I can tell that there is one parameter, and it's name is "something".
However, I haven't been able to figure out how to determine that the parameter "something" is assigned to a column named "BAZ" and that column is part of the table "BAR".
Does the parser API have a way to get the table/column metadata associated to each parameter?
String sql = "SELECT A.FOO FROM BAR A WHERE A.BAZ = :something";
DSLContext context = DSL.using...
Parser parser = context.parser();
Query query = parser.parseQuery(sql);
Map<String, Param<?>> params = query.getParams();

Starting from jOOQ 3.16
jOOQ 3.16 introduced a new, experimental (as of 3.16) query object model API, which can be traversed, see:
The manual
A blog post about traversing jOOQ expression trees
Specifically, you can write:
List<QueryPart> parts = query.$traverse(
Traversers.findingAll(q -> q instanceof Param)
);
Or, to conveniently produce exactly the type you wanted:
Map<String, Param<?>> params = query.$traverse(Traversers.collecting(
Collectors.filtering(q -> q instanceof Param,
Collectors.toMap(
q -> ((Param<?>) q).getParamName(),
q -> (Param<?>) q
)
)
));
The Collectors.toMap() call could include a mergeFunction, in case you have the same param name twice.
Pre jOOQ 3.16
As of jOOQ 3.11, the SPI that can be used to access the internal expression tree is the VisitListener SPI, which you have to attach to your context.configuration() prior to parsing. It will then be invoked whenever you traverse that expression tree, e.g. on your query.getParams() call.
However, there's quite a bit of manual plumbing that needs to be done. For example, the VisitListener will only see A.BAZ as a column reference without knowing directly that A is the renamed table BAR. You will have to keep track of such renaming yourself when you visit the BAR A expression.

Apache Lucene 6 QueryParser range query is not working with IntPoint

I'm using Lucene 6 new IntPoint and I want to do some range search
Using IntPoint.newRangeQuery the search works and the correct documents are returned, however when I'm using QueryParser (classic) or the new StandardQueryParser nothing is returned.
// This works
Query query = IntPoint.newRangeQuery("duration",1,20);
System.out.println(query);
//This doesn't work
QueryParser parser = new QueryParser("name", analyzer);
Query query = parser.parse("duration:[1 TO 20]");
System.out.println(query);
//This doesn't work
StandardQueryParser queryParserHelper = new StandardQueryParser();
Query query = queryParserHelper.parse("timestamp:[1 TO 20]", "timestamp");
System.out.println(query);
// In all 3 cases it prints: timestamp:[1 TO 20]
Is this a bug or am I missing something?

It's not a bug, and I wouldn't say you are missing anything, really. QueryParser doesn't have any support for IntPoint fields, or any other numeric (PointValues) field types. Range queries in QueryParser syntax will always generate a TermRangeQuery, which will search for that field based on lexicographic order in the inverted index, which will not be work for searching PointValues fields. Generating these using IntPoint.newRangeQuery and similar methods is the correct thing to do.

MongoDB Text Index using Java Driver

Using the MongoDB Java API, I have not been able to successfully locate a full example using text search. The code I am using is this:
DBCollection coll;
String searchString = "Test String";
coll.createIndex(new BasicDBObject ("blogcomments", "text"));
DBObject q = start("blogcomments").text(searchString).get();
The name of my collection that I am performing the search on is blogcomments. creatIndex() is the replacement method for the deprecated method ensureIndex(). I have seen examples for how to use the createIndex(), but not how to execute actual searches with the Java API. Is this the correct way to go about doing this?

That's not quite right. Queries that use indexes of type "text" can not specify a field name at query time. Instead, the field names to include in the index are specified at index creation time. See the documentation for examples. Your query will look like this:
DBObject q = QueryBuilder.start().text(searchString).get();

ElasticSearch - Using FilterBuilders

I am new to ElasticSearch and Couchbase. I am building a sample Java application to learn more about ElasticSearch and Couchbase.
Reading the ElasticSearch Java API, Filters are better used in cases where sort on score is not necessary and for caching.
I still haven't figured out how to use FilterBuilders and have following questions:
Can FilterBuilders be used alone to search?
Or Do they always have to be used with a Query? ( If true, can someone please list an example? )
Going through a documentation, if I want to perform a search based on field values and want to use FilterBuilders, how can I accomplish that? (using AndFilterBuilder or TermFilterBuilder or InFilterBuilder? I am not clear about the differences between them.)
For the 3rd question, I actually tested it with search using queries and using filters as shown below.
I got empty result (no rows) when I tried search using FilterBuilders. I am not sure what am I doing wrong.
Any examples will be helpful. I have had a tough time going through documentation which I found sparse and even searching led to various unreliable user forums.
private void processQuery() {
SearchRequestBuilder srb = getSearchRequestBuilder(BUCKET);
QueryBuilder qb = QueryBuilders.fieldQuery("doc.address.state", "TX");
srb.setQuery(qb);
SearchResponse resp = srb.execute().actionGet();
System.out.println("response :" + resp);
}
private void searchWithFilters(){
SearchRequestBuilder srb = getSearchRequestBuilder(BUCKET);
srb.setFilter(FilterBuilders.termFilter("doc.address.state", "tx"));
//AndFilterBuilder andFb = FilterBuilders.andFilter();
//andFb.add(FilterBuilders.termFilter("doc.address.state", "TX"));
//srb.setFilter(andFb);
SearchResponse resp = srb.execute().actionGet();
System.out.println("response :" + resp);
}
--UPDATE--
As suggested in the answer, changing to lowercase "tx" works. With this question resolved. I still have following questions:
In what scenario(s), are filters used with query? What purpose will this serve?
Difference between InFilter, TermFilter and MatchAllFilter. Any illustration will help.

Right, you should use filters to exclude documents from being even considered when executing the query. Filters are faster since they don't involve any scoring, and cacheable as well.
That said, it's pretty obvious that you have to use a filter with the search api, which does execute a query and accepts an optional filter. If you only have a filter you can just use the match_all query together with your filter. A filter can be a simple one, or a compund one in order to combine multiple filters together.
Regarding the Java API, the names used are the names of the filters available, no big difference. Have a look at this search example for instance. In your code I don't see where you do setFilter on your SearchRequestBuilder object. You don't seem to need the and filter either, since you are using a single filter. Furthermore, it might be that you are indexing using the default mappings, thus the term "TX" is lowercased. That's why when you search using the term filter you don't find any match. Try searching for "tx" lowercased.
You can either change your mapping if you want to keep the "TX" term as it is while indexing, probably setting the field as not_analyzed if it should only be a single token. Otherwise you can change filter, you might want to have a look at a query that is analyzed, so that your query wil be analyzed the same way the content was indexed.
Have a look at the query DSL documentation for more information regarding queries and filters:
MatchAllFilter: matches all your document, not that useful I'd say
TermFilter: Filters documents that have fields that contain a term (not analyzed)
AndFilter: compound filter used to put in and two or more filters
Don't know what you mean by InFilterBuilder, couldn't find any filter with this name.
The query usually contains what the user types in through the text search box. Filters are more way to refine the search, for example clicking on facet entries. That's why you would still have the query plus one or more filters.

To append to what #javanna said:
A lot of confusion can come from the fact that filters can be defined in several ways:
standalone (with a required query, for instance match_all if all you need is the filters) (http://www.elasticsearch.org/guide/reference/api/search/filter/)
or as part of a filtered query (http://www.elasticsearch.org/guide/reference/query-dsl/filtered-query/)
What's the difference you might ask. And indeed you can construct exactly the same logic in both ways.
The difference is that a query operates on BOTH the resultset as well as any facets you have defined. Whereas, a Filter (when defined standalone) only operates on the resultset and NOT on any facets you may have defined (explained here: http://www.elasticsearch.org/guide/reference/api/search/filter/)

To add to the other answers, InFilter is only used with FilterBuilders. The definition is, InFilter: A filter for a field based on several terms matching on any of them.
The query Java API uses FilterBuilders, which is a factory for filter builders that can dynamically create a query from Java code. We do this using a form and we build our query based on user selections from it with checkboxes, options, and dropdowns.
Here is some Example code for FilterBuilders and there is a snippet from that link that uses InFilter as shown below:
FilterBuilder filterBuilder;
User user = (User) auth.getPrincipal();
if (user.getGroups() != null && !user.getGroups().isEmpty()) {
filterBuilder = FilterBuilders.boolFilter()
.should(FilterBuilders.nestedFilter("userRoles", FilterBuilders.termFilter("userRoles.key", auth.getName())))
.should(FilterBuilders.nestedFilter("groupRoles", FilterBuilders.inFilter("groupRoles.key", user.getGroups().toArray())));
} else {
filterBuilder = FilterBuilders.nestedFilter("userRoles", FilterBuilders.termFilter("userRoles.key", auth.getName()));
}
...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.