Query matching multiple Ids? Hibernate Search 5 - java

Hello this query used to work in Hibernate Search 4.2, after upgrading to v5 apparently now it doesn't split the search terms:
#Indexed
public class Foo {
#DocumentId
private Integer id;
.....
}
.....
QueryBuilder qb = fullTextEntityManager.getSearchFactory()
.buildQueryBuilder().forEntity(Foo.class).get();
org.apache.lucene.search.Query luceneQuery = qb
.keyword()
.onFields("id")
.matching("123 567")
.createQuery();
In v4, Hibernate Search would create a query matching either of the 2 IDs in the example, however in v5, Hibernate Search no longer splits the "123 567" into 2 terms and treats the whole string as a single value. The same type of query seems to yield the old v4 behavior on any other field that's not the DocumentId. I've read the migration guide and I haven't seen any mention of this change of behavior. How would you rewrite this query now?
Can someone shed some light on this? Thank you.

The field mapped as #DocumentId needs to be treated as a special case. Since it's also used for deletion (and updates) of the document within the index, it has to be treated as a single keyword to avoid any ambiguity: no Analyzer will be applied to it.
The QueryBuilder DSL automatically pre-processes the matching() clause with the same Analyzer as used during the indexing pipeline; since the id field is treated as a single, unique keyword the input text is not broken down in this case.
To load entities by id I'd generally recommend to use a traditional Hibernate Criteria, not using a full-text query.
If you want to use the full-text query to combine it with other full-text restrictions you can either map the id property into an additional #Field (on top of #DocumentId) and give it a different name, or you can target each individual id with a TermQuery and combine all term queries using a BooleanQuery.
I suspect it's not mentioned in the migration guide as this was never intended to work like it did in v4. The fact that it no longer works as in older versions is likely the consequence of a bug fix.

Related

Hibernate search: Indexed data with Ngram filter and while searching it gives incorrect result due to tokenizing while querying

I have an analyzer with this configuration,
searchMapping//
.analyzerDef(BaseEntity.CUSTOM_SEARCH_INDEX_ANALYZER, WhitespaceTokenizerFactory.class)//
.filter(LowerCaseFilterFactory.class)//
.filter(ASCIIFoldingFilterFactory.class)//
.filter(NGramFilterFactory.class).param("minGramSize", "1").param("maxGramSize", "200");
This is how my entity field is configured
#Field(analyzer = #Analyzer(definition = CUSTOM_SEARCH_INDEX_ANALYZER))
private String bookName;
This is how I create a search query
queryBuilder.keyword().onField(prefixedPath).matching(matchingString).createQuery()
I have an entity with value bookName="Gulliver" and another entity with bookName="xGulliver";
If I tried to search with data bookName = xG then am getting both entities where I would expect entity only with bookName="xGulliver";
Also looked on the query that is produced by hibernate-search.
Executing Lucene query '+(+(+(+(
bookName:x
bookName:xg
bookName:g))))
Above Lucene query is prepared using BooleanJunction::must conditions by Lucene I guess which means it should match all the conditions.
Still why its giving me both entity data. I dont understand here.
I can also override the analyzer while querying by having KeywordTokenizer instead of NGramFilterFactory but this is like I have to override for each and every field before creating QueryBuilder which doesnt looks good because then I have to override all index fields which I have about 100 fields and some are dynamic fields and I create individual query for each field.
Is there any other way to override the analyzer in 5.11 version or is it handled in some other way in hibernate-search 6.x version in easier way?
Hibernate versions that I use are,
hibernate-search-elasticsearch, hibernate-search-orm = 5.11.4.Final
Above Lucene query is prepared using BooleanJunction::must conditions by Lucene I guess which means it should match all the conditions. Still why its giving me both entity data. I dont understand here.
When you create a keyword query using Hibernate Search, the string passed to that query is analyzed, and if there are multiple tokens, Hibernate Search creates a boolean query with one "should" clause for each token. You can see it here " bookName:x bookName:xg bookName:g": there is no "+" sign before "bookName", which means those are not "must" clauses, they are "should" clauses.
I can also override the analyzer while querying by having KeywordTokenizer instead of NGramFilterFactory but this is like I have to override for each and every field before creating QueryBuilder which doesnt looks good because then I have to override all index fields which I have about 100 fields and some are dynamic fields and I create individual query for each field.
True, that's annoying.
Is there any other way to override the analyzer in 5.11 version
In 5.11, I don't think there is any other way to override analyzers.
If necessary and if you're using the Lucene backend, I believe you should be able to bypass the Hibernate Search DSL just for this specific query:
Get the analyzer you want: something like Analyzer analyzer = fullTextSession.getSearchFactory().getAnalyzer("myAnalyzerWithoutNGramTokenFilter").
Analyze the search terms: call analyzer.tokenStream(...) and use the TokenStream as appropriate. You'll get a list of tokens.
Create the Lucene Query: essentially it will be a boolean query with one TermQuery for each token.
Pass the resulting Query to Hibernate Search as usual.
or is it handled in some other way in hibernate-search 6.x version in easier way?
It's dead simple in Hibernate Search 6.0.0.Beta4. There are two solutions:
Implicitly: in your mapping, you can specify not only an analyzer (using #FullTextField(analyzer = "myAnalyzer")), but also a "search" analyzer using #FullTextField(analyzer = "myAnalyzer", searchAnalyzer = "mySearchAnalyzer"). The "default" analyzer will be used when indexing, while the "search" analyzer will be used when searching (querying).
Explicitly: at query time, you can override the analyzer on a given predicate by calling .analyzer("mySearchAnalyzer") while building the predicate. There is one example in this section of the documentation.
Note however that dynamic fields are not supported yet in Hibernate Search 6: HSEARCH-3273.

How to exclude or ignore field when building a generic Lucene Query to run against more than 1 Entity

We're in the process of converting our java application from using SOLR/Lucene to Elasticsearch 5.6.6.
In 1 particular area, we would previously build 1 Lucene Search BooleanQuery and run it against 3 different entities, looking for matches. We didn't need to specify the entity it would be used against until you ran the actual query.
BooleanQuery luceneQuery = myQueryBuilder.buildMyQuery();
session.createFullTextQuery(luceneQuery, entityOne);
session.createFullTextQuery(luceneQuery, entityTwo);
session.createFullTextQuery(luceneQuery, entityThree);
One sub-query within [luceneQuery] above searched on taxId, which entityOne doesn't have (no taxId indexed field) but the other 2 entities do have. However it all worked fine, no exceptions were given, I believe it just ignored the unknown/un-indexed field, not exactly sure how it worked, but it did.
Now we're converting over to Elasticsearch DSL, we need to give the entity up front so I (for better or worse) build the query 3 different times, against each entity, like so:
QueryBuilder entityOneQB = session.getSearchFactory().buildQueryBuilder().forEntity(EntityOne.class).get();
QueryBuilder entityTwoQB = session.getSearchFactory().buildQueryBuilder().forEntity(EntityTwo.class).get();
QueryBuilder entityThreeQB = session.getSearchFactory().buildQueryBuilder().forEntity(EntityThree.class).get();
// Create 3 exact (except for which entity they point at) queries
Query entityOneQuery = myQueryBuilder.buildMyQuery(entityOne);
Query entityTwoQuery = myQueryBuilder.buildMyQuery(entityTwo);
Query entityThreeQuery = myQueryBuilder.buildMyQuery(entityThree);
Where buildMyQuery() has a number of sub-queries but the one dealing with taxId looks something like:
qb.bool().should(
qb.keyword()
.onField("taxId")
.matching(taxId)
.createQuery()
);
However, now, since, entityOne doesn't have taxId as an indexed column/field, createQuery() throws an exception:
SearchException: Unable to find field taxId in EntityOne
My questions are:
Is there some way to tell Lucene to ignore the field if the entity doesn't have it?
If not, is there some way, using the passed in QueryBuilder to determine what the entity is, so that, within the taxId subquery code, I can basically say if (entityType == EntityOne) {return null;} so that this particular sub-query won't be included in the overall query?
Is there some way to tell Lucene to ignore the field if the entity doesn't have it?
Just a clarification: it's Hibernate Search that implements the DSL and throws exceptions, not Lucene. Lucene is the underlying technology, and doesn't perform much validation.
If your goal is to retrieve all three entities in a single result list, and if fields with the same name in different entity types are configured similarly (e.g. field "name" appears in entity 1 and 2, but has the same analyzer), you could simply build a single query and retrieve all three types in that single query. You will have to:
Make sure, when building the single Lucene query, to always use the query builder of an entity type that actually defines the field your targeting: if targeting taxId for instance, you can use the query builder for EntityTwo or for EntityThree, but not the one for EntityOne. Yes, that's right: you can mix multiple query builders in a single query, as long as fields with the same name are configured similarly in all targeted entities.
build the FullTextQuery that way: session.createFullTextQuery(luceneQuery, EntityOne.class, EntityTwo.class, EntityThree.class);.
If not, is there some way, using the passed in QueryBuilder to determine what the entity is, so that, within the taxId subquery code, I can basically say if (entityType == EntityOne) {return null;} so that this particular sub-query won't be included in the overall query?
No, there is not. You could pass add a parameter to your method to pass the entity type, though: buildMyQuery(Class<?> type, QueryBuilder queryBuilder) instead of buildMyQuery(QueryBuilder queryBuilder).

Spring-data-couchbase - running non ad-hoc parametrized query

Is any possibility to execute parametrized N1QL query with turned off adhoc flag using annotation #Query?
I.e. for given query:
#Query("#{#n1ql.selectEntity} WHERE #{#n1ql.filter} and author = $author")
List<Comment> getCommentsByAuthor(#Param("author") String author);
If not, is any other way to force couchbase to use secondary index (in this example index on author field) in annotation queries?
You seem to mix a few things:
Parametrized Query will work with the #Query annotation as in your snippet. The documentation mentions it (below the 1st warning block in this section)
N1QL secondary indexes should be automatically picked up by N1QL, depending on the statement. The subtlety here is that n1ql.filter. SDC needs that to limit such queries to the correct set of documents in an heterogeneous bucket.
adhoc is something else: it is about prepared statements. SDC doesn't use that feature, and it will only produce N1QL query with the default value for adhoc (which is still true as far as I know).
If you've logged the query produced by this annotation and run an EXPLAIN on it to effectively see that the index is not picked up, maybe try inverting the two expressions in the WHERE clause?

Class for JPA criteria entity with _ appended to className doesn't exist

I'm trying to convert a simple Play/JPA query to use the criteria API. Below isn't even the query I'm trying to convert; this one's even simpler -- just trying to get something to succeed to begin with.
All the examples I've been finding online expect you to be able to use a class that has _ appended to the class name, much like what I've seen hibernate queries do to table name aliases in the generated SQL. However, I can't get my code to compile this way since there is no class: ExtendedHaulTrain_ (there is however ExtendedHaulTrain)
Is there some kind of annotation I need to add to the ExtendedHaulTrain class? Perhaps I have not been reading deeply enough but the examples I've found so far don't address the issue of the class with the underbar appended.
Here's my code that fails to compile on the last line, specifically on ExtendedHaulTrain_
Query query = JPA.em().createQuery("select DISTINCT(x.trnType) from ExtendedHaulTrain x");
List<String> trainTypes = query.getResultList();
//as criteria query
CriteriaBuilder cb = JPA.em().getCriteriaBuilder();
CriteriaQuery<ExtendedHaulTrain> q = cb.createQuery(ExtendedHaulTrain.class);
Root<ExtendedHaulTrain> xhtRoot = q.from(ExtendedHaulTrain.class);
q.select(xhtRoot.get(ExtendedHaulTrain_.trnType)).distinct(true);
Instead of the MetaModel classes(they end with '_') you can always use the attribute name in form of a string as refrence.
q.select(xhtRoot.get("trynType")).distinct(true);
As noted in my comment there is a notion of a meta-model class I'd rather avoid. So below is how I converted my existing query to use the criteria API. Again, this is just to get a success under my belt; I'm probably not going to replace this query. Rather I have another more complex query, for which I intend to use the Criteria API; this was just to get some familiarity with the Criteria API -- there will probably be more questions to follow!
/*
Query query = JPA.em().createQuery("select DISTINCT(x.trnType) from ExtendedHaulTrain x");
List<String> trainTypes = query.getResultList();
*/
CriteriaBuilder cb = JPA.em().getCriteriaBuilder();
CriteriaQuery cq = cb.createQuery(ExtendedHaulTrain.class);
Root root = cq.from(ExtendedHaulTrain.class);
cq.select(root.get("trnType")).distinct(true);
List<String> trainTypes = JPA.em().createQuery(cq).getResultList();
I understand that you do not like these meta-models but this is actually a very useful thing, which keeps your code on the safe side of type-safety (believe me, once you begin to write more queries, you will see the advantage). And the advantage is: you can generate them automatically with the so called meta-model generators (which are annotation processing tools). Hibernate has for example something one generator. In Eclipse it is very easy to generate them. Also in Maven it is easy. I recommend to use them.
UPDATE
Type Safety means actually beside not having to write xhtRoot.get("trynType") also that you work with correct join types. Do not forget, that compared to NamedQueries, CriteriaQueries are not checked on deployment. This means, if you remove or use the wrong type in the generic part of a join result (WrongOwner below)
Join<WrongOwner, Address> address = cq.join(Pet_.owners).join(Owner_.addresses);
you will know that on compile time.

JPA 2 CriteriaQuery, using a limit

I am using JPA 2. For safety reasons, I am working type safe with CriteriaQuery's (and thus, I am not searching for any solutions to typed queries and so on).
I recently came across an issue in which I needed to set a SQL-LIMIT.
After a lot of searching, I was still not successful in finding a solution.
CriteriaQuery<Product> query = getEntityManager().getCriteriaBuilder().createQuery(Product.class);
Root<Product> product = query.from(Product.class);
query.select(product);
return em.createQuery(query).getResultList();
Can anyone help me?
Define limit and offset on the Query:
return em.createQuery(query)
.setFirstResult(offset) // offset
.setMaxResults(limit) // limit
.getResultList();
From the documentation:
TypedQuery setFirstResult(int startPosition)
Set the position of the first result to retrieve. Parameters:
startPosition - position of the first result, numbered from 0
TypedQuery setMaxResults(int maxResult)
Set the maximum number of results to retrieve.
For the sake of completeness, I want to answer the initial question with regards to the JPA Criteria API.
First of all, you might clarify for yourself beforehand when to use JPQL and when to use the Criteria API depending on the use case. There is a nice article on this at the ObjectDB documentation website which states:
A major advantage of using the criteria API is that errors can be detected earlier, during compilation rather than at runtime. On the other hand, for many developers string based JPQL queries, which are very similar to SQL queries, are easier to use and understand.
I recommend this article in general because it describes concisely how to use the JPA Criteria API. There is a similar article about the Query API.
Back to the question:
A CriteriaQuery offers a set of restrictions that are accessible - for instance - by using the where() method. As you might intuitively guess: you cannot limit the query to a particular number of results with such a restriction - except you have a trivial case like limiting on a unique identifier (which would make the usage of the Criteria API obsolete). Simply explained: a limit is not a criterion and therefore not covered by that api. See also the old but gold Java EE docs for more details.
Solution
However, you can of course use your CriteriaQuery object as a foundation for a JPQL query. So first, you create your CriteriaQuery as is:
CriteriaQuery<Product> criteriaQuery =
getEntityManager().getCriteriaBuilder().createQuery(Product.class);
Root<Product> product = criteriaQuery.from(Product.class);
criteriaQuery.select(product);
Then use the JPA Query constructor for CriteriaQuery objects:
Query limitedCriteriaQuery = getEntityManager().createQuery(criteriaQuery)
.setMaxResults(resultLimit); // this is the important part
return limitedCriteriaQuery.getResultList();
And that is basically how you should use both APIs according to the documentation and the provided articles.

Categories

Resources