Lucene/Hibernate Search - Query associated collections? - java

I'm writing a Seam-based application, making use of JPA/Hibernate and Hibernate Search (Lucene). I have an object called Item that has a many-to-many relation to an object
Keyword. It looks like this (some annotations omitted):
#Indexed
public class Item {
...
#IndexedEmbedded
private List<Keyword> keywords;
...
}
#Indexed
public class Keyword {
...
#Field
private String value;
...
}
I'd like to be able to run a query for all Item object that contain a particular keyword value. I've setup numerous test objects in my database, and it appears the indexes are being created properly. However, when I create and run a query for "keywords.value" = <MY KEYWORD VALUE> I always get 0 results returned.
Does Hibernate Search/Lucene have the ability to run queries of this type? Is there something else I should be doing? Are there additional annotations that I could be missing?

Hibernate Search is perfectly suited for that kind of queries; but it can be done in a simpler way.
On your problem: text indexed by Hibernate Search (Lucene) is going to be Analysed, and the default analyser applies:
Lower casing of the input
Splitting in separate terms on whitespace
So if you're defining the queries as TermQuery (I'm assuming that's what you did, as it's the simplest form), then you have to match against the lower case form, of a token (with no spacing).
Bearing this in mind, you could dump all your keywords in a single Blob String on the Item entity, without needing to map it as separate keywords, chaining them in a single string separated by whitespaces.

Related

Hibernate search: Indexed data with Ngram filter and while searching it gives incorrect result due to tokenizing while querying

I have an analyzer with this configuration,
searchMapping//
.analyzerDef(BaseEntity.CUSTOM_SEARCH_INDEX_ANALYZER, WhitespaceTokenizerFactory.class)//
.filter(LowerCaseFilterFactory.class)//
.filter(ASCIIFoldingFilterFactory.class)//
.filter(NGramFilterFactory.class).param("minGramSize", "1").param("maxGramSize", "200");
This is how my entity field is configured
#Field(analyzer = #Analyzer(definition = CUSTOM_SEARCH_INDEX_ANALYZER))
private String bookName;
This is how I create a search query
queryBuilder.keyword().onField(prefixedPath).matching(matchingString).createQuery()
I have an entity with value bookName="Gulliver" and another entity with bookName="xGulliver";
If I tried to search with data bookName = xG then am getting both entities where I would expect entity only with bookName="xGulliver";
Also looked on the query that is produced by hibernate-search.
Executing Lucene query '+(+(+(+(
bookName:x
bookName:xg
bookName:g))))
Above Lucene query is prepared using BooleanJunction::must conditions by Lucene I guess which means it should match all the conditions.
Still why its giving me both entity data. I dont understand here.
I can also override the analyzer while querying by having KeywordTokenizer instead of NGramFilterFactory but this is like I have to override for each and every field before creating QueryBuilder which doesnt looks good because then I have to override all index fields which I have about 100 fields and some are dynamic fields and I create individual query for each field.
Is there any other way to override the analyzer in 5.11 version or is it handled in some other way in hibernate-search 6.x version in easier way?
Hibernate versions that I use are,
hibernate-search-elasticsearch, hibernate-search-orm = 5.11.4.Final
Above Lucene query is prepared using BooleanJunction::must conditions by Lucene I guess which means it should match all the conditions. Still why its giving me both entity data. I dont understand here.
When you create a keyword query using Hibernate Search, the string passed to that query is analyzed, and if there are multiple tokens, Hibernate Search creates a boolean query with one "should" clause for each token. You can see it here " bookName:x bookName:xg bookName:g": there is no "+" sign before "bookName", which means those are not "must" clauses, they are "should" clauses.
I can also override the analyzer while querying by having KeywordTokenizer instead of NGramFilterFactory but this is like I have to override for each and every field before creating QueryBuilder which doesnt looks good because then I have to override all index fields which I have about 100 fields and some are dynamic fields and I create individual query for each field.
True, that's annoying.
Is there any other way to override the analyzer in 5.11 version
In 5.11, I don't think there is any other way to override analyzers.
If necessary and if you're using the Lucene backend, I believe you should be able to bypass the Hibernate Search DSL just for this specific query:
Get the analyzer you want: something like Analyzer analyzer = fullTextSession.getSearchFactory().getAnalyzer("myAnalyzerWithoutNGramTokenFilter").
Analyze the search terms: call analyzer.tokenStream(...) and use the TokenStream as appropriate. You'll get a list of tokens.
Create the Lucene Query: essentially it will be a boolean query with one TermQuery for each token.
Pass the resulting Query to Hibernate Search as usual.
or is it handled in some other way in hibernate-search 6.x version in easier way?
It's dead simple in Hibernate Search 6.0.0.Beta4. There are two solutions:
Implicitly: in your mapping, you can specify not only an analyzer (using #FullTextField(analyzer = "myAnalyzer")), but also a "search" analyzer using #FullTextField(analyzer = "myAnalyzer", searchAnalyzer = "mySearchAnalyzer"). The "default" analyzer will be used when indexing, while the "search" analyzer will be used when searching (querying).
Explicitly: at query time, you can override the analyzer on a given predicate by calling .analyzer("mySearchAnalyzer") while building the predicate. There is one example in this section of the documentation.
Note however that dynamic fields are not supported yet in Hibernate Search 6: HSEARCH-3273.

How to selectively load hibernate entity while still initializing (not loading) lazy collection

Consider following simple entity model
class Order{
int id;
String description;
//one to one eager load with join column specified
Detail details;
//one to many lazy load with mapped by specified
Collection<Item> items;
}
class Detail{
}
class Item{
String name;
//reference to order
}
Now, let's say the requirement is to load all the orders with item details by some criteria (e.g. description matching something). Simple, i write a hql like "from Order where description...". This loads 1000 entities for example and item collection is lazy loaded. I force load them within the session by calling size.
This of course led to a N+1 problem so i decided to use batch fetching for items. Just added the batch size annotation on item collection and much fewer queries as expected.
However, i am not interested in 'detail' at all but since it is a one to one eager load, there is one query per Order to load this always. I simply want to do away with these queries.
To solve this, i try to do a select without details but i am not sure how to include items (collection) in the query so that it is loaded exactly in the same way as if i was selecting all (that is, lazy loaded which then can utilize batch size on later calls). Some suggestions are to use join in the where clause but that initializes my collection with empty array list (and not with PersistentBag as is the case with Lazy loading).
Looking for solutions.
One possible solution is the following:
Create a POJO which will contain a query result. Example:
public class OrderResult {
private String description;
private String itemName;
// ... more fields, if any
public OrderResult(String desc, String itemName) {
this.description = desc;
this.itemName = itemName;
}
// getters & setters
}
Create a JPQL query using a constructor expression as:
List<OrderResult> resultList = entityManager.createQuery("SELECT NEW OrderResult(o.description, i.name) FROM Order o JOIN o.items i where <condition>", OrderResult.class).getResultList();
So you'll get a list of instances of OrderResult containing only the information you're interested in.
NOTE 1: You're talking of HQL, but HQL is the Hibernate specific legacy query language. As Hibernate is an implementation of JPA, and you tagged your question with JPA, this solution should work in your environment too.
NOTE 2: In the solution, I am using the so called constructor expression of JPQL which is defined using NEW in the select clause. The argument to the NEW operator must be a fully qualified class name,e.g., if you put the OrderResult class in a package com.mycompany.myproject.order, then the expression should look like:
SELECT NEW com.mycompany.myproject.order.OrderResult(...) FROM ...
NOTE 3: This is just to give you a hint how to implement the solution and should be considered as pseodo code.

Transpose result of hibernate query into list of POJOs

I got a generics class that contains runQuery method that has following code setup:
public Object runQuery(String query) {
Query retVal = getSession().createSQLQuery(query);
return retVal.list();
}
I am trying to figure out how to transpose the returned values into a list of Check objects (List):
public class Check {
private int id;
private String name;
private String confirmationId;
//getters and setters
}
Most queries that i run are actually stored procs in mysql. I know of native queries and resultTransforms (which if implemented would mean i have to change my generics and not after that).
Any ideas how i can accomplish this with current setup?
You can find tutorials on ORMs ( What is Object/relational mapping(ORM) in relation to Hibernate and JDBC? ).
Basically, you add annotation to your Check class to tell hibernate which java fields matches which DB field, make a JPQL (it looks like SQL) request and it gets your object and maps them from DB to POJO.
It's a broad subject, this is a good start: https://www.tutorialspoint.com/hibernate/hibernate_quick_guide.htm
It will require some configuration, but that's worth it. Here's one tutorial on annotation based configuration: https://www.tutorialspoint.com/hibernate/hibernate_annotations.htm but with less explanations on how ORM works (there's also EclipseLink as an ORM)
Else, you could make you're own mapper, which takes values from a ResultSet, and set them in your class. For a lot of reason, I would recommand using an ORM than this method (Except maybe if you have only one class that is stored in the DB, which I doubt).

Hibernate-Search #IndexedEmbedded with HashMap, how to include keyset in index

I'm trying to integrate Hibernate Search into an application. The application entities can have multiple properties that are stored multilingual. This is accomplished by splitting the non multilingual and multilingual properties into seperate entities. An example snippet of this split looks like this (ommitted hibernate annotations as the database part is working fine):
#Indexed
public class Assignment {
#DocumentId
private UUID id;
#IndexedEmbedded
private Map<String, AssignmentI18n> i18n;
// Other properties
}
public class AssignmentI18n {
#DocumentId
#FieldBridge(impl = AssignmentI18nBridge.class)
private AssignmentI18nId id;
#Field
private String title;
#Field
private String description;
#Field
private String requirements;
public static class AssignmentI18nId {
private UUID assignmentId;
private String iso;
}
}
Now I would like to make this data searchable using Hibernate Search by treating it as a single entity in the index. The way the annotations are set up this happens however all entries of the multilingual fields are stored in the same field in the index. Basicly my index structure looks like this:
id
i18n.title
i18n.description
i18n.requirements
As all values of the multilingual data are indexed in the same field I can no longer distinguish what language they belong to. Is there a way to make the index look more like this?:
id
i18n.nl.title
i18n.en.title
i18n.nl.description
i18n.en.description
i18n.nl.requirements
i18n.en.requirements
Basicly I would like to add the HashMap key value to the index field name. I've looked into the possibiliy of treating the map as a field with a custom FieldBridge but that doesn't seem like the correct approach.
If you want to make the indexed fields look like the one you describe, use a custom field bridge. That's how you could get this structure, but since your map value is quite complex it would take quite a lot of custom code to create all fields.
You could create a feature request for Hibernate Search here. I could imagine that this type of feature would be of general use. Basically a way to either via an #IndexedEmbedded option or via an additional annotation define how the map key becomes part of the Lucene field name. That said, have you thought about how exactly you would then search in this index? Does the user somehow specify a locale and depending on this local you would target the appropriate fields? Also, how do you deal in your approach to configure different stemmers depending on the language type?

solrj: how to store and retrieve List<POJO> via multivalued field in index

My use case is an index which holds titles of online media. The provider of the data associates a list of categories with each title. I am using SolrJ to populate the index via an annotated POJO class
e.g.
#Field("title")
private String title;
#Field("categories")
private List<Category> categoryList;
The associated POJO is
public class Category {
private Long id;
private String name;
...
}
My question has two parts:
a) is this possible via SolrJ - the docs only contain an example of #Field using a List of String, so I assume the serialization/marshalling only supports simple types ?
b) how would I set up the schema to hold this. I have a naive assumption I just need to set
multiValued=true on the required field & it will all work by magic.
I'm just starting to implement this so any response would be highly appreciated.
The answer is as you thought:
a) You have only simple types available. So you will have a List of the same type e.g. String. The point is you cant represent complex types inside the lucene document so you wont deserialize them as well.
b) The problem is what you are trying is to represent relational thinking in a "document store". That will probably work only to a certain point. If you want to represent categories inside a lucene document just use the string it is not necessary to store a id as well.
The only point to store an id as well is: if you want to do aside the search a lookup on a RDBMS. If you want to do this you need to make sure that the id and the category name is softlinked. This is not working for every 1:n relation. (Every 1:n relation where the n related table consists only of required fields is possible. If you have an optional field you need to put something like a filling emptyconstant in the field if possible).
However if these 1:n relations are not sparse its possible actually if you maintain the order in which you add fields to the document. So the case with the category relation can be probably represented if you dont sort the lists.
You may implement a method which returns this Category if you instantiate it with the values at position 0...n. So the solution would be if you want to have the first category it will be at position 0 of every list related to this category.

Categories

Resources