Highlighting in Hibernate Search 6 and Elasticsearch backend

Highlighting in Hibernate Search 6 and Elasticsearch backend - java

We're in the process of converting our java application from Hibernate Search 5 to 6 with an Elasticsearch backend.
For some good background info, see How to do highlighting within HibernateSearch over Elasticsearch for a question we had when upgrading our highlighting code from a Lucene to Elasticsearch backend and how it was resolved.
Hibernate Search 6 seems to support using 2 backends at the same time, Lucene and Elasticsearch, so we'd like to use Elasticsearch for all our queries and Lucene for the highlighting, if that's possible.
Here is basically what we're trying to do:
public boolean matchPhoneNumbers() {
String phoneNumber1 = "603-436-1234";
String phoneNumber2 = "603-436-1234";
LuceneBackend luceneBackend =
Search.mapping(entityManager.getEntityManagerFactory())
.backend().unwrap(LuceneBackend.class);
Analyzer analyzer = luceneBackend.analyzer("phoneNumberKeywordAnalyzer").get();
//... builds a Lucene Query using the analyzer and phoneNumber1 term
Query phoneNumberQuery = buildQuery(analyzer, phoneNumber1, ...);
return isMatch("phoneNumberField", phoneNumber2, phoneNumberQuery, analyzer);
}
private boolean isMatch(String field, String target, Query sourceQ, Analyzer analyzer) {
Highlighter highlighter = new Highlighter(new QueryScorer(sourceQ, field));
highlighter.setTextFragmenter(new NullFragmenter());
try {
String result = highlighter.getBestFragment(analyzer, field, target);
return StringUtils.hasText(result);
} catch (IOException e) {
...
}
}
What I've attempted so far is to configure two separate backends in the configuration properties, per the documentation, like this:
properties.setProperty("hibernate.search.backends.elasticsearch.analysis.configurer", "com.bt.demo.search.AnalysisConfigurer");
properties.setProperty("hibernate.search.backends.lucene.analysis.configurer", "com.bt.demo.search.CustomLuceneAnalysisConfigurer");
properties.setProperty("hibernate.search.backends.elasticsearch.type", "elasticsearch");
properties.setProperty("hibernate.search.backends.lucene.type", "lucene");
properties.setProperty("hibernate.search.backends.elasticsearch.uris", "http://127.0.0.1:9200");
The AnalysisConfigurer class implements ElasticsearchAnalysisConfigurer and
CustomLuceneAnalysisConfigurer implements from LuceneAnalysisConfigurer.
Analyzers are defined twice, once in the Elasticsearch configurer and again in the Lucene configurer.
I don't know why both hibernate.search.backends.elasticsearch.type and hibernate.search.backends.lucene.type are necessary but if I don't include the lucene.type, I get Ambiguous backend type: configuration property 'hibernate.search.backends.lucene.type' is not set.
But if I do have both backend properties types set, I get
HSEARCH000575: No default backend. Check that at least one entity is configured to target the default backend, when attempting to retrieve the Lucene backend, like:
Search.mapping(entityManager.getEntityManagerFactory())
.backend().unwrap(LuceneBackend.class);
And the same error when trying to retrieve the Elasticsearch backend.
I've also added #Indexed(..., backend = "elasticsearch") to my entities since I wish to have them saved into Elasticsearch and don't need them in Lucene. I also tried adding a fake entity with #Indexed(..., backend = "lucene") but it made no difference.
What have I got configured wrong?

I don't know why both hibernate.search.backends.elasticsearch.type and hibernate.search.backends.lucene.type are necessary but if I don't include the lucene.type, I get Ambiguous backend type: configuration property 'hibernate.search.backends.lucene.type' is not set.
That's because the backend name is just that: a name. Hibernate Search doesn't infer particular information from it, even if you name your backend "lucene" or "elasticsearch". You could have multiple Elasticsearch backends for all it knows :)
But if I do have both backend properties types set, I get HSEARCH000575: No default backend. Check that at least one entity is configured to target the default backend, when attempting to retrieve the Lucene backend, like:
Search.mapping(entityManager.getEntityManagerFactory())
.backend().unwrap(LuceneBackend.class);
``
You called .backend(), which retrieves the default backend, i.e. the backend that doesn't have a name and is configured through hibernate.search.backend.* instead of hibernate.search.backends.<somename>.* (see https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#configuration-structure ).
But you are apparently mapping all your entities to a named backends, one named elasticsearch and one named lucene. So the default backend just doesn't exist.
You should call this:
Search.mapping(entityManager.getEntityManagerFactory())
.backend("lucene").unwrap(LuceneBackend.class);
I've also added #Indexed(..., backend = "elasticsearch") to my entities since I wish to have them saved into Elasticsearch
Since you obviously only want to use one backend for indexing, I would recommend reverting that change (keeping #Indexed without setting #Indexed.backend) and simply making using the default backend.
In short, remove the #Indexed.backend and replace this:
properties.setProperty("hibernate.search.backends.elasticsearch.analysis.configurer", "com.bt.demo.search.AnalysisConfigurer");
properties.setProperty("hibernate.search.backends.lucene.analysis.configurer", "com.bt.demo.search.CustomLuceneAnalysisConfigurer");
properties.setProperty("hibernate.search.backends.elasticsearch.type", "elasticsearch");
properties.setProperty("hibernate.search.backends.lucene.type", "lucene");
properties.setProperty("hibernate.search.backends.elasticsearch.uris", "http://127.0.0.1:9200");
With this
properties.setProperty("hibernate.search.backend.analysis.configurer", "com.bt.demo.search.AnalysisConfigurer");
properties.setProperty("hibernate.search.backends.lucene.analysis.configurer", "com.bt.demo.search.CustomLuceneAnalysisConfigurer");
properties.setProperty("hibernate.search.backend.type", "elasticsearch");
properties.setProperty("hibernate.search.backends.lucene.type", "lucene");
properties.setProperty("hibernate.search.backend.uris", "http://127.0.0.1:9200");
You don't technically have to do that, but I think it will be simpler in the long term. It keeps the Lucene backend as a separate hack that doesn't affect your whole application.
I also tried adding a fake entity with #Indexed(..., backend = "lucene")
I confirm you will need that fake entity mapped to the "lucene" backend, otherwise Hibernate Search will not create the "lucene" backend.

Related

Can a Hibernate Search FieldBridge configure facets for dynamic fields?

Using Hibernate Search 5.11.3 with programmatic API (no annotations), is there a way to facet on dynamic fields added in a class or field bridge? I don't see any 'facet' config available in FieldMetadataBuilder when using MetadataProvidingFieldBridge.
I have tried various combinations of luceneOptions.addSortedDocValuesFieldToDocument() and luceneOptions.addFieldToDocument() in the set() method. This successfully updates the index, but I cannot perform facet queries.
I am trying to do a basic attribute facet/filter where I have a generic table of attributes with id/name and attribute values associated with products. For various reasons I am using the programmatic API and especially for attributes I can't make use of the #Facet annotation. So for a product, I added this class bridge to Product.class:
public class ProductClassTagValuesBridge implements FieldBridge
{
#Override
public void set(String name, Object value, Document document, LuceneOptions luceneOptions)
{
Product product = (Product) value;
for (TagValue v : product.getTagValues())
{
Tag tag = v.getTag();
String tagName = "tag-" + tag.getId();
String tagValue = v.getId().toString();
// not sure if this line is required? Have tried with and without
luceneOptions.addFieldToDocument(tagName, tagValue, document);
luceneOptions.addSortedDocValuesFieldToDocument(tagName, tagValue, document);
}
}
}
Then I build my (test) faceting request to search tag-56 (which I confirmed is in the index using Luke):
FacetParameterContext context = queryBuilder.facet()
.name("tag-56")
.onField("tag-56")
.discrete();
FacetingRequest facetingRequest = context.createFacetingRequest();
Which when used in the search/FacetManager gives me the error:
org.hibernate.search.exception.SearchException: HSEARCH000268: Facet request 'TAG_56' tries to facet on field 'tag-56' which either does not exist or is not configured for faceting (via #Facet). Check your configuration.
I have also tried the custom config solution from the solution in this post: Hibernate Search: configure Facet for custom FieldBridge
For the custom field I added a field bridge to tagValues on my product. The same error occurs.
mapping.entity(Product.class).indexed()
.property("tagValues", ElementType.FIELD).field()
.analyze(Analyze.NO).store(Store.YES)
.bridge(ProductTagValuesFieldBridge.class)

Short answer: Hibernate Search does not allow that... yet.
Long answer:
Hibernate Search 5 allows dynamic fields, but does not allow faceting on fields declared in custom bridges.
That is to say, you can add arbitrary values to your index that don't fit a pre-defined schema, but you cannot use faceting on those fields.
Hibernate search 6 allows faceting (now called "aggregations") on fields declared in custom bridges (just declare them as .aggregable(Aggregable.YES)), but does not allow dynamic fields yet.
EDIT: Starting with 6.0.0.Beta7, dynamic fields are supported thanks to field templates. So the rest of my message is not useful anymore.
See this section of the documentation for more information about field templates. It's totally possible to declare an aggregable, dynamic field in your bridge.
Original message about ways to work without dynamic fields (obsolete):
That is to say, if you know the list of tags upon startup, are able to list them all, and are certain they won't change while your application is up, you could declare the fields upfront and use faceting on them. But if you don't know the list of tags upon startup, none of this is possible (yet).
Until dynamic fields are added to Hibernate Search 6, the only solution is to use Hibernate Search 5 and to re-implement faceting yourself. As you can expect, this will be complex and you will have to get your hands dirty with Lucene. You will have to:
Add fields of type SortedSetDocValuesFacetField to your document in your custom bridge.
Ensure Hibernate Search calls FacetsConfig.build on your documents after they are populated. One way to do that (through a hack) would be to declare a dummy #Facet field on your entity, even if you don't use it.
Completely ignore Hibernate Search's query feature and perform faceting yourself from an IndexReader. You can get an IndexReader from Hibernate Search as explained here. There's an example of how to perform faceting in org.hibernate.search.query.engine.impl.QueryHits#updateStringFacets.

Spring & Couchbase - how to create indexes via code

My Spring Boot app is using Couchbase 5.1 community.
My app needs both a primary & several secondary indexes.
Currently, in order to create the needed indexes, I access the UI and the query page and manually create the indexes that the app needs as described here.
I was looking for a way to do it automatically via code, so when the app is starting, it will check if the indexes are missing and will create them if needed.
Is there a way to do it via Spring Data or via the Couchbase client?

You can create them by using the DSL from the index class. There's an example of using it in the documentation under "Indexing the Data: N1QL & GSI"
From that example:
You can also create secondary indexes on specific fields of the JSON,
for better performance:
Index.createIndex("index_name").on(bucket.name(), "field_to_index")
In this case, give a name to your index, specify the target bucket AND
the field(s) in the JSON to index.
If the index already exists, there will be an IndexAlreadyExistsException (see documentation), so you'll need to check for that.

So this is how I solve it:
import com.couchbase.client.java.Bucket;
public class MyCouchBaseRepository{
private Bucket bucket;
public MyCouchBaseRepository(<My Repository that extends CouchbasePagingAndSortingRepository> myRepository){
bucket = myRepository.getCouchbaseOperations().getCouchbaseBucket();
createIndices();
}
private void createIndices(){
bucket.bucketManager().createN1qlPrimaryIndex(true, false)
bucket.query(N1qlQuery.simple("CREATE INDEX xyz ON `myBucket`(userId) WHERE _class = 'com.example.User'"))
...
}
}

Faceted classification - Spring Data Mongo

I'm trying to build a faceted classification using Spring Data Mongo and I'm confused on how to use Aggregation.facet method.
As I'm trying to figure out how the thing work I'm using twice the same FacetOperation and I'm getting a java.lang.IllegalArgumentException: Invalid reference 'producer.fundings'!. This FacetOperation alone in the Aggregation works fine!
FacetOperation fo1 = facet(
unwind("producer.fundings"),
project().and("producer.fundings.type").as("type").and("producer.fundings.acronym").as("name"),
group("name", "type").count().as("count"),
project("count").and("_id.name").as("name").and("_id.type").as("type").andExclude("_id")
).as("fundingAcronymFacet");
FacetOperation fo2 = facet(
unwind("producer.fundings"),
project().and("producer.fundings.type").as("type").and("producer.fundings.acronym").as("name"),
group("name", "type").count().as("count"),
project("count").and("_id.name").as("name").and("_id.type").as("type").andExclude("_id")
).as("fundingNameFacet");
Aggregation agg = Aggregation.newAggregation(fo1,fo2);
AggregationResults<FacetClassification> groupResults = mongoTemplate.aggregate(agg, "observations", FacetClassification.class);
List<FacetClassification> facet = groupResults.getMappedResults();
So either I'm not using the facet method well and only one call is needed to create different facet. This would looks like how it is implemented in the MongoDB API: $facet (aggregation)
Or I need to chain facet call to create the different facets of my classification and need to know what happens after the first call and why the exact same reference is not found.
The documentation only provides examples that create one facet and could'nt find any example elsewhere: Spring Data Mongo Faceted Classification.
related to : Using multiple facets in MongoDB Spring Data
Any help would be appreciated!

You can chain multiple facet operation using the and() and .as()methods. The example should look like this to create two different facet in the same aggregation operation:
FacetOperation fo1 = facet(
unwind("producer.fundings"),
project().and("producer.fundings.type").as("type").and("producer.fundings.acronym").as("name"),
group("name", "type").count().as("count"),
project("count").and("_id.name").as("name").and("_id.type").as("type").andExclude("_id")
).as("fundingAcronymFacet")
.and(unwind("producer.fundings"),
project().and("producer.fundings.type").as("type").and("producer.fundings.acronym").as("name"),
group("name", "type").count().as("count"),
project("count").and("_id.name").as("name").and("_id.type").as("type").andExclude("_id")
).as("fundingNamesFacet");

Versioned Object/JSON Mapping to/from Mongo?

So I have an app that needs to store certain configuration info, and so I am planning on storing the configs as simple JSON documents in Mongo:
appConfig: {
fizz: true,
buzz: 34
}
This might map to a Java POJO/entity like:
public class AppConfig {
private boolean fizz;
private int buzz;
}
etc. Ordinarily, with relational databases, I use Hibernate/JPA for O/R mapping from table data to/from Java entities. I believe the closest JSON/Mongo companion to table/Hibernate is a Morphia/GSON combo: use Morphia to drive connectivity from my Java app to Mongo, and then use GSON to O/J map the JSON to/from Java POJOs/entities.
The problem here is that, over time, my appConfig document structure will change. It may be something simple like:
appConfig: {
fizz: true,
buzz: 34
foo: "Hello!"
}
Which would then require the POJO/entity to become:
public class AppConfig {
private boolean fizz;
private int buzz;
private String foo;
}
But the problem is that I may have tens of thousands of JSON documents already stored in Mongo that don't have foo properties in them. In this specific case, the obvious solution is to set a default on the property like:
public class AppConfig {
private boolean fizz;
private int buzz;
private String foo = "Hello!"
}
However in reality, eventually the AppConfig document/schema/structure might change so much that it in no way, shape or form resembles its original design. But the kicker is: I need to be backwards-compatible and, preferably, be capable of updating/transforming documents to match the new schema/structure where appropriate.
My question: how is this "versioned document" problem typically solved?

I usually solve this problem by adding a version field to each document in the collection.
You might have several documents in the AppConfig collection:
{
_id: 1,
fizz: true,
buzz: 34
}
{
_id: 2,
version: 1,
fizz: false,
buzz: 36,
foo: "Hello!"
}
{
_id: 3,
version: 1,
fizz: true,
buzz: 42,
foo: "Goodbye"
}
In the above example, there are two documents at version one, and one older document at version zero (in this pattern, I generally interpret a missing or null version field to be version zero, because I always only add this once I'm versioning by documents in production).
The two principles of this pattern:
Documents are always saved at the newest version when they are actually modified.
When a document is read, if it's not at the newest version, it gets transparently upgraded to the newest version.
You do this by checking the version field, and performing a migration when the version isn't new enough:
DBObject update(DBObject document) {
if (document.getInt("version", 0) < 1) {
document.put("foo", "Hello!"); //add default value for foo
document.put("version", 1);
}
return document;
}
This migration can fairly easily add fields with default values, rename fields, and remove fields. Since it's located in application code, you can do more complicated calculations as necessary.
Once the document has been migrated, you can run it through whatever ODM solution you like to convert it into Java objects. This solution no longer has to worry about versioning, since the documents it deals with are all current!
With Morphia this could be done using the #PreLoad annotation.
Two caveats:
Sometimes you may want to save the upgraded document back to the database immediately. The most common reasons for this are when the migration is expensive, the migration is non-deterministic or integrates with another database, or you're in a hurry to upgrade an old version.
Adding or renaming fields that are used as criteria in queries is a bit trickier. In practice, you may need to perform more than one query, and unify the results.
In my opinion, this pattern highlights one of the great advantages of MongoDB: since the documents are versioned in the application, you can seamlessly migrate data representations in the application without any offline "migration phase" like you would need with a SQL database.

The JSON deserialzer solves this in a very simple way for you, (using JAVA)
Just allow your POJO/entity to grown with new fields. When you deserialize your JSON from mongo to you entity - all missing fields will be null.
mongoDocument v1 : Entity of v3
{
fizz="abc", --> fizz = "abc";
buzz=123 --> buzz = 123;
--> newObj = null;
--> obj_v3 = null;
}
You can even use this the other way around if you like to have you legacy servers work with new database objects:
mongoDocument v3 : Entity of v1
{
fizz:"abc", --> fizz = "abc";
buzz:123, --> buzz = 123;
newObj:"zzz", -->
obj_v3:"b -->
}
Depending if they have the fields or not - it will be populated by the deserializer.
Keep in mind that booleans are not best suited for this since they can default to false (depending on which deserializer you use).
So unless you are actively going to work with versioning of your objects why bother with the overhead when you can build a legacy safe server implementation what with just a few null checks can handle any of the older objects.
I hope this proposal might help you with your set-up

I guess below thread will help you although it is not about versioning documents in the DB, and it has been done using spring-data-mongodb,
How to add a final field to an existing spring-data-mongodb document collection?
So you can assign values to the POJO based on existence of the property in the document using the Converter implementation.

You have a couple of options with morphia, at least. You could use a versioned class name then rely on morphia's use of the className property to fetch correct class version. Then your application would just have to migrate that old object to the new class definition. Another option is to use #PreLoad and massage the DBObject coming out of mongo to the new shape before morphia maps the DBObject to your class. Using a version field on the class you can determine which migration to run when the data is loaded. From that point, it would just look like the new form to morphia and would map seamlessly. Once you save that configuration object back to mongo, it'd be in the new form and the next load wouldn't need to run the migration.

Changing table schema at run-time in Play! Framework

What is the best way to change database table schema at run time in Play! Framework? I get an unspecified amount of columns from a client application and can not have domain objects.

Another option may be to use a schema-less datastore ala mongodb. Seems like there is a mongo module for play, http://www.playframework.org/modules/mongo.

Why not using Apache DDLUtils ?
It's not plugged by default in Play! but I use it in other projects and it's quite useful.
http://db.apache.org/ddlutils/

I think this is exactly what you were looking for
have a look at play snippets
http://www.playframework.org/community/snippets/13
public class Application extends Controller {
#Before
public static void setDBSchema() {
String schema = "someSchema"; // This can come from a Cache, for example
Play.configuration.setProperty("hibernate.default_schema", schema);
JPA.entityManagerFactory = null;
JPAPlugin plugin = new JPAPlugin();
plugin.onApplicationStart();
}
...
you just change the configured hibernate schema, and then force the JPAPlugin to restart

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.