Creating and using LuceneAnalysisDefinitionProvider with Hibernate Search

Creating and using LuceneAnalysisDefinitionProvider with Hibernate Search - java

When you search Stackoverflow or the Internet for LuceneAnalysisDefinitionProvider, you'll find hundreds of pages, each of them having the same code copied from another page without any decent explanation or further examples of usage.
So I tried to do it by myself and failed. Here is my code:
public class CustomLuceneAnalysisDefinitionProvider
implements LuceneAnalysisDefinitionProvider {
#Override
public void register(final LuceneAnalysisDefinitionRegistryBuilder builder) {
builder
.analyzer("customAnalyzer")
.tokenizer(StandardTokenizerFactory.class)
.charFilter(MappingCharFilterFactory.class)
.param("mapping",
"org/hibernate/search/test/analyzer/mapping-chars.properties")
.tokenFilter(ASCIIFoldingFilterFactory.class)
.tokenFilter(LowerCaseFilterFactory.class)
.tokenFilter(StopFilterFactory.class)
// WRONG! It's not "mapping"!
// .param("mapping",
// "org/hibernate/search/test/analyzer/stoplist.properties")
.param("words",
"classpath:/stoplist.properties")
.param("ignoreCase", "true");
}
}
Now we have CustomLuceneAnalysisDefinitionProvider and what's next?
Where to put and how to address mapping-chars.properties when adding it as a parameter to MappingCharFilterFactory?
What is the contents of mapping-chars.properties and how to create mine of modify existing?
Where to put stoplist.properties and how to address it when adding as mapping parameter to StopFilterFactory?
How to add previously defined customAnalyzer to single #Field mentioned below?
#Field(
index = Index.YES,
analyze = Analyze.YES,
store = Store.NO,
bridge = #FieldBridge(impl = LocalizedFieldBridge.class)
)
private LocalizedField description;
On some pages I found option to put this definition into application.properties:
hibernate.search.lucene.analysis_definition_provider = com.thevegcat.app.search.CustomAnalysisDefinitionProvider
But I don't want to replace original analyzer, I just want to use custom analyzer for few specific properties.
EDIT#1
Looking into org.apache.lucene.analysis.core.StopFilterFactory line 86, one can notice it takes words as a key, not mapping.
EDIT#2
If you put your stop words file in src/main/resources, then you have to address it:
.param("words", "classpath:/stoplist.properties")

you'll find hundreds of pages, each of them having the same code copied from another page without any decent explanation or further examples of usage.
Hibernate Search 5 had its problems, one of which was lack of documentation in some areas. Now that it's in maintenance mode, those problems are unlikely to get addressed.
There is some documentation for that feature in the Hibernate Search 5 documentation: https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#section-programmatic-analyzer-definition
You'll get better documentation of that feature by migrating to Hibernate Search 6+.
That being said, most of your questions related to Lucene features, so you probably won't find answers in Hibernate Search's documentation. You could find them in Lucene's documentation. How to find such documentation is explained in the Hibernate Search 6 documentation:
To know more about the behavior of these character filters, tokenizers and token filters, either browse the Lucene Javadoc or read the corresponding section on the Solr Wiki (you don’t need Solr to use these analyzers, it’s just that there is no documentation page for Lucene proper).
Where to put and how to address mapping-chars.properties when adding it as a parameter to MappingCharFilterFactory?
In your classpath.
What is the contents of mapping-chars.properties and how to create mine of modify existing?
That's the kind of things that Lucene doesn't document, at least not clearly. Solr's documentation is better: https://solr.apache.org/guide/6_6/charfilterfactories.html#CharFilterFactories-solr.MappingCharFilterFactory
Where to put stoplist.properties and how to address it when adding as mapping parameter to StopFilterFactory?
Put it in the classpath, and pass the path to that file from the root of your classpath.
How to add previously defined customAnalyzer to single #Field mentioned below?
Well that is documented, at least: https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#_referencing_named_analyzers
#Field(analyzer = #Analyzer(definition = "customAnalyzer"))
On some pages I found option to put this definition into application.properties:
hibernate.search.lucene.analysis_definition_provider = com.thevegcat.app.search.CustomAnalysisDefinitionProvider
But I don't want to replace original analyzer, I just want to use custom analyzer for few specific properties.
You won't replace an "analyzer", you will register an analysis definition provider. Which will add analyzer definitions to Hibernate Search, which can then be referenced from #Field. Setting an analysis definition provider does not, in itself, change your mapping in any way.

Related

Highlighting in Hibernate Search 6 and Elasticsearch backend

We're in the process of converting our java application from Hibernate Search 5 to 6 with an Elasticsearch backend.
For some good background info, see How to do highlighting within HibernateSearch over Elasticsearch for a question we had when upgrading our highlighting code from a Lucene to Elasticsearch backend and how it was resolved.
Hibernate Search 6 seems to support using 2 backends at the same time, Lucene and Elasticsearch, so we'd like to use Elasticsearch for all our queries and Lucene for the highlighting, if that's possible.
Here is basically what we're trying to do:
public boolean matchPhoneNumbers() {
String phoneNumber1 = "603-436-1234";
String phoneNumber2 = "603-436-1234";
LuceneBackend luceneBackend =
Search.mapping(entityManager.getEntityManagerFactory())
.backend().unwrap(LuceneBackend.class);
Analyzer analyzer = luceneBackend.analyzer("phoneNumberKeywordAnalyzer").get();
//... builds a Lucene Query using the analyzer and phoneNumber1 term
Query phoneNumberQuery = buildQuery(analyzer, phoneNumber1, ...);
return isMatch("phoneNumberField", phoneNumber2, phoneNumberQuery, analyzer);
}
private boolean isMatch(String field, String target, Query sourceQ, Analyzer analyzer) {
Highlighter highlighter = new Highlighter(new QueryScorer(sourceQ, field));
highlighter.setTextFragmenter(new NullFragmenter());
try {
String result = highlighter.getBestFragment(analyzer, field, target);
return StringUtils.hasText(result);
} catch (IOException e) {
...
}
}
What I've attempted so far is to configure two separate backends in the configuration properties, per the documentation, like this:
properties.setProperty("hibernate.search.backends.elasticsearch.analysis.configurer", "com.bt.demo.search.AnalysisConfigurer");
properties.setProperty("hibernate.search.backends.lucene.analysis.configurer", "com.bt.demo.search.CustomLuceneAnalysisConfigurer");
properties.setProperty("hibernate.search.backends.elasticsearch.type", "elasticsearch");
properties.setProperty("hibernate.search.backends.lucene.type", "lucene");
properties.setProperty("hibernate.search.backends.elasticsearch.uris", "http://127.0.0.1:9200");
The AnalysisConfigurer class implements ElasticsearchAnalysisConfigurer and
CustomLuceneAnalysisConfigurer implements from LuceneAnalysisConfigurer.
Analyzers are defined twice, once in the Elasticsearch configurer and again in the Lucene configurer.
I don't know why both hibernate.search.backends.elasticsearch.type and hibernate.search.backends.lucene.type are necessary but if I don't include the lucene.type, I get Ambiguous backend type: configuration property 'hibernate.search.backends.lucene.type' is not set.
But if I do have both backend properties types set, I get
HSEARCH000575: No default backend. Check that at least one entity is configured to target the default backend, when attempting to retrieve the Lucene backend, like:
Search.mapping(entityManager.getEntityManagerFactory())
.backend().unwrap(LuceneBackend.class);
And the same error when trying to retrieve the Elasticsearch backend.
I've also added #Indexed(..., backend = "elasticsearch") to my entities since I wish to have them saved into Elasticsearch and don't need them in Lucene. I also tried adding a fake entity with #Indexed(..., backend = "lucene") but it made no difference.
What have I got configured wrong?

I don't know why both hibernate.search.backends.elasticsearch.type and hibernate.search.backends.lucene.type are necessary but if I don't include the lucene.type, I get Ambiguous backend type: configuration property 'hibernate.search.backends.lucene.type' is not set.
That's because the backend name is just that: a name. Hibernate Search doesn't infer particular information from it, even if you name your backend "lucene" or "elasticsearch". You could have multiple Elasticsearch backends for all it knows :)
But if I do have both backend properties types set, I get HSEARCH000575: No default backend. Check that at least one entity is configured to target the default backend, when attempting to retrieve the Lucene backend, like:
Search.mapping(entityManager.getEntityManagerFactory())
.backend().unwrap(LuceneBackend.class);
``
You called .backend(), which retrieves the default backend, i.e. the backend that doesn't have a name and is configured through hibernate.search.backend.* instead of hibernate.search.backends.<somename>.* (see https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#configuration-structure ).
But you are apparently mapping all your entities to a named backends, one named elasticsearch and one named lucene. So the default backend just doesn't exist.
You should call this:
Search.mapping(entityManager.getEntityManagerFactory())
.backend("lucene").unwrap(LuceneBackend.class);
I've also added #Indexed(..., backend = "elasticsearch") to my entities since I wish to have them saved into Elasticsearch
Since you obviously only want to use one backend for indexing, I would recommend reverting that change (keeping #Indexed without setting #Indexed.backend) and simply making using the default backend.
In short, remove the #Indexed.backend and replace this:
properties.setProperty("hibernate.search.backends.elasticsearch.analysis.configurer", "com.bt.demo.search.AnalysisConfigurer");
properties.setProperty("hibernate.search.backends.lucene.analysis.configurer", "com.bt.demo.search.CustomLuceneAnalysisConfigurer");
properties.setProperty("hibernate.search.backends.elasticsearch.type", "elasticsearch");
properties.setProperty("hibernate.search.backends.lucene.type", "lucene");
properties.setProperty("hibernate.search.backends.elasticsearch.uris", "http://127.0.0.1:9200");
With this
properties.setProperty("hibernate.search.backend.analysis.configurer", "com.bt.demo.search.AnalysisConfigurer");
properties.setProperty("hibernate.search.backends.lucene.analysis.configurer", "com.bt.demo.search.CustomLuceneAnalysisConfigurer");
properties.setProperty("hibernate.search.backend.type", "elasticsearch");
properties.setProperty("hibernate.search.backends.lucene.type", "lucene");
properties.setProperty("hibernate.search.backend.uris", "http://127.0.0.1:9200");
You don't technically have to do that, but I think it will be simpler in the long term. It keeps the Lucene backend as a separate hack that doesn't affect your whole application.
I also tried adding a fake entity with #Indexed(..., backend = "lucene")
I confirm you will need that fake entity mapped to the "lucene" backend, otherwise Hibernate Search will not create the "lucene" backend.

Can a Hibernate Search FieldBridge configure facets for dynamic fields?

Using Hibernate Search 5.11.3 with programmatic API (no annotations), is there a way to facet on dynamic fields added in a class or field bridge? I don't see any 'facet' config available in FieldMetadataBuilder when using MetadataProvidingFieldBridge.
I have tried various combinations of luceneOptions.addSortedDocValuesFieldToDocument() and luceneOptions.addFieldToDocument() in the set() method. This successfully updates the index, but I cannot perform facet queries.
I am trying to do a basic attribute facet/filter where I have a generic table of attributes with id/name and attribute values associated with products. For various reasons I am using the programmatic API and especially for attributes I can't make use of the #Facet annotation. So for a product, I added this class bridge to Product.class:
public class ProductClassTagValuesBridge implements FieldBridge
{
#Override
public void set(String name, Object value, Document document, LuceneOptions luceneOptions)
{
Product product = (Product) value;
for (TagValue v : product.getTagValues())
{
Tag tag = v.getTag();
String tagName = "tag-" + tag.getId();
String tagValue = v.getId().toString();
// not sure if this line is required? Have tried with and without
luceneOptions.addFieldToDocument(tagName, tagValue, document);
luceneOptions.addSortedDocValuesFieldToDocument(tagName, tagValue, document);
}
}
}
Then I build my (test) faceting request to search tag-56 (which I confirmed is in the index using Luke):
FacetParameterContext context = queryBuilder.facet()
.name("tag-56")
.onField("tag-56")
.discrete();
FacetingRequest facetingRequest = context.createFacetingRequest();
Which when used in the search/FacetManager gives me the error:
org.hibernate.search.exception.SearchException: HSEARCH000268: Facet request 'TAG_56' tries to facet on field 'tag-56' which either does not exist or is not configured for faceting (via #Facet). Check your configuration.
I have also tried the custom config solution from the solution in this post: Hibernate Search: configure Facet for custom FieldBridge
For the custom field I added a field bridge to tagValues on my product. The same error occurs.
mapping.entity(Product.class).indexed()
.property("tagValues", ElementType.FIELD).field()
.analyze(Analyze.NO).store(Store.YES)
.bridge(ProductTagValuesFieldBridge.class)

Short answer: Hibernate Search does not allow that... yet.
Long answer:
Hibernate Search 5 allows dynamic fields, but does not allow faceting on fields declared in custom bridges.
That is to say, you can add arbitrary values to your index that don't fit a pre-defined schema, but you cannot use faceting on those fields.
Hibernate search 6 allows faceting (now called "aggregations") on fields declared in custom bridges (just declare them as .aggregable(Aggregable.YES)), but does not allow dynamic fields yet.
EDIT: Starting with 6.0.0.Beta7, dynamic fields are supported thanks to field templates. So the rest of my message is not useful anymore.
See this section of the documentation for more information about field templates. It's totally possible to declare an aggregable, dynamic field in your bridge.
Original message about ways to work without dynamic fields (obsolete):
That is to say, if you know the list of tags upon startup, are able to list them all, and are certain they won't change while your application is up, you could declare the fields upfront and use faceting on them. But if you don't know the list of tags upon startup, none of this is possible (yet).
Until dynamic fields are added to Hibernate Search 6, the only solution is to use Hibernate Search 5 and to re-implement faceting yourself. As you can expect, this will be complex and you will have to get your hands dirty with Lucene. You will have to:
Add fields of type SortedSetDocValuesFacetField to your document in your custom bridge.
Ensure Hibernate Search calls FacetsConfig.build on your documents after they are populated. One way to do that (through a hack) would be to declare a dummy #Facet field on your entity, even if you don't use it.
Completely ignore Hibernate Search's query feature and perform faceting yourself from an IndexReader. You can get an IndexReader from Hibernate Search as explained here. There's an example of how to perform faceting in org.hibernate.search.query.engine.impl.QueryHits#updateStringFacets.

How to create a Elasticsearch node specifying default search analyzers for indexing and searching

The Elasticsearch documents state the following:
The default logical name allows one to configure an analyzer that will be used both for indexing and for searching APIs. The default_index logical name can be used to configure a default analyzer that will be used just when indexing, and the default_search can be used to configure a default analyzer that will be used just when searching.
In other words, it is possible to configure a default analyzer used when indexing, and another used when searching.
This question and its answer helped me to create a node with a default analyzer for indexing, which (simplified) can programmatically be done like this:
public Node node() {
ImmutableSettings.Builder elasticsearchSettings = ImmutableSettings.settingsBuilder()
.put("index.analysis.analyzer.default.type", "keyword");
return NodeBuilder.nodeBuilder()
.settings(elasticsearchSettings.build())
.node();
}
What would be the equivalent way of specifying the default analyzer to be used when searching?

I believe the default analyzers can be defined using:
index.analysis.analyzer.default_index.type for indexing
index.analysis.analyzer.default_search.type for searching

Hibernate + JodaTime mapping different types

I am using hibernate reverse engineering and trying to get my Timestamps to map to a JodaTime type.
I have setup my hibernate.reveng.xml file properly
<sql-type jdbc-type="TIMESTAMP" hibernate-type="org.joda.time.contrib.hibernate.PersistentDateTime" not-null="true"></sql-type>
The issue is that when i run the rev-eng process my Java classes also get the members created as PersistentDateTime objects, but I don't want that because they are not usable. I need the java objects to be org.joda.time.DateTime
So I tried creating a custom engineering strategy
public class C3CustomRevEngStrategy extends DelegatingReverseEngineeringStrategy {
public C3CustomRevEngStrategy(ReverseEngineeringStrategy res) {
super(res);
}
public String columnToHibernateTypeName(TableIdentifier table, String columnName, int sqlType, int length, int precision, int scale, boolean nullable, boolean generatedIdentifier) {
if(sqlType==Types.TIMESTAMP) {
return "org.joda.time.DateTime";
} else {
return super.columnToHibernateTypeName(table, columnName, sqlType, length, precision, scale, nullable, generatedIdentifier);
}
}
}
My thought was that the hibernate mapping files would get the hibernate.reveng.xml file settings and the java objects would get the settings from the custom strategy file...but that was not the case. Both the mapping file and Object are of type "org.joda.time.DateTime" which is not what I want.
How can I achieve my goal? Also, I am NOT using annotations.
Hibernate 3.6
JodaTime 2.3
JodaTime-Hibernate 1.3
Thanks
EDIT: To clarify exactly what the issue is
After reverse engineering this is what I get in my mapping file and POJO class
<property name="timestamp" type="org.joda.time.contrib.hibernate.PersistentDateTime">
private PersistentDateTime timestamp;
As a POJO property, PersistentDateTime is useless to me as I cannot do anything with it such as time manipulations or anything. So this is what I want after my reverse engineering
<property name="timestamp" type="org.joda.time.contrib.hibernate.PersistentDateTime">
private org.joda.time.DateTime timestamp;
Using the Jidira library as suggested below gives me the same result, a POJO that I cannot use.

The JodaTime-Hibernate library is deprecated, and is probably the source of your problem. Don't dispair however as there is a (better) alternative.
You will need to use the JadiraTypes library to create the correct JodaTime objects from Hibernate. Add the library which can be found here to your project classpath and then change your type to org.jadira.usertype.dateandtime.joda.PersistantDateTime. All of the JodaTime objects have a corresponding mapping in that package, so if you decide to change to another object then just update your type.
This should ensure that your objects get created correctly.
I should add a caveat to my answer, which is that I have never used the JadiraTypes library with Hibernate 3. If it only supports Hibernate 4 (I don't see why it would, but...) let me know and I'll delete my answer.

The Hibernate Tools don't seem to separate Hibernate Types from the Java types. If you would be using annotations, this would be more clear as in that case you'd need an #Type annotation, which Hibernate will not generate at all. So using the exposed APIs won't help here.
Fortunately, Hibernate lets you plug into the actual code (or XML) generation after it does its processing. You can do that by replacing the Freemarker templates it uses to generate both XML and Java code. You'll need to use Ant for reverse engineering, however.
To start using Ant for this purpose (if you're not doing so already), you can either pull Hibernate Tools in as a build-time dependency using your dependency manager or download a JAR. Put the jar in Ant's classpath, but also extract the template files from it: since you're using XML configuration, you'll be interested in the /hbm directory. Then add the task using <taskdef> and, assuming that you extracted the templates to TPL_PATH/hbm/*.ftl, call it using <hibernatetool templatepath="TPL_PATH" ...>. (For more info, see below for a link to the docs). Using Ant also helps with automation, for example on CI servers where you won't have Eclipse.
You'll want to keep hibernate-type="org.joda.time.DateTime" in your hibernate.reveng.xml so that the Java files get generated correctly, but replace it with org.joda.time.contrib.hibernate.PersistentDateTime in the generated XML. To do that, edit TPL_PATH/hbm/property.hbm.ftl, replace ${property.value.typeName} with ${javaType}, and assign javaType to the right value before it's used:
<#assign javaType=property.value.typeName>
<#if javaType.equals("DateTime")>
<#assign javaType="org.jadira.usertype.dateandtime.joda.PersistentDateTime">
</#if>
<property
name="${property.name}"
type="${javaType}"
...
You might want to remove newlines to keep the generated XML clean.
This is all described in the Hibernate Tools documentation at http://docs.jboss.org/tools/archive/3.2.1.GA/en/hibernatetools/html/codegen.html#d0e6349 - except that the documentation doesn't tell you exactly which templates you need to modify, you need to figure that out by reading the templates.

Running hibernate tool annotation generation without the "catalog" attribute

when i run my hibernate tools
it reads from the db and create java classes for each tables,
and a java class for composite primary keys.
that's great.
the problem is this line
#Table(name="tst_feature"
,catalog="tstdb"
)
while the table name is required, the "catalog" attribute is not required.
sometimes i want to use "tstdb", sometimes i want to use "tstdev"
i thought which db was chosen depends on the jdbc connection url
but when i change the jdbc url to point to "tstdev", it is still using "tstdb"
so,
i know what must be done,
just don't know how its is done
my options are
suppress the generation of the "catalog" attribute
currently i'm doing this manually (not very productive)
or i could write a program that parses the java file and remove the attribute manually
but i'm hoping i don't have to
OR
find a way to tell hibernate to ignore the "catalog" attribute and use the schema that is explicitly specified.
i don't know the exact setting i have to change to achive this, or even if the option is available.

You need to follow 3 steps -
1) In the hibernate.cfg.xml, add this property
hibernate.default_catalog = MyDatabaseName
(as specified in above post)
2) In the hibernate.reveng.xml, add all the table filters like this
table-filter match-name="MyTableName"
(just this, no catalog name here)
3) Regenerate hibernate code
You will not see any catalog name in any of the *.hbm.xml files.
I have used Eclipse Galileo and Hibernate-3.2.4.GA.

There is a customization to the generation, that will tell what tables to put in what catalog.
You can specify the catalogue manually (in reveng file, <table> element), or programmatically (in your custom ReverseEngineeringStrategy class if I remember well).
Also, I recently had to modify the generation templates.
See the reference documentation :
http://docs.jboss.org/tools/archive/3.0.1.GA/en/hibernatetools/html_single/index.html#hibernaterevengxmlfile
you can customize the catalogue of each of your tables manually
https://www.hibernate.org/hib_docs/tools/viewlets/custom_reverse_engineering.htm watch a movie that explains a lot ...
http://docs.jboss.org/tools/archive/3.0.1.GA/en/hibernatetools/html_single/index.html#d0e5363 for customizing the templates (I start with the directory that's closest to my needs, copy all of them in my own directory, then edit as will)
Sorry, this could get more precise, but I don't have access to my work computer right now.

The attribute catalog is a "connection" attribute and should be specified in the "connection" config file hibernate.cfg.xml and NOT in a "data" config file *.hbm.xml.
I generate hibernate code via ant task <hibernatetool> and I put this replace task after regeneration (replace schema-name with your database).
<replace dir='../src' token='catalog="schema-name"' value=''>
So, after generation, attribute catalog has been removed.
This is a workaround, but code generated works in my development a production environment with different schema-name.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.