Hibernate Search with Elasticsearch creating fields with .keyword suffix - java

I just implemented the integration of Hibernate Search with Elasticsearch using hibernate search 5.8 and ES 5.5.
I have several fields created specifically for sorting, and they are all called [field]Sort.
When I was testing it locally, the first time I let Hibernate create the indexes, it created the String sort fields like this:
nameSort -> text
nameSort.keyword -> keyword
I realized that I should use the suffixed field for sorting.
But then, when I destroyed my Elasticsearch cluster, to start over, it didn't create the suffixed fields, it just created the sort fields as keyword directly.
I recreated the cluster 5 or more times again and it never created the suffixed fields again.
When I finally sent my changes to our staging environment, it created the suffixed fields again, causing my queries to fail, because they are trying to sort by a text field, instead of a keyword field.
Now, I'm really not sure of why it sometimes creates the suffix and sometimes doesn't.
Is there any rule?
Is there a way to avoid it creating 2 fields and making it always create only one keyword field with exactly the name I gave it?
Here's an example of a sort field:
#Field(name = "nameSort", analyze = Analyze.NO, store = Store.YES, index = Index.NO)
#SortableField(forField = "nameSort")
public String getNameSort() {
return name != null ? name.toLowerCase(Locale.ENGLISH) : null;
}
Thanks in advance for any help.

Hibernate Search does no such thing as creating a separate keyword field for text fields. It creates either a text field or a keyword field, depending on whether the field should be analyzed. In your case, the field is not analyzed, so it should create a keyword field.
Now, Hibernate Search is not alone here, and this behavior could stem from the Elasticsearch cluster itself. Did you check whether you have particular index templates on your Elasticsearch cluster? It could lead to Elasticsearch creating a keyword field whenever Hibernate Search creates a text property.
On a side note, you may be interested by the fact Hibernate Search 5.8 allows defining normalizers (same thing as Elasticsearch normalizers), which would allow you to annotate the getName() getter directly and avoid doing the lowercase conversion yourself. See this blog post for more information.

Related

Hibernate search: Indexed data with Ngram filter and while searching it gives incorrect result due to tokenizing while querying

I have an analyzer with this configuration,
searchMapping//
.analyzerDef(BaseEntity.CUSTOM_SEARCH_INDEX_ANALYZER, WhitespaceTokenizerFactory.class)//
.filter(LowerCaseFilterFactory.class)//
.filter(ASCIIFoldingFilterFactory.class)//
.filter(NGramFilterFactory.class).param("minGramSize", "1").param("maxGramSize", "200");
This is how my entity field is configured
#Field(analyzer = #Analyzer(definition = CUSTOM_SEARCH_INDEX_ANALYZER))
private String bookName;
This is how I create a search query
queryBuilder.keyword().onField(prefixedPath).matching(matchingString).createQuery()
I have an entity with value bookName="Gulliver" and another entity with bookName="xGulliver";
If I tried to search with data bookName = xG then am getting both entities where I would expect entity only with bookName="xGulliver";
Also looked on the query that is produced by hibernate-search.
Executing Lucene query '+(+(+(+(
bookName:x
bookName:xg
bookName:g))))
Above Lucene query is prepared using BooleanJunction::must conditions by Lucene I guess which means it should match all the conditions.
Still why its giving me both entity data. I dont understand here.
I can also override the analyzer while querying by having KeywordTokenizer instead of NGramFilterFactory but this is like I have to override for each and every field before creating QueryBuilder which doesnt looks good because then I have to override all index fields which I have about 100 fields and some are dynamic fields and I create individual query for each field.
Is there any other way to override the analyzer in 5.11 version or is it handled in some other way in hibernate-search 6.x version in easier way?
Hibernate versions that I use are,
hibernate-search-elasticsearch, hibernate-search-orm = 5.11.4.Final
Above Lucene query is prepared using BooleanJunction::must conditions by Lucene I guess which means it should match all the conditions. Still why its giving me both entity data. I dont understand here.
When you create a keyword query using Hibernate Search, the string passed to that query is analyzed, and if there are multiple tokens, Hibernate Search creates a boolean query with one "should" clause for each token. You can see it here " bookName:x bookName:xg bookName:g": there is no "+" sign before "bookName", which means those are not "must" clauses, they are "should" clauses.
I can also override the analyzer while querying by having KeywordTokenizer instead of NGramFilterFactory but this is like I have to override for each and every field before creating QueryBuilder which doesnt looks good because then I have to override all index fields which I have about 100 fields and some are dynamic fields and I create individual query for each field.
True, that's annoying.
Is there any other way to override the analyzer in 5.11 version
In 5.11, I don't think there is any other way to override analyzers.
If necessary and if you're using the Lucene backend, I believe you should be able to bypass the Hibernate Search DSL just for this specific query:
Get the analyzer you want: something like Analyzer analyzer = fullTextSession.getSearchFactory().getAnalyzer("myAnalyzerWithoutNGramTokenFilter").
Analyze the search terms: call analyzer.tokenStream(...) and use the TokenStream as appropriate. You'll get a list of tokens.
Create the Lucene Query: essentially it will be a boolean query with one TermQuery for each token.
Pass the resulting Query to Hibernate Search as usual.
or is it handled in some other way in hibernate-search 6.x version in easier way?
It's dead simple in Hibernate Search 6.0.0.Beta4. There are two solutions:
Implicitly: in your mapping, you can specify not only an analyzer (using #FullTextField(analyzer = "myAnalyzer")), but also a "search" analyzer using #FullTextField(analyzer = "myAnalyzer", searchAnalyzer = "mySearchAnalyzer"). The "default" analyzer will be used when indexing, while the "search" analyzer will be used when searching (querying).
Explicitly: at query time, you can override the analyzer on a given predicate by calling .analyzer("mySearchAnalyzer") while building the predicate. There is one example in this section of the documentation.
Note however that dynamic fields are not supported yet in Hibernate Search 6: HSEARCH-3273.

java-Share data among different scenarios in a single feature

Cucumber java
My feature file looks like
Feature
Scenario1:.... Generate a unique number
Scenario2:.... Do some validations on the unique number generated
Using spring for dependency injection, the unique number generated # Scenario1 is assigned to a String, the same need to be used across the Scenario2 as well.
But I'm getting a String value as null #Scenario2. I think the dependency injection # scenario2 is creatin a new object and is getting the default value as null.
Please help me on to resolve this issue. Need to know how java objects can be passed across different scenarios in a single feature..
Use Singleton?
1) Generate unique number at 1st scenario
2) getInstance() at 2nd
Use gherkin with qaf where it provides different ways to share information between steps or scenarios.
For example, If your step returns value you can use like:
Then get text of 'element'
And store into 'applicaiton.refID'
to refer any stored value or any property you can use ${property}. For example
Given application to update is '${applicaiton.refID}'
You can applicaiton.refID in any of subsequent scenario.
If you want to do this in java step, you can write code something like below:
//store value for further use
getBundle().setProperty("applicaiton.refID","myvalue");
//retrieve applicaiton.refID any where
getBundle().getString("applicaiton.refID");

Handling null in Freemarker

Situation:
Old java project using freemarker has many finished templates working great.
Every template is using data form Transaction object.
This transaction object is very large, because wraps all data about transaction.
In templates is a lot of expression like this:
get("object1").getNestedObject2().getNestedObject3().getValue();
Problem:
New requirements appear: All templates have to be process for preview with no real data. All numbers should be Zero and all string should be ---.
Unsatisfactory solutions:
Remake all templates to check null values. (Lot of work and not safe)
Create Transaction object that contains all default value. (Lot of work)
Well my question is: Can I say to Freemarker, that if he finds null or finds null along the way, that he should use 0 instead if he was expecting number or --- if he was expecting String.
Or do you see any better solution?
If you need to show a dummy data model to the templates, your best bet is probably a custom ObjectWrapper (see Configuration.setObjectWrapper). Everything that reads the data model runs through the TemplateModel-s, and the root TemplateModel is made by the ObjectWrapper, thus it can control what values the templates get for what names. But the question is, when you have to return a dummy value for a name, how can you tell what its type will be? It's not just about finding out if it will be a string or a number, but also if it will be a method (like getNestedObject2) or a hash (something that can be followed by .). What can help there is that FreeMarker allows a value to have multiple types, so you can return a value that can be used as a method and as a hash and as a string, for example. Depending on the application that hack is might be good enough, except, you still have to decide if the value is a string or a number, because ${} will print the numerical value if the value both a string and a number.

Key-Value on top of Appengine

Although appengine already is schema-less, there still need to define the entities that needed to be stored into the Datastore through the Datanucleus persistence layer. So I am thinking of a way to get around this; by having a layer that will store Key-value at runtime, instead of compile-time Entities.
The way this is done with Redis is by creating a key like this:
private static final String USER_ID_FORMAT = "user:id:%s";
private static final String USER_NAME_FORMAT = "user:name:%s";
From the docs Redis types are: String, Linked-list, Set, Sorted set. I am not sure if there's more.
As for the GAE datastore is concerned a String "Key" and a "Value" have to be the entity that will be stored.
Like:
public class KeyValue {
private String key;
private Value value; // value can be a String, Linked-list, Set or Sorted set etc.
// Code omitted
}
The justification of this scheme is rooted to the Restful access to the datastore (that is provided by Datanucleus-api-rest)
Using this rest api, to persist a object or entity:
POST http://datanucleus.appspot.com/dn/guestbook.Greeting
{"author":null,
"class":"guestbook.Greeting",
"content":"test insert",
"date":1239213923232}
The problem with this approach is that in order to persist a Entity the actual class needs to be defined at compile time; unlike with the idea of having a key-value store mechanism we can simplify the method call:
POST http://datanucleus.appspot.com/dn/org.myframework.KeyValue
{ "class":"org.myframework.KeyValue"
"key":"user:id:johnsmith;followers",
"value":"the_list",
}
Passing a single string as "value" is fairly easy, I can use JSON array for list, set or sorted list. The real question would be how to actually persist different types of data passed into the interface. Should there be multiple KeyValue entities each representing the basic types it support: KeyValueString? KeyValueList? etc.
Looks like you're using a JSON based REST API, so why not just store Value as a JSON string?
You do not need to use the Datanucleus layer, or any of the other fine ORM layers (like Twig or Objectify). Those are optional, and are all based on the low-level API. If I interpret what you are saying properly, perhaps it already has the functionality that you want. See: https://developers.google.com/appengine/docs/java/datastore/entities
Datanucleus is a specific framework that runs on top of GAE. You can however access the database at a lower, less structured, more key/value-like level - the low-level API. That's the lowest level you can access directly.
BTW, the low-level-"GAE datastore" internally runs on 6 global Google Megastore tables, which in turn are hosted on the Google Big Table database system.
Saving JSON as a String works fine. But you will need ways to retrieve your objects other than by ID. That is, you need a way to index your data to support any kind of useful query on it.

What is the role of #Indexable in elasticsearch-osem?

I'm looking into integrating elasticsearch into my spring-jpa driven application.
For this purpose the elasticsearch-osem project seems an amazing fit.
What I can't understand is what is the role of the #Indexable(indexName = "someIndex") annotation which is shown in the example from the introduction to the project.
What confuses me is the fact that in the same example it says:
Then you can write objects to the ElasticSearch client:
node.client().prepareIndex("twitter", "tweet","1").setSource(context.write(tweet)).execute().actionGet();
Where "twitter" is the index-name.
I think my question is why should one also define an #Indexable on a field and why should they define an index-name?
Thanks
With #Indexable you say which fields should be included in the index. The indexName is the name of the field in the index. This is not the name of the index, which you set with your other call.
From Javadoc:
/**
* The name of the field that will be stored in the index. Defaults to the property/field name.
*/
String indexName() default "";
After looking through the source code I was able to see that the #Indexable is used to either supply aliases to the fields of indexed properties in indexed entities or to allow indexing of properties in un-indexed entities.
You can see this in the getIndexableProperties method in the AttributeSourceImpl type where it says in a comment:
Searchable class properties are implicitly Indexable

Categories

Resources