ElasticSearch Java API: Enabling fielddata on text fields

ElasticSearch Java API: Enabling fielddata on text fields - java

I have created an ElasticSearch index using the ElasticSearch Java API. Now I would like to perform some aggregations on data stored in this index, but I get the following error:
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [item] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
As suggested at this link, to solve this issue I should enable fielddata on the "item" text field, but how can I do that using the ElasticSearch Java API?
An alternative might be mapping the "item" field as a keyword, but same question: how can I do that with the ElasticSearch Java API?

For a new index you can set the mappings at creation time by doing something like:
CreateIndexRequest createIndexRequest = new CreateIndexRequest(indexName);
String source = // probably read the mapping from a file
createIndexRequest.source(source, XContentType.JSON);
restHighLevelClient.indices().create(createIndexRequest);
The mapping should have the same format as the request you can do against the rest endpoint similar to this:
{
"mappings": {
"your_type": {
"properties": {
"your_property": {
"type": "keyword"
}
}
}
}
}

Use XContentBuilder, easy to create json string to create or update mapping. It's seem like
.startObject("field's name")
.field("type", "text")
.field("fielddata", true)
.endObject()
After just use IndexRequest to create new indices or use PutMappingRequest to update old mapping

I have recently encountered this issue but in my case, I faced this issue when I was performing the Sorting. I have posted a working solution for this problem in another similar question -> set field data = true on java elasticsearch

Related

Hibernate Search: Fielddata is disabled on text fields

I attempted upgrade from Hibernate Search 5.8.0.CR1 to 5.8.2.Final
and from ElasticSearch 2.4.2 to 5.6.4.
When I run my application I'm getting the following error:
Status: 400 Bad Request
Error message: {"root_cause":[{"type":"illegal_argument_exception",
reason":"Fielddata is disabled on text fields by default.
Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index.
Note that this can however use significant memory. Alternatively use a keyword field instead."}]
I read about Fielddata here:
https://www.elastic.co/guide/en/elasticsearch/reference/5.6/fielddata.html#_fielddata_is_disabled_on_literal_text_literal_fields_by_default
But I'm not sure how to address this issue, especially from Hibernate Search.
My title field definition looks like this:
#Field(name = "title", analyzer = #Analyzer(definition = "my_collation_analyzer"))
#Field(name = "title_polish", analyzer = #Analyzer(definition = "polish"))
protected String title;
I'm using the following analyzer definition:
#AnalyzerDef(name = "my_collation_analyzer",
tokenizer = #TokenizerDef(factory = KeywordTokenizerFactory.class), filters = { #TokenFilterDef(
name = "polish_collation", factory = ElasticsearchTokenFilterFactory.class, params = {
#org.hibernate.search.annotations.Parameter(name = "type", value = "'icu_collation'"),
#org.hibernate.search.annotations.Parameter(name = "language", value = "'pl'") }) })
(Analyzer polish comes from plugin analysis-stempel.)
Elasticsearch notes on Fielddata recommend changing the type of the field
from text to keyword, or setting fielddata=true, but I'm not sure
how to do it using Hibernate Search annotations because there are no such
properties in annotation #Field.
Update:
Thank you very much for the help on this. I changed my code to this:
#NormalizerDef(name = "my_collation_normalizer",
filters = { #TokenFilterDef(
name = "polish_collation_normalization", factory = ElasticsearchTokenFilterFactory.class, params = {
#org.hibernate.search.annotations.Parameter(name = "type", value = "'icu_collation'"),
#org.hibernate.search.annotations.Parameter(name = "language", value = "'pl'") }) })
...
#Field(name = "title_for_search", analyzer = #Analyzer(definition = "polish"))
#Field(name = "title_for_sort", normalizer = #Normalizer(definition = "my_collation_normalizer"))
#SortableField(forField = "title_for_sort")
protected String title;
Is it ok? As I understand there should be no tokenization in a normalizer, but I'm not sure what else to use instead of #TokenFilterDef and factory = ElasticsearchTokenFilterFactory.class (?).
Unfortunately I'm also getting the following error:
Error message: {"root_cause":
[{"type":"illegal_argument_exception",
"reason":"Custom normalizer [my_collation_normalizer] may not use filter
[polish_collation_normalization]"}]
I need collation for sorting, as described in my previous question here: ElasticSearch - define custom letter order for sorting
Update 2:
I tested ElasticSearch version 5.6.5 and I think it allows icu_collation in normalizers (my annotations were accepted).

If you are trying to sort on the "title" field, then maybe you forgot to mark the field as sortable using the #SortableField annotation. (More information here) [EDIT: In Hibernate Search 6 you would use #KeywordField(sortable = Sortable.YES). See here]
Also, to avoid errors and for better performance, you should consider using normalizers instead of analyzers for fields you want to sort on (such as your "title" field). This will turn your field into a keyword field, which is what the Elasticsearch logs are hinting at.
More information on normalizers in Hibernate Search is available here, and here are the Elasticsearch specifics in Hibernate Search.

You most likely kept the old schema in your Elasticsearch cluster and tried to use it in Elasticsearch 5 with Hibernate Search. This will not work.
When upgrading from Elasticsearch 2 to 5, you must take some steps to upgrade the Elasticsearch schema, in order to use it with Hibernate Search. The easiest option (by far) is to delete the indexes and reindex your whole database. You can find details in the documentation: https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#_upgrading_elasticsearch
Note that you may also have to delete indexes and reindex if your Elasticsearch schema was generated from a Beta version of Hibernate Search: Beta versions are unstable, and may generate an incorrect schema. They are nice for experiments, but definitely not for production environments.

Too many parameters error on the following $in query

I'm using jongo API - org.jongo.MongoCollection is the class.
I have list of object ids and converted the same as ObjectId[] and trying to query as follows
collection.find("{_id:{$in:#}}", ids).as(Employee.class);
The query throws the exception - "java.lang.IllegalArgumentException: Too
many parameters passed to query: {"_id":{"$in":#}}"
The query doesn't work as specified in the URL In Jongo, how to find multiple documents from Mongodb by a list of IDs
Any suggestion on how to resolve?
Thanks.

Try it with a List as shown on the docs:
List<String> ages = Lists.newArrayList(22, 63);
friends.find("{age: {$in:#}}", ages); //→ will produce {age: {$in:[22,63]}}
For example the following snippet I crafted quick & dirty right now worked for me (I use older verbose syntax as I am currently on such a system ...)
List<ObjectId> ids = new ArrayList<ObjectId>();
ids.add(new ObjectId("57bc7ec7b8283b457ae4ef01"));
ids.add(new ObjectId("57bc7ec7b8283b457ae4ef02"));
ids.add(new ObjectId("57bc7ec7b8283b457ae4ef03"));
int count = friends.find("{ _id: { $in: # } }", ids).as(Friend.class).count();

How to index new document in elasticsearch using term_vector?

I'm trying to implement application that get document from MongoDB and insert it to ElasticSearch. Here is a piece of code that should insert document to the ElasticSearch index:
final Document o = (Document) document.get("o"); // this is where object lives
client.prepareIndex(index, mapping, id.toString())
.setSource(o.toJson())
.execute().actionGet();
And finally I get this error:
java.lang.IllegalArgumentException: Mapper for [title] conflicts with existing mapping in other types:
[mapper [title] has different [store_term_vector] values, mapper [title] has different [store_term_vector_offsets] values, mapper [title] has different [store_term_vector_positions] values, mapper [title] has different [store_term_vector_payloads] values]
at org.elasticsearch.index.mapper.FieldTypeLookup.checkCompatibility(FieldTypeLookup.java:117)
at org.elasticsearch.index.mapper.MapperService.checkNewMappersCompatibility(MapperService.java:368)
at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:319)
I'v tried to remove index completely using XDELETE and recreate using XPUT and error remain.
Here is how my index settings look like:
{
"msg": {
"mappings": {
"Message": {
"properties": {
"title": {
"type": "string",
"term_vector": "with_positions_offsets_payloads",
"analyzer": "russian"
}
}
}
}
}
}
However if I remove this term_vector part from index settings the code is inserts new document successfully.
Can someone explain me what is the problem? The same problem occur when I'm trying to use mongo-connector. If settings contain term_vector part for title field => mongo-connector fails with same Exception. And it works well without term_vector.

Are you sure you are using the correct term_vector value? I am only aware of five valid values for that attribute, as listed in the documentation:
Possible values are no, yes, with_offsets, with_positions, with_positions_offsets. Defaults to no.
I would suggest trying a different term_vector such as with_positions_offsets to see if you get the results you're expecting.

I hope my answer will help someone else.
The problem was that I have another mapping in the same index that also have field title. You have to update all other mappings to use same settings.

using $addToset with java morphia aggregation

I have mongodb aggregation query and it works perfectly in shell.
How can i rewrite this query to use with morphia ?
org.mongodb.morphia.aggregation.Group.addToSet(String field) accepts only one field name but i need to add object to the set.
Query:
......aggregate([
{$group:
{"_id":"$subjectHash",
"authors":{$addToSet:"$fromAddress.address"},
---->> "messageDataSet":{$addToSet:{"sentDate":"$sentDate","messageId":"$_id"}},
"messageCount":{$sum:1}}},
{$sort:{....}},
{$limit:10},
{$skip:0}
])
Java code:
AggregationPipeline aggregationPipeline = myDatastore.createAggregation(Message.class)
.group("subjectHash",
grouping("authors", addToSet("fromAddress.address")),
--------??????------>> grouping("messageDataSet", ???????),
grouping("messageCount", new Accumulator("$sum", 1))
).sort(...)).limit(...).skip(...);

That's currently not supported but if you'll file an issue I'd be happy to include that in an upcoming release.

Thanks for your answer, I can guess that according to source code. :(
I don't want to use spring-data or java-driver directly (for this project) so I changed my document representation.
Added messageDataSet object which contains sentDate and messageId (and some other nested objects) (these values become duplicated in a document which is a bad design).
Aggregation becomes : "messageDataSet":{$addToSet:"$messageDataSet"},
and Java code is: grouping("messageDataSet", addToSet("messageDataSet")),
This works with moprhia. Thanks.

Update an existing collection in MongoDB using Java-Hadoop connector

Is it possible to update existing MongoDB collection with new data. I am using hadoop job to read write data to Mongo. Required scenario is :-
Say first collection in Mongo is
{
"_id" : 1,
"value" : "aaa"
"value2" : null
}
after reading data from Mongo and processing data, MongoDB should contain
{
"_id" : 1,
"value" : "aaa"
"value2" : "bbb"
}
If possible, please provide some dummy code.

BasicBSONObject query=new BasicBSONObject();
query.append("fieldname", value);
BasicBSONObject update=new BasicBSONObject();
update.append("$set", new BasicBSONObject().append("newfield",value));
MongoUpdateWritable muw=new MongoUpdateWritable(query,update,false,true);
contex.write(key, muw);
query : is used for providing condition(matching condition).
update : is used for adding new field and value in existing collection.
MongoUpdateWritable:
3rd parameter is upsert value(same as mongodb)
4th parameter is multiple update in many documents in a collection.
Set in Driver class
job.setOutputValueClass(MongoUpdateWritable.class);

I have done it by extending org.apache.hadoop.mapreduce.RecordWriter and overriding write method of this class.

The Mongo-Hadoop Connector doesn't currently suppor this feature. You can open a feature request in the MongoDB Jira if you like.

I have done it by the stratio, if you are using spark, you can check it out!

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

ElasticSearch Java API: Enabling fielddata on text fields - java

Use XContentBuilder, easy to create json string to create or update mapping. It's seem like .startObject("field's name") .field("type", "text") .field("fielddata", true) .endObject() After just use IndexRequest to create new indices or use PutMappingRequest to update old mapping

I have recently encountered this issue but in my case, I faced this issue when I was performing the Sorting. I have posted a working solution for this problem in another similar question -> set field data = true on java elasticsearch

Related

Hibernate Search: Fielddata is disabled on text fields

Too many parameters error on the following $in query

How to index new document in elasticsearch using term_vector?

using $addToset with java morphia aggregation

Update an existing collection in MongoDB using Java-Hadoop connector

Categories

Resources