How to index new document in elasticsearch using term_vector?

How to index new document in elasticsearch using term_vector? - java

I'm trying to implement application that get document from MongoDB and insert it to ElasticSearch. Here is a piece of code that should insert document to the ElasticSearch index:
final Document o = (Document) document.get("o"); // this is where object lives
client.prepareIndex(index, mapping, id.toString())
.setSource(o.toJson())
.execute().actionGet();
And finally I get this error:
java.lang.IllegalArgumentException: Mapper for [title] conflicts with existing mapping in other types:
[mapper [title] has different [store_term_vector] values, mapper [title] has different [store_term_vector_offsets] values, mapper [title] has different [store_term_vector_positions] values, mapper [title] has different [store_term_vector_payloads] values]
at org.elasticsearch.index.mapper.FieldTypeLookup.checkCompatibility(FieldTypeLookup.java:117)
at org.elasticsearch.index.mapper.MapperService.checkNewMappersCompatibility(MapperService.java:368)
at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:319)
I'v tried to remove index completely using XDELETE and recreate using XPUT and error remain.
Here is how my index settings look like:
{
"msg": {
"mappings": {
"Message": {
"properties": {
"title": {
"type": "string",
"term_vector": "with_positions_offsets_payloads",
"analyzer": "russian"
}
}
}
}
}
}
However if I remove this term_vector part from index settings the code is inserts new document successfully.
Can someone explain me what is the problem? The same problem occur when I'm trying to use mongo-connector. If settings contain term_vector part for title field => mongo-connector fails with same Exception. And it works well without term_vector.

Are you sure you are using the correct term_vector value? I am only aware of five valid values for that attribute, as listed in the documentation:
Possible values are no, yes, with_offsets, with_positions, with_positions_offsets. Defaults to no.
I would suggest trying a different term_vector such as with_positions_offsets to see if you get the results you're expecting.

I hope my answer will help someone else.
The problem was that I have another mapping in the same index that also have field title. You have to update all other mappings to use same settings.

Related

Error when using apache beam ParquetIO to read data from parquet file due to avro schema exception

I am using the Apache Beam ParquetIO.read(schema) method to read data from a parquet file. When performing the read I was getting the following error: java.lang.NullPointerException: null of com.namespace.myfield field myfield.
This was occuring because the field in question in the source data had a null value. I updated the avro schema being used by the ParquetIO.read(schema) method to include a union so that it now looks like the below:
{
"type": "record",
"name": "TABLE",
"namespace": "com.namespace",
"fields": [
{
"name": "myfield ",
"type": [
"null",
{
"type": "fixed",
"name": "myfield",
"size": 5,
"logicalType": "decimal",
"precision": 10,
"scale": 5
}
]
}
}
My thinking was this would allow the value to be null or of the fixed type required.
When I run the same code now I get a different error: org.apache.avro.UnresolvedUnionException: Not in union ["null",{"type":"fixed","name":"myfield","namespace":"com.namespace","size":5,"logicalType":"decimal","precision":10,"scale":5}]: [0, 0, 0, 0, 0]
When I debug the code and step through it seems to be that the exception is being thrown from the org.apache.avro.generic.GenericData class within the resolveUnion method and it looks as though it is unable to find the required fixed type because it cant handle the complex type within the array.
Has anyone had any experience of getting ParquetIO working with reading a file using an avro schema that contains a union of null and a fixed type?
For reference I am using the 2.19.0 version of beam-sdks-java-io-parquet and I beleive this in turn is using v 1.8.2 of org.apache.avro. I am unsure whether this is occuring because there is a known bug in the older versions being used or if I am missing something in the format of the schema?
UPDATE
It now looks like the error is occurring because the lookup is searching for the fixed field within the union by name "myfield" however it looks like it is only findable with its fqdn "com.namespace.myfield". I am not entirely sure what to change so that it searches for the field including the namespace.

So I figured this out for anyone who possibly runs into the same issue. ParquetIO.read() in apache beam use org.apache.avro.generic. When resolving the union there is a line of code in the GenericData class in the resolveUnion method:
Integer i = union.getIndexNamed(getSchemaName(datum));
the getIndexNamed method is being called for a fixed type, within this method there is a map called indexByName which contains the elements in the union. That line of code above is searching for a field in the union called 'myfield'. 'myfield' is however not in that map, when it was created the field was added to the map with the full name (including namespace) so it was called 'com.namespace.myfield'. As a result of this it can never resolve.
If I remove the namespace from the record it is able to resolve the union with no issues.

How to access key value in array of objects - java

I'm getting result from cloudant db and response type would be Document object.
This is my query:
FindResult queryResult = cloudantConfig.clientBuilder()
.postFind(findOptions)
.execute()
.getResult();
This is my result from cloudant db:
{
"bookmark": "Tq2MT8lPzkzJBYqLOZaWZOQXZVYllmTm58UHpSamxLukloFUc8BU41GXBQAtfh51",
"docs": [
{
"sports": [
{
"name": "CRICKET",
"player_access": [
"All"
]
}
]
}
]
}
I'd like to access 'name' and 'player access,' but I can only go up to'sports,' and I can't get to 'name' or 'player access.' This is how I attempted to obtain 'name.'
queryResult.getDocs().get(0).get("sports").get(0).get("name");
With above one I'm getting an error like this The method get(int) is undefined for the type Object
I'm receiving the values when I try to get up to'sports.'
This is how I obtain sports:
queryResult.getDocs().get(0).get("sports");
When I sysout the aforementioned sports, I get the results below.
[{name=CRICKET, player_access=[All]}]
So, how do I gain access to 'name' and 'player access' here? Can somebody help me with this?

I've dealed with JSON values recently. But ended up just using regex and splitting/matching from there.
You can regex everything from the "name" (not including until the last comma) and do the same for sport Access.
Be aware that this is just a work around, and not the best option. But sometimes JSON objects on Java can be Tricky.

How to obtain validator expression used when creating a MongoDB collection? [duplicate]

I'm trying to add new field (LastLoginDate of type Date) to a existing collection. Here is my sample script:
db.createCollection( "MyTestCollection",
{ "validator": { "$or":
[
{ "username": { "$type": "string" } },
{ "notes": { "$type": "string" } }
]
}
}
)
db.getCollectionInfos({name: "MyTestCollection"});
[
{
"name" : "MyTestCollection",
"options" : {
"validator" : {
"$or" : [
{
"username" : {
"$type" : "string"
}
},
{
"notes" : {
"$type" : "string"
}
}
]
}
}
}
]
What is the best way to add new field LastLoginDate : { $type: "date" }, to this existing collection "MyTestCollection".
Adding new document or updating existing collection with new field may create this field. But i'm not sure how to enforce the date type on the new field. After adding new filed, if i execute the following command again, it doesn't show type validator for newly added field.

I "should" probably prefix this with one misconception in your question. The fact is MongoDB differs from traditional RDBMS in that it is "schemaless" and you do not in fact need to "create fields" at all. So this differs from a "table schema" where you cannot do anything until the schema changes. "Validation" however is a different thing as well as a "still" relatively new feature as of writing.
If you want to "add a validation rule" then there are methods which depend on the current state of the collection. In either case, there actually is no "add to" function, but the action instead is to "replace" all the validation rules with new ones to specify. Read on for the rules of how this works.
Existing Documents
Where the collection has existing documents, as noted in the documentation
Existing Documents
You can control how MongoDB handles existing documents using the validationLevel option.
By default, validationLevel is strict and MongoDB applies validation rules to all inserts and updates. Setting validationLevel to moderate applies validation rules to inserts and to updates to existing documents that fulfill the validation criteria. With the moderate level, updates to existing documents that do not fulfill the validation criteria are not checked for validity.
This and the following example section are basically saying that in addition to the options on .createCollection() you may also modify an existing collection with documents, but should be "wary" that the present documents may not meet the required rules. Therefore use "moderate" if you are unsure the rule will be met for all documents in the collection.
In order to apply, you use the .runCommand() method at present to issue the "command" which sets the validation rules. Which is "validationLevel" from the passage above.
Since you have existing rules, we can use the `.getCollectionInfos() to retrieve them and then add the new rule and apply:
let validatior = db.getCollectionInfos({name: "MyTestCollection"})[0].options.validator;
validator.$or.push({ "LastLoginDate": { "$type": "date" } });
db.runCommand({
"collMod": "MyTestCollection",
"validator": validator,
"validationLevel": "moderate"
});
Of course as noted before, that if you are confident the documents all meet the conditions then you can apply "strict" as the default instead.
Empty Collection
If in the case is that the collection is actually "empty" with no documents at all or you may "drop" the collection since the current data is not of consequence, then you can simply vary the above and use .createCollection() in combination with .drop():
let validatior = db.getCollectionInfos({name: "MyTestCollection"})[0].options.validator;
validator.$or.push({ "LastLoginDate": { "$type": "date" } });
db.getCollection("MyTestCollection").drop();
db.createCollection( "MyTestCollection", { "validator": validator });

let previousValidator = db.getCollectionInfos({name: "collectionName"})[0].options.validator;
# push the key to required array
previousValidator.$jsonSchema.required.push("isBloodReportAvailable")
let isBloodReportAvailabl = {"bsonType" : "bool", "description" : "must be an bool object and is optional" }
# add new property to validator
previousValidator1.$jsonSchema.properties['isBloodReportAvailable'] = isBloodReportAvailabl
db.runCommand({
"collMod": "collectionName",
"validator": previousValidator,
});

ElasticSearch Java API: Enabling fielddata on text fields

I have created an ElasticSearch index using the ElasticSearch Java API. Now I would like to perform some aggregations on data stored in this index, but I get the following error:
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [item] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
As suggested at this link, to solve this issue I should enable fielddata on the "item" text field, but how can I do that using the ElasticSearch Java API?
An alternative might be mapping the "item" field as a keyword, but same question: how can I do that with the ElasticSearch Java API?

For a new index you can set the mappings at creation time by doing something like:
CreateIndexRequest createIndexRequest = new CreateIndexRequest(indexName);
String source = // probably read the mapping from a file
createIndexRequest.source(source, XContentType.JSON);
restHighLevelClient.indices().create(createIndexRequest);
The mapping should have the same format as the request you can do against the rest endpoint similar to this:
{
"mappings": {
"your_type": {
"properties": {
"your_property": {
"type": "keyword"
}
}
}
}
}

Use XContentBuilder, easy to create json string to create or update mapping. It's seem like
.startObject("field's name")
.field("type", "text")
.field("fielddata", true)
.endObject()
After just use IndexRequest to create new indices or use PutMappingRequest to update old mapping

I have recently encountered this issue but in my case, I faced this issue when I was performing the Sorting. I have posted a working solution for this problem in another similar question -> set field data = true on java elasticsearch

MongoCollection : How to get value of nested key

I have some mongo data that looks like this
{
"_id": {
"$oid": "5984cfb276c912dd03c1b052"
},
"idkey": "123",
"objects": [{
"key1": "481334",
"key2": {
"key3":"val3",
"key4": "val4"
}
}]
}
I want to know what the value of key4 is. I also need to filter the results byidkey and key1. So I tried
doc = mongoCollection.find(and(eq("idKey", 123),eq("objects.key1", 481334))).first();
and this works. But i want to check the value of key4 without having to unwrap the entire object. Is there some query i can perform that gives me just the value of key4? Note that I can update the value of key4 as
mongoCollection.updateOne(and(eq("idKey", 123), eq("objects.key1", 481334)),Updates.set("objects.$.key2.key4", "someVal"));
Is there a similar query i can run just to get the value of key4?
Upadte
thanks a lot #dnickless for your help. I tried both of your suggestions but i am getting null. Here is what i tried
existingDoc = mongoCollection.find(and(eq("idkey", 123), eq("objects.key1", 481334))).first();
this gives me
Document{{_id=598b13ca324fb0717c509e2d, idkey="2323", objects=[Document{{key1="481334", key2=Document{{key3=val3, key4=val4}}}}]}}
so far so good. next i tried
mongoCollection.updateOne(and(eq("idkey", "123"), eq("objects.key1", "481334")),Updates.set("objects.$.key2.key4", "newVal"));
now i tried to get the updated document as
updatedDoc = mongoCollection.find(and(eq("idkey", "123"),eq("objects.key1","481334"))).projection(Projections.fields(Projections.excludeId(), Projections.include("key4", "$objects.key2.key4"))).first();
for this i got
Document{{}}
and finally i tried
updatedDoc = mongoCollection.aggregate(Arrays.asList(Aggregates.match(and(eq("idkey", "123"), eq("objects.key1", "481334"))),
Aggregates.unwind("$objects"), Aggregates.project(Projections.fields(Projections.excludeId(), Projections.computed("key4", "$objects.key2.key4")))))
.first();
and for this i got
Document{{key4="newVal"}}
so i'm happy :) but can you think of a reason why the firs approach did not work?
Final answer
thanks for the update #dnickless
document = collection.find(and(eq("idkey", "123"), eq("objects.key1", "481334"))).projection(fields(excludeId(), include("key4", "objects.key2.key4"))).first();

Your data sample contains a lowercase "idkey" whereas your query uses "idKey". In my examples below, I use the lowercase version. Also you are querying for integers 123 and 481334 as opposed to strings which would be correct looking at your sample data. I'm going for the string version with my below code in order to make it work against the provided sample data.
You have two options:
Either you simply limit your result set but keep the same structure using a simple find + projection:
document = collection.find(and(eq("idkey", "123"), eq("objects.key1", "481334"))).projection(fields(excludeId(), include("objects.key2.key4"))).first();
Or, probably nicer in terms of output (not necessarily speed, though), you use the aggregation framework in order to really just get what you want:
document = collection.aggregate(Arrays.asList(match(and(eq("idkey", "123"), eq("objects.key1", "481334"))), unwind("$objects"), project(fields(excludeId(), computed("key4", "$objects.key2.key4"))))).first();

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to index new document in elasticsearch using term_vector? - java

I hope my answer will help someone else. The problem was that I have another mapping in the same index that also have field title. You have to update all other mappings to use same settings.

Related

Error when using apache beam ParquetIO to read data from parquet file due to avro schema exception

How to access key value in array of objects - java

How to obtain validator expression used when creating a MongoDB collection? [duplicate]

ElasticSearch Java API: Enabling fielddata on text fields

MongoCollection : How to get value of nested key

Categories

Resources