Elasticsearch : How to retrieve only the desired nested objects

Elasticsearch : How to retrieve only the desired nested objects - java

I have the below mapping structure for my Elasticsearch index.
{
"users": {
"mappings": {
"user-type": {
"properties": {
"lastModifiedBy": {
"type": "string"
},
"lastModifiedDate": {
"type": "date",
"format": "dateOptionalTime"
},
"details": {
"type": "nested",
"properties": {
"lastModifiedBy": {
"type": "string"
},
"lastModifiedDate": {
"type": "date",
"format": "dateOptionalTime"
},
"views": {
"type": "nested",
"properties": {
"id": {
"type": "string"
},
"name": {
"type": "string"
},
"properties": {
"properties": {
"name": {
"type": "string"
},
"type": {
"type": "string"
},
"value": {
"type": "string"
}
}
}
}
}
}
}
}
}
}
}
}
Basically I want to retrieve ONLY the view object inside details based on index id & view id(details.views.id).
I have tried with the below java code.But seems to be not working.
SearchRequestBuilder srq = this.client.prepareSearch(this.indexName)
.setTypes(this.type)
.setQuery(QueryBuilders.termQuery("_id", sid))
.setPostFilter(FilterBuilders.nestedFilter("details.views",
FilterBuilders.termFilter("details.views.id", id)));
Below is the query structure for this java code.
{
"query": {
"term": {
"_id": "123"
}
},
"post_filter": {
"nested": {
"filter": {
"term": {
"details.views.id": "def"
}
},
"path": "details.views"
}
}
}

Since details is nested and view is nested inside details, you basically need two nested filters as well (one for each level) + the constraint on the _id field is best done with the ids query. The query DSL would look like this:
{
"query": {
"ids": {
"values": [
"123"
]
}
},
"post_filter": {
"nested": {
"filter": {
"nested": {
"path": "details.view",
"filter": {
"term": {
"details.views.id": "def"
}
}
}
},
"path": "details"
}
}
}
Translating this into Java code yields:
// 2nd-level nested filter
FilterBuilder detailsView = FilterBuilders.nestedFilter("details.views",
FilterBuilders.termFilter("details.views.id", id));
// 1st-level nested filter
FilterBuilder details = FilterBuilders.nestedFilter("details", detailsView);
// ids constraint
IdsQueryBuilder ids = QueryBuilders.idsQuery(this.type).addIds("123");
SearchRequestBuilder srq = this.client.prepareSearch(this.indexName)
.setTypes(this.type)
.setQuery(ids)
.setPostFilter(details);
PS: I second what #Paul said, i.e. always play around with the query DSL first and when you know you have zeroed in on the exact query you need, then you can translate it to the Java form.

Related

How to validate property field which is UUID in json schema?

{
"status": 200,
"id": "123e4567-e89b-12d3-a456-426655440000",
"shop": {
"c73bcdcc-2669-4bf6-81d3-e4ae73fb11fd": {
"123e4567-e89b-12d3-a456-426655443210": {
"quantity": {
"value": 10
}
},
"123e4567-e89b-12d3-a456-426655443211": {
"quantity": {
"value": 20
}
}
}
}
}
This is my json response. I want to validate the fields "c73bcdcc-2669-4bf6-81d3-e4ae73fb11fd" , "123e4567-e89b-12d3-a456-426655443210" and "123e4567-e89b-12d3-a456-426655443211", which are uniquely generated every time whenever hits the endpoint.

Building on #pxcv7r's answer:
To validate UUID in particular, you may use format in JSON schema, which provides built-in support for the UUID syntax: { "type": "string", "format": "uuid" }
See https://json-schema.org/understanding-json-schema/reference/string.html
Additionally, you can use a combination of "propertyNames" and "unevaluatedProperties" to avoid the need for any regular expression:
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"type": "object",
"properties": {
"status": {
"type": "integer"
},
"id": {
"type": "string",
"format": "uuid"
},
"shop": {
"type": "object",
"minProperties": 1,
"maxProperties": 1,
"propertyNames": {
"format": "uuid"
},
"unevaluatedProperties": {
"type":"object",
"minProperties": 1,
"propertyNames": {
"format": "uuid"
},
"unevaluatedProperties": {
"title": "single variant of a shop",
"type": "object",
"properties": {
"quantity": {
"type": "object",
"properties": {
"value": {
"type": "integer"
}
}
}
}
}
}
}
}
}

To validate in JSON schema that a string conforms to a regular expression pattern use
{ "type": "string", "pattern": "\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b" }
The concrete pattern is adapted from the question Searching for UUIDs in text with regex see there for more details.
To validate UUID in particular, you may use format in JSON schema, which provides built-in support for the UUID syntax: { "type": "string", "format": "uuid" }
See https://json-schema.org/understanding-json-schema/reference/string.html

You need "patternProperties":
{
"$schema":"http://json-schema.org/draft-07/schema#",
"type":"object",
"properties": {
"shop":{
"type":"object",
"additionalProperties":false,
"patternProperties":{
"[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}": {
"type":"object",
"patternProperties" :{
"[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}":{
"type":"object",
"properties":{
"quantity":{
"type":"object",
"properties":{
"value":{
"type":"integer"
}
}
}
}
}
}
}
}
}
}
}

How can I validate the dependency of existance of value for one property value based on the existance of another property value in JSON schema

I have a JSON schema:
{
"type": "object",
"properties": {
"name": { "type": ["string", "null"] },
"credit_card": {
"type": ["string", "null"]
},
"billing_address": {
"type": ["string", "null"]
}
},
"dependencies": [{
"credit_card": ["billing_address"]
}]
}
I want the billing_address value to be present if the credit_card value is provided. But since I have specified the type of billing_address as null it accepts null value even when the credit_card value is present and hence not validating. Could someone suggest the right approach for doing this.
Thanks in advance.

Your dependencies should be defined as an object {} not an array []. You just need to remove the outer square brackets:
"dependencies": {
"credit_card": ["billing_address"]
}
Overall, this gives the following schema:
{
"type": "object",
"properties": {
"name": {
"type": ["string", "null"]
},
"credit_card": {
"type": ["string", "null"]
},
"billing_address": {
"type": ["string", "null"]
}
},
"dependencies": {
"credit_card": ["billing_address"]
}
}
Using the above schema, the following JSON is valid:
{
"name": "Abel",
"credit_card": "1234...",
"billing_address": "some address here..."
}
But the following JSON is invalid:
{
"name": "Abel",
"credit_card": "1234"
}
You can test these using an online validator such as this one.
You may also want to consider removing the null values you are using in the schema. For example, by using this:
{
"type": "object",
"properties": {
"name": {
"type": "string",
},
"credit_card": {
"type": "string",
},
"billing_address": {
"type": "string",
}
},
"dependencies": {
"credit_card": ["billing_address"]
}
}
Using this revised schema, you will now also get a validation error for JSON such as the following:
{
"name": "Abel",
"credit_card": null,
"billing_address": "some address here..."
}
Update - both fields are present but null:
If both credit_card and billing_address are null, then this case can be handled using a conditional validation (added to the end of our schema below):
{
"type": "object",
"properties": {
"name": {
"type": ["string", "null"]
},
"credit_card": {
"type": ["string", "null"]
},
"billing_address": {
"type": ["string", "null"]
}
},
"dependencies": {
"credit_card": ["billing_address"]
},
"if": {
"properties": { "credit_card": { "const": null } }
},
"then": {
"properties": { "billing_address": { "const": null } }
}
}
Now, the following will also be valid:
{
"name": "Abel",
"credit_card": null,
"billing_address": null
}
One note of warning: This uses a relatively newer feature of the JSON Schema spec. It is supported by the online validator I referred to above - but I do not know if it is supported by whatever validator you may be using.

Elasticsearch Query String Query not working with synonym analyzer

I am trying to configure elastic search with synonyms.
These are my settings:
"analysis": {
"analyzer": {
"category_synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym_filter"
]
}
},
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms_path": "synonyms.txt"
}
}
}
Mappings config:
"category": {
"properties": {
"name": {
"type":"string",
"search_analyzer" : "category_synonym",
"index_analyzer" : "standard",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
And the list of my synonyms
film => video,
ooh => panels , poster,
commercial => advertisement,
print => magazine
I must say that I am using Elasticsearch Java API.
I am using QueryBuilders.queryStringQuery because this is the only way how I set analyzers to my request.
So, when I am making:
QueryBuilders.queryStringQuery("name:film").analyzer(analyzer)
It returns me
[
{
"id": 71,
"name": "Pitch video",
"description": "... ",
"parent": null
},
{
"id": 25,
"name": "Video",
"description": "... ",
"parent": null
}
]
That is perfect for me, but when I am calling something like this
QueryBuilders.queryStringQuery("name:vid").analyzer(analyzer)
I expect that it should return same objects, but there is nothing: []
So, I added asterisk to queryStringQuery:
QueryBuilders.queryStringQuery("name:vid*").analyzer(analyzer)
Works well, but now
QueryBuilders.queryStringQuery("name:film*").analyzer(analyzer)
returns me []
So, how can I configure my elastic search that it will return same objects when I am searching video, vid, film and fil?
Thanks in advance!

Hm, I don't think Elasticsearch will know to "translate" fil into vid :-). So, I think you need edgeNGrams for this, both at indexing and search time.
PUT test
{
"settings": {
"analysis": {
"analyzer": {
"category_synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym_filter",
"my_edgeNGram_filter"
]
},
"standard_edgeNGram": {
"tokenizer": "standard",
"filter": [
"lowercase",
"synonym_filter",
"my_edgeNGram_filter"
]
}
},
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms_path": "synonyms.txt"
},
"my_edgeNGram_filter": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 8
}
}
}
},
"mappings": {
"test": {
"properties": {
"name": {
"type": "string",
"analyzer": "category_synonym",
"index_analyzer": "standard_edgeNGram",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
POST test/test/1
{"name": "Pitch video"}
POST test/test/2
{"name": "Video"}
GET /test/test/_search
{
"query": {
"query_string": {
"query": "name:fil"
}
}
}

How to have a mapping to not_analyze some fields for all indexes and types by default

I understand how to build a mapping for any index and type. But, I want to have a field my_field_1 and my_field_2 which will not be analyzed for all the indexes and types that will be created in future.
PUT /address_index
{
"mappings":{
"address":{
"properties":{
"state":{
"type":"string",
"fields":{
"raw":{
"type":"string",
"index":"not_analyzed"
}
}
}
}
}
}
}
I also saw in one of the links on how to do for all the strings. But, I am unable to add it for just the fields mentioned above.
I will be implementing this in Java. However, just DSL JSON would be good headstart.

You can do this by creating an index template with a pattern of "*" meaning it will apply to all indices you create in the future and defining this mapping in there.
PUT 127.0.0.1:9200/_template/stacktest
{
"template": "*",
"settings": {
"number_of_shards": 1
},
"mappings": {
"address": {
"properties": {
"state": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
Now you can create an index with any name and this mapping will apply to it.
PUT 127.0.0.1:9200/testindex/
GET 127.0.0.1:9200/testindex/_mapping
{
"testindex": {
"mappings": {
"address": {
"properties": {
"state": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
}
Note that the index: not_analyzed part was transformed into the keyword datatype, as string has been deprecated. You should use text and keyword if you are on version 5.x.
Edit to address your comments
To adapt this to the specific two fields mentioned by you, the following request would create the template:
{
"template": "*",
"settings": {
"number_of_shards": 1
},
"mappings": {
"_default_": {
"properties": {
"my_field_1": {
"type": "string",
"index": "not_analyzed"
},
"my_field_2": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
If you now index a document into a new index, those two fields will be not analyzed for any type of document, any other string fields will be analyzed, which is the way I understood your original question.
PUT 127.0.0.1:9200/testindex/address
{
"my_field_1": "this is not_analyzed",
"my_field_2": "this is not_analyzed either",
"other_field": "this however is analyzed"
}
PUT 127.0.0.1:9200/testindex/differenttype
{
"my_field_1": "this is not_analyzed",
"my_field_2": "this is not_analyzed either",
"other_field": "this however is analyzed"
}
Now check the mapping and notice the difference:
{
"testindex": {
"mappings": {
"differenttype": {
"properties": {
"my_field_1": {
"type": "keyword"
},
"my_field_2": {
"type": "keyword"
},
"other_field": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"address": {
"properties": {
"my_field_1": {
"type": "keyword"
},
"my_field_2": {
"type": "keyword"
},
"other_field": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"_default_": {
"properties": {
"my_field_1": {
"type": "keyword"
},
"my_field_2": {
"type": "keyword"
}
}
}
}
}
}

Elasticsearch Java API addMapping() and setSettings() usage

Problem: How to create an index from a json file using
The json file contains a definition for the index de_brochures. It also defines an analyzer de_analyzerwith custom filters that are used by the respective index.
As the json works with curl and Sense I assume I have to adapt the syntax of it to work with the java API.
I don't want to use XContentFactory.jsonBuilder() as the json comes from a file!
I have the following json file to create my mapping from and to set settings:
Using Sense with PUT /indexname it does create an index from this.
{
"mappings": {
"de_brochures": {
"properties": {
"text": {
"type": "string",
"store": true,
"index_analyzer": "de_analyzer"
},
"classification": {
"type": "string",
"index": "not_analyzed"
},
"language": {
"type": "string",
"index": "not_analyzed"
}
}
}
"settings": {
"analysis": {
"filter": {
"de_stopwords": {
"type": "stop",
"stopwords": "_german_"
},
"de_stemmer": {
"type": "stemmer",
"name": "light_german"
}
},
"analyzer": {
"de_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"de_stopwords",
"de_stemmer"
]
}
}
}
}
}
As the above did not work with addMapping() alone I tried to split it into two seperate files (I realized that I had to remove the "mappings": and "settings": part):
------ Mapping json ------
{
"de_brochures": {
"properties": {
"text": {
"type": "string",
"store": true,
"index_analyzer": "de_analyzer"
},
"classification": {
"type": "string",
"index": "not_analyzed"
},
"language": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
------- Settings json --------
{
"analysis": {
"filter": {
"de_stopwords": {
"type": "stop",
"stopwords": "_german_"
},
"de_stemmer": {
"type": "stemmer",
"name": "light_german"
}
},
"analyzer": {
"de_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"de_stopwords",
"de_stemmer"
]
}
}
}
}
This is my java code to load and add/set the json.
CreateIndexRequestBuilder createIndexRequestBuilder = client.admin().indices().prepareCreate(index);
// CREATE SETTINGS
String settings_json = new String(Files.readAllBytes(brochures_mapping_path));
createIndexRequestBuilder.setSettings(settings_json);
// CREATE MAPPING
String mapping_json = new String(Files.readAllBytes(brochures_mapping_path));
createIndexRequestBuilder.addMapping("de_brochures", mapping_json);
CreateIndexResponse indexResponse = createIndexRequestBuilder.execute().actionGet();
There is no more complaint about the mapping file's structure but it now fails with the error:
Caused by: org.elasticsearch.index.mapper.MapperParsingException: Analyzer [de_analyzer] not found for field [text]

Solution:
I managed to do it with my original json file using createIndexRequestBuilder.setSource(settings_json);

I think the problem is with structure of your mapping file.
Here is a sample example.
mapping.json
{
"en_brochures": {
"properties": {
"text": {
"type": "string",
"store": true,
"index_analyzer": "en_analyzer",
"term_vector": "yes"
},
"classification": {
"type": "string",
"index": "not_analyzed"
},
"language": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
String mapping = new String(Files.readAllBytes(Paths.get("mapping.json")));
createIndexRequestBuilder.addMapping('en_brochures', mapping);
CreateIndexResponse indexResponse =createIndexRequestBuilder.execute().actionGet();
This works in mine, you can try.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Elasticsearch : How to retrieve only the desired nested objects - java

Related

How to validate property field which is UUID in json schema?

How can I validate the dependency of existance of value for one property value based on the existance of another property value in JSON schema

Elasticsearch Query String Query not working with synonym analyzer

How to have a mapping to not_analyze some fields for all indexes and types by default

Elasticsearch Java API addMapping() and setSettings() usage

Categories

Resources