Jolt Transform JSON Spec - java

I need to transform below Input JSON to output JSON and not sure about how to write spec for that. Need to re-position one field ("homePage") as a root element. Any help or suggestion would be appreciated.
Input JSON :
[{
"uuid": "cac40601-ffc9-4fd0-c5a1-772ac65f0587",
"pageId": 123456,
"page": {
"indexable": true,
"rootLevel": false,
"homePage": false
}
}]
Output JSON :
[{
"uuid": "cac40601-ffc9-4fd0-c5a1-772ac65f0587",
"pageId": 123456,
"homePage": false,
"page": {
"indexable": true,
"rootLevel": false
}
}]

This Jolt Spec should work for you. Tested with https://jolt-demo.appspot.com/
[
{
"operation": "shift",
"spec": {
"*": {
"uuid": "[&1].uuid",
"pageId": "[&1].pageId",
"page": {
"indexable": "[&2].page.indexable",
"rootLevel": "[&2].page.rootLevel",
"homePage": "[&2].homePage"
}
}
}
}
]
input:
{
"uuid" : "cac40601-ffc9-4fd0-c5a1-772ac65f0587",
"pageId" : 123456,
"page" : {
"indexable" : true,
"rootLevel" : false
},
"homePage" : false
}
output:
[ {
"uuid" : "cac40601-ffc9-4fd0-c5a1-772ac65f0587",
"pageId" : 123456,
"page" : {
"indexable" : true,
"rootLevel" : false
},
"homePage" : false
} ]
Explanation:
From the javadoc
& Path lookup
As Shiftr processes data and walks down the spec, it maintains a data structure describing the path it has walked.
The & wildcard can access data from that path in a 0 major, upward oriented way.
Example:
{
"foo" : {
"bar": {
"baz": // &0 = baz, &1 = bar, &2 = foo
}
}
}
Next thing: How to wrap the output object into the array?
A good example can be found in this post.
So, in our case:
"[&1].uuid" says:
Place the uuid value in the object inside the array. The index of the array is indicated by the &1 wildcard. For uuid it will be the index of the array, where the object with uuid key is placed in the original json.
Next, [&2] is similar to [&1]. However, looking at the "indexable" key, it is one level deeper in the input json. Thats why instead of [&1] we used [&2] (have a look again at the foo-bar example from the docs).

Related

Use JsonPath to read nested array properties

[
{
"path": "test",
"resources": [
{
"name": "testfile"
}
]
},
{
"path": "test-1",
"resources": [
{
"name": "testfile-1"
}
]
}
]
Given a json like above, is it possible to read it in following format using jsonPath?
[
{
"path": "test",
"name": "testfile"
},
{
"path": "test-1",
"name": "testfile-1"
}
]
Its safe to assume that resources array will always be size 1.
I tried $.[*]['path', ['resources'][0]['name']] but it does not show the value of path.
As mentioned, this cannot be done with JSONPath; selecting items from different levels and recombining them into new array elements does not work, you need a JSON transformer like Jolt instead.
There are different ways but you could use shift transformer like shown below to get the desired output:
[
{
"operation": "shift",
"spec": {
"*": {
// Grab the value of clientName, and write it to
// bookMap. whatever is at clientId
"#path": "[&].path",
"#resources[0].name": "[&].name"
}
}
}
]
Try it online with your own input here.

How to insert any nested json object to elasticsearh by using JAVA API

In my project, we use flink to handle log data, then we send the data into elastisearch. However, I find that es could not recognize json object, it only recogize some basic data types. Therefore, I could only transform json object into a string, but in this time, when I check log data in elasticsearch, the format is really hard to understand.
"hits" : {
"total" : 10,
"max_score" : 1.0,
"hits" : [
{
"_index" : "wyh_dye_test",
"_type" : "nested",
"_id" : "gzlvM3EBRgA6CE7yDw8l",
"_score" : 1.0,
"_source" : {
"id" : "id",
"module" : "wyh_key",
"content" : """{"map":{"wyh_key":"wyh_value","user_key":"user_value","wqq_key":"wqq_value","hello_key":"hello_value"}}"""
}
}
this is my kibana search result, as you can see, the content field is really hard to read.
You can update this index mapping,then put the data into the corresponding field.
PUT xxx_index/_mapping/xxx_type
{
"properties": {
"wyh_key": {
"type": "keyword"
},
"user_key": {
"type": "keyword"
},
"wqq_key": {
"type": "keyword"
},
"hello_key": {
"type": "keyword"
}
}
}

Elasticsearch matching string with like operator

I would query elasticsearch for retrieve all the document that has field value like a given string.
For example field LIKE "abc" has to return
"abc"
"abcdef"
"abcd"
"abc1"
So all the field that has "abc" string inside.
I try this query but return only the document with field = "abc":
{"query":{"more_like_this":{"fields":["FIELD"],"like_text":"abc","min_term_freq" : 1,"max_query_terms" : 12}}}
What is the correct query?
Thanks
If you're trying to do a Prefix Query, then you can use this.
{ "query": {
"prefix" : { "field" : "abc" }
}
See ElasticSearch Prefix Query ElasticSearch Prefix Query
Although your question is incomplete. I will try to give you several ideas.
One way surely is a prefix query, but much more efficiently is to build an edge ngram analyzer. That way you'll have your data prepared on inserts and query will be much faster. edge ngram is the most flexible way to do your functionality also, because you can autocomplete words that appear in any order. If you don't need to do this, but you only need "search as you type" queries then the best way is to use completion suggester. If you need to find strings that appear in the middle of the words than you can check ngram analyzer.
Here is how I set an edge ngram analyzer from my code.
"settings": {
"analysis": {
"filter" : {
"edge_filter" : {
"type" : "edge_ngram",
"min_gram": 1,
"max_gram": 256
}
},
"analyzer": {
"edge_analyzer" : {
"type" : "custom",
"tokenizer": "whitespace",
"filter" : ["lowercase", "edge_filter"]
},
"lowercase_whitespace": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [ "lowercase" ]
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"name": {
"type": "keyword",
"fields": {
"suggest": {
"type": "text",
"analyzer" : "edge_analyzer",
"search_analyzer": "lowercase_whitespace"
}
}
}
}
}
}
You should be able to perform a wildcard query as described here.
Elasticsearch like query
{
"query": {
"wildcard": {
"<<FIELD NAME>>": "*<<QUERY TEXT>>*"
}
}
}

ElasticSearch mapping for dynamic keys for indexing a map

I have a sample json which I want to index into elasticsearch.
Sample Json Indexed:
put test/names/1
{
"1" : {
"name":"abc"
},
"2" : {
"name":"def"
},
"3" : {
"name":"xyz"
}
}
where ,
index name : test,
type name : names,
id :1
Now the default mapping generated by elasticsearch is :
{
"test": {
"mappings": {
"names": {
"properties": {
"1": {
"properties": {
"name": {
"type": "string"
}
}
},
"2": {
"properties": {
"name": {
"type": "string"
}
}
},
"3": {
"properties": {
"name": {
"type": "string"
}
}
},
"metadataFieldDefinition": {
"properties": {
"name": {
"type": "string"
}
}
}
}
}
}
}
}
If the map size increases from 3 ( currently) to suppose thousand or million, then ElasticSearch will create a mapping for each which may cause a performance issue as the mapping collection will be huge .
I tried creating a mapping by setting :
"dynamic":false,
"type":object
but it was overriden by ES. since it didnt match the indexed data.
Please let me know how can I define a mapping so that ES. doesnot creates one like the above .
I think there might be a little confusion here in terms of how we index documents.
put test/names/1
{...
document
...}
This says: the following document belongs to index test and is of type name with id 1. The entire document is treated as type name. Using the PUT API as you currently are, you cannot index multiple documents at once. ES immediately interprets 1, 2, and 3 as a properties of type object, each containing a property name of type string.
Effectively, ES thinks you are trying to index ONE document, instead of three
To get many documents into index test with a type of name, you could do this, using the CURL syntax:
curl -XPUT"http://your-es-server:9200/test/names/1" -d'
{
"name": "abc"
}'
curl -XPUT"http://your-es-server:9200/test/names/2" -d'
{
"name": "ghi"
}'
curl -XPUT"http://your-es-server:9200/test/names/3" -d'
{
"name": "xyz"
}'
This will specify the document ID in the endpoint you are index to. Your mapping will then look like this:
"test": {
"mappings": {
"names": {
"properties": {
"name": {
"type": "string"
}
}
}
}
}
Final Word: Split your indexing up into discrete operations, or check out the Bulk API to see the syntax on how to POST multiple operations in a single request.

ElasticSearch : Sorting by nested documents' values

I am facing a trouble in the use of ElasticSearch for my java application.
I explain myself, I have a mapping, which is something like :
{
"products": {
"properties": {
"id": {
"type": "long",
"ignore_malformed": false
},
"locations": {
"properties": {
"category": {
"type": "long",
"ignore_malformed": false
},
"subCategory": {
"type": "long",
"ignore_malformed": false
},
"order": {
"type": "long",
"ignore_malformed": false
}
}
},
...
So, as you can see, I receive a list of products, which are composed of locations. In my model, this locations are all the categories' product. It means that a product can be in 1 or more categories. In each of this category, the product has an order, which is the order the client wants to show them.
For instance, a diamond product can have a first place in Jewelry, but the third place in Woman (my examples are not so logic ^^).
So, when I click on Jewelry, I want to show this products, ordered by the field locations.order in this specific category.
For the moment, when I search all the products on a specific category the response for ElasticSearch that I receive is something like :
{"id":5331880,"locations":[{"category":5322606,"order":1},
{"category":5883712,"subCategory":null,"order":3},
{"category":5322605,"subCategory":6032961,"order":2},.......
Is it possible to sort this products, by the element locations.order for the specific category I am searching for ? For instance, if I am querying the category 5322606, I want the order 1 for this product to be taken.
Thank you very much beforehand !
Regards,
Olivier.
First a correction of terminology: in Elasticsearch, "parent/child" refers to completely separate docs, where the child doc points to the parent doc. Parent and children are stored on the same shard, but they can be updated independently.
With your example above, what you are trying to achieve can be done with nested docs.
Currently, your locations field is of type:"object". This means that the values in each location get flattened to look something like this:
{
"locations.category": [5322606, 5883712, 5322605],
"locations.subCategory": [6032961],
"locations.order": [1, 3, 2]
}
In other words, the "sub" fields get flattened into multi-value fields, which is of no use to you, because there is no correlation between category: 5322606 and order: 1.
However, if you change locations to be type:"nested" then internally it will index each location as a separate doc, meaning that each location can be queried independently, using the dedicated nested query and filter.
By default, the nested query will return a _score based upon how well each location matches, but in your case you want to return the highest value of the order field from any matching children. To do this, you'll need to use a custom_score query.
So let's start by creating the index with the appropriate mapping:
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"mappings" : {
"products" : {
"properties" : {
"locations" : {
"type" : "nested",
"properties" : {
"order" : {
"type" : "long"
},
"subCategory" : {
"type" : "long"
},
"category" : {
"type" : "long"
}
}
},
"id" : {
"type" : "long"
}
}
}
}
}
'
The we index your example doc:
curl -XPOST 'http://127.0.0.1:9200/test/products?pretty=1' -d '
{
"locations" : [
{
"order" : 1,
"category" : 5322606
},
{
"order" : 3,
"subCategory" : null,
"category" : 5883712
},
{
"order" : 2,
"subCategory" : 6032961,
"category" : 5322605
}
],
"id" : 5331880
}
'
And now we can search for it using the queries we discussed above:
curl -XGET 'http://127.0.0.1:9200/test/products/_search?pretty=1' -d '
{
"query" : {
"nested" : {
"query" : {
"custom_score" : {
"script" : "doc[\u0027locations.order\u0027].value",
"query" : {
"constant_score" : {
"filter" : {
"and" : [
{
"term" : {
"category" : 5322605
}
},
{
"term" : {
"subCategory" : 6032961
}
}
]
}
}
}
}
},
"score_mode" : "max",
"path" : "locations"
}
}
}
'
Note: the single quotes within the script have been escaped as \u0027 to get around shell quoting. The script actually looks like this: "doc['locations.order'].value"
If you look at the _score from the results, you can see that it has used the order value from the matching location:
{
"hits" : {
"hits" : [
{
"_source" : {
"locations" : [
{
"order" : 1,
"category" : 5322606
},
{
"order" : 3,
"subCategory" : null,
"category" : 5883712
},
{
"order" : 2,
"subCategory" : 6032961,
"category" : 5322605
}
],
"id" : 5331880
},
"_score" : 2,
"_index" : "test",
"_id" : "cXTFUHlGTKi0hKAgUJFcBw",
"_type" : "products"
}
],
"max_score" : 2,
"total" : 1
},
"timed_out" : false,
"_shards" : {
"failed" : 0,
"successful" : 5,
"total" : 5
},
"took" : 9
}
Just add a more updated version related to sorting parent by child field.
We can query parent doc type sorted by child field ('count' e.g.) similar as follows.
https://gist.github.com/robinloxley1/7ea7c4f37a3413b1ca16

Categories

Resources