Ho do I transform using Json using bolt? - java

I have to transform one JSON to another JSON,
I am new to jolt.if you have any other methods in java please let me know.
Input can have many other attributes in a nested manner.
I have to make generic code which can consume all the fields in JSON and transform it into the desired output which I have mentioned.
Input
{
"id": "123456789",
"OrderType": "ABC",
"Abc": [
{
"Name": "Pluto",
"Value": "Charon"
},
{
"Name": "Earth",
"Value": "Moon"
}
]
}
Desired Output
"MyFieldList": [
{
"Footer": "My Footer",
"fieldList": [
{
"label": "id",
"fieldName": "id",
"fieldValue": "123456789",
"editable": false,
"dataType": "STRING"
},
{
"label": "OrderType",
"fieldName": "OrderType",
"fieldValue": "ABC",
"editable": false,
"dataType": "STRING"
},
{
"label": "Pluto",
"fieldName": "Pluto",
"fieldValue": "Charon",
"editable": false,
"dataType": "STRING"
},
{
"label": "Earth",
"fieldName": "Earth",
"fieldValue": "Moon",
"editable": false,
"dataType": "STRING"
}]
}
]
I have tried using this jolt spec but , I cannot figure out the nested part how to flatten it .
{
"operation": "shift",
"spec": {
"*": {
"$": "[#2].fieldName",
"#": "[#2].fieldValue",
"#false": "[#2].editable",
"# ": "[#2].Size",
"#STRING": "[#2].dataType"
}
}
}

The important part is creating a array of arrays, before turning into the fieldList:
[
{
"operation": "shift",
"spec": {
"id": {
"$": "[#1].[#1].fieldName",
"#": "[#1].[#1].fieldValue",
"#false": "[#1].[#1].editable",
"#STRING": "[#1].[#1].dataType"
},
"OrderType": {
"$": "[#2].[#1].fieldName",
"#": "[#2].[#1].fieldValue",
"#false": "[#2].[#1].editable",
"#STRING": "[#2].[#1].dataType"
},
"Abc": {
"*": {
"Name": "[#3].[&1].fieldName",
"Value": "[#3].[&1].fieldValue",
"#false": "[#3].[&1].editable",
"#STRING": "[#3].[&1].dataType"
}
}
}
},
{
"operation": "shift",
"spec": {
"*": {
"*": "MyFieldList.fieldList.[]"
}
}
},
{
"operation": "default",
"spec": {
"MyFieldList": {
"Footer": "My Footer"
}
}
}
]

Related

what will be the jolt transform for that?

I want the jolt transform for the given input . Your help in this is highly appreciated . thanks
i am providing the input and expected output.
in input json Photos array is dynamic in nature. Here it is 3 , it can be 3 or 4 or5 any .
INPUT JSON .
{
"Entity": {
"card": {
"cardNo":"123456789",
"cardStatus":"10",
"cardAddress":"UK",
"cardAddress1":"US",
"cardCity":"mk" ,
"name": "RAM",
"lastName": "ABU",
"name1": "RAM1",
"lastName1": "ABU1"
},
"Photos": [
{
"Id": "327703",
"Caption": "TEST>> photo 1",
"Url": "http://bob.com/0001/327703/photo.jpg"
},
{
"Id": "327704",
"Caption": "TEST>> photo 2",
"Url": "http://bob.com/0001/327704/photo.jpg"
},
{
"Id": "327704",
"Caption": "TEST>> photo 2",
"Url": "http://bob.com/0001/327704/photo.jpg"
}
]
}
}
OUTPUT GETTING after jolt transform
{
"tab": {
"text": "123456789"
},
"address": [
{
"add": "UK",
"add2": "US",
"city": "mk"
}
],
"Photos": [
{
"no": "327703",
"caption": "TEST>> photo 1"
},
{
"no": "327704",
"caption": "TEST>> photo 2"
},
{
"no": "327704",
"caption": "TEST>> photo 2"
}
]
}
WHAT WILL BE THE CORRECT JOLT TRANSFORM FOR THIS?
jolt spec that i have used IS
[
{
"operation": "shift",
"spec": {
"Entity": {
"card": {
"cardNo": "tab.text",
"cardAddress": "address[0].add",
"cardAddress1": "address[0].add2",
"cardCity": "address[0].city",
"name": "Photos[&1].no",
"lastName": "Photos[&1].caption",
"name1": "Photos[&1].no",
"lastName1": "Photos[&1].caption"
},
"Photos": {
"*": {
"Id": "Photos[&1].no",
"Caption": "Photos[&1].caption"
}
}
}
}
}
]
EXPECTED OUTPUT:
{
"tab": {
"text": "123456789"
},
"address": [
{
"add": "UK",
"add2": "US",
"mk": "mk"
}
],
"Photos": [
{
"no": "RAM",
"caption2": "ABU"
},
{
"no": "RAM1",
"caption2": "ABU1"
},
{
"no": "327703",
"caption2": "TEST>> photo 1"
},
{
"no": "327704",
"caption2": "TEST>> photo 2"
},
{
"no": "327704",
"caption2": "TEST>> photo 2"
}
]
}
I am very new to jolt transform. Your help is highly appreciated. Thanks
You can use two consecutive shift transformation specs, in the first determine the groupings as desired such as in the following
[
{
"operation": "shift",
"spec": {
"Entity": {
"card": {
"cardNo": "tab.text",
"cardAddress": "address[0].add",
"cardAddress1": "address[0].add2",
"cardC*": "address[0].mk",
"nam*": "Photos.no",
"lastNam*": "Photos.caption2"
},
"Photos": {
"*": {
"Id": "Photos.no",
"Caption": "Photos.caption2"
}
}
}
}
},
{
"operation": "shift",
"spec": {
"*": "&", // "else" case --> the arrrays/objects/attributes other than "Photos"
"Photos": {
"*": {
"*": {
"#": "&3[&1].&2"
}
}
}
}
}
]

Jolt JSON Spec for Nested Object Transformation

I have a requirement to transform the Nested Object in a Json structure.
Here's the Input JSON
Input JSON
{
"data": {
"PRODUCTS": {
"ProductID": "1234-5678",
"ModelNumber": "B550",
"Price": "199",
"Quantity": "1",
"ATTRIBUTES": {
"ProductID": "1234-5678",
"Height": "25",
"Width": "75"
}
}
}
}
Required Output
{
"data": {
"products": [
{
"productId": "1234-5678",
"modelNumber": "B550",
"unitPrice": "199",
"quantity": "1",
"attributes": [
{
"productId": "1234-5678",
"height": "25",
"width": "75"
}
]
}
]
}
}
My JSON Spec:
[
{
"operation": "modify-overwrite-beta",
"spec": {
"data": {
"PRODUCTS": "=toList"
}
}
},
{
"operation": "shift",
"spec": {
"data": {
"PRODUCTS": {
"*": {
"ProductID": "data.products[&1].productId",
"ModelNumber": "data.products[&1].modelNumber",
"Price": "data.products[&1].unitPrice",
"Quantity": "data.products[&1].quantity",
"ATTRIBUTES": {
"ProductID": "data.products[&1].attributes[&1].productId",
"Height": "data.products[&1].attributes[&1].height",
"Width": "data.products[&1].attributes[&1].width"
}
}
}
}
}
},
{
"operation": "default",
"spec": {
"data": {
"*": {}
}
}
}
]
Current Output
{
"data" : {
"products" : [ {
"productId" : "1234-5678",
"modelNumber" : "B550",
"unitPrice" : "199",
"quantity" : "1"
} ]
}
}
I want to convert the ATTRIBUTES nested object to a list and also the nodes inside the ATTRIBUTES object as per the expected output. Can someone throw some light as to how can I achieve this?
You can apply three steps of shift transformations;
To rename all keys as desired
To create the innermost array (attributes)
To create the outermost array (products)
such as
[
{
"operation": "shift",
"spec": {
"data": {
"*": {
"ProductID": "&2.products.productID",
"ModelNumber": "&2.products.modelNumber",
"Price": "&2.products.unitPrice",
"Quantity": "&2.products.quantity",
"*": {
"ProductID": "&3.products.attributes.productId",
"Height": "&3.products.attributes.height",
"Width": "&3.products.attributes.width"
}
}
}
}
},
{
"operation": "shift",
"spec": {
"*": {
"*": {
"attributes": "&2.&1.&[]",
"*": "&2.&1.&"
}
}
}
},
{
"operation": "shift",
"spec": {
"*": {
"*": "&1.&[]"
}
}
}
]

JSON file schema evaluation using json-schema-validator

I have a sample JSON file and I have also come up with a schema to evaluate above file using below JSON file:
//[gcp_ingestion_parameters_schema.json]
{
...
"properties": {
"application": {
"$ref": "#/definitions/application"
},
"ingestion": {
"$ref": "#/definitions/ingestion"
}
},
"definitions": {
"applicaion": {
"type": "object",
"properties": {
"project_id": {
"type": "string"
},
"path_to_json_key_file": {
"type": "string"
}
},
"required": [
"project_id",
"path_to_json_key_file"
]
},
...
I am still not sure how to write the schema file. In my sample file both application and ingestion tags should occur once, but fileingestion-mappings inside ingestion can occur one or more than once.
I have written some java code to evaluate my JSON file (first file) based on the provided JSON schema file.
but I get exception as follow:
Exception in thread "main"
com.github.fge.jsonschema.core.exceptions.ProcessingException: fatal: JSON Reference "#/definitions/appl
ication" cannot be resolved
level: "fatal"
schema: {"loadingURI":"#","pointer":"/properties/application"}
ref: "#/definitions/application"
Can some with experience working wit above library answer my questions asked in this tread?
As suggested you have a typo in ur schema it should be below
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "http://json-schema.org/draft-07/schema#",
"title": "Core schema meta-schema",
"definitions": {
"schemaArray": {
"type": "array",
"minItems": 1,
"items": { "$ref": "#" }
},
"nonNegativeInteger": {
"type": "integer",
"minimum": 0
},
"nonNegativeIntegerDefault0": {
"allOf": [
{ "$ref": "#/definitions/nonNegativeInteger" },
{ "default": 0 }
]
},
"simpleTypes": {
"enum": [
"array",
"boolean",
"integer",
"null",
"number",
"object",
"string"
]
},
"stringArray": {
"type": "array",
"items": { "type": "string" },
"uniqueItems": true,
"default": []
}
},
"type": ["object", "boolean"],
"properties": {
"$id": {
"type": "string",
"format": "uri-reference"
},
"$schema": {
"type": "string",
"format": "uri"
},
"$ref": {
"type": "string",
"format": "uri-reference"
},
"$comment": {
"type": "string"
},
"title": {
"type": "string"
},
"description": {
"type": "string"
},
"default": true,
"readOnly": {
"type": "boolean",
"default": false
},
"examples": {
"type": "array",
"items": true
},
"multipleOf": {
"type": "number",
"exclusiveMinimum": 0
},
"maximum": {
"type": "number"
},
"exclusiveMaximum": {
"type": "number"
},
"minimum": {
"type": "number"
},
"exclusiveMinimum": {
"type": "number"
},
"maxLength": { "$ref": "#/definitions/nonNegativeInteger" },
"minLength": { "$ref": "#/definitions/nonNegativeIntegerDefault0" },
"pattern": {
"type": "string",
"format": "regex"
},
"additionalItems": { "$ref": "#" },
"items": {
"anyOf": [
{ "$ref": "#" },
{ "$ref": "#/definitions/schemaArray" }
],
"default": true
},
"maxItems": { "$ref": "#/definitions/nonNegativeInteger" },
"minItems": { "$ref": "#/definitions/nonNegativeIntegerDefault0" },
"uniqueItems": {
"type": "boolean",
"default": false
},
"contains": { "$ref": "#" },
"maxProperties": { "$ref": "#/definitions/nonNegativeInteger" },
"minProperties": { "$ref": "#/definitions/nonNegativeIntegerDefault0" },
"required": { "$ref": "#/definitions/stringArray" },
"additionalProperties": { "$ref": "#" },
"definitions": {
"type": "object",
"additionalProperties": { "$ref": "#" },
"default": {}
},
"properties": {
"type": "object",
"additionalProperties": { "$ref": "#" },
"default": {}
},
"patternProperties": {
"type": "object",
"additionalProperties": { "$ref": "#" },
"propertyNames": { "format": "regex" },
"default": {}
},
"dependencies": {
"type": "object",
"additionalProperties": {
"anyOf": [
{ "$ref": "#" },
{ "$ref": "#/definitions/stringArray" }
]
}
},
"propertyNames": { "$ref": "#" },
"const": true,
"enum": {
"type": "array",
"items": true,
"minItems": 1,
"uniqueItems": true
},
"type": {
"anyOf": [
{ "$ref": "#/definitions/simpleTypes" },
{
"type": "array",
"items": { "$ref": "#/definitions/simpleTypes" },
"minItems": 1,
"uniqueItems": true
}
]
},
"format": { "type": "string" },
"contentMediaType": { "type": "string" },
"contentEncoding": { "type": "string" },
"if": {"$ref": "#"},
"then": {"$ref": "#"},
"else": {"$ref": "#"},
"allOf": { "$ref": "#/definitions/schemaArray" },
"anyOf": { "$ref": "#/definitions/schemaArray" },
"oneOf": { "$ref": "#/definitions/schemaArray" },
"not": { "$ref": "#" }
},
"default": true
}
This works perfectly fine with the JSON you have provided.
You have typo error in applicaion it should be application
Change this "definitions": { "applicaion": { to "definitions": { "application": {
also Refer this link to validate your schema https://www.liquid-technologies.com/online-json-schema-validator.

Jolt conditional spec

I want a conditional transformation where I need to add a property in output if the value of a specific field in input matches my condition. Below is my input and output required.
Input
{
"attr": [
{
"name": "first",
"validations": [
{
"type": "Required",
"value": true
}
]
},
{
"name": "last",
"validations": [
{
"type": "lenght",
"value": "10"
}
]
},
{
"name": "email",
"validations": [
{
"type": "min",
"value": 10
}
]
}
]
}
Output
{
"out": [
{
"name": "first",
"required": "yes"
},
{
"name": "last"
},
{
"name": "email"
}
]
}
So I am able to get till the condition, but inside condition, & and # are being respective to the input rather than to the output. Can anybody help me out with the transformation? Below is the spec I have written so far.
[
{
"operation": "shift",
"spec": {
"attr": {
"*": {
"name": "out.&1.name",
"validations": {
"*": {
"type": {
"Required": {
"#(2,value)": "out.&1.req"
}
}
}
}
}
}
}
}
]
This spec does the transform.
[
{
"operation": "shift",
"spec": {
"attr": {
"*": {
"name": "out[&1].name",
"validations": {
"*": {
"type": {
"Required": {
"#yes": "out[&5].required"
}
}
}
}
}
}
}
}
]
However, I think you meant to grab the "value" : true that is a sibling of the "Required" : true, rather than have the output be "yes".
If so swap in this bit.
"Required": {
"#(2,value)": "out[&5].required"
}

How to implement an autocomplete search field (suggestor) with an existing ElasticSearch index?

The ES index consists of 2 types that are implicitly mapped (default mapping). One type is "person" or an author, the 2nd type is "document".
The index has some 500k entries.
What I have to do is: implement an autocomplete (suggestions) functionality where only the fields "title", "classification" (document) and "name" (author) are relevant for the suggestions shown to the user.
Could it be done without changing the 500k docs in the index?
I found some tutorials that suggest preparing a specific mapping and also altering the documents (this I want to avoid if possible) and so on but I am new to this and I am not sure how to go about the this problem?
Below is the JSON for the index, and how the documents look:
//a Document
{
"rawsource": "Phys.Rev. D67 (2003) 084031",
"pubyear": 2003,
"citedFrom": 19,
"topics": [
{
"name": "General Relativity and Quantum Cosmology"
}
],
"cited": [
{
"ref": 0,
"id": "PN132433"
},
{
"ref": 1,
"id": "PN206900"
}
],
"id": "PN120001",
"collection": "PN",
"source": "Phys Rev D",
"classification": "Physics",
"title": "Observables in causal set cosmology",
"url": "http://arxiv.org/abs/gr-qc/0210061",
"authors": [
{
"name": "Brightwell, Graham"
},
{
"name": "Dowker, H. Fay"
},
{
"name": "Garcia, Raquel S."
},
{
"name": "Henson, Joe"
},
{
"name": "Sorkin, Rafael D."
}
]
}
//a Person (author)
{
"name": "Terasawa, M.",
"documents": [
{
"citedFrom": 0,
"id": "PN039187"
}
],
"coAuthors": [
{
"name": "Famiano, M. A.",
"count": "1"
},
{
"name": "Boyd, R. N.",
"count": "1"
}
],
"topics": [
{
"name": "Astrophysics",
"count": "1"
}
]
}
//the mapping (implicit/default)
{
"dlsnew": {
"aliases": {
},
"mappings": {
"person": {
"properties": {
"coAuthors": {
"properties": {
"count": {
"type": "string"
},
"name": {
"type": "string"
}
}
},
"documents": {
"properties": {
"citedFrom": {
"type": "long"
},
"id": {
"type": "string"
}
}
},
"name": {
"type": "string"
},
"referenced": {
"properties": {
"count": {
"type": "string"
},
"id": {
"type": "string"
}
}
},
"topics": {
"properties": {
"count": {
"type": "string"
},
"name": {
"type": "string"
}
}
}
}
},
"document": {
"properties": {
"abstract": {
"type": "string"
},
"authors": {
"properties": {
"name": {
"type": "string"
}
}
},
"cited": {
"properties": {
"id": {
"type": "string"
},
"ref": {
"type": "long"
}
}
},
"citedFrom": {
"type": "long"
},
"classification": {
"type": "string"
},
"collection": {
"type": "string"
},
"id": {
"type": "string"
},
"pubyear": {
"type": "long"
},
"rawsource": {
"type": "string"
},
"source": {
"type": "string"
},
"title": {
"type": "string"
},
"topics": {
"properties": {
"name": {
"type": "string"
}
}
},
"url": {
"type": "string"
}
}
}
},
"settings": {
"index": {
"creation_date": "1454247029258",
"number_of_shards": "5",
"uuid": "k_CyQaxwSAaae67wW98HyQ",
"version": {
"created": "1050299"
},
"number_of_replicas": "1"
}
},
"warmers": {
}
}
}
The implementation is to be done using JAVA and the Vaadin Framework (this is not relevant at this point, but examples in Java/Vaadin will be most welcomed).
Thanks.
So, I think I solved my problem on the Elasticsearch side or at least to a good enough extend for me and the task at hand. I followed this ruby example.
I had to re-index all documents to accommodate the new settings for my index and to change my mapping explicitly.
They key is in defining proper analyzers and an edgeNGram filter in this case, like so:
"settings": {
"index": {
"analysis": {
"filter": {
"def_ngram_filter": {
"min_gram": "1",
"side": "front",
"type": "edgeNGram",
"max_gram": "16"
}
},
"analyzer": {
"def_search_analyzer": {
"filter": [
"lowercase",
"asciifolding"
],
"type": "custom",
"tokenizer": "def_tokenizer"
},
"def_ngram_analyzer": {
"filter": [
"lowercase",
"asciifolding",
"def_ngram_filter"
],
"type": "custom",
"tokenizer": "def_tokenizer"
},
"def_shingle_analyzer": {
"filter": [
"shingle",
"lowercase",
"asciifolding"
],
"type": "custom",
"tokenizer": "def_tokenizer"
},
"def_default_analyzer": {
"filter": [
"lowercase",
"asciifolding"
],
"type": "custom",
"tokenizer": "def_tokenizer"
}
},
"tokenizer": {
"def_tokenizer": {
"type": "whitespace"
}
}
}
}
}
and the use these in the mapping for the fields to be searched, like so:
"mappings": {
"person": {
"properties": {
"coAuthors": {
"properties": {
"count": {
"type": "string"
},
"name": {
"type": "string"
}
}
},
"documents": {
"properties": {
"citedFrom": {
"type": "long"
},
"id": {
"type": "string"
}
}
},
"name": {
"type": "string",
"analyzer": "def_default_analyzer",
"fields": {
"ngrams": {
"type": "string",
"index_analyzer": "def_ngram_analyzer",
"search_analyzer": "def_search_analyzer"
},
"shingles": {
"type": "string",
"analyzer": "def_shingle_analyzer"
},
"stemmed": {
"type": "string",
"analyzer": "def_snowball_analyzer"
}
}
},
"referenced": {
"properties": {
"count": {
"type": "string"
},
"id": {
"type": "string"
}
}
},
"topics": {
"properties": {
"count": {
"type": "string"
},
"name": {
"type": "string"
}
}
}
}
},
"document": {
"properties": {
"abstract": {
"type": "string"
},
"authors": {
"properties": {
"name": {
"type": "string",
"analyzer": "def_default_analyzer",
"fields": {
"ngrams": {
"type": "string",
"index_analyzer": "def_ngram_analyzer",
"search_analyzer": "def_search_analyzer"
},
"shingles": {
"type": "string",
"analyzer": "def_shingle_analyzer"
},
"stemmed": {
"type": "string",
"analyzer": "def_snowball_analyzer"
}
}
}
}
},
"cited": {
"properties": {
"id": {
"type": "string"
},
"ref": {
"type": "long"
}
}
},
"citedFrom": {
"type": "long"
},
"classification": {
"type": "string"
},
"collection": {
"type": "string"
},
"id": {
"type": "string"
},
"pubyear": {
"type": "long"
},
"rawsource": {
"type": "string"
},
"source": {
"type": "string"
},
"title": {
"type": "string",
"analyzer": "def_default_analyzer",
"fields": {
"ngrams": {
"type": "string",
"index_analyzer": "def_ngram_analyzer",
"search_analyzer": "def_search_analyzer"
},
"shingles": {
"type": "string",
"analyzer": "def_shingle_analyzer"
},
"stemmed": {
"type": "string",
"analyzer": "def_snowball_analyzer"
}
}
},
"topics": {
"properties": {
"name": {
"type": "string",
"analyzer": "def_default_analyzer",
"fields": {
"ngrams": {
"type": "string",
"index_analyzer": "def_ngram_analyzer",
"search_analyzer": "def_search_analyzer"
},
"shingles": {
"type": "string",
"analyzer": "def_shingle_analyzer"
},
"stemmed": {
"type": "string",
"analyzer": "def_snowball_analyzer"
}
}
}
}
},
"url": {
"type": "string"
}
}
}
}
then querying the index with the following works as expected:
curl -XGET "http://localhost:9200/_search " -d'
{
"size": 5,
"query": {
"multi_match": {
"query": "physics",
"type": "most_fields",
"fields": [
"document.title^10",
"document.title.shingles^2",
"document.title.ngrams",
"person.name^10",
"person.name.shingles^2",
"person.name.ngrams",
"document.topics.name^10",
"document.topics.name.shingles^2",
"document.topics.name.ngrams"
],
"operator": "and"
}
}
}'
Hope this will help someone, it is probably not the best example as I am a complete noob to this, but it worked for me.
There exist different Autocomplete components for Vaadin.
Have a look at this link.
Depending on which Add-On you choose, the databinding is done differently, but you have to "connect" it to your index.

Categories

Resources