Trouble encoding avro enum with null default - java

org.apache.avro.UnresolvedUnionException: Not in union ["null",{"type":"enum","name":"document_change_type","namespace":"document","symbols":["create","update","delete"]}]: create
I am passing in the string create for this field, and it is throwing the above exception.
create is one of the 3 acceptable values for the enum, what is causing the exception?

Suppose your avro schema looks this:
{
"type" : "record",
"namespace" : "document",
"name" : "document_details",
"fields" : [
{ "name" : "documentName" , "type" : "string" },
{"name" : "documentChange" ,
"type" : ["null",
{"type" : "enum",
"namespace" : "document",
"name" : "documentChangeType",
"symbols" :["create","update","delete"]
}]
}
]
}
You can create the Record for this schema in your code as :
GenericRecord documentDetailsRecord = new GenericData.Record(schema);
GenericEnumSymbol enumSymbol = new GenericData.EnumSymbol(schema.getField("documentChange").schema().getTypes().get(1), "create");
e2.put("documentName", "someDocumentName");
e2.put("documentChange",enumSymbol);
You can get a list of schema for all the fields of union as follows :
schema.getField(<unionFieldName>).schema().getTypes()

Related

How to map two datasets in spark Java

Hi I'm reading data from mongodb into spark application.
My mongodb contains 2 collections.
One is profile_data(actual data with field names)
(Which holds all the input data including some unique fields)
{
"MessageStatus" : 2,
"Origin" : 1,
"_id" : ObjectId("596340fe8b0fa35d2880db1a"),
"accerlation" : 19.4,
"cylinders" : 4,
"displacement" : 119,
"file_id" : ObjectId("59633e48b760e7c8071a6c1c"),
"horsepower" : 82,
"modelyear" : 82,
"modified_date" : ISODate("2017-07-10T08:47:01.641Z"),
"mpg" : 31,
"snet_id" : "new_project",
"unique_id" : "784",
"username" : "chevy s-10",
"weight" : 2720
}
And another collection is : predictive_model_details(Which holds the ML model details like model name, feature fields and prediction field just like metadata)
{
"_id" : ObjectId("56b4351be4b064bb19a90324"),
"algorithm_id" : "55d717a53d9e22022ff2a1e9",
"algorithm_name" : "K- Nearest Neighbours (IBK)",
"client_id" : "562e1d51b760d0e408151b91",
"feature_fields" : [
{
"name" : "Origin",
"type" : "int"
},
{
"name" : "accerlation",
"type" : "Double"
},
{
"name" : "displacement",
"type" : "Int"
},
{
"name" : "horsepower",
"type" : "Int"
},
{
"name" : "modelyear",
"type" : "Int"
}
],
,
"makeActiveStatus" : "0",
"model_name" : "test1",
"parameter_type" : "system_defined",
"parameters" : [
{
"symbol" : "-K",
"value" : "1"
}
],
"predictor" : {
"name" : "mpg"
"type" : "Int"
},
"result_exists" : true,
"snet_id" : "new_project"
}
So I've created 2 datasets in spark for two collections in MongoDB. Now I want to map these 2 Datasets with all feature fields together and prediction field together.
And common field in 2 datasets is snet_id.
Could anyone please help?

How do I GroupBy in Spring Data Mongodb without Aggregation?

I have a data like below, and I want to group that data by the type, I'm using spring-data-mongodb .
[
{
"_id" : ObjectId("58a5518aace6132a88309d98"),
"type" : "SMS",
},
{
"_id" : ObjectId("58a5518bace6132a88309d99"),
"type" : "PUSH_NOTIFICATION",
},
{
"_id" : ObjectId("58a5519aace6132a0094d7df"),
"type" : "SMS",
},
{
"_id" : ObjectId("58a5519aace6132a0094d7e0"),
"type" : "PUSH_NOTIFICATION",
}
]
I'm using this method and won't work.
GroupByResults<Queuing> results = mongoTemplate.group("queuing",
GroupBy.key("type"), Queuing.class);
Anyone know the best and clear way to do this grouping using spring-data-mongodb.
Thanks.
This is the correct syntax for group operation.
GroupByResults<Queuing> results = mongoTemplate.group("queuing",
GroupBy.key("type").initialDocument("{}").reduceFunction("function(doc, prev) {}"),
Queuing.class);
More information here http://docs.spring.io/spring-data/mongodb/docs/current/reference/html/#mongo.group.example

Spring Data Tool Suite - Bulk Insert of objects

I am developing an API that create a list of Questions , and would like to know check if STS have any native capability that can support bulk insert , or if i have to create a custom query using #Query annotation?
I have refer to this Spring Data MongoDB support bulk insert/save , i would like to check if an unique ObjectId still be generated through bulk insert/save?
Sample definition i am expecting , where each question is differentiated with an unique Id.
questions": [
{
"id" : "01-QuestionId",
"type" : "multiple",
"question" : "What is your Gender?",
"options" : [
{
"key" : "a",
"value" : "Male"
},
{
"key" : "b",
"value" : "Female"
}
],
"survey":{
"id": "123",
"name": "Test1",
"description": "First Survey"
}
},
{
"id" : "02-QuestionId",
"type" : "multiple",
"question" : "What is your income?",
"options" : [
{
"key" : "a",
"value" : "1000"
},
{
"key" : "b",
"value" : "2000"
}
],
"survey":{
"id": "123",
"name": "Test1",
"description": "First Survey"
}
}
]
Thanks all!
Robin
Found out after deeper research in Spring Data.
We can just use save() or insert() interface from MongoRepository class.
For example
final List savedQuestions = questionRepository.save(questions);

Design an Avro schema basis on my JSON document

I have been reading a lot about Apache Avro these days and I am more inclined towards using it instead of using JSON. Currently, what we are doing is, we are serializing the JSON document using Jackson and then writing that serialize JSON document into Cassandra for each row key/user id.
Then we have a REST service that reads the whole JSON document using the row key and then deserialize it and use it further.
Now while reading on the web it looks like, Avro requires a schema beforehand... I am not sure how to come up with a schema in Apache Avro for my JSON document.
Below is my JSON document that I am writing into Cassandra after serializing it using Jackson. Now how to come up with an Avro schema for the below JSON?
{
"lv" : [ {
"v" : {
"site-id" : 0,
"categories" : {
"321" : {
"price_score" : "0.2",
"confidence_score" : "0.5"
},
"123" : {
"price_score" : "0.4",
"confidence_score" : "0.2"
}
},
"price-score" : 0.5,
"confidence-score" : 0.2
}
} ],
"lmd" : 1379231624261
}
Can anyone provide a simple example on this, how to come up with a schema in Avro basis on my above JSON document? Thanks for the help.
The simplest way to define an avro schema as you have outlined above would be to start from what they call IDL. IDL is a high-level language than the Avro schema (json) and makes writing avro schema much more straight-forward..
See avro IDL here: http://avro.apache.org/docs/current/idl.html
To define what you've got above in JSON, you're going to define a set of records in IDL that look like this:
#namespace("com.sample")
protocol sample {
record Category {
union {null, string} price_score = null;
union {null, string} confidence_score = null;
}
record vObject {
int site_id = 0;
union {null, map<Category>} categories = null;
union {null, float} price_score = null;
union {null, float} confidence_score = null;
}
record SampleObject {
union {null, array<vObject>} lv = null;
long lmd = -1;
}
}
When you run the compiler tool (as listed on that website above), you will get an avro schema generated like so:
{
"protocol" : "sample",
"namespace" : "com.sample",
"types" : [ {
"type" : "record",
"name" : "Category",
"fields" : [ {
"name" : "price_score",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "confidence_score",
"type" : [ "null", "string" ],
"default" : null
} ]
}, {
"type" : "record",
"name" : "vObject",
"fields" : [ {
"name" : "site_id",
"type" : "int",
"default" : 0
}, {
"name" : "categories",
"type" : [ "null", {
"type" : "map",
"values" : "Category"
} ],
"default" : null
}, {
"name" : "price_score",
"type" : [ "null", "float" ],
"default" : null
}, {
"name" : "confidence_score",
"type" : [ "null", "float" ],
"default" : null
} ]
}, {
"type" : "record",
"name" : "SampleObject",
"fields" : [ {
"name" : "lv",
"type" : [ "null", {
"type" : "array",
"items" : "vObject"
} ],
"default" : null
}, {
"name" : "lmd",
"type" : "long",
"default" : -1
} ]
} ],
"messages" : {
}
}
Using whatever language you'd like, you can now generate a set of objects and the default "toString" operation is to output in JSON form as you have above. However, the true power of Avro comes with it's compression capabilities. You should truly write out in avro binary format to see the real benefits of avro.
Hope this helps!

Mongo query inside Hashmap with unknown hash key

Platform: MongoDB, Spring, SpringDataMongoDB
I have a collection called "Encounter" with below structure
Encounter:
{ "_id" : "49a0515b-e020-4e0d-aa6c-6f96bb867288",
"_class" : "com.keype.hawk.health.emr.api.transaction.model.Encounter",
"encounterTypeId" : "c4f657f0-015d-4b02-a216-f3beba2c64be",
"visitId" : "8b4c48c6-d969-4926-8b8f-05d2f58491ae",
"status" : "ACTIVE",
"form" :
{
"_id" : "be3cddc5-4cec-4ce5-8592-72f1d7a0f093",
"formCode" : "CBC",
"fields" : {
"dc" : {
"label" : "DC",
"name" : "tc",
},
"tc" : {
"label" : "TC",
"name" : "tc",
},
"notes" : {
"label" : "Notes",
"name" : "notes",
}
},
"notes" : "Blood Test",
"dateCreated" : NumberLong("1376916746564"),
"dateModified" : NumberLong("1376916746564"),
"staffCreated" : 10013,
"staffModified" : 10013
},
}
The element "fields" is represented using a Java Hashmap as:
protected LinkedHashMap<String, Field> fields;
The Key to the hashmap () is not fixed, but generated at run time.
How do I query to get all documents in the collection where "label" = "TC"?
It's not possible to query like db.encounter.find({'form.fields.dc.label':'TC'}) because the element name 'dc' is NOT known. I want to skip that postion and the execute query, something like:
db.encounter.find({'form.fields.*.label':'TC'});
Any ideas?
Also, how do I best use indexes in this scenario?
If fields were an array and your key a part of the sub-document instead:
"fields" : [
{ "key" : "dc",
"label" : "DC",
"name" : "dc"
},
{ "key" : "tc",
"label" : "TC",
"name" : "tc"
}
]
In this case, you could simply query for any sub-element inside the array:
db.coll.find({"form.fields.label":"TC"})
Not sure how you would integrate that with Spring, but perhaps the idea helps? As far as indexes are concerned, you can index into the array, which gives you a multi-key index. Basically, the index will have a separate entry pointing to the document for each array value.

Categories

Resources