I am developing an API that create a list of Questions , and would like to know check if STS have any native capability that can support bulk insert , or if i have to create a custom query using #Query annotation?
I have refer to this Spring Data MongoDB support bulk insert/save , i would like to check if an unique ObjectId still be generated through bulk insert/save?
Sample definition i am expecting , where each question is differentiated with an unique Id.
questions": [
{
"id" : "01-QuestionId",
"type" : "multiple",
"question" : "What is your Gender?",
"options" : [
{
"key" : "a",
"value" : "Male"
},
{
"key" : "b",
"value" : "Female"
}
],
"survey":{
"id": "123",
"name": "Test1",
"description": "First Survey"
}
},
{
"id" : "02-QuestionId",
"type" : "multiple",
"question" : "What is your income?",
"options" : [
{
"key" : "a",
"value" : "1000"
},
{
"key" : "b",
"value" : "2000"
}
],
"survey":{
"id": "123",
"name": "Test1",
"description": "First Survey"
}
}
]
Thanks all!
Robin
Found out after deeper research in Spring Data.
We can just use save() or insert() interface from MongoRepository class.
For example
final List savedQuestions = questionRepository.save(questions);
Related
I am planning to implement CAS with MongoDB authentication and also planning to use existing MongoDB collection which is not following the expected Document scheme as mentioned in apreo documentation.
As per documents,
Accounts are expected to be found as such in collections:
{
"username": "casuser",
"password": "34598dfkjdjk3487jfdkh874395",
"first_name": "john",
"last_name": "smith"
}
is there way to customize the username and password attributes something like below
{
"_id" : "000debf7-ee17-42ec-b267-9028b721cd57",
"firstName" : "Steve Ward",
"lastName" : "Steve Ward",
"emailAddress1" : "test#inter.com",
"status" : "active",
"users" : [
{
"userName" : "weldoneinc",
"password" : "KJH8u3lRYvm82EbiGjKZs7exPbY=",
"createUser" : "TEST",
"createDate" : ISODate("2020-05-22T15:28:54.439+0000"),
"updateUser" : "TEST",
"updateDate" : ISODate("2020-05-22T15:28:54.439+0000")
}
]
}
I have a following mongoDB document structure -
db.menus.findOne()
{
"_id" : ObjectId("5cf25412326c3f4f26df039b"),
"restaurantId" : "301728",
"items" : [
{
"itemId" : "CEBM4H41JR",
"name" : "Crun Chicken",
"imageUrl" : "",
"price" : 572,
"attributes" : [
"Tasty",
"Spicy"
]
},
{
"itemId" : "53Q0XS3HPR",
"name" : "Devils Chicken",
"imageUrl" : "",
"price" : 595,
"attributes" : [
"Gravy",
"Salty"
]
}
]
}
I am trying to write a query to get all the menus based on the "attributes" field under "items" in the document.
I have done the following to get the menus if "name" of "items" is given and I am getting a result -
db.menus.find({ 'items' : {$elemMatch : {'name' : {$regex : "Chicken Thali", $options: 'i' }}}}).pretty()
I have tried this for getting the result for attributes but this is not working -
db.menus.find({'items' : {$elemMatch : {'attributes' : {$all : [{$regex : "Tasty", $options: 'i' }]}}}})
How do I get the list and I also want to write this query for mongoRepository in a spring boot application?
Further, based on the restaurantId's obtained, I have to query restaurant collection in order to find all the restaurants in restaurants collection having the following structure -
{
"_id" : ObjectId("5cf2540e326c3f4f26de93dd"),
"restaurantId" : "301728",
"name" : "Desire Foods",
"imageUrl" : "https://b.zmtcdn.com/data/pictures/8/301728/d690ccb500d746530f56e1d637949da2_featured_v2.jpg",
"latitude" : 28.4900591,
"longitude" : 77.3066401,
"attributes" : [
"Chinese",
" Fast Food",
" Bakery"
],
"opensAt" : "09:30",
"closesAt" : "22:30"
}
Is the whole operation possible in a single query?
I think you can modify your query to use $in instead of $all.
To achieve your intended result, you can try:
db.collection.aggregate([
{
"$match": {
"items": {
"$elemMatch": {
"attributes": {
"$in": [
"Tasty"
]
}
}
}
}
},
{
"$lookup": {
"from": "restaurant",
"localField": "restaurantId",
"foreignField": "restaurantId",
"as": "restaurants"
}
},
{
"$unwind": "restaurants"
},
{
"$replaceRoot": { "newRoot": "$restaurants" }
}
])
Use $match at appropriate stages as needed to limit the documents pulled in memory
In my project, we use flink to handle log data, then we send the data into elastisearch. However, I find that es could not recognize json object, it only recogize some basic data types. Therefore, I could only transform json object into a string, but in this time, when I check log data in elasticsearch, the format is really hard to understand.
"hits" : {
"total" : 10,
"max_score" : 1.0,
"hits" : [
{
"_index" : "wyh_dye_test",
"_type" : "nested",
"_id" : "gzlvM3EBRgA6CE7yDw8l",
"_score" : 1.0,
"_source" : {
"id" : "id",
"module" : "wyh_key",
"content" : """{"map":{"wyh_key":"wyh_value","user_key":"user_value","wqq_key":"wqq_value","hello_key":"hello_value"}}"""
}
}
this is my kibana search result, as you can see, the content field is really hard to read.
You can update this index mapping,then put the data into the corresponding field.
PUT xxx_index/_mapping/xxx_type
{
"properties": {
"wyh_key": {
"type": "keyword"
},
"user_key": {
"type": "keyword"
},
"wqq_key": {
"type": "keyword"
},
"hello_key": {
"type": "keyword"
}
}
}
I have been reading a lot about Apache Avro these days and I am more inclined towards using it instead of using JSON. Currently, what we are doing is, we are serializing the JSON document using Jackson and then writing that serialize JSON document into Cassandra for each row key/user id.
Then we have a REST service that reads the whole JSON document using the row key and then deserialize it and use it further.
Now while reading on the web it looks like, Avro requires a schema beforehand... I am not sure how to come up with a schema in Apache Avro for my JSON document.
Below is my JSON document that I am writing into Cassandra after serializing it using Jackson. Now how to come up with an Avro schema for the below JSON?
{
"lv" : [ {
"v" : {
"site-id" : 0,
"categories" : {
"321" : {
"price_score" : "0.2",
"confidence_score" : "0.5"
},
"123" : {
"price_score" : "0.4",
"confidence_score" : "0.2"
}
},
"price-score" : 0.5,
"confidence-score" : 0.2
}
} ],
"lmd" : 1379231624261
}
Can anyone provide a simple example on this, how to come up with a schema in Avro basis on my above JSON document? Thanks for the help.
The simplest way to define an avro schema as you have outlined above would be to start from what they call IDL. IDL is a high-level language than the Avro schema (json) and makes writing avro schema much more straight-forward..
See avro IDL here: http://avro.apache.org/docs/current/idl.html
To define what you've got above in JSON, you're going to define a set of records in IDL that look like this:
#namespace("com.sample")
protocol sample {
record Category {
union {null, string} price_score = null;
union {null, string} confidence_score = null;
}
record vObject {
int site_id = 0;
union {null, map<Category>} categories = null;
union {null, float} price_score = null;
union {null, float} confidence_score = null;
}
record SampleObject {
union {null, array<vObject>} lv = null;
long lmd = -1;
}
}
When you run the compiler tool (as listed on that website above), you will get an avro schema generated like so:
{
"protocol" : "sample",
"namespace" : "com.sample",
"types" : [ {
"type" : "record",
"name" : "Category",
"fields" : [ {
"name" : "price_score",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "confidence_score",
"type" : [ "null", "string" ],
"default" : null
} ]
}, {
"type" : "record",
"name" : "vObject",
"fields" : [ {
"name" : "site_id",
"type" : "int",
"default" : 0
}, {
"name" : "categories",
"type" : [ "null", {
"type" : "map",
"values" : "Category"
} ],
"default" : null
}, {
"name" : "price_score",
"type" : [ "null", "float" ],
"default" : null
}, {
"name" : "confidence_score",
"type" : [ "null", "float" ],
"default" : null
} ]
}, {
"type" : "record",
"name" : "SampleObject",
"fields" : [ {
"name" : "lv",
"type" : [ "null", {
"type" : "array",
"items" : "vObject"
} ],
"default" : null
}, {
"name" : "lmd",
"type" : "long",
"default" : -1
} ]
} ],
"messages" : {
}
}
Using whatever language you'd like, you can now generate a set of objects and the default "toString" operation is to output in JSON form as you have above. However, the true power of Avro comes with it's compression capabilities. You should truly write out in avro binary format to see the real benefits of avro.
Hope this helps!
I am facing a trouble in the use of ElasticSearch for my java application.
I explain myself, I have a mapping, which is something like :
{
"products": {
"properties": {
"id": {
"type": "long",
"ignore_malformed": false
},
"locations": {
"properties": {
"category": {
"type": "long",
"ignore_malformed": false
},
"subCategory": {
"type": "long",
"ignore_malformed": false
},
"order": {
"type": "long",
"ignore_malformed": false
}
}
},
...
So, as you can see, I receive a list of products, which are composed of locations. In my model, this locations are all the categories' product. It means that a product can be in 1 or more categories. In each of this category, the product has an order, which is the order the client wants to show them.
For instance, a diamond product can have a first place in Jewelry, but the third place in Woman (my examples are not so logic ^^).
So, when I click on Jewelry, I want to show this products, ordered by the field locations.order in this specific category.
For the moment, when I search all the products on a specific category the response for ElasticSearch that I receive is something like :
{"id":5331880,"locations":[{"category":5322606,"order":1},
{"category":5883712,"subCategory":null,"order":3},
{"category":5322605,"subCategory":6032961,"order":2},.......
Is it possible to sort this products, by the element locations.order for the specific category I am searching for ? For instance, if I am querying the category 5322606, I want the order 1 for this product to be taken.
Thank you very much beforehand !
Regards,
Olivier.
First a correction of terminology: in Elasticsearch, "parent/child" refers to completely separate docs, where the child doc points to the parent doc. Parent and children are stored on the same shard, but they can be updated independently.
With your example above, what you are trying to achieve can be done with nested docs.
Currently, your locations field is of type:"object". This means that the values in each location get flattened to look something like this:
{
"locations.category": [5322606, 5883712, 5322605],
"locations.subCategory": [6032961],
"locations.order": [1, 3, 2]
}
In other words, the "sub" fields get flattened into multi-value fields, which is of no use to you, because there is no correlation between category: 5322606 and order: 1.
However, if you change locations to be type:"nested" then internally it will index each location as a separate doc, meaning that each location can be queried independently, using the dedicated nested query and filter.
By default, the nested query will return a _score based upon how well each location matches, but in your case you want to return the highest value of the order field from any matching children. To do this, you'll need to use a custom_score query.
So let's start by creating the index with the appropriate mapping:
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"mappings" : {
"products" : {
"properties" : {
"locations" : {
"type" : "nested",
"properties" : {
"order" : {
"type" : "long"
},
"subCategory" : {
"type" : "long"
},
"category" : {
"type" : "long"
}
}
},
"id" : {
"type" : "long"
}
}
}
}
}
'
The we index your example doc:
curl -XPOST 'http://127.0.0.1:9200/test/products?pretty=1' -d '
{
"locations" : [
{
"order" : 1,
"category" : 5322606
},
{
"order" : 3,
"subCategory" : null,
"category" : 5883712
},
{
"order" : 2,
"subCategory" : 6032961,
"category" : 5322605
}
],
"id" : 5331880
}
'
And now we can search for it using the queries we discussed above:
curl -XGET 'http://127.0.0.1:9200/test/products/_search?pretty=1' -d '
{
"query" : {
"nested" : {
"query" : {
"custom_score" : {
"script" : "doc[\u0027locations.order\u0027].value",
"query" : {
"constant_score" : {
"filter" : {
"and" : [
{
"term" : {
"category" : 5322605
}
},
{
"term" : {
"subCategory" : 6032961
}
}
]
}
}
}
}
},
"score_mode" : "max",
"path" : "locations"
}
}
}
'
Note: the single quotes within the script have been escaped as \u0027 to get around shell quoting. The script actually looks like this: "doc['locations.order'].value"
If you look at the _score from the results, you can see that it has used the order value from the matching location:
{
"hits" : {
"hits" : [
{
"_source" : {
"locations" : [
{
"order" : 1,
"category" : 5322606
},
{
"order" : 3,
"subCategory" : null,
"category" : 5883712
},
{
"order" : 2,
"subCategory" : 6032961,
"category" : 5322605
}
],
"id" : 5331880
},
"_score" : 2,
"_index" : "test",
"_id" : "cXTFUHlGTKi0hKAgUJFcBw",
"_type" : "products"
}
],
"max_score" : 2,
"total" : 1
},
"timed_out" : false,
"_shards" : {
"failed" : 0,
"successful" : 5,
"total" : 5
},
"took" : 9
}
Just add a more updated version related to sorting parent by child field.
We can query parent doc type sorted by child field ('count' e.g.) similar as follows.
https://gist.github.com/robinloxley1/7ea7c4f37a3413b1ca16