Bulk Import in CosmosDb with Stored Procedure and Partitioning

Bulk Import in CosmosDb with Stored Procedure and Partitioning - java

I've been using Azure's cosmosDb for a while. Recently i had done bulk import using stored procedure in a collection of my database, and that used to work fine. Now i've to do the same in another collection which uses partitioning; I searched azure code samples and modified my previous bulk insert function like this:
public void createMany(JSONArray aDocumentList, PartitionKey aPartitionKey) throws DocumentClientException {
List<String> aList = new ArrayList<String>();
for(int aIndex = 0; aIndex < aDocumentList.length(); aIndex++) {
JSONObject aJsonObj = aDocumentList.getJSONObject(aIndex);
aList.add(aJsonObj.toString());
}
String aSproc = getCollectionLink() + BULK_INSERTION_PROCEDURE;
RequestOptions requestOptions = new RequestOptions();
requestOptions.setPartitionKey(aPartitionKey);
String result = documentClient.executeStoredProcedure(aSproc,
requestOptions
, new Object[] { aList}).getResponseAsString();
}
but this code gives me error:
com.microsoft.azure.documentdb.DocumentClientException: Message: {"Errors":["Encountered exception while executing function. Exception = Error: {\"Errors\":[\"Requests originating from scripts cannot reference partition keys other than the one for which client request was submitted.\"]}\r\nStack trace: Error: {\"Errors\":[\"Requests originating from scripts cannot reference partition keys other than the one for which client request was submitted.\"]}\n at callback (bulkInsertionStoredProcedure.js:1:1749)\n at Anonymous function (bulkInsertionStoredProcedure.js:689:29)"]}
I'm not quite certain what that error actually means. Since partitionKey is just a JSON key in the document, why would it need it in other Documents also. Do i need to append this in my document also(with partitionKey key) .Could anyone please tell me what i'm missing here? I've searched over the internet and haven't found anything useful that could make it work.

I've already answered this question here. The gist of it is that the documents you're inserting with your SPROC must have a partitionKey that matches the one you pass with the request options
// ALL documents inserted must have a parititionKey value that matches
// "aPartitionKey" value
requestOptions.setPartitionKey(aPartitionKey);
So if aPartitionKey == 123456 then all the documents you are inserting with the SPROC are required to belong to that partition. If you have documents spanning multiple partitions that you want to bulk insert you will have to group them by partition key and run the SPROC separately for each grouping.

Related

How to get Datastore entity id from com.google.datastore.v1.Entity

I have written a code to fetch data from Google Datastore in my Google Cloud Dataflow program. I am able to fetch all fields of the entity except Id field which is autogenerated field. I have tried to use entity.getKey() but I am getting null.
Below is my code snippet,
Datastore datastore = DataflowDatastoreService.getDatastoreObject(null, null, null);
Query.Builder queryBuilder = Query.newBuilder();
Filter filter1 = Filter.newBuilder()
.setPropertyFilter(PropertyFilter.newBuilder() .setProperty(PropertyReference.newBuilder().setName("cId"))
.setOp(PropertyFilter.Operator.EQUAL)
.setValue(Value.newBuilder().setIntegerValue(1059438885900008L).build()).build()).build();
Filter filter2 = Filter.newBuilder()
.setPropertyFilter(PropertyFilter.newBuilder()
.setProperty(PropertyReference.newBuilder().setName("active"))
.setOp(PropertyFilter.Operator.EQUAL)
.setValue(Value.newBuilder().setBooleanValue(Boolean.TRUE).build()).build()).build();
Filter composeFilter = Filter.newBuilder().setCompositeFilter(CompositeFilter.newBuilder()
.addFilters(filter1).setOp(Operator.AND).addFilters(filter2).build()).build();
queryBuilder.addKind(KindExpression.newBuilder().setName("MyMaster").build());
queryBuilder.setFilter(composeFilter).build();
RunQueryRequest request = DataflowDatastoreService.makeRequest(queryBuilder.build(), null);
RunQueryResponse response = datastore.runQuery(request);
QueryResultBatch batch = response.getBatch();
List<EntityResult> entityResutls = batch.getEntityResultsList();
List<Entity> myEntities = new ArrayList<>();
Map<String, Value> entityMap = myEntities(0).getPropertiesMap();
In my code I am able to get all fields in entityMap key but I am not getting key, is there any other way through which I can fetch all the fields with Id.

Note: I'm not a java user, answer based on python experience
Indeed, entities returned in a regular query result do not contain the entity key/ID. Attempting to obtain that from the entity is rather inefficient - you need to reach to the datastore for each individual entity (not even looking at why that doesn't appear to be working for you).
If I need the entity keys/IDs I'd instead use keys-only queries - obtaining the keys, from which I can easily get:
the key IDs, locally, without making actual datastore calls (in python via key.id(), I don't know the java equivalent)
the entities via direct key lookup, which can be batched for efficiency.

entity.getKey().getPathList().get(0).getId()
This help me to achieve the result. Getting entity Id through getKey method.

No response with a query by ID on Azure DocumentDB

I'm currently facing very slow/ no response on a collection looking by ID. I have ~ 2 milion of documents in a partitioned collection. If lookup the document using the partitionKey and id the response is immediate
SELECT * FROM c WHERE c.partitionKey=123 AND c.id="20566-2"
if I try using only the id
SELECT * FROM c WHERE c.id="20566-2"
the response never returns, java client seems freezed and I have the same situation using the Data Explorer from Azure Portal. I tried also looking up by another field that isn't the id or the partitionKey and the response always returns. When I try the select from Java client I always set the flag to enable cross partition query.
The next thing to try is to avoid the character "-" in the ID to test if this character blocks the query (anyway I didn't find anything on the documentation)

The issue is related to your Java code. Due to Azure DocumentDB Java SDK wrapped the DocumentDB REST APIs, according to the reference of REST API Query Documents, as #DanCiborowski-MSFT said, the header x-ms-documentdb-query-enablecrosspartition explains your issue reason as below.
Header: x-ms-documentdb-query-enablecrosspartition
Required/Type: Optional/Boolean
Description: If the collection is partitioned, this must be set to True to allow execution across multiple partitions. Queries that filter against a single partition key, or against single-partitioned collections do not need to set the header.
So you need to set True to enable cross partition for querying across multiple partitions without a partitionKey in where clause via pass a instance of class FeedOption to the method queryDocuments, as below.
FeedOptions queryOptions = new FeedOptions();
queryOptions.setEnableCrossPartitionQuery(true); // Enable query across multiple partitions
String collectionLink = collection.getSelfLink();
FeedResponse<Document> queryResults = documentClient.queryDocuments(
collectionLink,
"SELECT * FROM c WHERE c.id='20566-2'", queryOptions);

Working with CRUD operations in Couchbase

I am novice User to Couchbase, I am trying to insert the documents into the default bucket like below.
I found the following 2 ways to insert the json documents in the bucket:
1) Inserting by preparing JsonDocument and upsert into the bucket
StringBuilder strBuilder = new StringBuilder();
strBuilder.append("{'phone':{'y':{'phonePropertyList':{'dskFlag':'false','serialId':1000,'inputTray':{'LIST': {'e':[{'inTray':{'id':'1','name':'BypassTray','amount': {'unit':'sheets','state':'empty','typical':'0','capacity':'100'}");
String LDATA = strBuilder.toString();
Cluster cluster = CouchbaseCluster.create("localhost");
Bucket bucket = cluster.openBucket("default");
JsonObject deviceinfoObj = JsonObject.create().put("phoneinfo", LDATA);
bucket.upsert(JsonDocument.create("phone", deviceinfoObj));
2) Or by using like direct query like SQL
String query = "upsert into default(KEY, VALUE) values(LDATA)"
I am not able to find how to execute the above query like noram sql statement
Example:
Statement st = connection.createStatement();
ResultSet rs = st.executeQuery(query);
How to use N1QLQuery to insert the json document in the Couchbase Bucket.
I found the two ways to fetching the document.
i) Getting the document directly by using document Id
bucket.get("phone").content().get("phoneinfo")
ii) Getting the documents by using N!QueryResultSet
N1qlQueryResult result = bucket.query(N1qlQuery.simple("select * from default;"));
for (N1qlQueryRow row : result) {
System.out.println(row);
}
I got confused with the different approaches for inserting and fetching the documents from/to bucket in the Couchbase.
If I insert the documents by using 1st approach I need to prepare JsonObject with some key and value as a entire jsondocument.
So I think I better to insert the document using the second approach, So I will be able to fetch the documents using N1QLResultSet(2nd approach). but by using first approach I need to get the number of documents in the bucket and and then only I can loop through all the documents
Queries:
1) How to get the selective nested nodes from the document
2) In json document, Should I need to split key-value for each node ant then put in the JSONObject to prepare JSONDocument?
3) How to create a view for the bucket to do fast retrieval?

Why dont you create a simple java pojo and define it as a #Document. Then use CrudRepo to save and retrieve documents from couchbase.
this sample might help you:
https://github.com/maverickmicky/spring-couchbase-cache

DynamoDB - DynamoDBMapper query and get entire result set

I'm querying a DynamoDB table using the hash key. Each record in the table is uniquely identified by a hash key and a range key
DynamoDBMapper mapper;
....
MyClass myClass = new MyClass();
myClass.setHashKey(hashKey);
DynamoDBQueryExpression<MyClass> queryExpression = new DynamoDBQueryExpression<MyClass>()
.withHashKeyValues(myClass);
PaginatedQueryList<MyClass> entries = mapper.query(MyClass.class, queryExpression);
//Work with the elements of entries
When the result set is more than 1MB, how can I retrieve the rest.
I cannot find any method to get the LastEvaluatedKey as mentioned in the docs.

AWS SDK for Dynamodb Mapper handles pagination for you. It internally queries db and when you require data for more than 1MB, it queries it again and gets data for you.
If you want the complete list in one go you can use operations like size or copying the paginated result into a list which will require mapper to fetch complete result.
In short, you need not worry about LastEvaluatedKey and its handling it is handled for you.
Examples,
PaginatedQueryList<T> resultPaginatedList = dynamoDBMapper.query(getModelClass(), queryExpression);
List<T> queryList = new LinkedList<>(resultPaginatedList); //--- Line1
logger.info("Total elements found: " + queryList.size()); //--- Line2
Both line 1 and line 2 depicts operations for which mapper will fetch the complete result (by querying multiple times handled by SDK) and not just 1MB.

Java method for MongoDB collection.save()

I'm having a problem with MongoDB using Java when I try adding documents with customized _id field. And when I insert new document to that collection, I want to ignore the document if it's _id has already existed.
In Mongo shell, collection.save() can be used in this case but I cannot find the equivalent method to work with MongoDB java driver.
Just to add an example:
I have a collection of documents containing websites' information
with the URLs as _id field (which is unique)
I want to add some more documents. In those new documents, some might be existing in the current collection. So I want to keep adding all the new documents except for the duplicate ones.
This can be achieve by collection.save() in Mongo Shell but using MongoDB Java Driver, I can't find the equivalent method.
Hopefully someone can share the solution. Thanks in advance!

In the MongoDB Java driver, you could try using the BulkWriteOperation object with the initializeOrderedBulkOperation() method of the DBCollection object (the one that contains your collection). This is used as follows:
MongoClient mongo = new MongoClient("localhost", port_number);
DB db = mongo.getDB("db_name");
ArrayList<DBObject> objectList; // Fill this list with your objects to insert
BulkWriteOperation operation = col.initializeOrderedBulkOperation();
for (int i = 0; i < objectList.size(); i++) {
operation.insert(objectList.get(i));
}
BulkWriteResult result = operation.execute();
With this method, your documents will be inserted one at a time with error handling on each insert, so documents that have a duplicated id will throw an error as usual, but the operation will still continue with the rest of the documents. In the end, you can use the getInsertedCount() method of the BulkWriteResult object to know how many documents were really inserted.
This can prove to be a bit ineffective if lots of data is inserted this way, though. This is just sample code (that was found on journaldev.com and edited to fit your situation.). You may need to edit it so it fits your current configuration. It is also untested.

I guess save is doing something like this.
fun save(doc: Document, col: MongoCollection<Document>) {
if (doc.getObjectId("_id") != null) {
doc.put("_id", ObjectId()) // generate a new id
}
col.replaceOne(Document("_id", doc.getObjectId("_id")), doc)
}
Maybe they removed save so you decide how to generate the new id.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Bulk Import in CosmosDb with Stored Procedure and Partitioning - java

Related

How to get Datastore entity id from com.google.datastore.v1.Entity

No response with a query by ID on Azure DocumentDB

Working with CRUD operations in Couchbase

DynamoDB - DynamoDBMapper query and get entire result set

Java method for MongoDB collection.save()

Categories

Resources