Bulk write in mongoDB using vertx

Bulk write in mongoDB using vertx - java

I am using BulkWriteWithOptions for inserting multiple documents in DB. I want the inserted documents so that I can know which one was inserted, failed or duplicate document.
Following is the piece of code I am using
mongoClient.bulkWriteWithOptions(collection, operations, options, repoAsyncResult -> {
if (repoAsyncResult.failed()) {
LOGGER.error("Bulk insertion failed : {}", repoAsyncResult.cause().getMessage());
if(repoAsyncResult.cause() instanceof MongoBulkWriteException ){
MongoBulkWriteException exception = (MongoBulkWriteException)repoAsyncResult.cause() ;
exception.getWriteErrors().forEach(error -> {
LOGGER.error("Insert Error : " + error.getMessage());
});
}
repoFuture.fail(repoAsyncResult.cause());
} else {
LOGGER.info("Bulk insertion successful : {}", repoAsyncResult.result().toJson());
repoFuture.complete(repoAsyncResult.result().toJson());
}
});
Is there any way to get the inserted documents as a result?

No, you can only get the IDs of upserted documents from repoAsyncResult.result().getUpserts() (a List<JsonObject> whose .getAsString("_id") will return the upserted IDs.)

Related

How to delete multiple documents from a collection in Cloud Firestore? [duplicate]

This question already has answers here:
How to delete multiple documents from Cloud Firestore?
(6 answers)
How to delete multiple documents that have different ids in firestore? [duplicate]
(2 answers)
delete multiple documents in Firebase Firestore (Android)
(1 answer)
How delete a collection or subcollection from Firestore?
(1 answer)
Closed 12 months ago.
The official Firebase docs explain how to delete a single document. But they do not explain how to delete multiple documents at once.
See: Delete data from Cloud Firestore
One common scenario where the user may want to delete multiple documents, would be for example:
Removing all of the items from a user's shopping cart. (But not from other user's carts.)

To do this efficiently, you should use a batched write. This allows you to perform multiple delete operations simultaneously.
See: Transactions and batched writes | Firebase Documentation
For example, to delete three documents at once:
WriteBatch batch = db.batch();
batch.delete(db.collection("cartItems").document("item1"));
batch.delete(db.collection("cartItems").document("item2"));
batch.delete(db.collection("cartItems").document("item3"));
batch.commit()
.addOnSuccessListener((result) -> {
Log.i(LOG_TAG, "Selected items have been deleted.");
})
.addOnFailureListener((error) -> {
Log.e(LOG_TAG, "Failed to delete selected items.", error);
});
So if you know the document Ids in advance, this is fairly simple:
public void removeSelectedItemsFromShoppingCart(List<String> itemIds) {
WriteBatch batch = db.batch();
CollectionReference collection = db.collection("cartItems");
for (String itemId : itemIds) {
batch.delete(collection.document(itemId));
}
batch.commit()
.addOnSuccessListener((result) -> {
Log.i(LOG_TAG, "Selected items have been removed.");
})
.addOnFailureListener((error) -> {
Log.e(LOG_TAG, "Failed to remove selected items.", error);
});
}
But if you don't know the document Ids in advance, you may have to query the database first:
public void removeAllItemsFromShoppingCart(String userId) {
db.collection("cartItems")
.whereEqualTo("userId", userId)
.get()
.addOnSuccessListener((querySnapshot) -> {
WriteBatch batch = db.batch();
for (DocumentSnapshot doc : querySnapshot) {
batch.delete(doc.getReference());
}
batch.commit()
.addOnSuccessListener((result) -> {
Log.i(LOG_TAG, "All items have been removed.");
})
.addOnFailureListener((error) -> {
Log.e(LOG_TAG, "Failed to remove all items.", error);
});
})
.addOnFailureListener((error) -> {
Log.e(LOG_TAG, "Failed to get your cart items.", error);
});
}
Note that the code above uses a one-time read, instead of a real-time read. Which is critical here, so it doesn't continue deleting documents after the operation is done.

Fetching new documents on insertion in ElasticSearch with Java

I have been looking for a solution to create a sort of alert when new documents are added to ES via Logstash. I have seen some threads on here such as : stackoverflow.com/a/51980618/4604579, but that does not really serve my purposes as the plug-ins mentioned do not work with the newest version of ELK and there is no Changes API out yet.
So I have resorted to trying 2 different approaches:
Create a Scroll and run over all the documents in a given index using the Search API, retain the last document's ID and use it after a given timeout period to get all documents that were added after it
Creating a Watcher that checks after a given interval (for example 5 minutes) if new documents have been added to an index.
I have advanced on approach 1, where I can scroll through about 50k documents that are currently in ES and retrieve the last documents id (i sort the query based on timestamp in ascending order, that way I know that the last document will be the latest that was inserted). But I don't know how efficient this approach is and I know that a scroller may time out after a given delay, so if no new documents are inserted, that means the scroll will be removed.
I was looking also into using a Watcher, but I don't really understand how I can set up the condition to check if a new document was inserted in a given index.
I imagine I can do something of the genre:
PUT _watcher/watch/new_docs
{
"trigger" : {
"schedule" : {
"interval" : "5s"
}
},
"input" : {
"search" : {
"request" : {
"indices" : "logstash",
"body" : {
"size" : 0,
"query" : { "match" : { "#timestamp" : "now-5s" } }
}
}
}
},
"condition" : {
"compare" : { ?? }
},
"actions" : {
"my_webhook" : {
"webhook" : {
"method" : "POST",
"host" : "mylisteninghost",
"port" : 9200,
"path" : "/{{watch_id}}",
"body" : "New document {{document ID}} errors"
}
}
I am not exactly sure how to define or use the Watcher and if it would even work.
Can anyone let me know what the best course of action would be?
Thank you
EDIT:
For those interested I found a way to poll the ES REST API using Search After. The difference is that using Scroll, there is a snapshot taken of the documents in the ES DB, so any new documents added wont be in this snapshot. Contrary to that, Search After is state-less, which means that it will use unique sorting parameters (in my case timestamp/id) and hold the last one fetched, afterwards we query all documents that come after the held parameters. This way if any new documents are added, they will come after the held timestamp and will be fetched by the query.
Code:
public static void searchAfterElasticData()
throws FileNotFoundException, IOException, InterruptedException {
//create a search request for a given index
SearchRequest search_request = new SearchRequest(elastic_index);
SearchSourceBuilder source_builder =
getSearchSourceBuilder("#timestamp", "_id", 100);
search_request.source(source_builder);
SearchResponse search_response = null;
try {
search_response = client.search(search_request, RequestOptions.DEFAULT);
} catch (ElasticsearchException | ConnectException ex) {
log.info("Error while querying Elastic API: {}", ex.toString());
}
if (search_response != null) {
SearchHit[] search_hits = search_response.getHits().getHits();
Object[] sort_values = null;
while (search_hits != null) {
if (search_hits.length > 0) {
//if there are records retrieved, parse them
for (SearchHit hit: search_hits) {
Map<String, Object> source_map = hit.getSourceAsMap();
try {
parse((String)source_map.get("message"));
} catch (Exception ex) {
log.error("Error while parsing: {}",
(String)source_map.get("message"));
}
}
//get sorting value of last record and do new request
log.info("Getting sorting values");
sort_values = search_response.getHits()
.getAt(search_hits.length-1).getSortValues();
} else {
log.info("Waiting 1 minute for new entries");
Thread.sleep(60000);
}
source_builder.searchAfter(sort_values);
search_request.source(source_builder);
search_response =
client.search(search_request, RequestOptions.DEFAULT);
search_hits = search_response.getHits().getHits();
log.info("Fetched hits: {}", search_hits.length);
log.info("Searching after for new hits");
}
}
}
I still would like to know if it is possible to do the same using a Watcher, also if anyone has any suggestions to make the code more elegant, please share.
Thank you

Why would a Cosmos stored procedure run differently when called from browser vs. called from Java?

I have a stored procedure in Cosmos DB Emulator. All this procedure does is: delete ALL documents from mycollection. When I run it in browser (https://localhost:8081/_explorer/index.html), it works great. Then I try to call it from Java:
RequestOptions requestOptions = new RequestOptions();
requestOptions.setPartitionKey(new PartitionKey(null));
System.out.println("START DELETE PROCEDURE");
StoredProcedureResponse spr = client.executeStoredProcedure(sprocLink, requestOptions, null);
System.out.println(spr.getResponseAsString());
And get the following result: {"deleted": 0,"continuation": false}
This is CRAZY. I'm running this stored procedure from the browser and getting this result: {"deleted": 10,"continuation": false}. Then (of course adding back those 10 documents) running this result from Java and getting this result: {"deleted": 0,"continuation": false}
So when the stored procedure is ran by Java, it is called but didn't do the job. Deleted nothing.... Why would this happen?
Below is the stored procedure
function testStoredProcedure( ) {
var collection = getContext().getCollection();
var collectionLink = collection.getSelfLink();
var response = getContext().getResponse();
var responseBody = {
deleted: 0,
continuation: true
};
var query = 'SELECT * FROM mycollection ';
// Validate input.
if (!query) throw new Error("The query is undefined or null.");
tryQueryAndDelete();
// Recursively runs the query w/ support for continuation tokens.
// Calls tryDelete(documents) as soon as the query returns documents.
function tryQueryAndDelete(continuation)) {
var requestOptions = {continuation: continuation};
var isAccepted = collection.queryDocuments(collectionLink, query, requestOptions, function (err, retrievedDocs, responseOptions) {
if (err) throw err;
if (retrievedDocs.length > 0) {
// Begin deleting documents as soon as documents are returned form the query results.
// tryDelete() resumes querying after deleting; no need to page through continuation tokens.
// - this is to prioritize writes over reads given timeout constraints.
tryDelete(retrievedDocs);
} else if (responseOptions.continuation) {
// Else if the query came back empty, but with a continuation token; repeat the query w/ the token.
tryQueryAndDelete(responseOptions.continuation);
} else {
// Else if there are no more documents and no continuation token - we are finished deleting documents.
responseBody.continuation = false;
response.setBody(responseBody);
}
});
// If we hit execution bounds - return continuation: true.
if (!isAccepted) {
response.setBody(responseBody);
}
}
// Recursively deletes documents passed in as an array argument.
// Attempts to query for more on empty array.
function tryDelete(documents) {
if (documents.length > 0) {
// Delete the first document in the array.
var isAccepted = collection.deleteDocument(documents[0]._self, {}, function (err, responseOptions) {
if (err) throw err;
responseBody.deleted++;
documents.shift();
// Delete the next document in the array.
tryDelete(documents);
});
// If we hit execution bounds - return continuation: true.
if (!isAccepted) {
response.setBody(responseBody);
}
} else {
// If the document array is empty, query for more documents.
tryQueryAndDelete();
}
}
}

For partitioned containers, when executing a stored procedure, a partition key value must be provided in the request options. Stored procedures are always scoped to a partition key. Items that have a different partition key value will not be visible to the stored procedure. This also applied to triggers as well.
You are setting partition key to "null" in requestOptions. "null" is a valid partition key value. Looks like "null" is not a partition key value for your 10 documents.

Humbly reposting #Jay Gong answer How to specify NONE partition key for deleting a document in Document DB java SDK?
Maybe it will help someone. Put:
PartitionKey partitionKey = new PartitionKey(Undefined.Value());

Azure Document DB - Java 1.9.5 | Authorization Error

I have a collection with some documents in it. And in my application I am creating this collection first and then inserting documents. Also, based on the requirement I need to truncate (delete all documents) the collection as well. Using document db java api I have written the following code for my this purpose-
DocumentClient documentClient = getConnection(masterkey, server, portNo);
List<Database> databaseList = documentClient.queryDatabases("SELECT * FROM root r WHERE r.id='" + schemaName + "'", null).getQueryIterable().toList();
DocumentCollection collection = null;
Database databaseCache = (Database)databaseList.get(0);
List<DocumentCollection> collectionList = documentClient.queryCollections(databaseCache.getSelfLink(), "SELECT * FROM root r WHERE r.id='" + collectionName + "'", null).getQueryIterable().toList();
// truncate logic
if (collectionList.size() > 0) {
collection = ((DocumentCollection) collectionList.get(0));
if (truncate) {
try {
documentClient.deleteDocument(collection.getSelfLink(), null);
} catch (DocumentClientException e) {
e.printStackTrace();
}
}
} else { // create logic
RequestOptions requestOptions = new RequestOptions();
requestOptions.setOfferType("S1");
collection = new DocumentCollection();
collection.setId(collectionName);
try {
collection = documentClient.createCollection(databaseCache.getSelfLink(), collection, requestOptions).getResource();
} catch (DocumentClientException e) {
e.printStackTrace();
}
With the above code I am able to create a new collection successfully. Also, I am able to insert documents as well in this collection. But while truncating the collection I am getting below error-
com.microsoft.azure.documentdb.DocumentClientException: The input authorization token can't serve the request. Please check that the expected payload is built as per the protocol, and check the key being used. Server used the following payload to sign: 'delete
colls
eyckqjnw0ae=
I am using Azure Document DB Java API version 1.9.5.
It will be of great help if you can point out the error in my code or if there is any other better way of truncating collection. I would really appreciate any kind of help here.

According to your description & code, I think the issue was caused by the code below.
try {
documentClient.deleteDocument(collection.getSelfLink(), null);
} catch (DocumentClientException e) {
e.printStackTrace();
}
It seems that you want to delete a document via the code above, but pass the argument documentLink with a collection link.
So if your real intention is to delete a collection, please using the method DocumentClient.deleteCollection(collectionLink, options).

Where to put index-re-aliasing when re-indexing in the background?

I try to re-index an ES index with Java:
// reindex all documents from the old into the new index
QueryBuilder qb = QueryBuilders.matchAllQuery();
SearchResponse scrollResp = client.prepareSearch("my_index").setSearchType(SearchType.SCAN).setScroll(new TimeValue(600000)).setQuery(qb).setSize(100).execute().actionGet();
while (true) {
scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(600000)).execute().actionGet();
final int documentFoundCount = scrollResp.getHits().getHits().length;
// Break condition: No hits are returned
if (documentFoundCount == 0) {
break;
}
// otherwise add all documents which are found (in this scroll-search) to a bulk operation for reindexing.
logger.info("Found {} documents in the scroll search, re-indexing them via bulk now.", documentFoundCount);
BulkRequestBuilder bulk = client.prepareBulk();
for (SearchHit hit : scrollResp.getHits()) {
bulk.add(new IndexRequest(newIndexName, hit.getType()).source(hit.getSource()));
}
bulk.execute(new ActionListener<BulkResponse>() {
#Override public void onResponse(BulkResponse bulkItemResponses) {
logger.info("Reindexed {} documents from '{}' to '{}'.", bulkItemResponses.getItems().length, currentIndexName, newIndexName);
}
#Override public void onFailure(Throwable e) {
logger.error("Could not complete the index re-aliasing.", e);
}
});
}
// these following lines should only be executed if the re-indexing was successful for _all_ documents.
logger.info("Finished re-indexing all documents, now setting the aliases from the old to the new index.");
try {
client.admin().indices().aliases(new IndicesAliasesRequest().removeAlias(currentIndexName, "my_index").addAlias("my_index", newIndexName)).get();
// finally, delete the old index
client.admin().indices().delete(new DeleteIndexRequest(currentIndexName)).actionGet();
} catch (InterruptedException | ExecutionException e) {
logger.error("Could not complete the index re-aliasing.", e);
}
In general, this works, but the approach has one problem:
If there is a failure during re-indexing, e.g. it takes too long and is stopped by some transaction watch (it runs during EJB startup), the alias is re-set and the old index is nevertheless removed.
How can I do that alias-re-setting if and only if all bulk requests were successful?

You're not waiting until the bulk request finishes. If you call execute() without actionGet(), you end up running asynchronously. Which means you will start changing aliases and deleting indexes before the new index is completely built.
Also:
client.admin().indices().aliases(new IndicesAliasesRequest().removeAlias(currentIndexName, "my_index").addAlias("my_index", newIndexName)).get();
This should be ended with execute().actionGet() and not get(). that is probably why your alias is not getting set

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Bulk write in mongoDB using vertx - java

No, you can only get the IDs of upserted documents from repoAsyncResult.result().getUpserts() (a List<JsonObject> whose .getAsString("_id") will return the upserted IDs.)

Related

How to delete multiple documents from a collection in Cloud Firestore? [duplicate]

Fetching new documents on insertion in ElasticSearch with Java

Why would a Cosmos stored procedure run differently when called from browser vs. called from Java?

Azure Document DB - Java 1.9.5 | Authorization Error

Where to put index-re-aliasing when re-indexing in the background?

Categories

Resources