Elasticsearch update document without creating new index - java

I have many existing indices partition by date. Eg: index_190901, index_190902,...
And I have an API which takes index_name and doc_id as inputs. User want to update some documents in index by input fields, index_name, doc_id.
I'm trying to update document using the following code:
updateRequest.index("invalid_daily_index")
.type("type")
.id("id")
.doc(jsonMap)
It works fine if user input existing index but if user input non-existing index, new index with no document will be created.
I know that I can setup auto_create_index but I still want to create index automatically when I insert new documents.
Check if index is existed with client.indices.exists(request, RequestOptions.DEFAULT) is quite expensive. I don't want to check it every request
How to make Elasticsearch to not create new index when I use updateRequest.

You can block the option to automaticaly create non existing indices by putting false to the action.auto_create_index setting of the cluster
PUT _cluster/settings
{
"persistent" : { "action.auto_create_index” : "false" }
}
For details take a look at the reference

Related

Problems with Google Firestore collectionGroup - whereArrayContains

I'm working on an Android App written in Java.
I'm trying to query for documents in Cloud Firestore using collectionGroup and whereArrayContains.
My structure in Cloud Firestore looks like this:
Teams (collection)
- teamUUID (document)
- reservations(collection)
- reservationDate (document)
- places (collection)
- userUUIDs (array of strings)
-> one entry ->"vQn9vbWzTtcsB71hgPBhX2uWBuI3"
- placeID (document)
partial screenshot of the structure
I want to get all documents in the collection places, where the userUUIDs field contains a specific string. My code for this is the following:
db.collectionGroup("places")
.whereArrayContains("userUUIDs",
"vQn9vbWzTtcsB71hgPBhX2uWBuI3")
.get()
.addOnSuccessListener(queryDocumentSnapshots -> {
Log.d(TAG, String.valueOf(queryDocumentSnapshots.getDocuments().size()));
if (queryDocumentSnapshots.isEmpty()) {
//nothing found
Log.d(TAG, "nothing found for this user");
If this code is executed, the query is successful, but no documents are returned.
If I leave out .whereArrayContains("userUUIDs", "vQn9vbWzTtcsB71hgPBhX2uWBuI3"), one document is returned.
Google Firestore automatically created the index to query for "userUUIDs".
index created in firestore
Why are there no documents returned using whereArrayContains?
Edit:
The problem seams to exist with all queries.
I added a string testValue to places and made a whereEqualTo Query. The result was the same. After creating an index (automatically via link in the console) the onSuccessListener code was executed, but no document was found.
Thanks in advance.
Why are there no documents returned using whereArrayContains?
Because you most probably didn't create the correct index. If the userUUIDs exists within the documents from the "places" collection, when running the app you'll find a message like this in the logcat:
Caused by: io.grpc.StatusException: FAILED_PRECONDITION: The query requires a COLLECTION_GROUP_CONTAINS index for collection places and field userUUIDs. You can create it here: https://console.firebase.google.com/v1/r/project/...
You can simply click on that link or copy, and paste the URL into a web browser and your index will be created automatically.

Spring MongoTemplate upsert entire list

I have a list of objects which I want to insert into a collection. The mongoTemplate.insert(list); works fine but now I want to modify it to upsert(); as my list can contain duplicate objects which are already inserted into a collection. So what I want is insert entire list and on the go check if the item is already present in the collection then skip it else insert it.
You can try out continueOnError or ordered flag like this:
db.collection.insert(myArray, {continueOnError: true})
OR,
db.collection.insert(myArray, {ordered: false})
You need to create a unique index field of your object's id(if there is no unique constraint). So that it will make error while you try to insert using same id.
Using the unique constraint you insert array or using BulkInsert
For using insert you can set a flag continueOnError: true which can continue insertion whenever error found in case of error because of unique constraint while inserting existing id of object.
The only way to do a bulk-upsert operation is the method MongoCollection.bulkWrite (or at least: the only way I know... ;-))
To use it, you have to convert your documents to the appropriate WriteModel: for upserts on a per-document basis, this is UpdateOneModel.
List<Document> toUpdate = ...;
MongoCollection coll = ...;
// Convert Document to UpdateOneModel<Document>
List<UpdateOneModel<Document>> bulkOperationList = toUpdate.stream()
.map(doc -> new UpdateOneModel<Document>(
Filters.eq("_id", doc.get("_id")), // identify by same _id
doc,
new UpdateOptions().upsert(true)))
.collect(Collectors.toList());
// Write to DB
coll.bulkWrite(bulkOperationList);
(Disclaimer: I only typed this code, I never ran it)

MongoDB result set getting modified after execution of a query

In my application there are 2 threads:
crawl the web-sites and insert the data into MongoDB
retrieve the crawled sites and perform business logic
In order to retrieve the the crawled sites I use the following query:
Document query = new Document("fetchStatus", new Document("$lte", fetchStatusParam));
FindIterable<Document> unfetchedEpisodes = dbC_Episodes.find(query);
As the result I get all episodes, which its fetchStatusParam is less or equal to the specific value.
The next step, I store the items of the result set in HashMap<String, TrackedEpisode>, which is an object property in order to track them:
for (Document document : unfetchedEpisodes) {
this.trackedEpisodes.put(document.get("_id").toString(), new TrackedEpisode(document));
}
Then I do some business logic, which:
doesn't modify the unfetchedEpisodes result set.
doesn't remove any object from trackedEpisodes.
Up till now everything is OK.
The last step, I pass over all retrieved documents and mark them as fetched in order to prevent the duplicate fetching in the future.
for (Document document : unfetchedEpisodes) {
if (this.trackedEpisodes.containsKey(document.get("_id").toString())) {
// prevent repeated fetching
document.put("fetchStatus", FetchStatus.IN_PROCESS.getID());
if (this.trackedEpisodes.get(document.get("_id").toString()).isExpired()) {
document.put("isExpired", true);
document.put("fetchStatus", FetchStatus.FETCHED.getID());
}
} else {
System.out.println("BOO! Strange new object detected");
}
dbC_Episodes.updateOne(new Document("_id", document.get("_id")), new Document("$set", document));
}
I run this code for a couple of days and paid attention that sometimes it arrives to the else part of the if (this.trackedEpisodes.containsKey()) statement. It's weird for me, how it can be possible that unfetchedEpisodes and trackedEpisodes are not synchronized and don't contain the same items?
I began to investigate the case and paid attention that the times I arrive to "BOO! Strange new object detected" the document iterator contains the item which is in database but should not yet be in unfetchedEpisodes since I didn't execute a new query to database.
I checked couple of times the matter of storing retrieved items into trackedEpisodes and always all elements from the unfetchedEpisodes have been added to trackedEpisodes but after that sometimes I still arrive to "BOO! Strange new object detected".
My question:
Why unfetchedEpisodes gets new items after execution of a query?
Is it possible that unfetchedEpisodes will be modified by MongoDB driver after execution of Collection#query()?
Maybe should I use kind of .close() after executing a query from the MongoDB?
The used versions:
MongoDB: 3.2.3, x64
MongoDB Java Driver: mongodb-driver-3.2.2, mongodb-driver-core-3.2.2, bson-3.2.2
When you call find here:
FindIterable<Document> unfetchedEpisodes = dbC_Episodes.find(query);
you are not actually getting all the episodes back. You are getting a database cursor pointing to the matched documents.
Then when you call:
for (Document document : unfetchedEpisodes){}
an iterator is created over all of the documents that match the query.
When you call it a second time, a new cursor is returned, for the same query, and all of the documents that match now are iterated over.
If the collection has changed in between, the results will be different.
If you want to ensure that the contents of unfetchedEpisodes are unchanged then one option is you could pull the entire result set into memory and iterate over it in memory rather than on the DB, e.g.
ArrayList<Document> unfetchedEpisodes = dbC_Episodes.find(query).into(new ArrayList<Document>());

Java method for MongoDB collection.save()

I'm having a problem with MongoDB using Java when I try adding documents with customized _id field. And when I insert new document to that collection, I want to ignore the document if it's _id has already existed.
In Mongo shell, collection.save() can be used in this case but I cannot find the equivalent method to work with MongoDB java driver.
Just to add an example:
I have a collection of documents containing websites' information
with the URLs as _id field (which is unique)
I want to add some more documents. In those new documents, some might be existing in the current collection. So I want to keep adding all the new documents except for the duplicate ones.
This can be achieve by collection.save() in Mongo Shell but using MongoDB Java Driver, I can't find the equivalent method.
Hopefully someone can share the solution. Thanks in advance!
In the MongoDB Java driver, you could try using the BulkWriteOperation object with the initializeOrderedBulkOperation() method of the DBCollection object (the one that contains your collection). This is used as follows:
MongoClient mongo = new MongoClient("localhost", port_number);
DB db = mongo.getDB("db_name");
ArrayList<DBObject> objectList; // Fill this list with your objects to insert
BulkWriteOperation operation = col.initializeOrderedBulkOperation();
for (int i = 0; i < objectList.size(); i++) {
operation.insert(objectList.get(i));
}
BulkWriteResult result = operation.execute();
With this method, your documents will be inserted one at a time with error handling on each insert, so documents that have a duplicated id will throw an error as usual, but the operation will still continue with the rest of the documents. In the end, you can use the getInsertedCount() method of the BulkWriteResult object to know how many documents were really inserted.
This can prove to be a bit ineffective if lots of data is inserted this way, though. This is just sample code (that was found on journaldev.com and edited to fit your situation.). You may need to edit it so it fits your current configuration. It is also untested.
I guess save is doing something like this.
fun save(doc: Document, col: MongoCollection<Document>) {
if (doc.getObjectId("_id") != null) {
doc.put("_id", ObjectId()) // generate a new id
}
col.replaceOne(Document("_id", doc.getObjectId("_id")), doc)
}
Maybe they removed save so you decide how to generate the new id.

Create index in MongoDB 3.2 to avoid duplicated documents/rows

I'm using MongoDB 3.2 and want to avoid the duplicates in my collection. In order to do that I use createIndex() method (I tried different variants, none of them doesn't work):
dbColl.createIndex(new Document("guid", 1));
dbColl.createIndex(new BasicDBObject("guid", 1));
dbColl.createIndex(new Document("guid.content", 1));
dbColl.createIndex(new BasicDBObject("guid.content", 1));
Then I try to execute data insert with:
itemsArr.forEach(
item -> dbColl.insertOne(Document.parse(item.toString()))
);
I do it two times and anticipate that the second time MongoDB will not add any new row since the data has been already added and there is an index on the guid field. But that's not the case MongoDB adds duplicates despite index value.
Why does MongoDB add duplicates even if there is an index on a guid and/or guid.content field? And how to fix it? I want to be able to add the document with the same guid field only one time.
Here is a sample of documents structure:
In my data the guid field is a unique document identifier.
Regular indexes allow multiple documents with the same value.
What you need is not a regular index but an unique index. These are created by using the method createIndex(DBObject keys, DBObject options) with an options-object where unique is true.
collection.createIndex(new BasicDBObject("guid", 1), new BasicDBObject("unique", true));
With the help of Phillip, I composed a completely worked solution for the problem «How to avoid duplicates / skip duplicates on insert» in MongoDB 3.2 for Java Driver 3.2.0:
IndexOptions options = new IndexOptions();
// ensure the index is unique
options.unique(true);
// define the index
dbColl.createIndex(new BasicDBObject("guid", 1), options);
// add data to DB
for (Object item : itemsArr) {
// if there is a duplicate, skip it and write to a console (optionally)
try {
dbColl.insertOne(Document.parse(item.toString()));
} catch (com.mongodb.MongoWriteException ex) {
//System.err.println(ex.getMessage());
}
}
Feel free to use this ready-to-use solution.

Categories

Resources