I am using MongoDB 2.6.1. The question from me is that-
"Is it possible for keep a track of _id in Bulk Operations??"
Suppose if I have created one object for BulkWriteOperation, for example 50 documents to be inserted to the 'B' collection from 'A' collection. I need keep a list of successful write operations and failed write operations also.
Bulk Inserts and deletes are working fine. But the question is that-
-- "I need to keep a track of _ids, for a query- find the documents from A and insert to B collection. In the mean while, I need to keep a list of _ids (successful and failed operations). I need to delete the documents in A collection, only for those successful operations and keep failed documents as it is"--
Please help me out.
Thanking you :) :)
First, you'll need to use UnorderedBulkOperation for the entire batch to execute. You will need to use a try/catch around your BulkWriteOperation.execute(), catching BulkWriteException which will give you access to a list of BulkWriteError as well as the BulkWriteResult.
Here's a quick and dirty example:
MongoClient m = new MongoClient("localhost");
DB db = m.getDB( "test" );
DBCollection coll = db.getCollection( "bulk" );
coll.drop();
coll.createIndex(new BasicDBObject("i", 1), new BasicDBObject("unique", true));
BulkWriteOperation bulkWrite = coll.initializeUnorderedBulkOperation();
for (int i = 0; i < 100; i++) {
bulkWrite.insert(new BasicDBObject("i", i));
}
// Now add 10 documents to the batch that will generate a unique index error
for (int i = 0; i < 10; i++) {
bulkWrite.insert(new BasicDBObject("i", i));
}
BulkWriteResult result = null;
List<BulkWriteError> errors = null;
try {
result = bulkWrite.execute();
} catch (BulkWriteException bwe) {
bwe.printStackTrace();
errors = bwe.getWriteErrors();
result = bwe.getWriteResult();
}
for (BulkWriteError e : errors) {
System.out.println(e.getIndex() + " failed");
}
System.out.println(result);
Related
We created a program to make the use of the database easier in other programs. So the code im showing gets used in multiple other programs.
One of those other programs gets about 10,000 records from one of our clients and has to check if these are in our database already. If not we insert them into the database (they can also change and have to be updated then).
To make this easy we load all the entries from our whole table (at the moment 120,000), create a class for every entry we get and put all of them into a Hashmap.
The loading of the whole table this way takes around 5 minutes. Also we sometimes have to restart the program because we run into a GC overhead error because we work on limited hardware. Do you have an idea of how we can improve the performance?
Here is the code to load all entries (we have a global limit of 10.000 entries per query so we use a loop):
public Map<String, IMasterDataSet> getAllInformationObjects(ISession session) throws MasterDataException {
IQueryExpression qe;
IQueryParameter qp;
// our main SDP class
Constructor<?> constructorForSDPbaseClass = getStandardConstructor();
SimpleDateFormat itaTimestampFormat = new SimpleDateFormat("yyyyMMddHHmmssSSS");
// search in standard time range (modification date!)
Calendar cal = Calendar.getInstance();
cal.set(2010, Calendar.JANUARY, 1);
Date startDate = cal.getTime();
Date endDate = new Date();
Long startDateL = Long.parseLong(itaTimestampFormat.format(startDate));
Long endDateL = Long.parseLong(itaTimestampFormat.format(endDate));
IDescriptor modDesc = IBVRIDescriptor.ModificationDate.getDescriptor(session);
// count once before to determine initial capacities for hash map/set
IBVRIArchiveClass SDP_ARCHIVECLASS = getMasterDataPropertyBag().getSDP_ARCHIVECLASS();
qe = SDP_ARCHIVECLASS.getQueryExpression(session);
qp = session.getDocumentServer().getClassFactory()
.getQueryParameterInstance(session, new String[] {SDP_ARCHIVECLASS.getDatabaseName(session)}, null, null);
qp.setExpression(qe);
qp.setHitLimitThreshold(0);
qp.setHitLimit(0);
int nrOfHitsTotal = session.getDocumentServer().queryCount(session, qp, "*");
int initialCapacity = (int) (nrOfHitsTotal / 0.75 + 1);
// MD sets; and objects already done (here: document ID)
HashSet<String> objDone = new HashSet<>(initialCapacity);
HashMap<String, IMasterDataSet> objRes = new HashMap<>(initialCapacity);
qp.close();
// do queries until hit count is smaller than 10.000
// use modification date
boolean keepGoing = true;
while(keepGoing) {
// construct query expression
// - basic part: Modification date & class type
// a. doc. class type
qe = SDP_ARCHIVECLASS.getQueryExpression(session);
// b. ID
qe = SearchUtil.appendQueryExpressionWithANDoperator(session, qe,
new PlainExpression(modDesc.getQueryLiteral() + " BETWEEN " + startDateL + " AND " + endDateL));
// 2. Query Parameter: set database; set expression
qp = session.getDocumentServer().getClassFactory()
.getQueryParameterInstance(session, new String[] {SDP_ARCHIVECLASS.getDatabaseName(session)}, null, null);
qp.setExpression(qe);
// order by modification date; hitlimit = 0 -> no hitlimit, but the usual 10.000 max
qp.setOrderByExpression(session.getDocumentServer().getClassFactory().getOrderByExpressionInstance(modDesc, true));
qp.setHitLimitThreshold(0);
qp.setHitLimit(0);
// Do not sort by modification date;
qp.setHints("+NoDefaultOrderBy");
keepGoing = false;
IInformationObject[] hits = null;
IDocumentHitList hitList = null;
hitList = session.getDocumentServer().query(qp, session);
IDocument doc;
if (hitList.getTotalHitCount() > 0) {
hits = hitList.getInformationObjects();
for (IInformationObject hit : hits) {
String objID = hit.getID();
if(!objDone.contains(objID)) {
// do something with this object and the class
// here: construct a new SDP sub class object and give it back via interface
doc = (IDocument) hit;
IMasterDataSet mdSet;
try {
mdSet = (IMasterDataSet) constructorForSDPbaseClass.newInstance(session, doc);
} catch (Exception e) {
// cause for this
String cause = (e.getCause() != null) ? e.getCause().toString() : MasterDataException.ERRMSG_PART_UNKNOWN;
throw new MasterDataException(MasterDataException.ERRMSG_NOINSTANCE_POSSIBLE, this.getClass().getSimpleName(), e.toString(), cause);
}
objRes.put(mdSet.getID(), mdSet);
objDone.add(objID);
}
}
doc = (IDocument) hits[hits.length - 1];
Date lastModDate = ((IDateValue) doc.getDescriptor(modDesc).getValues()[0]).getValue();
startDateL = Long.parseLong(itaTimestampFormat.format(lastModDate));
keepGoing = (hits.length >= 10000 || hitList.isResultSetTruncated());
}
qp.close();
}
return objRes;
}
Loading 120,000 rows (and more) each time will not scale very well, and your solution may not work in the future as the record size grows. Instead let the database server handle the problem.
Your table needs to have a primary key or unique key based on the columns of the records. Iterate through the 10,000 records performing JDBC SQL update to modify all field values with where clause to exactly match primary/unique key.
update BLAH set COL1 = ?, COL2 = ? where PKCOL = ?; // ... AND PKCOL2 =? ...
This modifies an existing row or does nothing at all - and JDBC executeUpate() will return 0 or 1 indicating number of rows changed. If number of rows changed was zero you have detected a new record which does not exist, so perform insert for that new record only.
insert into BLAH (COL1, COL2, ... PKCOL) values (?,?, ..., ?);
You can decide whether to run 10,000 updates followed by however many inserts are needed, or do update+optional insert, and remember JDBC batch statements / auto-commit off may help speed things up.
I currently have a cursor that is going through a MongoDB Collection and is taking out a couple of different values out and adding them to another table. However I have noticed that when the process is running the cursor isn't covering all the documents within the collection (Found out by adding counter).
The Beacon Lookup Collection has 3342 documents, but from the logging I can only see it's iterated through 1114 of them and finishes the cursor with no error.Looking at the cursor when debugging it does contain all 3343 documents.
Below is the method I am trying to run and currently having issues with:
public void flattenCollection(){
MongoCollection<Document> beaconLookup = getCollection("BEACON_LOOKUP");
MongoCollection<Document> triggers = getCollection("DIM_TRIGGER");
System.out.println(beaconLookup.count());
// count = 3342
long count = beaconLookup.count();
MongoCursor<Document> beaconLookupCursor = beaconLookup.find().batchSize((int) count).noCursorTimeout(true).iterator();
MongoCursor<Document> triggersCursor = triggers.find().iterator();
try {
while (beaconLookupCursor.hasNext()) {
int major = (Integer) beaconLookupCursor.next().get("MAJOR");
int minor = (Integer) beaconLookupCursor.next().get("MINOR");
if(major==1215) {
System.out.println("MAJOR " + major + " MINOR " + minor);
}
triggers.updateMany(and(eq("MAJOR", major),
eq("MINOR", minor)),
combine(set("BEACON_UUID",beaconLookupCursor.next().get("UUID"))));
count = count - 1;
System.out.println(count);
}
} finally {
beaconLookupCursor.close();
}
}
Any advice would be great!
You are calling next() more than one time for each iteration.
Change
int major = (Integer) beaconLookupCursor.next().get("MAJOR");
int minor = (Integer) beaconLookupCursor.next().get("MINOR");
to
Document doc = beaconLookupCursor.next();
int major = (Integer) doc.get("MAJOR");
int minor = (Integer) doc.get("MINOR");
Looks like there is one more call for UUID. Update that with doc reference too.
I want to generate test data for MongoDB. The size should be 200 Mb. I tried this code:
#Test
public void testMongoDBTestDataGenerate()
{
MongoClient mongoClient = new MongoClient("localhost", 27017);
DB db = mongoClient.getDB("development");
DBCollection collection = db.getCollection("ssv");
for (int i = 0; i < 100; i++)
{
BasicDBObject document = new BasicDBObject();
document.put("database", "test");
document.put("table", "hosting");
BasicDBObject documentDetail = new BasicDBObject();
documentDetail.put("records", 99);
documentDetail.put("index", "vps_index1");
documentDetail.put("active", "true");
document.put("detail", documentDetail);
collection.insert(document);
}
mongoClient.close();
}
How I can generate data exactly with this size?
I am not getting what are you trying to achieve by setting size 200 Mb.
You can add logical checks.
db.testCollection.stats() - You can check size of the collection before every insert.
Object.bsonsize(..) - Also you can check document size before inserting to make it exactly 200 MB.
And also you can make capped collection where you can notify number of documents or size of collection.
Hope this helps.
What I would probably do is create a capped collection with size 200MB (209715200 bytes):
db.createCollection( "ssv", { capped: true, size: 209715200 } )
Then insert records as you are doing. Then at intervals inside the for loop, check if the collection if full (or almost full).
So in your code, maybe (totally pseudo code):
if(i % 10 == 0) {
if(db.ssv.stats().size >= 209715100){ //Or an arbitrary value closer to 200MB
break;
}
}
Why would you copy the same data 100 times to make 200mb worth test data? Instead
1.append a counter to the value so that you can generate data sequentially
OR
Use random function to generate random data
#Test
public void testMongoDBTestDataGenerate()
{
MongoClient mongoClient = new MongoClient("localhost", 27017);
DB db = mongoClient.getDB("development");
DBCollection collection = db.getCollection("ssv");
int counter=0;
for (int i = 0; i < 873813; i++)
{
BasicDBObject document = new BasicDBObject();
document.put("database", "test");
document.put("table", "hosting");
BasicDBObject documentDetail = new BasicDBObject();
documentDetail.put("counter0", counter++);
documentDetail.put("counter1", counter++);
documentDetail.put("counter2", counter++);
documentDetail.put("counter3", counter++);
documentDetail.put("counter4", counter++);
documentDetail.put("counter5", counter++);
documentDetail.put("counter6", counter++);
documentDetail.put("counter7", counter++);
documentDetail.put("counter8", counter++);
documentDetail.put("counter9", counter++);
document.put("detail", documentDetail);
collection.insert(document);
}
mongoClient.close();
}
}
10 eight two-bytes char strings and 10 eight-bytes numbers => 240B
240B * 873813 = 200MB
A quick dirty solution based on Robert Udah and Somnath Muluk suggestions:
create a field with type nvarchar then generate a .txt file in your for example desktop with 200MB data. then put this string into that field. that's it.
The function below merge word MongoDB collection and map content like this:
Collection:
cat 3,
dog 5
Map:
dog 2,
zebra 1
Collection after merge:
cat 3,
dog 7,
zebra 1
We have empty collection and map with about 14000 elements.
Oracle PL/SQL procedure using one merge SQL running on 15k RPM HD do it in less then a second.
MongoBD on SSD disk needs about 53 seconds.
It looks like Oracle prepares in memory image of file operation
and saves result in one i/o operation.
MongoDB probably does 14000 i/o - it is about 4 ms for each insert. It is corresponds with performance of SSD.
If I do just 14000 inserts without search for documents existence as in case of merge everything works also fast - less then a second.
My questions:
Can the code be improved?
Maybe it necessary to do something with MongoDB configuration?
Function code:
public void addBookInfo(String bookTitle, HashMap<String, Integer> bookInfo)
{
// insert information to the book collection
Document d = new Document();
d.append("book_title", bookTitle);
book.insertOne(d);
// insert information to the word collection
// prepare collection of word info and book_word info documents
List<Document> wordInfoToInsert = new ArrayList<Document>();
List<Document> book_wordInfoToInsert = new ArrayList<Document>();
for (String key : bookInfo.keySet())
{
Document d1 = new Document();
Document d2 = new Document();
d1.append("word", key);
d1.append("count", bookInfo.get(key));
wordInfoToInsert.add(d1);
d2.append("book_title", bookTitle);
d2.append("word", key);
d2.append("count", bookInfo.get(key));
book_wordInfoToInsert.add(d2);
}
// this is collection of insert/update DB operations
List<WriteModel<Document>> updates = new ArrayList<WriteModel<Document>>();
// iterator for collection of words
ListIterator<Document> listIterator = wordInfoToInsert.listIterator();
// generate list of insert/update operations
while (listIterator.hasNext())
{
d = listIterator.next();
String wordToUpdate = d.getString("word");
int countToAdd = d.getInteger("count").intValue();
updates.add(
new UpdateOneModel<Document>(
new Document("word", wordToUpdate),
new Document("$inc",new Document("count", countToAdd)),
new UpdateOptions().upsert(true)
)
);
}
// perform bulk operation
// this is slowly
BulkWriteResult bulkWriteResult = word.bulkWrite(updates);
boolean acknowledge = bulkWriteResult.wasAcknowledged();
if (acknowledge)
System.out.println("Write acknowledged.");
else
System.out.println("Write was not acknowledged.");
boolean countInfo = bulkWriteResult.isModifiedCountAvailable();
if (countInfo)
System.out.println("Change counters avaiable.");
else
System.out.println("Change counters not avaiable.");
int inserted = bulkWriteResult.getInsertedCount();
int modified = bulkWriteResult.getModifiedCount();
System.out.println("inserted: " + inserted);
System.out.println("modified: " + modified);
// insert information to the book_word collection
// this is very fast
book_word.insertMany(book_wordInfoToInsert);
}
I have an Oracle 12c database query, which pulls a table of 13 columns and more than 114470 rows in a daily basis.
I was not concerned with this issue until I moved the same code from my DEV server to my PROD server.
On my DEV environment the query takes 3 min:26 sec to complete its execution.
However on PROD the exact same code takes 15 min:34 sec for finishing.
These times were retrieved adding logs on the following code execution:
private List<Map<String, String>> getFieldInformation(ResultSet sqlResult) throws SQLException {
//Map between each column name and the designated data as String
List<Map<String, String>> rows = new ArrayList<Map<String,String>>();
// Count number of returned records
ResultSetMetaData rsmd = sqlResult.getMetaData();
int numberOfColumns = rsmd.getColumnCount();
boolean continueLoop = sqlResult.next();
// If there are no results we return an empty list not null
if(!continueLoop) {
return rows;
}
while (continueLoop) {
Map<String, String> columns = new LinkedHashMap<String, String>();
// Reset Variables for data
String columnLabel = null;
String dataInformation = null;
// Append to the map column name and related data
for(int i = 1; i <= numberOfColumns; i++) {
columnLabel = rsmd.getColumnLabel(i);
dataInformation = sqlResult.getString(i);
if(columnLabel!=null && columnLabel.length()>0 && (dataInformation==null || dataInformation.length() <= 0 )) {
dataInformation = "";
}
columns.put(columnLabel, dataInformation);
}
rows.add(columns);
continueLoop = sqlResult.next();
}
return rows;
}
I understand that "getString" is not the best way for retrieving non TEXT data, but due to the nature of the project I not always know the data type.
Furthermore, I checked in PROD under task manager, that "Memory (Private Working Set)" is being reserved very slowly.
So I would appreciate if you could help in the following questions:
Why there is a discrepancy in the execution timings for both environments? Can you please highlight some ways for checking this issue?
Is there a way were I can see my result set required memory and reserve the same upfront? Will this have some improvements in Performance?
How can I improve the performance for getString(i) method?
Thank you in advance for your assistance.
Best regards,
Ziza