I currently have a cursor that is going through a MongoDB Collection and is taking out a couple of different values out and adding them to another table. However I have noticed that when the process is running the cursor isn't covering all the documents within the collection (Found out by adding counter).
The Beacon Lookup Collection has 3342 documents, but from the logging I can only see it's iterated through 1114 of them and finishes the cursor with no error.Looking at the cursor when debugging it does contain all 3343 documents.
Below is the method I am trying to run and currently having issues with:
public void flattenCollection(){
MongoCollection<Document> beaconLookup = getCollection("BEACON_LOOKUP");
MongoCollection<Document> triggers = getCollection("DIM_TRIGGER");
System.out.println(beaconLookup.count());
// count = 3342
long count = beaconLookup.count();
MongoCursor<Document> beaconLookupCursor = beaconLookup.find().batchSize((int) count).noCursorTimeout(true).iterator();
MongoCursor<Document> triggersCursor = triggers.find().iterator();
try {
while (beaconLookupCursor.hasNext()) {
int major = (Integer) beaconLookupCursor.next().get("MAJOR");
int minor = (Integer) beaconLookupCursor.next().get("MINOR");
if(major==1215) {
System.out.println("MAJOR " + major + " MINOR " + minor);
}
triggers.updateMany(and(eq("MAJOR", major),
eq("MINOR", minor)),
combine(set("BEACON_UUID",beaconLookupCursor.next().get("UUID"))));
count = count - 1;
System.out.println(count);
}
} finally {
beaconLookupCursor.close();
}
}
Any advice would be great!
You are calling next() more than one time for each iteration.
Change
int major = (Integer) beaconLookupCursor.next().get("MAJOR");
int minor = (Integer) beaconLookupCursor.next().get("MINOR");
to
Document doc = beaconLookupCursor.next();
int major = (Integer) doc.get("MAJOR");
int minor = (Integer) doc.get("MINOR");
Looks like there is one more call for UUID. Update that with doc reference too.
Related
We created a program to make the use of the database easier in other programs. So the code im showing gets used in multiple other programs.
One of those other programs gets about 10,000 records from one of our clients and has to check if these are in our database already. If not we insert them into the database (they can also change and have to be updated then).
To make this easy we load all the entries from our whole table (at the moment 120,000), create a class for every entry we get and put all of them into a Hashmap.
The loading of the whole table this way takes around 5 minutes. Also we sometimes have to restart the program because we run into a GC overhead error because we work on limited hardware. Do you have an idea of how we can improve the performance?
Here is the code to load all entries (we have a global limit of 10.000 entries per query so we use a loop):
public Map<String, IMasterDataSet> getAllInformationObjects(ISession session) throws MasterDataException {
IQueryExpression qe;
IQueryParameter qp;
// our main SDP class
Constructor<?> constructorForSDPbaseClass = getStandardConstructor();
SimpleDateFormat itaTimestampFormat = new SimpleDateFormat("yyyyMMddHHmmssSSS");
// search in standard time range (modification date!)
Calendar cal = Calendar.getInstance();
cal.set(2010, Calendar.JANUARY, 1);
Date startDate = cal.getTime();
Date endDate = new Date();
Long startDateL = Long.parseLong(itaTimestampFormat.format(startDate));
Long endDateL = Long.parseLong(itaTimestampFormat.format(endDate));
IDescriptor modDesc = IBVRIDescriptor.ModificationDate.getDescriptor(session);
// count once before to determine initial capacities for hash map/set
IBVRIArchiveClass SDP_ARCHIVECLASS = getMasterDataPropertyBag().getSDP_ARCHIVECLASS();
qe = SDP_ARCHIVECLASS.getQueryExpression(session);
qp = session.getDocumentServer().getClassFactory()
.getQueryParameterInstance(session, new String[] {SDP_ARCHIVECLASS.getDatabaseName(session)}, null, null);
qp.setExpression(qe);
qp.setHitLimitThreshold(0);
qp.setHitLimit(0);
int nrOfHitsTotal = session.getDocumentServer().queryCount(session, qp, "*");
int initialCapacity = (int) (nrOfHitsTotal / 0.75 + 1);
// MD sets; and objects already done (here: document ID)
HashSet<String> objDone = new HashSet<>(initialCapacity);
HashMap<String, IMasterDataSet> objRes = new HashMap<>(initialCapacity);
qp.close();
// do queries until hit count is smaller than 10.000
// use modification date
boolean keepGoing = true;
while(keepGoing) {
// construct query expression
// - basic part: Modification date & class type
// a. doc. class type
qe = SDP_ARCHIVECLASS.getQueryExpression(session);
// b. ID
qe = SearchUtil.appendQueryExpressionWithANDoperator(session, qe,
new PlainExpression(modDesc.getQueryLiteral() + " BETWEEN " + startDateL + " AND " + endDateL));
// 2. Query Parameter: set database; set expression
qp = session.getDocumentServer().getClassFactory()
.getQueryParameterInstance(session, new String[] {SDP_ARCHIVECLASS.getDatabaseName(session)}, null, null);
qp.setExpression(qe);
// order by modification date; hitlimit = 0 -> no hitlimit, but the usual 10.000 max
qp.setOrderByExpression(session.getDocumentServer().getClassFactory().getOrderByExpressionInstance(modDesc, true));
qp.setHitLimitThreshold(0);
qp.setHitLimit(0);
// Do not sort by modification date;
qp.setHints("+NoDefaultOrderBy");
keepGoing = false;
IInformationObject[] hits = null;
IDocumentHitList hitList = null;
hitList = session.getDocumentServer().query(qp, session);
IDocument doc;
if (hitList.getTotalHitCount() > 0) {
hits = hitList.getInformationObjects();
for (IInformationObject hit : hits) {
String objID = hit.getID();
if(!objDone.contains(objID)) {
// do something with this object and the class
// here: construct a new SDP sub class object and give it back via interface
doc = (IDocument) hit;
IMasterDataSet mdSet;
try {
mdSet = (IMasterDataSet) constructorForSDPbaseClass.newInstance(session, doc);
} catch (Exception e) {
// cause for this
String cause = (e.getCause() != null) ? e.getCause().toString() : MasterDataException.ERRMSG_PART_UNKNOWN;
throw new MasterDataException(MasterDataException.ERRMSG_NOINSTANCE_POSSIBLE, this.getClass().getSimpleName(), e.toString(), cause);
}
objRes.put(mdSet.getID(), mdSet);
objDone.add(objID);
}
}
doc = (IDocument) hits[hits.length - 1];
Date lastModDate = ((IDateValue) doc.getDescriptor(modDesc).getValues()[0]).getValue();
startDateL = Long.parseLong(itaTimestampFormat.format(lastModDate));
keepGoing = (hits.length >= 10000 || hitList.isResultSetTruncated());
}
qp.close();
}
return objRes;
}
Loading 120,000 rows (and more) each time will not scale very well, and your solution may not work in the future as the record size grows. Instead let the database server handle the problem.
Your table needs to have a primary key or unique key based on the columns of the records. Iterate through the 10,000 records performing JDBC SQL update to modify all field values with where clause to exactly match primary/unique key.
update BLAH set COL1 = ?, COL2 = ? where PKCOL = ?; // ... AND PKCOL2 =? ...
This modifies an existing row or does nothing at all - and JDBC executeUpate() will return 0 or 1 indicating number of rows changed. If number of rows changed was zero you have detected a new record which does not exist, so perform insert for that new record only.
insert into BLAH (COL1, COL2, ... PKCOL) values (?,?, ..., ?);
You can decide whether to run 10,000 updates followed by however many inserts are needed, or do update+optional insert, and remember JDBC batch statements / auto-commit off may help speed things up.
I have been wondering if there is a way to access all the twitter followers list.
We have tried using call to the REST API via twitter4j:
public List<User> getFriendList() {
List<User> friendList = null;
try {
friendList = mTwitter.getFollowersList(mTwitter.getId(), -1);
} catch (IllegalStateException e) {
e.printStackTrace();
} catch (TwitterException e) {
e.printStackTrace();
}
return friendList;
}
But it returns only a list of 20 followers.
I tried using the same call in loop, but it cause a rate limit exception - says we are not allowed to make too many requests in a small interval of time.
Do we have a way around this?
You should definitely use getFollowersIDs. As the documentation says, this returns an array (list) of IDs objects. Note that it causes the list to be broken into pages of around 5000 IDs at a time. To begin paging provide a value of -1 as the cursor. The response from the API will include a previous_cursor and next_cursor to allow paging back and forth.
The tricky part is to handle the cursor. If you can do this, then you will not have the problem of getting only 20 followers.
The first call to getFollowersIDs will need to be given a cursor of -1. For subsequent calls, you need to update the cursor value, by getting the next cursor, as done in the while part of the loop.
long cursor =-1L;
IDs ids;
do {
ids = twitter.getFollowersIDs(cursor);
for(long userID : ids.getIDs()){
friendList.add(userID);
}
} while((cursor = ids.getNextCursor())!=0 );
Here is a very good reference:
https://github.com/yusuke/twitter4j/blob/master/twitter4j-examples/src/main/java/twitter4j/examples/friendsandfollowers/GetFriendsIDs.java
Now, if the user has more than around 75000 followers, you will have to do some waiting (see Vishal's answer).
The first 15 calls will yield you around 75000 IDs. Then you will have to sleep for 15 minutes. Then make another 15 calls, and so on till you get all the followers. This can be done using a simple Thread.sleep(time_in_milliseconds) outside the for loop.
Just Change like this and try, this is working for me
try {
Log.i("act twitter...........", "ModifiedCustomTabBarActivity.class");
// final JSONArray twitterFriendsIDsJsonArray = new JSONArray();
IDs ids = mTwitter.mTwitter.getFriendsIDs(-1);// ids
// for (long id : ids.getIDs()) {
do {
for (long id : ids.getIDs()) {
String ID = "followers ID #" + id;
String[] firstname = ID.split("#");
String first_Name = firstname[0];
String Id = firstname[1];
Log.i("split...........", first_Name + Id);
String Name = mTwitter.mTwitter.showUser(id).getName();
String screenname = mTwitter.mTwitter.showUser(id).getScreenName();
// Log.i("id.......", "followers ID #" + id);
// Log.i("Name..", mTwitter.mTwitter.showUser(id).getName());
// Log.i("Screen_Name...", mTwitter.mTwitter.showUser(id).getScreenName());
// Log.i("image...", mTwitter.mTwitter.showUser(id).getProfileImageURL());
}
} while (ids.hasNext());
} catch (Exception e) {
e.printStackTrace();
}
Try This...
ConfigurationBuilder confbuilder = new ConfigurationBuilder();
confbuilder.setOAuthAccessToken(accessToken)
.setOAuthAccessTokenSecret(secretToken)
.setOAuthConsumerKey(TwitterOAuthActivity.CONSUMER_KEY)
.setOAuthConsumerSecret(TwitterOAuthActivity.CONSUMER_SECRET);
Twitter twitter = new TwitterFactory(confbuilder.build()).getInstance();
PagableResponseList<User> followersList;
ArrayList<String> list = new ArrayList<String>();
try
{
followersList = twitter.getFollowersList(screenName, cursor);
for (int i = 0; i < followersList.size(); i++)
{
User user = followersList.get(i);
String name = user.getName();
list.add(name);
System.out.println("Name" + i + ":" + name);
}
listView.setAdapter(new ArrayAdapter<String>(this, android.R.layout.simple_list_item_1 , list));
listView.setVisibility(View.VISIBLE);
friend_list.setVisibility(View.INVISIBLE);
post_feeds.setVisibility(View.INVISIBLE);
twit.setVisibility(View.INVISIBLE);
}
This is a tricky one.
You should specify whether you're using application or per user tokens and the number of users you're fetching followers_ids for.
You get just 15 calls per 15 minutes in case of an application token. You can fetch a maximum of 5000 followers_ids per call. That gives you a maximum of 75K followers_ids per 15 minutes.
If any of the users you're fetching followers_ids for has over 75K followers, you'll get the rate_limit error immediately. If you're fetching for more than 1 user, you'll need to build strong rate_limit handling in your code with sleeps and be very patient.
The same applies for friends_ids.
I've not had to deal with fetching more than 75K followers/friends for a given user but come to think of it, I don't know if it's even possible anymore.
I am trying insert an item in MongoDB using Java MongoDB driver.Before inserting I am trying to get nextId to insert,but not sure why I am always getting nextId as 4 .I am using below given method to get nextId before inserting any item in Mongo.
private Long getNextIdValue(DBCollection dbCollection) {
Long nextSequenceNumber = 1L;
DBObject query = new BasicDBObject();
query.put("id", -1);
DBCursor cursor = dbCollection.find().sort(query).limit(1);
while (cursor.hasNext()) {
DBObject itemDBObj = cursor.next();
nextSequenceNumber = new Long(itemDBObj.get("id").toString()) + 1;
}
return nextSequenceNumber;
}
I have total 13 record in my mongodb collection.What I am doing wrong here?
Please don't do that. You don't need create a bad management id situation as the driver already do this in the best way, just use the right type and annotation for the field:
#Id
#ObjectId
private String id;
Then write a generic method to insert all entites:
public T create(T entity) throws MongoException, IOException {
WriteResult<? extends Object, String> result = jacksonDB.insert(entity);
return (T) result.getSavedObject();
}
This will create a time-based indexed hash for id's which is pretty much more powerful than get the "next id".
https://www.tutorialspoint.com/mongodb/mongodb_objectid.htm
How can you perform Arithmetic operations like +1 to String
nextSequenceNumber = new Long(itemDBObj.get("id").toString()) + 1;
Try to create a Sequence collection like this.
{"id":"MySequence","sequence":1}
Then use Update to increment the id
// Query for sequence collection
Query query = new Query(new Criteria().where("id").is("MySequence"));
//Increment the sequence by 1
Update update = new Update();
update.inc("sequence", 1);
FindAndModifyOptions findAndModifyOptions = new FindAndModifyOptions();
findAndModifyOptions.returnNew(true);
SequenceCollection sequenceCollection = mongoOperations.findAndModify(query, update,findAndModifyOptions, SequenceCollection.class);
return sequenceModel.getSequence();
I found the work around using b.collection.count().I simply find the total count and incremented by 1 to assign id to my object.
I have an Oracle 12c database query, which pulls a table of 13 columns and more than 114470 rows in a daily basis.
I was not concerned with this issue until I moved the same code from my DEV server to my PROD server.
On my DEV environment the query takes 3 min:26 sec to complete its execution.
However on PROD the exact same code takes 15 min:34 sec for finishing.
These times were retrieved adding logs on the following code execution:
private List<Map<String, String>> getFieldInformation(ResultSet sqlResult) throws SQLException {
//Map between each column name and the designated data as String
List<Map<String, String>> rows = new ArrayList<Map<String,String>>();
// Count number of returned records
ResultSetMetaData rsmd = sqlResult.getMetaData();
int numberOfColumns = rsmd.getColumnCount();
boolean continueLoop = sqlResult.next();
// If there are no results we return an empty list not null
if(!continueLoop) {
return rows;
}
while (continueLoop) {
Map<String, String> columns = new LinkedHashMap<String, String>();
// Reset Variables for data
String columnLabel = null;
String dataInformation = null;
// Append to the map column name and related data
for(int i = 1; i <= numberOfColumns; i++) {
columnLabel = rsmd.getColumnLabel(i);
dataInformation = sqlResult.getString(i);
if(columnLabel!=null && columnLabel.length()>0 && (dataInformation==null || dataInformation.length() <= 0 )) {
dataInformation = "";
}
columns.put(columnLabel, dataInformation);
}
rows.add(columns);
continueLoop = sqlResult.next();
}
return rows;
}
I understand that "getString" is not the best way for retrieving non TEXT data, but due to the nature of the project I not always know the data type.
Furthermore, I checked in PROD under task manager, that "Memory (Private Working Set)" is being reserved very slowly.
So I would appreciate if you could help in the following questions:
Why there is a discrepancy in the execution timings for both environments? Can you please highlight some ways for checking this issue?
Is there a way were I can see my result set required memory and reserve the same upfront? Will this have some improvements in Performance?
How can I improve the performance for getString(i) method?
Thank you in advance for your assistance.
Best regards,
Ziza
I am using MongoDB 2.6.1. The question from me is that-
"Is it possible for keep a track of _id in Bulk Operations??"
Suppose if I have created one object for BulkWriteOperation, for example 50 documents to be inserted to the 'B' collection from 'A' collection. I need keep a list of successful write operations and failed write operations also.
Bulk Inserts and deletes are working fine. But the question is that-
-- "I need to keep a track of _ids, for a query- find the documents from A and insert to B collection. In the mean while, I need to keep a list of _ids (successful and failed operations). I need to delete the documents in A collection, only for those successful operations and keep failed documents as it is"--
Please help me out.
Thanking you :) :)
First, you'll need to use UnorderedBulkOperation for the entire batch to execute. You will need to use a try/catch around your BulkWriteOperation.execute(), catching BulkWriteException which will give you access to a list of BulkWriteError as well as the BulkWriteResult.
Here's a quick and dirty example:
MongoClient m = new MongoClient("localhost");
DB db = m.getDB( "test" );
DBCollection coll = db.getCollection( "bulk" );
coll.drop();
coll.createIndex(new BasicDBObject("i", 1), new BasicDBObject("unique", true));
BulkWriteOperation bulkWrite = coll.initializeUnorderedBulkOperation();
for (int i = 0; i < 100; i++) {
bulkWrite.insert(new BasicDBObject("i", i));
}
// Now add 10 documents to the batch that will generate a unique index error
for (int i = 0; i < 10; i++) {
bulkWrite.insert(new BasicDBObject("i", i));
}
BulkWriteResult result = null;
List<BulkWriteError> errors = null;
try {
result = bulkWrite.execute();
} catch (BulkWriteException bwe) {
bwe.printStackTrace();
errors = bwe.getWriteErrors();
result = bwe.getWriteResult();
}
for (BulkWriteError e : errors) {
System.out.println(e.getIndex() + " failed");
}
System.out.println(result);