MongoDB - find() calls getting stuck sometimes - Timeout - java

We are using MongoDb for saving and fetching data.
All calls that are putting data into collections are working fine and are through common method.
All calls that are fetching data from collections are working fine sometimes and are through common method.
But Sometimes, only for one of the collection, i get my calls being stuck for forever, consuming CPU usage. I have to manually kill the threads otherwise it consumes my whole CPU.
Mongo Connection
MongoClient mongo = new MongoClient(hostName , Integer.valueOf(port));
DB mongoDb = mongo.getDB(dbName);
Code To fetch
DBCollection collection = mongoDb.getCollection(collectionName);
DBObject dbObject = new BasicDBObject("_id" , key);
DBCursor cursor = collection.find(dbObject);
Though i have figured out the collection for which it is causing issues, but how can i improve upon this, since it is occurring for this particular collection and sometimes.
EDIT
Code to save
DBCollection collection = mongoDb.getCollection(collectionName);
DBObject query = new BasicDBObject("_id" , key);
DBObject update = new BasicDBObject();
update.put("$set" , JSON.parse(value));
collection.update(query , update , true , false);
Bulk Write / collection
DB mongoDb = controllerFactory.getMongoDB();
DBCollection collection = mongoDb.getCollection(collectionName);
BulkWriteOperation bulkWriteOperation = collection.initializeUnorderedBulkOperation();
Map<String, Object> dataMap = (Map<String, Object>) JSON.parse(value);
for (Entry<String, Object> entrySet : dataMap.entrySet()) {
BulkWriteRequestBuilder bulkWriteRequestBuilder = bulkWriteOperation.find(new BasicDBObject("_id" ,
entrySet.getKey()));
DBObject update = new BasicDBObject();
update.put("$set" , entrySet.getValue());
bulkWriteRequestBuilder.upsert().update(update);
}
How can i set timeout for fetch calls..??

A different approach is to use the proposed method for MongoDB 3.2 Driver. Keep in mind that you have to update your .jar libraries (if you haven't) to the latest version.
public final MongoClient connectToClient(String hostName, String port) {
try {
MongoClient client = new MongoClient(hostName, Integer.valueOf(port));
return client;
} catch(MongoClientException e) {
System.err.println("Cannot connect to Client.");
return null;
}
}
public final MongoDatabase connectToDB(String databaseName) {
try {
MongoDatabase db = client.getDatabase(databaseName);
return db;
} catch(Exception e) {
System.err.println("Error in connecting to database " + databaseName);
return null;
}
public final void closeConnection(MongoClient client) {
client.close();
}
public final void findDoc(MongoDatabase db, String collectionName) {
MongoCollection<Document> collection = db.getCollection(collectionName);
try {
FindIterable<Document> iterable = collection
.find(new Document("_id", key));
Document doc = iterable.first();
//For an Int64 field named 'special_id'
long specialId = doc.getLong("special_id");
} catch(MongoException e) {
System.err.println("Error in retrieving document.");
} catch(NullPointerException e) {
System.err.println("Document with _id " + key + " does not exist.");
}
}
public final void insertToDB(MongoDatabase db, String collectioName) {
try {
db.getCollection(collectionName).insertOne(new Document()
.append("special_id", 5)
//Append anything
);
catch(MongoException e) {
System.err.println("Error in inserting new document.");
}
}
public final void updateDoc(MongoDatabase db, String collectionName, long id) {
MongoCollection<Document> collection = db.getCollection(collectionName);
try {
collection.updateOne(new Document("_id", id),
new Document("$set",
new Document("special_id",
7)));
catch(MongoException e) {
System.err.println("Error in updating new document.");
}
}
public static void main(String[] args) {
String hostName = "myHost";
String port = "myPort";
String databaseName = "myDB";
String collectionName = "myCollection";
MongoClient client = connectToClient(hostName, port);
if(client != null) {
MongoDatabase db = connectToDB(databaseName);
if(db != null) {
findDoc(db, collectionName);
}
client.closeConnection();
}
}
EDIT: As the others suggested, check from the command line if the procedure of finding the document by its ID is slow too. Then maybe this is a problem with your hard drive. The _id is supposed to be indexed but for better or for worse, re-create the index on the _id field.

The answers posted by others are great, but did not solve my purpose.
Actually issue was in my existing code itself , my cursor was waiting in while loop infinite time.
I was missing few checks which has been resolved now.

Just some possible explanations/thoughts.
In general "query by id" has to be fast since _id is supposed to be indexed, always. The code snippet looks correct, so probably the reason is in mongo itself. This leads me to a couple of suggestions:
Try to connect to mongo directly from the command line and run the "find" from there. The chances are that you'll still be able to observe occasional slowness.
In this case:
Maybe its about the disks (maybe this particular server is deployed on the slow disk or at least there is a correlation with some slowness of accessing the disk).
Maybe your have a sharded configuration and one shard is slower than others
Maybe its a network issue that occurs sporadically. If you run mongo locally/on staging env. with the same collection does this reproduce?
Maybe (Although I hardly believe that) the query runs in sub un-optimal way. In this case you can use "explain()" as someone has already suggested here.
If you happen to have replica set, please figure out what is the [Read Preference]. Who knows, maybe you prefer to get this id from the sub-optimal server

Related

How to mass delete multiple rows in hbase?

I have the following rows with these keys in hbase table "mytable"
user_1
user_2
user_3
...
user_9999999
I want to use the Hbase shell to delete rows from:
user_500 to user_900
I know there is no way to delete, but is there a way I could use the "BulkDeleteProcessor" to do this?
I see here:
https://github.com/apache/hbase/blob/master/hbase-examples/src/test/java/org/apache/hadoop/hbase/coprocessor/example/TestBulkDeleteProtocol.java
I want to just paste in imports and then paste this into the shell, but have no idea how to go about this. Does anyone know how I can use this endpoint from the jruby hbase shell?
Table ht = TEST_UTIL.getConnection().getTable("my_table");
long noOfDeletedRows = 0L;
Batch.Call<BulkDeleteService, BulkDeleteResponse> callable =
new Batch.Call<BulkDeleteService, BulkDeleteResponse>() {
ServerRpcController controller = new ServerRpcController();
BlockingRpcCallback<BulkDeleteResponse> rpcCallback =
new BlockingRpcCallback<BulkDeleteResponse>();
public BulkDeleteResponse call(BulkDeleteService service) throws IOException {
Builder builder = BulkDeleteRequest.newBuilder();
builder.setScan(ProtobufUtil.toScan(scan));
builder.setDeleteType(deleteType);
builder.setRowBatchSize(rowBatchSize);
if (timeStamp != null) {
builder.setTimestamp(timeStamp);
}
service.delete(controller, builder.build(), rpcCallback);
return rpcCallback.get();
}
};
Map<byte[], BulkDeleteResponse> result = ht.coprocessorService(BulkDeleteService.class, scan
.getStartRow(), scan.getStopRow(), callable);
for (BulkDeleteResponse response : result.values()) {
noOfDeletedRows += response.getRowsDeleted();
}
ht.close();
If there exists no way to do this through JRuby, Java or alternate way to quickly delete multiple rows is fine.
Do you really want to do it in shell because there are various other better ways. One way is using the native java API
Construct an array list of deletes
pass this array list to Table.delete method
Method 1: if you already know the range of keys.
public void massDelete(byte[] tableName) throws IOException {
HTable table=(HTable)hbasePool.getTable(tableName);
String tablePrefix = "user_";
int startRange = 500;
int endRange = 999;
List<Delete> listOfBatchDelete = new ArrayList<Delete>();
for(int i=startRange;i<=endRange;i++){
String key = tablePrefix+i;
Delete d=new Delete(Bytes.toBytes(key));
listOfBatchDelete.add(d);
}
try {
table.delete(listOfBatchDelete);
} finally {
if (hbasePool != null && table != null) {
hbasePool.putTable(table);
}
}
}
Method 2: If you want to do a batch delete on the basis of a scan result.
public bulkDelete(final HTable table) throws IOException {
Scan s=new Scan();
List<Delete> listOfBatchDelete = new ArrayList<Delete>();
//add your filters to the scanner
s.addFilter();
ResultScanner scanner=table.getScanner(s);
for (Result rr : scanner) {
Delete d=new Delete(rr.getRow());
listOfBatchDelete.add(d);
}
try {
table.delete(listOfBatchDelete);
} catch (Exception e) {
LOGGER.log(e);
}
}
Now coming down to using a CoProcessor. only one advice, 'DON'T USE CoProcessor' unless you are an expert in HBase.
CoProcessors have many inbuilt issues if you need I can provide a detailed description to you.
Secondly when you delete anything from HBase it's never directly deleted from Hbase there is tombstone marker get attached to that record and later during a major compaction it gets deleted, so no need to use a coprocessor which is highly resource exhaustive.
Modified code to support batch operation.
int batchSize = 50;
int batchCounter=0;
for(int i=startRange;i<=endRange;i++){
String key = tablePrefix+i;
Delete d=new Delete(Bytes.toBytes(key));
listOfBatchDelete.add(d);
batchCounter++;
if(batchCounter==batchSize){
try {
table.delete(listOfBatchDelete);
listOfBatchDelete.clear();
batchCounter=0;
}
}}
Creating HBase conf and getting table instance.
Configuration hConf = HBaseConfiguration.create(conf);
hConf.set("hbase.zookeeper.quorum", "Zookeeper IP");
hConf.set("hbase.zookeeper.property.clientPort", ZookeeperPort);
HTable hTable = new HTable(hConf, tableName);
If you already aware of the rowkeys of the records that you want to delete from HBase table then you can use the following approach
1.First create a List objects with these rowkeys
for (int rowKey = 1; rowKey <= 10; rowKey++) {
deleteList.add(new Delete(Bytes.toBytes(rowKey + "")));
}
2.Then get the Table object by using HBase Connection
Table table = connection.getTable(TableName.valueOf(tableName));
3.Once you have table object call delete() by passing the list
table.delete(deleteList);
The complete code will look like below
Configuration config = HBaseConfiguration.create();
config.addResource(new Path("/etc/hbase/conf/hbase-site.xml"));
config.addResource(new Path("/etc/hadoop/conf/core-site.xml"));
String tableName = "users";
Connection connection = ConnectionFactory.createConnection(config);
Table table = connection.getTable(TableName.valueOf(tableName));
List<Delete> deleteList = new ArrayList<Delete>();
for (int rowKey = 500; rowKey <= 900; rowKey++) {
deleteList.add(new Delete(Bytes.toBytes("user_" + rowKey)));
}
table.delete(deleteList);

Runtime Exception In Mongo Db Java Query

I am new to java.
I am doing a search in window-builder using java-mongodb.
When I execute the below code i get Runtime exception error.
try{
// To connect to mongodb server
MongoClient mongoClient = new MongoClient( "localhost" , 27017 );
// Now connect to your databases
DB db = mongoClient.getDB( "Ticket" );
System.out.println("Connect to database successfully");
DBCollection coll = db.getCollection("OnlineT");
System.out.println("Collection created successfully");
F_stn = (String)fm.getSelectedItem();
T_stn = (String)to.getSelectedItem();
BasicDBObject doc = new BasicDBObject("From",F_stn);
BasicDBObject doc1 = new BasicDBObject("To",T_stn);
DBCursor ser = coll.find(doc);
DBCursor ser2 = coll.find(doc1);
while(ser.hasNext())
{
String data=ser.next().get("To").toString();
System.out.println(data);
if(data.equals(T_stn))
{
System.out.println("i m in");
String dis=ser.next().toString();
System.out.println(dis);
break;
}
else
System.out.println("No data found");
}
}
It is working fine but when it enters the if loop it did not print the DBobject.
Please suggest me some way to do this. Thanks in advance..
In the "if" loop, you have:
String dis=ser.next().toString();
This makes your cursor move to the next postion and it didn't check hasNext(). I think that is the problem
Instead, you may do something like:
while(ser.hasNext()){
DBObject dbObject = ser.next();
String data=dbObject.get("To").toString();
System.out.println(data);
if(data.equals(T_stn))
{
System.out.println("i m in");
System.out.println(dbObject);
break;
}
else
System.out.println("No data found");
In addition, you don't need toString() for printing, println() will call automatically toString() method of the object

neo4j - batch insertion using neo4j rest graph db

I'm using version 2.0.1 .
I have like hundred of thousands of nodes that needs to be inserted. My neo4j graph db is on a stand alone server, and I'm using RestApi through the neo4j rest graph db library to achieved this.
However, I'm facing a slow performance result. I've chopped my queries into batches, sending 500 cypher statements in a single http call. The result that I'm getting is like:
10:38:10.984 INFO commit
10:38:13.161 INFO commit
10:38:13.277 INFO commit
10:38:15.132 INFO commit
10:38:15.218 INFO commit
10:38:17.288 INFO commit
10:38:19.488 INFO commit
10:38:22.020 INFO commit
10:38:24.806 INFO commit
10:38:27.848 INFO commit
10:38:31.172 INFO commit
10:38:34.767 INFO commit
10:38:38.661 INFO commit
And so on.
The query that I'm using is as follows:
MERGE (a{main:{val1},prop2:{val2}}) MERGE (b{main:{val3}}) CREATE UNIQUE (a)-[r:relationshipname]-(b);
My code is this:
private RestAPI restAPI;
private RestCypherQueryEngine engine;
private GraphDatabaseService graphDB = new RestGraphDatabase("http://localdomain.com:7474/db/data/");
...
restAPI = ((RestGraphDatabase) graphDB).getRestAPI();
engine = new RestCypherQueryEngine(restAPI);
...
Transaction tx = graphDB.getRestAPI().beginTx();
try {
int ctr = 0;
while (isExists) {
ctr++;
//excute query here through engine.query()
if (ctr % 500 == 0) {
tx.success();
tx.close();
tx = graphDB.getRestAPI().beginTx();
LOGGER.info("commit");
}
}
tx.success();
} catch (FileNotFoundException | NumberFormatException | ArrayIndexOutOfBoundsException e) {
tx.failure();
} finally {
tx.close();
}
Thanks!
UPDATED BENCHMARK.
Sorry for the confusion, the benchmark that I've posted isn't accurate, and is not for 500 queries. My ctr variable isn't actually referring to the number of cypher queries.
So now, I'm having like 500 queries per 3 seconds and that 3 seconds keeps on increasing as well. It's still way slow compared to the embedded neo4j.
If you have to ability to use Neo4j 2.1.0-M01 (don't use it in prod yet!!), you could benefit from new features. If you'd create/generate a CSV file like this:
val1,val2,val3
a_value,another_value,yet_another_value
a,b,c
....
you'd only need to launch the following code:
final GraphDatabaseService graphDB = new RestGraphDatabase("http://server:7474/db/data/");
final RestAPI restAPI = ((RestGraphDatabase) graphDB).getRestAPI();
final RestCypherQueryEngine engine = new RestCypherQueryEngine(restAPI);
final String filePath = "file://C:/your_file_path.csv";
engine.query("USING PERIODIC COMMIT 500 LOAD CSV WITH HEADERS FROM '" + filePath
+ "' AS csv MERGE (a{main:csv.val1,prop2:csv.val2}) MERGE (b{main:csv.val3})"
+ " CREATE UNIQUE (a)-[r:relationshipname]->(b);", null);
You'd have to make sure that the file can be accessed from the machine where your server is installed on.
Take a look at my server plugin that does this for you on the server. If you build this and put in the plugins folder, you could use the plugin in java as follows:
final RestAPI restAPI = new RestAPIFacade("http://server:7474/db/data");
final RequestResult result = restAPI.execute(RequestType.POST, "ext/CSVBatchImport/graphdb/csv_batch_import",
new HashMap<String, Object>() {
{
put("path", "file://C:/.../neo4j.csv");
}
});
EDIT:
You can also use a BatchCallback in the java REST wrapper to boost the performance and it removes the transactional boilerplate code as well. You could write your script similar to:
final RestAPI restAPI = new RestAPIFacade("http://server:7474/db/data");
int counter = 0;
List<Map<String, Object>> statements = new ArrayList<>();
while (isExists) {
statements.add(new HashMap<String, Object>() {
{
put("val1", "abc");
put("val2", "abc");
put("val3", "abc");
}
});
if (++counter % 500 == 0) {
restAPI.executeBatch(new Process(statements));
statements = new ArrayList<>();
}
}
static class Process implements BatchCallback<Object> {
private static final String QUERY = "MERGE (a{main:{val1},prop2:{val2}}) MERGE (b{main:{val3}}) CREATE UNIQUE (a)-[r:relationshipname]-(b);";
private List<Map<String, Object>> params;
Process(final List<Map<String, Object>> params) {
this.params = params;
}
#Override
public Object recordBatch(final RestAPI restApi) {
for (final Map<String, Object> param : params) {
restApi.query(QUERY, param);
}
return null;
}
}

Avoiding duplicate entries in a mongoDB with Java and JSON objects

I´m developing an analyzing program for Twitter Data.
I´m using mongoDB and at the moment. I try to write a Java program to get tweets from the Twitter API and put them in the database.
Getting the Tweets already works very well, but I have a problem when I want to put them in the database. As the Twitter API often returns just the same Tweets, I have to place some kind of index in the database.
First of all, I connect to the database and get the collection related to the search-term, or create this collection if this doesn´t exist.
public void connectdb(String keyword)
{
try {
// on constructor load initialize MongoDB and load collection
initMongoDB();
items = db.getCollection(keyword);
BasicDBObject index = new BasicDBObject("tweet_ID", 1);
items.ensureIndex(index);
} catch (MongoException ex) {
System.out.println("MongoException :" + ex.getMessage());
}
}
Then I get the tweets and put them in the database:
public void getTweetByQuery(boolean loadRecords, String keyword) {
if (cb != null) {
TwitterFactory tf = new TwitterFactory(cb.build());
Twitter twitter = tf.getInstance();
try {
Query query = new Query(keyword);
query.setCount(50);
QueryResult result;
result = twitter.search(query);
System.out.println("Getting Tweets...");
List<Status> tweets = result.getTweets();
for (Status tweet : tweets) {
BasicDBObject basicObj = new BasicDBObject();
basicObj.put("user_name", tweet.getUser().getScreenName());
basicObj.put("retweet_count", tweet.getRetweetCount());
basicObj.put("tweet_followers_count", tweet.getUser().getFollowersCount());
UserMentionEntity[] mentioned = tweet.getUserMentionEntities();
basicObj.put("tweet_mentioned_count", mentioned.length);
basicObj.put("tweet_ID", tweet.getId());
basicObj.put("tweet_text", tweet.getText());
if (mentioned.length > 0) {
// System.out.println("Mentioned length " + mentioned.length + " Mentioned: " + mentioned[0].getName());
}
try {
items.insert(basicObj);
} catch (Exception e) {
System.out.println("MongoDB Connection Error : " + e.getMessage());
loadMenu();
}
}
// Printing fetched records from DB.
if (loadRecords) {
getTweetsRecords();
}
} catch (TwitterException te) {
System.out.println("te.getErrorCode() " + te.getErrorCode());
System.out.println("te.getExceptionCode() " + te.getExceptionCode());
System.out.println("te.getStatusCode() " + te.getStatusCode());
if (te.getStatusCode() == 401) {
System.out.println("Twitter Error : \nAuthentication credentials (https://dev.twitter.com/pages/auth) were missing or incorrect.\nEnsure that you have set valid consumer key/secret, access token/secret, and the system clock is in sync.");
} else {
System.out.println("Twitter Error : " + te.getMessage());
}
loadMenu();
}
} else {
System.out.println("MongoDB is not Connected! Please check mongoDB intance running..");
}
}
But as I mentioned before, there are often the same tweets, and they have duplicates in the database.
I think the tweet_ID field is a good field for an index and should be unique in the collection.
Set the unique option on your index to have MongoDb enforce uniqueness:
items.ensureIndex(index, new BasicDBObject("unique", true));
Note that you'll need to manually drop the existing index and remove all duplicates or you won't be able to create the unique index.
This question is already answered but I would like to contribute a bit since MongoDB API 2.11 offers a method which receives unique option as a parameter:
public void ensureIndex(DBObject keys, String name, boolean unique)
A minor remind to someone who would like to store json documents on MongoDBNote is that uniqueness must be applied to a BasicObject key and not over values. For example:
BasicDBObject basicObj = new BasicDBObject();
basicObj.put("user_name", tweet.getUser().getScreenName());
basicObj.put("retweet_count", tweet.getRetweetCount());
basicObj.put("tweet_ID", tweet.getId());
basicObj.put("tweet_text", tweet.getText());
basicObj.put("a_json_text", "{"info_details":{"info_id":"1234"},"info_date":{"year":"2012"}, {"month":"12"}, {"day":"10"}}");
On this case, you can create unique index only to basic object keys:
BasicDBObject index = new BasicDBObject();
int directionOrder = 1;
index.put("tweet_ID", directionOrder);
boolean isUnique = true;
items.ensureIndex(index, "unique_tweet_ID", isUnique);
Any index regarding JSON value like "info_id" would not work since it´s not a BasicObject key.
Using indexes on MongDB is not as easy as it sounds. You may also check MongoDB docs for more details here Mongo Indexing Tutorials and Mongo Index Concepts. Direction order might be pretty important to understand once you need a composed index which is well explained here Why Direction order matter.

Duplicate Keys in Oracle Berkeley DB Java Edition

I'm using Oracle Berkeley DB Java Edition with tables having key/value format. I'm trying to insert duplicate keys, but keep getting SecondaryIntegrityException. According to Oracle, if the setSortedDuplicates() is set to true, then duplicates are allowed. This does not work in my case. Below is some code with key=bob, value=smith. The first I run it, it runs as expected. If I run it a second time changing only value=johnson, I get SecondaryIntegrityException. Is there something I'm doing wrong? Thanks.
String key = "bob";
String value = "smith";
EnvironmentConfig envConfig = new EnvironmentConfig();
envConfig.setAllowCreate(true);
envConfig.setTransactional(false);
Environment myDBenvironment = new Environment(new File(filePath), envConfig);
DatabaseConfig dbConfig = new DatabaseConfig();
dbConfig.setAllowCreate(true);
dbConfig.setTransactional(false);
Database myDatabase = myDBenvironment.openDatabase(null, dbname,
dbConfig);
// create secondary database
SecondaryConfig mySecConfig = new SecondaryConfig();
mySecConfig.setAllowCreate(true);
mySecConfig.setSortedDuplicates(true);
mySecConfig.setTransactional(false);
mySecConfig.setKeyCreator(new SecondKeyCreator());
SecondaryDatabase mySecondaryDatabase = myDBenvironment
.openSecondaryDatabase(null, secdbname, myDatabase,
mySecConfig);
DatabaseEntry myKey = new DatabaseEntry(key.getBytes("UTF-8"));
Record mydata = new Record();
mydata.setobjectVal(value);
DatabaseEntry myrecord = new DatabaseEntry();
new RecordTupleBinding().objectToEntry(mydata, myrecord);
myDatabase.put(null, myKey, myrecord);
mySecondaryDatabase.close();
myDatabase.close();
myDBenvironment.close();
public class SecondKeyCreator implements SecondaryKeyCreator{
#Override
public boolean createSecondaryKey(SecondaryDatabase arg0,
DatabaseEntry key, DatabaseEntry data, DatabaseEntry secondKey) {
RecordTupleBinding binding = new RecordTupleBinding();
Record record = (Record) binding.entryToObject(data);
try {
secondKey.setData(data.getData());
} catch (Exception e) {
e.printStackTrace();
}
return true;
}
}
Although I am nota an expert on the topic, let me try to help you.
According to Oracle documentation, "If a primary database is to be associated with one or more secondary databases, it may not be configured for duplicates". Do you have an association from this database? If so, this may be the reason.
I hope it helps.
A secondary database is needed and required to allow duplicates. The above works if
secondKey.setData(data.getData());
is changed to
secondKey.setData(((String)record.getobjectVal()).getBytes());

Categories

Resources