MongoDB Java driver: Undefined values are not shown - java

Open mongo shell and create a document with a undefined value:
> mongo
MongoDB shell version: 2.4.0
connecting to: test
> use mydb
switched to db mydb
> db.mycol.insert( {a_number:1, a_string:"hi world", a_null:null, an_undefined:undefined} );
> db.mycol.findOne();
{
"_id" : ObjectId("51c2f28a7aa5079cf24e3999"),
"a_number" : 1,
"a_string" : "hi world",
"a_null" : null,
"an_undefined" : null
}
As we can see, javascript translates the "undefined" value (stored in the db) to a "null" value, when showing it to the user. But, in the db, the value is still "undefined", as we are going to see with java.
Let's create a "bug_undefined_java_mongo.java" file, with the following content:
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.DBCursor;
import com.mongodb.MongoClient;
public class bug_undefined_java_mongo
{
String serv_n = "myserver"; // server name
String db_n = "mydb"; // database name
String col_n = "mycol"; // collection name
public static void main(String[] args)
{
new bug_undefined_java_mongo().start();
}
public void start()
{
pr("Connecting to server ...");
MongoClient cli = null;
try
{
cli = new MongoClient( serv_n );
}
catch (Exception e)
{
pr("Can't connecto to server: " + e);
System.exit(1);
}
if (cli == null)
{
pr("Can't connect to server");
System.exit(1);
}
pr("Selecting db ...");
DB db_res = cli.getDB( db_n );
pr("Selecting collection ...");
DBCollection col = db_res.getCollection( col_n );
pr("Searching documents ...");
DBCursor cursor = null;
try
{
cursor = col.find( );
}
catch (Exception e)
{
pr("Can't search for documents: " + e);
System.exit(1);
}
pr("Printing documents ...");
try
{
while (cursor.hasNext())
{
Object doc_obj = cursor.next();
System.out.println("doc: " + doc_obj);
}
}
catch (Exception e)
{
pr("Can't browse documents: " + e);
return;
}
finally
{
pr("Closing cursor ...");
cursor.close();
}
}
public void pr(String cad)
{
System.out.println(cad);
}
}
After compiling and running it, we get this:
Connecting to server ...
Selecting db ...
Selecting collection ...
Searching documents ...
Printing documents ...
doc: { "_id" : { "$oid" : "51c2f0f85353d3425fcb5a14"} , "a_number" : 1.0 , "a_string" : "hi world" , "a_null" : null }
Closing cursor ...
We see that the "a_null:null" pair is shown, but... the "an_undefined:undefined" pair has disappeared! (both the key and the value).
Why? Is it a bug?
Thank you

Currently undefined is not supported by the java driver as there is no equivalent mapping in java.
Other drivers such as pymongo and the js shell handles this differently by casting undefined to None when representing the data, however it is a separate datatype and is deprecated in the bson spec.
If you need it in the java driver then you will have to code your own decoder factory and then set it like so:
collection.setDBDecoderFactory(MyDecoder.FACTORY);
A minimal example that has defined handling for undefined and factory is available on github in the horn of mongo repo.

I see, creating a factory could be a solution.
Anyway, probably many developers would find it useful the posibility of enabling a mapping in the driver to convert automatically "undefined" values to "null" value. For example, by calling a mapUndefToNull() method:
cli = new MongoClient( myserver );
cli.mapUndefToNull(true);
In my case, I'm running a MapReduce (it is Javascript code) on my collections, and I am having to explicitly convert the undefined values (generated when accessing to non existent keys) to null, in order to avoid Java driver to remove it:
try { value = this[ key ] } catch(e) {value = null}
if (typeof value == "undefined") value = null; // avoid Java driver to remove it
So, as a suggestion, I'd like the mapUndefToNull() method to be added to the Java driver. If possible.
Thank you

Related

CosmosDB : CosmosPatchOperation not working via Stored Procedure

I want to replace cosmos batch with Stored Proc as my requirement is to upsert 100+ records which cosmos batch does not support. I am adding 2 java objects and 1 CosmosPatchOperations
in List and passing to below method.Whenver I am adding cosmos patch object no rows got inserted/updated otherwise it is working fine.I want to perform both insertion and patch operation in same transaction. Can somebody please guide how to modify SP so that it supports both insert and patch operation.
String rowsUpserted = "";
try
{
rowsUpserted = container
.getScripts()
.getStoredProcedure("createEvent")
.execute(Arrays.asList(listObj), options)
.getResponseAsString();
}catch(Exception e){
e.printStackTrace();
}
Stored Proc
function createEvent(items) {
var collection = getContext().getCollection();
var collectionLink = collection.getSelfLink();
var count = 0;
if (!items) throw new Error("The array is undefined or null.");
var numItems = items.length;
if (numItems == 0) {
getContext().getResponse().setBody(0);
return;
}
tryCreate(items[count], callback);
function tryCreate(item, callback) {
var options = { disableAutomaticIdGeneration: false };
var isAccepted = collection.upsertDocument(collectionLink, item, options, callback);
if (!isAccepted) getContext().getResponse().setBody(count);
}
function callback(err, item, options) {
if (err) throw err;
count++;
if (count >= numItems) {
getContext().getResponse().setBody(count);
} else {
tryCreate(items[count], callback);
}
}
}
Patching doesn't appear to be supported by the Collection type in the Javascript stored proc API. I suspect this was done as it's more an optimisiation for remote calls and SP execute locally so it's not really neccessary.
API reference is here: http://azure.github.io/azure-cosmosdb-js-server/Collection.html
upsertDocument is expecting the full document.

Hive-MetaStore issue during updating table partitions

I am trying to update hive-table partitions using Hive Java Api's.These are the below steps that i am following to achieve this:-
1.Extracting partitions which are not in metastore.
2.Adding these Partitions to table.
3.Going back to Hive-Command line and running show partitions and msck repair table command just to make sure everything is fine.
What i got:-
1.Show partitions is working fine(giving list of partitions which i have added).
2.MSCK Repair command is not working(getting this :Partitions are not present in metastore.)
Here is the piece of code that i am using :-
public class HiveMetastoreChecker {
public static void main(String[] args) {
final String dbName = "db_name";
final String tableName = "db_name.table_name";
CheckResult result = new CheckResult();
try {
Configuration configuration = new Configuration();
HiveConf conf = new HiveConf();
conf.addResource(configuration);
Hive hive = Hive.get(conf, true);
HiveMetaStoreChecker checker = new HiveMetaStoreChecker(hive);
Table table = new Table(dbName, tableName);
table.setDbName(dbName);
table.setInputFormatClass(TextInputFormat.class);
table.setOutputFormatClass(HiveIgnoreKeyTextOutputFormat.class);
table = hive.getTable(dbName, tableName);
checker.checkMetastore(dbName, tableName, null, result);
System.out.println(table.getDataLocation());
List<CheckResult.PartitionResult> partitionNotInMs = result.getPartitionsNotInMs();
System.out.println("not in ms " + partitionNotInMs.size());
List<org.apache.hadoop.hive.ql.metadata.Partition> partitions = hive.getPartitions(table);
System.out.println("partitions size " + partitions.size());
AddPartitionDesc apd = new AddPartitionDesc(table.getDbName(), table.getTableName(), false);
List<String> finalListOfPartitionsNotInMs = new ArrayList<String>();
for (CheckResult.PartitionResult part : partitionNotInMs){
if(!finalListOfPartitionsNotInMs.contains(part.getPartitionName().replace("/",""))){
finalListOfPartitionsNotInMs.add(part.getPartitionName().replace("/",""));
}
}
for (String partition:finalListOfPartitionsNotInMs) {
apd.addPartition(Warehouse.makeSpecFromName(partition), table.getDataLocation().toString());
}
hive.createPartitions(apd);
} catch (HiveException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (MetaException e) {
e.printStackTrace();
}
}
}
Any kind of help would be appreciated.
Thanks.
MSCK REPAIR is failing on HIVE? If yes then check if the Partition Column name is in CAPITAL Letters. I found the same issue where my PARTITION on aws s3 was like DCA=1000.
If that is the case then execute MSCK REPAIR using Spark SQL and it will owrk, in case you don't want to rename the partition into lower case.

MongoDB - find() calls getting stuck sometimes - Timeout

We are using MongoDb for saving and fetching data.
All calls that are putting data into collections are working fine and are through common method.
All calls that are fetching data from collections are working fine sometimes and are through common method.
But Sometimes, only for one of the collection, i get my calls being stuck for forever, consuming CPU usage. I have to manually kill the threads otherwise it consumes my whole CPU.
Mongo Connection
MongoClient mongo = new MongoClient(hostName , Integer.valueOf(port));
DB mongoDb = mongo.getDB(dbName);
Code To fetch
DBCollection collection = mongoDb.getCollection(collectionName);
DBObject dbObject = new BasicDBObject("_id" , key);
DBCursor cursor = collection.find(dbObject);
Though i have figured out the collection for which it is causing issues, but how can i improve upon this, since it is occurring for this particular collection and sometimes.
EDIT
Code to save
DBCollection collection = mongoDb.getCollection(collectionName);
DBObject query = new BasicDBObject("_id" , key);
DBObject update = new BasicDBObject();
update.put("$set" , JSON.parse(value));
collection.update(query , update , true , false);
Bulk Write / collection
DB mongoDb = controllerFactory.getMongoDB();
DBCollection collection = mongoDb.getCollection(collectionName);
BulkWriteOperation bulkWriteOperation = collection.initializeUnorderedBulkOperation();
Map<String, Object> dataMap = (Map<String, Object>) JSON.parse(value);
for (Entry<String, Object> entrySet : dataMap.entrySet()) {
BulkWriteRequestBuilder bulkWriteRequestBuilder = bulkWriteOperation.find(new BasicDBObject("_id" ,
entrySet.getKey()));
DBObject update = new BasicDBObject();
update.put("$set" , entrySet.getValue());
bulkWriteRequestBuilder.upsert().update(update);
}
How can i set timeout for fetch calls..??
A different approach is to use the proposed method for MongoDB 3.2 Driver. Keep in mind that you have to update your .jar libraries (if you haven't) to the latest version.
public final MongoClient connectToClient(String hostName, String port) {
try {
MongoClient client = new MongoClient(hostName, Integer.valueOf(port));
return client;
} catch(MongoClientException e) {
System.err.println("Cannot connect to Client.");
return null;
}
}
public final MongoDatabase connectToDB(String databaseName) {
try {
MongoDatabase db = client.getDatabase(databaseName);
return db;
} catch(Exception e) {
System.err.println("Error in connecting to database " + databaseName);
return null;
}
public final void closeConnection(MongoClient client) {
client.close();
}
public final void findDoc(MongoDatabase db, String collectionName) {
MongoCollection<Document> collection = db.getCollection(collectionName);
try {
FindIterable<Document> iterable = collection
.find(new Document("_id", key));
Document doc = iterable.first();
//For an Int64 field named 'special_id'
long specialId = doc.getLong("special_id");
} catch(MongoException e) {
System.err.println("Error in retrieving document.");
} catch(NullPointerException e) {
System.err.println("Document with _id " + key + " does not exist.");
}
}
public final void insertToDB(MongoDatabase db, String collectioName) {
try {
db.getCollection(collectionName).insertOne(new Document()
.append("special_id", 5)
//Append anything
);
catch(MongoException e) {
System.err.println("Error in inserting new document.");
}
}
public final void updateDoc(MongoDatabase db, String collectionName, long id) {
MongoCollection<Document> collection = db.getCollection(collectionName);
try {
collection.updateOne(new Document("_id", id),
new Document("$set",
new Document("special_id",
7)));
catch(MongoException e) {
System.err.println("Error in updating new document.");
}
}
public static void main(String[] args) {
String hostName = "myHost";
String port = "myPort";
String databaseName = "myDB";
String collectionName = "myCollection";
MongoClient client = connectToClient(hostName, port);
if(client != null) {
MongoDatabase db = connectToDB(databaseName);
if(db != null) {
findDoc(db, collectionName);
}
client.closeConnection();
}
}
EDIT: As the others suggested, check from the command line if the procedure of finding the document by its ID is slow too. Then maybe this is a problem with your hard drive. The _id is supposed to be indexed but for better or for worse, re-create the index on the _id field.
The answers posted by others are great, but did not solve my purpose.
Actually issue was in my existing code itself , my cursor was waiting in while loop infinite time.
I was missing few checks which has been resolved now.
Just some possible explanations/thoughts.
In general "query by id" has to be fast since _id is supposed to be indexed, always. The code snippet looks correct, so probably the reason is in mongo itself. This leads me to a couple of suggestions:
Try to connect to mongo directly from the command line and run the "find" from there. The chances are that you'll still be able to observe occasional slowness.
In this case:
Maybe its about the disks (maybe this particular server is deployed on the slow disk or at least there is a correlation with some slowness of accessing the disk).
Maybe your have a sharded configuration and one shard is slower than others
Maybe its a network issue that occurs sporadically. If you run mongo locally/on staging env. with the same collection does this reproduce?
Maybe (Although I hardly believe that) the query runs in sub un-optimal way. In this case you can use "explain()" as someone has already suggested here.
If you happen to have replica set, please figure out what is the [Read Preference]. Who knows, maybe you prefer to get this id from the sub-optimal server

I got error using MapReduce in MongoDB

first of all I using OS windows XP 32-bit, MongoDB as NoSQL DB and Eclipse as editor program. I got an assignment from my school about MapReduce, so I decided to find how many working-age and non-working population using mapreduce. I use this codes to input data and save as Insert.java :
package mongox;
import com.mongodb.BasicDBObject;
import com.mongodb.Mongo;
import com.mongodb.DB;
import com.mongodb.DBCollection;
public class Insert {
public static void main(String[] args) {
try{
Mongo mongox = new Mongo();
DB db = mongox.getDB("DBPublic");
DBCollection koleksi = db.getCollection("lancestorvalley");
BasicDBObject object = new BasicDBObject();
object = new BasicDBObject();
object.put("NIK", "7586930211");
object.put("Name", "Richard Bou");
object.put("Sex", "M");
object.put("Age", "31");
object.put("Blood", "A");
object.put("Status", "Married");
object.put("Education", "Bachelor degree");
object.put("Employment", "Labor");
koleksi.insert(object);
}
catch(Exception e){
System.out.println(e.toString());
}
}
}
I use this code for MapReduce and save as Mapreduce.java :
package mongox;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.DBObject;
import com.mongodb.MapReduceCommand;
import com.mongodb.MapReduceOutput;
import com.mongodb.Mongo;
public class Mapreduce {
public static void main(String[] args) {
try{
Mongo mongox = new Mongo("localhost", 27017);
DB db = mongox.getDB("DBPublic");
DBCollection koleksi = db.getCollection("lancestorvalley");
String map = "function() { "+
"var category; " +
"if ( this.Age >= 15 && this.Age <=59 ) "+
"category = 'Working-Age Population'; " +
"else " +
"category = 'Non-Working-Age Population'; "+
"emit(category, {Nama: this.Nama});}";
String reduce = "function(key, values) { " +
"var sum = 0; " +
"values.forEach(function(doc) { " +
"sum += 1; "+
"}); " +
"return {data: sum};} ";
MapReduceCommand cmd = new MapReduceCommand(koleksi, map, reduce,
null, MapReduceCommand.OutputType.INLINE, null);
MapReduceOutput out = koleksi.mapReduce(cmd);
for (DBObject o : out.results()) {
System.out.println(o.toString());
}
}
catch(Exception e){
e.printStackTrace();;
}
}
}
I already input 5000 data and when I run the Mapreduce.java the output is :
{ "_id" : "Non-Working-age population" , "value" : { "data" : 41.0}}
{ "_id" : "Working-age Population" , "value" : { "data" : 60.0}}
Is there something wrong with my code in Mapreduce.java? why the output is only like that while the data is about 5000?
Hopefully someone could help me, Thanks before guys
MongoDB docs explicity state the below , which might be cause of un-expected behavior:
Platform Support
Starting in version 2.2, MongoDB does not support Windows XP. Please use a more recent version of Windows to use more recent releases of MongoDB.
Moreover :
MongoDB for Windows 32-bit runs on any 32-bit version of Windows newer than Windows XP. 32-bit versions of MongoDB are only intended for older systems and for use in testing and development systems. 32-bit versions of MongoDB only support databases smaller than 2GB.

Avoiding duplicate entries in a mongoDB with Java and JSON objects

I´m developing an analyzing program for Twitter Data.
I´m using mongoDB and at the moment. I try to write a Java program to get tweets from the Twitter API and put them in the database.
Getting the Tweets already works very well, but I have a problem when I want to put them in the database. As the Twitter API often returns just the same Tweets, I have to place some kind of index in the database.
First of all, I connect to the database and get the collection related to the search-term, or create this collection if this doesn´t exist.
public void connectdb(String keyword)
{
try {
// on constructor load initialize MongoDB and load collection
initMongoDB();
items = db.getCollection(keyword);
BasicDBObject index = new BasicDBObject("tweet_ID", 1);
items.ensureIndex(index);
} catch (MongoException ex) {
System.out.println("MongoException :" + ex.getMessage());
}
}
Then I get the tweets and put them in the database:
public void getTweetByQuery(boolean loadRecords, String keyword) {
if (cb != null) {
TwitterFactory tf = new TwitterFactory(cb.build());
Twitter twitter = tf.getInstance();
try {
Query query = new Query(keyword);
query.setCount(50);
QueryResult result;
result = twitter.search(query);
System.out.println("Getting Tweets...");
List<Status> tweets = result.getTweets();
for (Status tweet : tweets) {
BasicDBObject basicObj = new BasicDBObject();
basicObj.put("user_name", tweet.getUser().getScreenName());
basicObj.put("retweet_count", tweet.getRetweetCount());
basicObj.put("tweet_followers_count", tweet.getUser().getFollowersCount());
UserMentionEntity[] mentioned = tweet.getUserMentionEntities();
basicObj.put("tweet_mentioned_count", mentioned.length);
basicObj.put("tweet_ID", tweet.getId());
basicObj.put("tweet_text", tweet.getText());
if (mentioned.length > 0) {
// System.out.println("Mentioned length " + mentioned.length + " Mentioned: " + mentioned[0].getName());
}
try {
items.insert(basicObj);
} catch (Exception e) {
System.out.println("MongoDB Connection Error : " + e.getMessage());
loadMenu();
}
}
// Printing fetched records from DB.
if (loadRecords) {
getTweetsRecords();
}
} catch (TwitterException te) {
System.out.println("te.getErrorCode() " + te.getErrorCode());
System.out.println("te.getExceptionCode() " + te.getExceptionCode());
System.out.println("te.getStatusCode() " + te.getStatusCode());
if (te.getStatusCode() == 401) {
System.out.println("Twitter Error : \nAuthentication credentials (https://dev.twitter.com/pages/auth) were missing or incorrect.\nEnsure that you have set valid consumer key/secret, access token/secret, and the system clock is in sync.");
} else {
System.out.println("Twitter Error : " + te.getMessage());
}
loadMenu();
}
} else {
System.out.println("MongoDB is not Connected! Please check mongoDB intance running..");
}
}
But as I mentioned before, there are often the same tweets, and they have duplicates in the database.
I think the tweet_ID field is a good field for an index and should be unique in the collection.
Set the unique option on your index to have MongoDb enforce uniqueness:
items.ensureIndex(index, new BasicDBObject("unique", true));
Note that you'll need to manually drop the existing index and remove all duplicates or you won't be able to create the unique index.
This question is already answered but I would like to contribute a bit since MongoDB API 2.11 offers a method which receives unique option as a parameter:
public void ensureIndex(DBObject keys, String name, boolean unique)
A minor remind to someone who would like to store json documents on MongoDBNote is that uniqueness must be applied to a BasicObject key and not over values. For example:
BasicDBObject basicObj = new BasicDBObject();
basicObj.put("user_name", tweet.getUser().getScreenName());
basicObj.put("retweet_count", tweet.getRetweetCount());
basicObj.put("tweet_ID", tweet.getId());
basicObj.put("tweet_text", tweet.getText());
basicObj.put("a_json_text", "{"info_details":{"info_id":"1234"},"info_date":{"year":"2012"}, {"month":"12"}, {"day":"10"}}");
On this case, you can create unique index only to basic object keys:
BasicDBObject index = new BasicDBObject();
int directionOrder = 1;
index.put("tweet_ID", directionOrder);
boolean isUnique = true;
items.ensureIndex(index, "unique_tweet_ID", isUnique);
Any index regarding JSON value like "info_id" would not work since it´s not a BasicObject key.
Using indexes on MongDB is not as easy as it sounds. You may also check MongoDB docs for more details here Mongo Indexing Tutorials and Mongo Index Concepts. Direction order might be pretty important to understand once you need a composed index which is well explained here Why Direction order matter.

Categories

Resources