Update an existing collection in MongoDB using Java-Hadoop connector - java

Is it possible to update existing MongoDB collection with new data. I am using hadoop job to read write data to Mongo. Required scenario is :-
Say first collection in Mongo is
{
"_id" : 1,
"value" : "aaa"
"value2" : null
}
after reading data from Mongo and processing data, MongoDB should contain
{
"_id" : 1,
"value" : "aaa"
"value2" : "bbb"
}
If possible, please provide some dummy code.

BasicBSONObject query=new BasicBSONObject();
query.append("fieldname", value);
BasicBSONObject update=new BasicBSONObject();
update.append("$set", new BasicBSONObject().append("newfield",value));
MongoUpdateWritable muw=new MongoUpdateWritable(query,update,false,true);
contex.write(key, muw);
query : is used for providing condition(matching condition).
update : is used for adding new field and value in existing collection.
MongoUpdateWritable:
3rd parameter is upsert value(same as mongodb)
4th parameter is multiple update in many documents in a collection.
Set in Driver class
job.setOutputValueClass(MongoUpdateWritable.class);

I have done it by extending org.apache.hadoop.mapreduce.RecordWriter and overriding write method of this class.

The Mongo-Hadoop Connector doesn't currently suppor this feature. You can open a feature request in the MongoDB Jira if you like.

I have done it by the stratio, if you are using spark, you can check it out!

Related

Diferences between insert, save, update in MongoTemplate (Spring)

I am starting with Spring and MongoDB. I have seen that there are several methods to insert and / or update. I have also read some posts here explaining some concepts. But I don't quite understand them.
Correct me if I'm wrong or if things are missing.
Update (): only updates an object and only works if it has an id.
Upsert (): Makes an Update if the object exists (it must have an id) or inserts it if it does not exist.
Insert (): You don't need an id and add a Document to the collection.
save (): I don't really know the difference with an insert.
If there are more methods that work similarly and that I forgot to mention, I would appreciate if you could explain it as well.
Save
The save method saves the document to collection for the entity type of the given object. When we pass collection name, then document is saved in the specified collection, even if entity is of different type.
`Student ram = new Student(101,"Ram",20);
mongoTemplate.save(ram);
Person newPerson = new Person(102, "Shyam");
mongoTemplate.save(newPerson, "student"); `
After Save
{ "_id" : 101, "name" : "Ram", "age" : 20, "_class" : "com.concretepage.entity.Student" }
{ "_id" : 102, "name" : "Shyam", "_class" : "com.concretepage.entity.Person" }
Insert
To insert a document into MongoDB collection, the MongoTemplate provides insert method. Find the code to insert one document.
Student ram = new Student(1,"Ram",20);
mongoTemplate.insert(ram);
After Insert
{ "_id" : 1, "name" : "Ram", "age" : 20, "_class" : "com.concretepage.entity.Student" }
for more info, Refer below link
https://www.concretepage.com/spring-5/spring-data-mongotemplate

ElasticSearch Java API: Enabling fielddata on text fields

I have created an ElasticSearch index using the ElasticSearch Java API. Now I would like to perform some aggregations on data stored in this index, but I get the following error:
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [item] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
As suggested at this link, to solve this issue I should enable fielddata on the "item" text field, but how can I do that using the ElasticSearch Java API?
An alternative might be mapping the "item" field as a keyword, but same question: how can I do that with the ElasticSearch Java API?
For a new index you can set the mappings at creation time by doing something like:
CreateIndexRequest createIndexRequest = new CreateIndexRequest(indexName);
String source = // probably read the mapping from a file
createIndexRequest.source(source, XContentType.JSON);
restHighLevelClient.indices().create(createIndexRequest);
The mapping should have the same format as the request you can do against the rest endpoint similar to this:
{
"mappings": {
"your_type": {
"properties": {
"your_property": {
"type": "keyword"
}
}
}
}
}
Use XContentBuilder, easy to create json string to create or update mapping. It's seem like
.startObject("field's name")
.field("type", "text")
.field("fielddata", true)
.endObject()
After just use IndexRequest to create new indices or use PutMappingRequest to update old mapping
I have recently encountered this issue but in my case, I faced this issue when I was performing the Sorting. I have posted a working solution for this problem in another similar question -> set field data = true on java elasticsearch

Spring mongo query collection on property with underscore char

I'm building a query to retrieve elements from a mongo collection, using MongoTemplate. The query criteria contains a property with an underscore, that somehow is replaced with '._', making the query always return 0 elements.
Criteria matchingCriteria = Criteria
.where("entries").elemMatch(Criteria.where("app_id").is(appId))
Looking to the logs I can see the generated query as follows:
o.s.data.mongodb.core.MongoTemplate: find using query: { "entries" : { "$elemMatch" : { "app._id" : "5834718ab0"}}} fields: null for class: Ranking in collection: ranking
I've already tried with BasicQuery, slashing underscore with '\\', and using the unicode “app\u005Fid". None of them worked. It's important to note that a collection with name "app" exists in my database.
The behaviour doesn't look standard. When I use another property with an underscore the value is not replaced:
Criteria matchingCriteria = Criteria .where("entries").elemMatch(Criteria.where("unique_app_id").‌​is(appId))
The logs:
o.s.data.mongodb.core.MongoTemplate find using query: { "entries" : { "$elemMatch" : { "unique_app_id" : "1131706359"}}} fields: null for class: class Ranking in collection: ranking
entries is an array with collection with the following format:
{
"instanceId" : "654ba2d16579e",
"app_id" : "583471adb0",
"unique_app_id" : "554577506",
"value" : 169
}
It's worth mentioning that the same query (without the underscore replacement) works fine in a mongo IDE (Robomongo in this case).
I'm using spring-boot-starter-data-mongodb 1.4.1.RELEASE.
I'm really out of ideas right now.
Any suggestion ?
Per section 3.4.3 of this Spring Data Commons documentation:
As we treat underscore as a reserved character we stongly advise to
follow standard Java naming conventions (i.e. not using underscores in
property names but camel case instead).
I don't believe you can use an underscore character in the middle of an element's name using Spring. Manual references are named after the referenced collection. Use the document type (collection name in singular) followed by _id ( <document>_id ). This is the only case where you can use underscore in the middle.
Update: Here is an existing pull request for the exact behavior you're seeing, as well as Spring's bug tracker for it.
From the Mongo shell, I can execute the following query with success:
> db.app.findOne({ "entries" : { "$elemMatch" : { "app_id" : "1"}}})
{
"_id" : ObjectId("58a5bc6afa8dd4ae3097d5f7"),
"name" : "Keith",
"entries" : [
{
"instanceId" : "654ba2d16579e",
"app_id" : "1"
}
]
}
So, perhaps the Spring API doesn't split when it finds multiple _ tokens when parsing a criteria, but does split for traversal when parsing one.

using $addToset with java morphia aggregation

I have mongodb aggregation query and it works perfectly in shell.
How can i rewrite this query to use with morphia ?
org.mongodb.morphia.aggregation.Group.addToSet(String field) accepts only one field name but i need to add object to the set.
Query:
......aggregate([
{$group:
{"_id":"$subjectHash",
"authors":{$addToSet:"$fromAddress.address"},
---->> "messageDataSet":{$addToSet:{"sentDate":"$sentDate","messageId":"$_id"}},
"messageCount":{$sum:1}}},
{$sort:{....}},
{$limit:10},
{$skip:0}
])
Java code:
AggregationPipeline aggregationPipeline = myDatastore.createAggregation(Message.class)
.group("subjectHash",
grouping("authors", addToSet("fromAddress.address")),
--------??????------>> grouping("messageDataSet", ???????),
grouping("messageCount", new Accumulator("$sum", 1))
).sort(...)).limit(...).skip(...);
That's currently not supported but if you'll file an issue I'd be happy to include that in an upcoming release.
Thanks for your answer, I can guess that according to source code. :(
I don't want to use spring-data or java-driver directly (for this project) so I changed my document representation.
Added messageDataSet object which contains sentDate and messageId (and some other nested objects) (these values become duplicated in a document which is a bad design).
Aggregation becomes : "messageDataSet":{$addToSet:"$messageDataSet"},
and Java code is: grouping("messageDataSet", addToSet("messageDataSet")),
This works with moprhia. Thanks.

Retrieving a Subset of Fields from Mongodb in Java

I'm trying to retrieve only a subset of fields from mongodb using Java driver. In documentation I found a way to do this javascript-way
db.posts.find( { tags : 'tennis' }, { comments : 0 } );
Trouble is, if I do similar thing in java
db.getCollection("posts").find(new BasicDBObject("comments",0));
What it does, is filtering objects where "comments" == 0, and does pull comments field as usual.
How to do this properly in java?
I think you have to use it the following way:
BasicDBObject keys = new BasicDBObject();
keys.put("comments", 0);
db.getCollection("posts").find(new BasicDBObject(), keys);

Categories

Resources