Using Spring Mongo is there a way of deleting multiple files from Mongo(stored with GridFS) all at once using only query or something similar? For example if I have in collection .files a field Language I want to be able to delete all entries from .files and also from .chunks in a single query. Not sure if this is possible in Mongo (outside Spring).
Tried to use GridFsTemplate but the delete method calls GridFS.remove method (code from Spring-Mongodb)
public void delete(Query query) {
getGridFs().remove(getMappedQuery(query));
}
And this method gets all files in memory and then deletes them one by one:
public void remove( DBObject query ){
for ( GridFSDBFile f : find( query ) ){
f.remove();
}
}
void remove(){
_fs._filesCollection.remove( new BasicDBObject( "_id" , _id ) );
_fs._chunkCollection.remove( new BasicDBObject( "files_id" , _id ) );
}
UPDATE: from comments:
Yes, you can't perform such kind of queries in MongoDB. Also, there is no direct way of deleting bunch of files in GridFS using both mongofiles command line tool and mongo java driver.
To remove an entire collection at once, you can perform one of these operations:
If your collection is a MongoDB collection, you can delete it with
MongoDatabase db = client.getDatabase("db");
db.getCollection("MyCollectionName").drop();
If you are not certain collection named "MyCollectionName" exists, you should check that first, as described here.
If your collection is a GridFS collection, you can delete it with
MongoDatabase db = client.getDatabase("db");
db.getCollection("MyCollectionName.files").drop();
db.getCollection("MyCollectionName.chunks").drop();
This works, since GridFS collections consist of a .files and a .chunks collection behind the screens, and those are accessible (and removable) in the 'classic' way.
DBObject query;
//write appropriate query
List<GridFSDBFile> fileList = gfs.find(query);
for (GridFSDBFile f : fileList)
{
gfs.remove(f.getFilename());
// if you have not set fileName and
// your _id is of ObjectId type, then you can use
//gfs.remove((ObjectId) file.getId());
}
Related
I am trying to find how to use mongo Atlas search indexes, from java application, which is using spring-data-mongodb to query the data, can anyone share an example for it
what i found was as code as below, but that is used for MongoDB Text search, though it is working, but not sure whether it is using Atlas search defined index.
TextQuery textQuery = TextQuery.queryText(new TextCriteria().matchingAny(text)).sortByScore();
textQuery.fields().include("cast").include("title").include("id");
List<Movies> movies = mongoOperations
.find(textQuery, Movies.class);
I want smaple java code using spring-data-mongodb for below query:
[
{
$search: {
index: 'cast-fullplot',
text: {
query: 'sandeep',
path: {
'wildcard': '*'
}
}
}
}
]
It will be helpful if anyone can explain how MongoDB Text Search is different from Mongo Atlas Search and correct way of using Atalas Search with the help of java spring-data-mongodb.
How to code below with spring-data-mongodb:
Arrays.asList(new Document("$search",
new Document("index", "cast-fullplot")
.append("text",
new Document("query", "sandeep")
.append("path",
new Document("wildcard", "*")))),
new Document())
Yes, spring-data-mongo supports the aggregation pipeline, which you'll use to execute your query.
You need to define a document list, with the steps defined in your query, in the correct order. Atlas Search must be the first step in the pipeline, as it stands. You can translate your query to the aggregation pipeline using the Mongo Atlas interface, they have an option to export the pipeline array in the language of your choosing. Then, you just need to execute the query and map the list of responses to your entity class.
You can see an example below:
public class SearchRepositoryImpl implements SearchRepositoryCustom {
private final MongoClient mongoClient;
public SearchRepositoryImpl(MongoClient mongoClient) {
this.mongoClient = mongoClient;
}
#Override
public List<SearchEntity> searchByFilter(String text) {
// You can add codec configuration in your database object. This might be needed to map
// your object to the mongodb data
MongoDatabase database = mongoClient.getDatabase("aggregation");
MongoCollection<Document> collection = database.getCollection("restaurants");
List<Document> pipeline = List.of(new Document("$search", new Document("index", "default2")
.append("text", new Document("query", "Many people").append("path", new Document("wildcard", "*")))));
List<SearchEntity> searchEntityList = new ArrayList<>();
collection.aggregate(pipeline, SearchEntity.class).forEach(searchEntityList::add);
return searchEntityList;
}
}
I have created a MongoDB Collection using the following code, I want to update this collection add a new column named "username" and also want to change the data type of roleId from String to Long. Please advice how to do this in Java Mongo API
fun createUsers(mongoDB: MongoDatabase) {
var collOptions: ValidationOptions = ValidationOptions()
collOptions.validator(
Filters.and(
Filters.exists("userId"),
Filters.type("userId",BsonType.STRING),
Filters.exists("roleId"),
Filters.type("roleId", BsonType.INT32)
)
)
collOptions.validationLevel(ValidationLevel.STRICT)
collOptions.validationAction(ValidationAction.ERROR)
mongoDB.createCollection("user", CreateCollectionOptions().validationOptions(collOptions))
}
I'm trying to get the objectId of an object that I have updated - this is my java code using the java driver:
Query query = new Query();
query.addCriteria(Criteria.where("color").is("pink"));
Update update = new Update();
update.set("name", name);
WriteResult writeResult = mongoTemplate.updateFirst(query, update, Colors.class);
Log.e("object id", writeResult.getUpsertedId().toString());
The log message returns null. I'm using a mongo server 3.0 on mongolab as I'm on the free tier so it shouldn't return null. My mongo shell is also:
MongoDB shell version: 3.0.7
Is there an easy way to return the object ID for the doc that I have just updated? What is the point of the method getUpsertedId() if I cannot return the upsertedId?
To do what I want, I currently have to issue two queries which is highly cumbersome:
//1st query - updating the object first
Query query = new Query();
query.addCriteria(Criteria.where("color").is("pink"));
Update update = new Update();
update.set("name", name);
WriteResult writeResult = mongoTemplate.updateFirst(query, update, Colors.class);
//2nd query - find the object so that I can get its objectid
Query queryColor = new Query();
queryColor.addCriteria(Criteria.where("color").is("pink"));
queryColor.addCriteria(Criteria.where("name").is(name));
Color color = mongoTemplate.findOne(queryColor, Color.class);
Log.e("ColorId", color.getId());
As per David's answer, I even tried his suggestion to rather use upsert on the template, so I changed the code to the below and it still does not work:
Query query = new Query();
query.addCriteria(Criteria.where("color").is("pink"));
Update update = new Update();
update.set("name", name);
WriteResult writeResult = mongoTemplate.upsert(query, update, Colors.class);
Log.e("object id", writeResult.getUpsertedId().toString());
Simon, I think its possible to achieve in one query. What you need is a different method called findAndModify().
In java driver for mongoDB, it has a method called findOneAndUpdate(filter, update, options).
This method returns the document that was updated. Depending on the options you specified for the method, this will either be the document as it was before the update or as it is after the update. If no documents matched the query filter, then null will be returned. Its not required to pass options, in that case it will return the document that was updated before update operation was applied.
A quick look at the mongoTemplate java driver docs here: http://docs.spring.io/spring-data/mongodb/docs/current/api/org/springframework/data/mongodb/core/FindAndModifyOptions.html tells me that you can use the method call:
public <T> T findAndModify(Query query,
Update update,
FindAndModifyOptions options,
Class<T> entityClass)
You can also change the FindAndModifyOptions class to take on an 'upsert' if the item was not found in the query.If it is found, the object will just be modified.
Upsert only applies if both
The update options had upsert on
A new document was actually created.
Your query neither has upsert enabled, nor creates a new document. Therefore it makes perfect sense that the getUpsertedId() returns null here.
Unfortunately it is not possible to get what you want in a single call with the current API; you need to split it into two calls. This is further indicated by the Mongo shell API for WriteResults:
The _id of the document inserted by an upsert. Returned only if an
upsert results in an insert.
This is an example to do this with findOneAndUpdate(filter,update,options) in Scala:
val findOneAndUpdateOptions = new FindOneAndUpdateOptions
findOneAndUpdateOptions.returnDocument(ReturnDocument.AFTER)
val filter = Document.parse("{\"idContrato\":\"12345\"}")
val update = Document.parse("{ $set: {\"descripcion\": \"New Description\" } }")
val response = collection.findOneAndUpdate(filter,update,findOneAndUpdateOptions)
println(response)
I would like to update a specific collection in MongoDb via Spark in Java.
I am using the MongoDB Connector for Hadoop to retrieve and save information from Apache Spark to MongoDb in Java.
After following Sampo Niskanen's excellent post regarding retrieving and saving collections to MongoDb via Spark, I got stuck with updating collections.
MongoOutputFormat.java includes a constructor taking String[] updateKeys, which I am guessing is referring to a possible list of keys to compare on existing collections and perform an update. However, using Spark's saveAsNewApiHadoopFile() method with parameter MongoOutputFormat.class, I am wondering how to use that update constructor.
save.saveAsNewAPIHadoopFile("file:///bogus", Object.class, Object.class, MongoOutputFormat.class, config);
Prior to this, MongoUpdateWritable.java was being used to perform collection updates. From examples I've seen on Hadoop, this is normally set on mongo.job.output.value, maybe like this in Spark:
save.saveAsNewAPIHadoopFile("file:///bogus", Object.class, MongoUpdateWritable.class, MongoOutputFormat.class, config);
However, I'm still wondering how to specify the update keys in MongoUpdateWritable.java.
Admittedly, as a hacky way, I've set the "_id" of the object as my document's KeyValue so that when a save is performed, the collection will overwrite the documents having the same KeyValue as _id.
JavaPairRDD<BSONObject,?> analyticsResult; //JavaPairRdd of (mongoObject,result)
JavaPairRDD<Object, BSONObject> save = analyticsResult.mapToPair(s -> {
BSONObject o = (BSONObject) s._1;
//for all keys, set _id to key:value_
String id = "";
for (String key : o.keySet()){
id += key + ":" + (String) o.get(key) + "_";
}
o.put("_id", id);
o.put("result", s._2);
return new Tuple2<>(null, o);
});
save.saveAsNewAPIHadoopFile("file:///bogus", Object.class, Object.class, MongoOutputFormat.class, config);
I would like to perform the mongodb collection update via Spark using MongoOutputFormat or MongoUpdateWritable or Configuration, ideally using the saveAsNewAPIHadoopFile() method. Is it possible? If not, is there any other way that does not involve specifically setting the _id to the key values I want to update on?
I tried several combination of config.set("mongo.job.output.value","....") and several combination of
.saveAsNewAPIHadoopFile(
"file:///bogus",
classOf[Any],
classOf[Any],
classOf[com.mongodb.hadoop.MongoOutputFormat[Any, Any]],
mongo_config
)
and none of them worked.
I made it to work by using MongoUpdateWritable class as output of my map method:
items.map(row => {
val mongo_id = new ObjectId(row("id").toString)
val query = new BasicBSONObject()
query.append("_id", mongo_id)
val update = new BasicBSONObject()
update.append("$set", new BasicBSONObject().append("field_name", row("new_value")))
val muw = new MongoUpdateWritable(query,update,false,true)
(null, muw)
})
.saveAsNewAPIHadoopFile(
"file:///bogus",
classOf[Any],
classOf[Any],
classOf[com.mongodb.hadoop.MongoOutputFormat[Any, Any]],
mongo_config
)
The raw query executed in mongo is then something like this:
2014-11-09T13:32:11.609-0800 [conn438] update db.users query: { _id: ObjectId('5436edd3e4b051de6a505af9') } update: { $set: { value: 10 } } nMatched:1 nModified:0 keyUpdates:0 numYields:0 locks(micros) w:24 3ms
Trying to use a similar example from the sample code found here
My sample function is:
void query()
{
String nodeResult = "";
String rows = "";
String resultString;
String columnsString;
System.out.println("In query");
// START SNIPPET: execute
ExecutionEngine engine = new ExecutionEngine( graphDb );
ExecutionResult result;
try ( Transaction ignored = graphDb.beginTx() )
{
result = engine.execute( "start n=node(*) where n.Name =~ '.*79.*' return n, n.Name" );
// END SNIPPET: execute
// START SNIPPET: items
Iterator<Node> n_column = result.columnAs( "n" );
for ( Node node : IteratorUtil.asIterable( n_column ) )
{
// note: we're grabbing the name property from the node,
// not from the n.name in this case.
nodeResult = node + ": " + node.getProperty( "Name" );
System.out.println("In for loop");
System.out.println(nodeResult);
}
// END SNIPPET: items
// START SNIPPET: columns
List<String> columns = result.columns();
// END SNIPPET: columns
// the result is now empty, get a new one
result = engine.execute( "start n=node(*) where n.Name =~ '.*79.*' return n, n.Name" );
// START SNIPPET: rows
for ( Map<String, Object> row : result )
{
for ( Entry<String, Object> column : row.entrySet() )
{
rows += column.getKey() + ": " + column.getValue() + "; ";
System.out.println("nested");
}
rows += "\n";
}
// END SNIPPET: rows
resultString = engine.execute( "start n=node(*) where n.Name =~ '.*79.*' return n.Name" ).dumpToString();
columnsString = columns.toString();
System.out.println(rows);
System.out.println(resultString);
System.out.println(columnsString);
System.out.println("leaving");
}
}
When I run this in the web console I get many results (as there are multiple nodes that have an attribute of Name that contains the pattern 79. Yet running this code returns no results. The debug print statements 'in loop' and 'nested' never print either. Thus this must mean there are not results found in the Iterator, yet that doesn't make sense.
And yes, I already checked and made sure that the graphDb variable is the same as the path for the web console. I have other code earlier that uses the same variable to write to the database.
EDIT - More info
If I place the contents of query in the same function that creates my data, I get the correct results. If I run the query by itself it returns nothing. It's almost as the query works only in the instance where I add the data and not if I come back to the database cold in a separate instance.
EDIT2 -
Here is a snippet of code that shows the bigger context of how it is being called and sharing the same DBHandle
package ContextEngine;
import ContextEngine.NeoHandle;
import java.util.LinkedList;
/*
* Class to handle streaming data from any coded source
*/
public class Streamer {
private NeoHandle myHandle;
private String contextType;
Streamer()
{
}
public void openStream(String contextType)
{
myHandle = new NeoHandle();
myHandle.createDb();
}
public void streamInput(String dataLine)
{
Context context = new Context();
/*
* get database instance
* write to database
* check for errors
* report errors & success
*/
System.out.println(dataLine);
//apply rules to data (make ContextRules do this, send type and string of data)
ContextRules contextRules = new ContextRules();
context = contextRules.processContextRules("Calls", dataLine);
//write data (using linked list from contextRules)
NeoProcessor processor = new NeoProcessor(myHandle);
processor.processContextData(context);
}
public void runQuery()
{
NeoProcessor processor = new NeoProcessor(myHandle);
processor.query();
}
public void closeStream()
{
/*
* close database instance
*/
myHandle.shutDown();
}
}
Now, if I call streamInput AND query in in the same instance (parent calls) the query returns results. If I only call query and do not enter ANY data in that instance (yet web console shows data for same query) I get nothing. Why would I have to create the Nodes and enter them into the database at runtime just to return a valid query. Shouldn't I ALWAYS get the same results with such a query?
You mention that you are using the Neo4j Browser, which comes with Neo4j. However, the example you posted is for Neo4j Embedded, which is the in-process version of Neo4j. Are you sure you are talking to the same database when you try your query in the Browser?
In order to talk to Neo4j Server from Java, I'd recommend looking at the Neo4j JDBC driver, which has good support for connecting to the Neo4j server from Java.
http://www.neo4j.org/develop/tools/jdbc
You can set up a simple connection by adding the Neo4j JDBC jar to your classpath, available here: https://github.com/neo4j-contrib/neo4j-jdbc/releases Then just use Neo4j as any JDBC driver:
Connection conn = DriverManager.getConnection("jdbc:neo4j://localhost:7474/");
ResultSet rs = conn.executeQuery("start n=node({id}) return id(n) as id", map("id", id));
while(rs.next()) {
System.out.println(rs.getLong("id"));
}
Refer to the JDBC documentation for more advanced usage.
To answer your question on why the data is not durably stored, it may be one of many reasons. I would attempt to incrementally scale back the complexity of the code to try and locate the culprit. For instance, until you've found your problem, do these one at a time:
Instead of looping through the result, print it using System.out.println(result.dumpToString());
Instead of the regex query, try just MATCH (n) RETURN n, to return all data in the database
Make sure the data you are seeing in the browser is not "old" data inserted earlier on, but really is an insert from your latest run of the Java program. You can verify this by deleting the data via the browser before running the Java program using MATCH (n) OPTIONAL MATCH (n)-[r]->() DELETE n,r;
Make sure you are actually working against the same database directories. You can verify this by leaving the server running. If you can still start your java program, unless your Java program is using the Neo4j REST Bindings, you are not using the same directory. Two Neo4j databases cannot run against the same database directory simultaneously.