I've decided to move one of our projects from PostgreSQL to MongoDB and this project deals with images. I am able to save images and retrieve them by their _id now but I couldn't find a function with GridFSOperations where I could safely get all documents. I am doing this so that I could take photo meta-data I saved with the image and index them with Lucene (as I needed a full text search on some relevant metadata, also future possible scenarios where we might need to rebuild the Lucene index)
In the old code, I simply had a function with an offset and limit for the SQL query as I found out (the hard way) that our dev system can only do bulk Lucene adds in groups of 5k. Is there an equivalent way of doing this with GridFS?
Edit:
function inherited from the old interface:
public List<Photo> getPublicPhotosForReindexing(long offset, long limit) {
List<Photo> result = new ArrayList<>();
List<GridFSDBFile> files = gridFsOperations.find(new Query().limit((int) limit).skip((int) offset));
for(GridFSDBFile file:files) {
result.add(convertToPhoto(file));
}
return result;
}
a simple converter taking parts of the metadata and putting it in the POJO I made:
private Photo convertToPhoto(GridFSDBFile fsFile) {
Photo resultPhoto = new Photo(fsFile.getId().toString());
try {
resultPhoto
.setOriginalFilename(fsFile.getFilename())
// .setPhotoData(IOUtils.toByteArray(fsFile.getInputStream()))
.setDateAdded(fsFile.getUploadDate());
} catch (Exception e) {
logger.error("Should not hit this one", e);
}
return resultPhoto;
}
When you are using GridFS the information is stored in your MongoDB database in two collections. The first is fs.files which has the main reference to the file and fs.chunks that actually holds the "chunks" of data. See the examples
Collection: fs.files
{
"_id" : ObjectId("53229d20f3dde871df8b89a7"),
"filename" : "receptor.jpg",
"chunkSize" : 262144,
"uploadDate" : ISODate("2014-03-14T06:09:36.462Z"),
"md5" : "f1e71af6d0ba9c517280f33b4cbab3f9",
"length" : 138905
}
Collection: fs.chunks
{
"_id" : ObjectId("53229d20824b12efe88cc1f2"),
"files_id" : ObjectId("53229d20f3dde871df8b89a7"),
"n" : 0,
"data" : // all of the binary data
}
So really these are just normal MongoDB documents and normal collections.
As you can see, there are various ways you can "query" these collections with the standard API:
The Object Id is monotonic and therefore ever increasing. Newer entries will have a higher ObjectId value than older ones. Most importantly, the last Id that was indexed.
The updloadDate also holds a general date timestamp that you can use for date range based queries.
So you see, that GridFS is really just "Driver level magic" to work with ordinary MongoDB documents, and treat the binary data as a single document.
As they are just normal collections with normal documents, unless you are retrieving or otherwise updating the content, then just use the normal methods to select and find.
Related
I'm trying to filter the data from my database using this code:
fdb.orderByChild("title").startAt(searchquery).endAt(searchquery+"\uf8ff").addValueEventListener(valuelistener2);
My database is like this:
"g12" : {
"Books" : {
"-Mi_He4vHXOuKHNL7yeU" : {
"title" : "Technical Sciences P1"
},
"-Mi_He50tUPTN9XDiVow" : {
"title" : "Life Sciences"
},
"-Mi_He51dhQfl3RAjysQ" : {
"title" : "Technical Sciences P2"
}}
While the code works, it only returns the first value that matches the query and doesn't fetch the rest of the data even though it matches.
If I put a "T" as my search query, I just get the first title "Technical Sciences P1 " and don't get the other one with P2
(Sorry for the vague and common question title, it's just I've been looking for a solution for so long)
While the codes works, it only returns the first value that matches the query
That's the expected behavior since Firebase Realtime Database does not support native indexing or search for text fields in database properties.
When you are using the following query:
fdb.orderByChild("title").startAt(searchquery).endAt(searchquery+"\uf8ff")
It means that you are trying to get all elements that start with searchquery. For example, if you have a title called "Don Quixote" and you search for "Don", your query will return the correct results. However, searching for "Quix" will yield no results.
You might consider downloading the entire node to search for fields client-side but this solution isn't practical at all. To enable full-text search of your Firebase Realtime Database data, I recommend you to use a third-party search service like Algolia or Elasticsearch.
If you consider at some point in time to try using Cloud Firestore, please see the following example:
Is it possible to use Algolia query in FirestoreRecyclerOptions?
To see how it works with Cloud Firestore but in the same way, you can use it with Firebase Realtime Database.
All of the examples I'm finding online have very simple document IDs, but what do you do if you're auto-generating all your IDs (as the docs say you should)? For example, I want to query the date when the user was created. The document ID for this is below, but I've just copy-pasted it from the Firestore console. How would I know the document ID so that I may query any user's info? Note that I will be have a users, groups, usergroups, etc... There will be quite a few collections, each using the auto-ID feature. I would need to be able to update any row in any collection.
val docRef = db.collection("users").document("9AjpkmJdAdFScZaeo8FV45DT54E")
docRef.get()
.addOnSuccessListener { document ->
if (document != null) {
Log.e("Query", "Data: ${document.data}")
} else {
Log.e("Query", "Document is null")
}
}
.addOnFailureListener { exception ->
Log.e("Query", "Failure")
}
If you have data to query, that should all be stored as fields in the documents. Don't put that data in the ID of the documents - use field values.
You can filter documents in a collection using "where" clauses as shown in the documentation. What you're showing here isn't enough to go with in to make specific recommendations. But you definitely want to think about your queries ahead of time, and model your data to suit those queries.
If you need to update a document, you must first query for it, then update what you find from the query. This is extremely common, as Firestore does no provide any SQL-like "update where" queries that both locate and update data in the same command.
I have a firestore db structure like so;
How can i query all documents under auctions collection that have a given cargoOwnerId?
I am using com.google.firebase:firebase-firestore:20.2.0 in my project.
I have looked through similar issues here on stack overflow but no to success.
I have also read the documentation by Google about doing this here but nothing seems to work when I run the code below
FirebaseFirestore.getInstance()
.collection("auctions")
.whereEqualTo("cargoOwnerId","ZkYu6H6ObiTrFSX5uqNb7lWU7KG3")
.get()
.addOnCompleteListener(task -> {
Log.d("Logging", "Size: " + task.getResult().size());
});
I expect to return a list of all the documents that contain ZkYu6H6ObiTrFSX5uqNb7lWU7KG3 as the cargoOwnerId but it returns nothing at all. The size is 0.
Am i missing something?
The way your document is structured, what you're trying to do is not possible. You can't query by map properties when you don't also know the name of the map property. The name of map property would also have to be consistent among all your documents.
At the top level of your document, you apparently have a single field with the same id as the document, which is a map. It's not clear to me at all why you want a single map field in your document. It seems to me that you don't want a single map and instead want all the fields of that map to be document fields instead. This would allow you to perform the query you're asking about. Perhaps you made a mistake in populating the document.
This is possible.
Try changing task.getResult().size() to just task.size()
Example:
db.collection("auctions")
.whereEqualTo("cargoOwnerId","ZkYu6H6ObiTrFSX5uqNb7lWU7KG3")
.get()
.addOnSuccessListener { documents ->
Log.d("Logging", "Size: " + documents.size());
for (document in documents) {
Log.d(TAG, "${document.id} => ${document.data}")
}
}
.addOnFailureListener { exception ->
Log.w(TAG, "Error getting documents: ", exception)
}
Is there a way to get records which matches a query partially in Solr.
For &q="java enterprise" in the below mentioned records,
{
"name":"java",
"case:"enterprise",
},
{
"name":"java enterprise"
"case": "enterprise"
}
I want to fetch only those records which have java and enterprise mentioned separately and not together, i.e only the below record should come into my result.
{
"name":"java",
"case:"enterprise",
}
Is there a way to search for only those records and eliminate the documents from the search which has exact match?
You don't need to use exact phrase match, instead, you can use boolean queries in that case
(name:"java" AND case:"enterprise" ) OR (name:"enterprise" AND case:"java" )
I'm having a problem with MongoDB using Java when I try adding documents with customized _id field. And when I insert new document to that collection, I want to ignore the document if it's _id has already existed.
In Mongo shell, collection.save() can be used in this case but I cannot find the equivalent method to work with MongoDB java driver.
Just to add an example:
I have a collection of documents containing websites' information
with the URLs as _id field (which is unique)
I want to add some more documents. In those new documents, some might be existing in the current collection. So I want to keep adding all the new documents except for the duplicate ones.
This can be achieve by collection.save() in Mongo Shell but using MongoDB Java Driver, I can't find the equivalent method.
Hopefully someone can share the solution. Thanks in advance!
In the MongoDB Java driver, you could try using the BulkWriteOperation object with the initializeOrderedBulkOperation() method of the DBCollection object (the one that contains your collection). This is used as follows:
MongoClient mongo = new MongoClient("localhost", port_number);
DB db = mongo.getDB("db_name");
ArrayList<DBObject> objectList; // Fill this list with your objects to insert
BulkWriteOperation operation = col.initializeOrderedBulkOperation();
for (int i = 0; i < objectList.size(); i++) {
operation.insert(objectList.get(i));
}
BulkWriteResult result = operation.execute();
With this method, your documents will be inserted one at a time with error handling on each insert, so documents that have a duplicated id will throw an error as usual, but the operation will still continue with the rest of the documents. In the end, you can use the getInsertedCount() method of the BulkWriteResult object to know how many documents were really inserted.
This can prove to be a bit ineffective if lots of data is inserted this way, though. This is just sample code (that was found on journaldev.com and edited to fit your situation.). You may need to edit it so it fits your current configuration. It is also untested.
I guess save is doing something like this.
fun save(doc: Document, col: MongoCollection<Document>) {
if (doc.getObjectId("_id") != null) {
doc.put("_id", ObjectId()) // generate a new id
}
col.replaceOne(Document("_id", doc.getObjectId("_id")), doc)
}
Maybe they removed save so you decide how to generate the new id.