Reading all the documents from a bucket - java

Is there a way to read all the documents from a bucket? It is an active bucket an I want to access newly created document as well.
Few people suggested to use to view to query against a bucket. How can I create a View which will be updated with new or updated documents?
Newly created view's map function:
function (doc, meta) {
emit(doc);
}
Reduce function is empty. When I query the view like this bucket.query(ViewQuery.from("test1", "all")).totalRows() it returns 0 results back.

For returning zero results issue, did you promote the view to a production view? This is a common mistake. Development views only look at a small subset of data so as to not possibly overwhelm the server. Try this first.
Also, never emit the entire document if you can help it, especially if you are looking over all documents in a bucket. You want to emit the IDs of the documents and then if you need to get the content of those objects, do a get operation or bulk operation. I would give you a direct link for the bulk operations, but you have not said what SDK you are using and those are SDK specific. Here is the one for Java, for example.
All that being said, I have questions about why you are doing the equivalent of select * from bucket. What are you planning to do with this data once you have? What are you really trying to do? There are lots of options on how to solve this of course.

A view is just a predefined query over a bucket. New or changed documents will be shown in the view.
You can check the results of your View when you create it by clicking the Show Results button in the Web UI, so if 0 documents show up there, it should be no surprise you get 0 from the SDK.
If you are running Couchbase Server 4+ and the latest SDK, you could use N1QL and create a primary index on your bucket, then do a regular Select * from bucket to get all the documents.

Related

Delete firestore document based on rule

I created an app where people can upload some images of some themselves.
Now, In order to deal with cases where people can upload inappropriate images, I created a reporting system. What I'm doing is basically every time someone reports the image, the ID of the reporting person is added to an array in firestore like this:
db
.collection( "Reports" )
.document( ImageID )
.update( "Reports", FieldValue.arrayUnion( UID ) );
Now, I want to set a rule that if for example, the size of the array is 5 (5 different people reports the image) that it will automatically delete that image from the cloud.
Is there any way I can do it instead of every time reading the array and check ist size?
Thank you
You can create a trigger on your Reports collection.
export const updateReportTrigger = functions.firestore
.document('Reports/{ImageID}')
.onUpdate(onUpdate)
async function onUpdate({ before, after }, context) {
const newData = after.data()
if (newData && newData.Reports && newData.Reports.length > 5) {
// put your delete logic here
// you can access the document id through context.params.ImageID
}
}
https://firebase.google.com/docs/functions/firestore-events
The most common way to do that would be through Cloud Functions, which are server-side code that is automatically triggered when (for this use-case) something is written to Firestore. See the documentation on Cloud Functions and the page on Firestore triggers.
An alternate (but more involved) would be to secure access through a query and security rules. In this scenario you'd:
Add a reportCount to the document, and ensure this gets updated in sync with the reports.
From the application code use a query to only request images with a reportCount less than 5.
Use security rules to only allow queries with that clause in it.
As said, this is more involved in the amount of code you write, but the advantage is that no server-side code is involved, and that documents are immediately blocked when too many reports come in for them.

Is there a easy way to get Nth page of items from DynamoDB by java?

I am now working on a web app associated with Amazon DynamoDB,
I want to achieve a function that my users can directly get to the Nth page to view the item info,
I have been told that the pagination in DynamoDB is based on last key, rather than limit/offset.It doesn't natively support offset.DynamoDB Scan / Query Pagination
Does that mean : If I want to get to the 10th page of items, then I have to query the 9 pages ahead first?(which seems reeeeeally not a good solution)
Is there a easier way to do that?
You are right. DynamoDB doesn't support numerical offset. The only way to paginate is to use the LastEvaluatedKey parameter when making a request. You still have some good options to achieve pagination using a number.
Fast Cursor
You can make fast pagination requests by discarding the full result and getting only the Keys. You are limited to 1MB per request. This represents a large amount of Keys! Using this, you can move your cursor to the required position and start reading full objects.
This solution is acceptable for small/medium datasets. You will run into performance and cost issues on large datasets.
Numerical index
You can also create a global secondary index where you will paginate your dataset. You can add for example an offset property to all your objects. You can query this global index directly to get the desired page.
Obviously this only works if you don't use any custom filter... And you have to maintain this value when inserting/deleting/updating objects. So this solution is only good if you have an 'append only' dataset
Cached Cursor
This solution is built on the first one. But instead of fetching keys every single time, you can cache the pages positions and reuse them for other requests. Cache tools like redis or memcached can help you to achieve that.
You check the cache to see if pages are already calculated
If not, you scan your dataset getting only Keys. Then you store the starting Key of each page in your cache.
You request the desired page to fetch full objects
Choose the solution that fits your needs. I hope this will help you :)

XPages: Navigating arround a document collection

I create a document collection and am able to put the docid of the second doc in the first doc, third in second and so till the last Document which enable me to navigate from first to second document when the user approved a job and so on, but i want to be also able to go from second back to first when the user reject the task but i have not be able to store the docid of the first in the second documnet. Below is the code i am currently using
Document nextJob= null;
Document thisJob =null;
DocumentCollection col = lookup.getAllDocumentsByKey(ID, true);
if (col != null){
Job= col.getFirstDocument();
while (job!= null) {
thisJob.createDocument()
thisJob =Job;
thisJob.replaceItemValue("DocID",thisJob.getUniversalID());
thisJob.save(true);
if(nextJob!= null){
nextJob.replaceItemValue("TaskSuccessor",thisJob.getUniversalID());
nextJob.save(true);
}
nextJob= thisJob
tmpDoc = Job;
Job = col.getNextDocument(Job);
}
}
To echo Frantisek and others, updating the documents is not best practice. The key to how to achieve it is to consider a number of questions:
What you mean first next and previous job?
What is the numbers of jobs involved?
How are save conflicts going to be minimised / resolved by you / users?
How are deletions being handled, to ensure referential integrity?#
What happens when you need to archive data?
If it's for all users and next on date created, create a view based on date created. It will be quicker to create, completely negate the issue of save conflicts or deletes and not have a significant performance hit unless you're dealing with very large numbers of jobs (in which case you should be considering archiving).
If it's a small number of jobs, store them in a Java Map. But you need to handle deletions. Because you'll be loadingn the map when the app loads, archiving is not a problem.
If it's next / previous per user, a better method would be storing the order in a document per person in the database. If replicas are not involved, Note IDs can be used and will be shorter. It will negate save conflicts. But it may cause problems with large numbers of jobs - you will probably need to create new fields programmatically and also handle deletions.
DonMaro's suggestion fits with a graph database approach of edges (the third documents) between the vertices (the jobs).
In most cases, views will be the easiest and most recommended approach. IBM have included view index enhancements in 9.0.1 FP3 and will allow view indexes to be stored outside the NSF in the next point release.
Even if you're confident that you can build a better indexing system than what is already included in Domino, there are other aspects like save conflicts that need to be handled and you're decision may not allow future functional requirements like security, deletion, archiving etc.
Well, despite pointing out to really consider Frantisek Kossuth's comment (as UNIDs get changed in case you have might have to copy/paste a document back into the database, e.g. for backup; try considering generating unique values by using #Unique):
just create a third document object "prevJob" and store the previous document there when/before changing to the next one.
Then you can access the UNID just as you already do by "prevJob.getUniversalID()" and store it in the document you're currently processing.

CouchDB/Couchbase/MongoDB transaction emulation?

I've never used CouchDB/MongoDB/Couchbase before and am evaluating them for my application. Generally speaking, they seem to be a very interesting technology that I would like to use. However, coming from an RDBMS background, I am hung up on the lack of transactions. But at the same time, I know that there is going to be much less a need for transactions as I would have in an RDBMS given the way data is organized.
That being said, I have the following requirement and not sure if/how I can use a NoSQL DB.
I have a list of clients
Each client can have multiple files
Each file must be sequentially number for that specific client
Given an RDBMS this would be fairly simple. One table for client, one (or more) for files. In the client table, keep a counter of last filenumber, and increment by one when inserting a new record into the file table. Wrap everything in a transaction and you are assured that there are inconsistencies. Heck, just to be safe, I could even put a unique constraint on a (clientId, filenumber) index to ensure that there is never the same filenumber used twice for a client.
How can I accomplish something similar in MongoDB or CouchDB/base? Is it even feasible? I keep reading about two-phase commits, but I can't seem to wrap my head around how that works in this kind of instance. Is there anything in Spring/Java that provides two-phase commit that would work with these DBs, or does it need to be custom code?
Couchdb is transactional by default. Every document in couchdb contains a _rev key. All updates to a document are performed against this _rev key:-
Get the document.
Send it for update using the _rev property.
If update succeeds then you have updated the latest _rev of the document
If the update fails the document was not recent. Repeat steps 1-3.
Check out this answer by MrKurt for a more detailed explanation.
The couchdb recipies has a banking example that show how transactions are done in couchdb.
And there is also this atomic bank transfers article that illustrate transactions in couchdb.
Anyway the common theme in all of these links is that if you follow the couchdb pattern of updating against a _rev you can't have an inconsistent state in your database.
Heck, just to be safe, I could even put a unique constraint on a (clientId, filenumber) index to ensure that there is never the same filenumber used twice for a client.
All couchdb documents are unique since the _id fields in two documents can't be the same. Check out the view cookbook
This is an easy one: within a CouchDB database, each document must have a unique _id field. If you require unique values in a database, just assign them to a document’s _id field and CouchDB will enforce uniqueness for you.
There’s one caveat, though: in the distributed case, when you are running more than one CouchDB node that accepts write requests, uniqueness can be guaranteed only per node or outside of CouchDB. CouchDB will allow two identical IDs to be written to two different nodes. On replication, CouchDB will detect a conflict and flag the document accordingly.
Edit based on comment
In a case where you want to increment a field in one document based on the successful insert of another document
You could use separate documents in this case. You insert a document, wait for the success response. Then add another document like
{_id:'some_id','count':1}
With this you can set up a map reduce view that simply counts the results of these documents and you have an update counter. All you are doing is instead of updating a single document for updates you are inserting a new document to reflect a successful insert.
I always end up with the case where a failed file insert would leave the DB in an inconsistent state especially with another client successfully inserting a file at the same time.
Okay so I already described how you can do updates over separate documents but even when updating a single document you can avoid inconsistency if you :
Insert a new file
When couchdb gives a success message -> attempt to update the counter.
Why this works?
This works because because when you try to update the update document you must supply a _rev string. You can think of _rev as a local state for your document. Consider this scenario:-
You read the document that is to be updated.
You change some fields.
Meanwhile another request has already changed the original document. This means the document now has a new _rev
But You request couchdb to update the document with a _rev that is stale that you read in step #1.
Couchdb will generate an exception.
You read the document again get the latest _rev and attempt to update it.
So if you do this you will always have to update against the latest revision of the document. I hope this makes things a bit clearer.
Note:
As pointed out by Daniel the _rev rules don't apply to bulk updates.
Yes you can do the same with MongoDB, and Couchbase/CouchDB using proper approach.
First of all in MongoDB you have unique index, this will help you to ensure a part of the problem:
- http://docs.mongodb.org/manual/tutorial/create-a-unique-index/
You also have some pattern to implement sequence properly:
- http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/
You have many options to implement a cross document/collection transactions, you can find some good information about this on this blog post:
http://edgystuff.tumblr.com/post/93523827905/how-to-implement-robust-and-scalable-transactions (the 2 phase commit is documented in detail here: http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/ )
Since you are talking about Couchbase, you can find some pattern here too:
http://docs.couchbase.com/couchbase-devguide-2.5/#providing-transactional-logic

Should I use Lucene only for search?

Our website needs to give out data to the world. This is open-source data that we have stored, and we want it to make it publicly available. It's about 2 million records.
We've implemented the search of these records using Lucene, which is fine, however we'd like to show an individual record (say the user clicks on it after the search is done) and provide more detailed information for that record.
This more detailed information however isn't stored in the index directly... there are like many-to-many relationships and we use our relational database (MySQL) to provide this information.
So like a single record belongs to a category, we want the user to click on that category and show the rest of the records within that category (lots more associations like this).
My question is, should we use Lucene also to store this sort of information and retrieve it through simple search (category:apples), or should MySQL continue doing this logical job? Should I use Lucene only for the search part?
EDIT
I would like to point out that all of our records are pretty static.... changes are made to this data once every week or so.
Lucene's strength lies in rapidly building an index of a set of documents and allowing you to search over them. If this "detailed information" does not need to be indexed or searched over, then don't store it in Lucene.
Lucene is not a database, it's an index.
You want to use Lucene to store data?, I thing it's ok, I've used Solr http://lucene.apache.org/solr/
which built on top of Lucene to work as search engine and store more data relate to the record that maybe use for front end display. It worked with 500k records for me, and 2mil records I think it should be fine.

Categories

Resources