In this documentation, it shows an example code for Future<V>, in some GAE datastore ORM's I can see usages for this pattern, what is the implication of using Future on doing datastore put, get or delete methods? What is the common motivation on doing this?
The reason Future is used is because the database operations are done asynchronously, so you can carry on doing whatever and come back to check the result of the database operation.
The implications of an asynchronous store is that you could read a value that is stale, ie a put has been sent but the Future operation is not complete, or worse, the value has already been deleted, but this is something the ORM has hopefully solved for you.
You can use Future.get to block the current thread and wait for the operation to complete and check the result.
Related
If I have Collection<CompletableFuture<MyResult>>, I expect to convert this into CompletableFuture<Collection<MyResult>>. So after conversion I have only one future and can easyly write bussines logic on MyResult collection using methods from CompletableFuture like thenApply, thenAccept etc. But CompletableFuture#allOf have result type Void so after invoking it I get "no results". E.g. I can not retrieve (as I understand) any results from returned future that correspods to Collection<CompletableFuture<MyResult>>.
I have a doubt that CompletableFuture#allOf just return the Future wich is completed after all in collection. So I can invoke CompletableFuture#allOf(...).isDone and then manually (!) in cycle tranform Collection<CompletableFuture> to CompletableFuture<Collection>, Is my assumption right?
Yes, the allOf method does not supply data, but does signal that all futures have been completed. This eliminates the need of using the more cumbersome countdown latch approach. The expectation is that you would then convert the completed futures back into a usable Collection to apply your business logic. See this question for implementation details. A great discussion of this topic is available at this blog post.
if you need CompletableFuture<Collection<MyResult>> as result you can get it by using allAsList method in https://github.com/spotify/completable-futures (spotify-completlablefutures library). CompletableFutures.allAsList(List<CompletableFuture<MyResult>>) will give you CompletableFuture<List<MyResult>>.
Say I'm creating an entity like this:
Answer answer = new Answer(this, question, optionId);
ofy().save().entity(answer);
Should I check whether the write process is successful?
Say I want to make another action (increment a counter), Should I make a transaction, that includes the writing process?
And also, how can I check if the writing process is successful?
An error while saving will produce an exception. Keep in mind that since you are not calling now(), you have started an async operation and the actual exception may occur when the session is closed (eg, end of the request).
Yes, if you want to increment a counter, you need to start a transaction which encompasses the load, increment, and save. Also keep in mind that it's possible for a transaction to retry even though it is successful, so a naive transaction can possibly overcount. If you need a rigidly exact increment, the pattern is significantly more complex. All databases suffer from some variation of this problem.
What is most suitable way to handle optimistic locking in jpa. I have below solutions but don't know its better to use this.
Handling exception of optimistic locking in catch blocking and retrying again.
Using Atomic variable flag and checking if its processing then wait until other thread finish its processing. This way data modification or locking contention may be avoided.
Maintaining queue of all incoming database change request and processing it one by one.
Anyone please suggest me if there is better solution to this problem.
You don't say why you are using optimistic locking.
You usually use it to avoid blocking resources (like database rows) for a long time, i.e. data is read from a database and displayed to the user. Eventually the user makes changes to the database, and the data is written back.
You don't want to block the data for other users during that time. In a scenario like this you don't want to use option 2, for the same reason.
Option 1 is not easy, because an optimistic locking exception tells you that something has changed the data behind your back, and you would overwrite these changes with your data. Re-trying to write the data won't help here.
Option 3 might be possible in some situations, but adds a lot of complexity and possible errors. This would be my last resort by far.
In my experience optimistic locking exceptions are quite rare. In most cases the easiest way out is to discard everything, and re-do it from start, even if it means to tell the user: sorry, there was an unexpected problem, do it again.
On the other hand, if you get these problems regularly, between two competing threads, you should try to avoid it. In these cases option 2 might by the way to go, but it depends on the scenario.
If the conflict occurs between a user interaction and a background thread (and not between two users) you could try to change the timing of the background thread, or signal the background thread to delay its work.
To sum it up: It mostly depends on your setup, and when and how the exception occurs.
In my Java app, sometimes my users do some work that requires a datastore write, but I don't want to keep the user waiting while the datastore is writing. I want to immediately return a response to the user while the data is stored in the background.
It seems fairly clear that I could do this by using GAE task queues, enqueueing a task to store the data. But I also see that there's an Async datastore API, which seems like it would be much easier than dealing with task queues.
Can I just call AsyncDatastoreService.put() and then return from my servlet? Will that API store my data without keeping my users waiting?
I think you are right that the Async calls seem easier. However, the docs for AsyncDatastore mention one caveat that you should consider:
Note: Exceptions are not thrown until you call the get() method. Calling this method allows you to verify that the asynchronous operation succeeded.
The "get" in that note is being called on the Future object returned by the async call. If you just return from your servlet without ever calling get on the Future object, you might not know for sure whether your put() worked.
With a queued task, you can handle the error cases more explicitly, or just rely on the automatic retries. If all you want to queue is datastore puts, you should be able to create (or find) a utility class that does most of the work for you.
Unfortunately, there aren't any really good solutions here. You can enqueue a task, but there's several big problems with that:
Task payloads are limited in size, and that size is smaller than the entity size limit.
Writing a record to the datastore is actually pretty fast, in wall-clock time. A significant part of the cost, too, is serializing the data, which you have to do to add it to the task queue anyway.
By using the task queue, you're creating more eventual consistency - the user may come back and not see their changes applied, because the task has not yet executed. You may also be introducing transaction issues - how do you handle concurrent updates?
If something fails, it could take an arbitrarily long time to apply the user's updates. In such situations, it probably would have been better to simply return an error to the user.
My recommendation would be to use the async API where possible, but to always write to the datastore directly. Note that you need to wait on all your outstanding API calls, as Peter points out, or you won't know if they failed - and if you don't wait on them, the app server will, before returning a response to the user.
If all you need is for the user to have a responsive interface while stuff churns in the back on the db, all you have to do is make an asynchronous call at the client level, aka do some ajax that sends the db write request, changes imemdiatelly the users display, and then upon an ajax request callback update the view with whatever is it you wish.
You can easily add GWT support to you GAE project (either via eclipse plugin or maven gae plugin) and have the time of your life doing asynchronous stuff.
I'm using appengine servers. I expect to get many requests (dozens) in close proximity that will put some of my data in an inconsistent state. The cleanup of that data can be efficiently batched - for example, it would be best to run my cleanup code just once, after the dozens of requests have all completed. I don't know exactly how many requests there will be, or how close together they will be. It is OK if the cleanup code is run multiple times, but it must be run after the last request.
What's the best way to minimize the number of cleanup runs?
Here's my idea:
public void handleRequest() {
manipulateData();
if (memCacheHasCleanupToken()) {
return; //yay, a cleanup is already scheduled
} else {
scheduleDeferredCleanup(5 seconds from now);
addCleanupTokenToMemCache();
}
}
...
public void deferredCleanupMethod() {
removeCleanupTokenFromMemcache();
cleanupData();
}
I think this will break down because cleanupData might receive outdated data even after some request has found that there IS a cleanup token in the memcache (HRD latency, etc), so some data might be missed in the cleanup.
So, my questions:
Will this general strategy work? Maybe if I use a transactional lock on a datastore entity?
What strategy should I use?
The general strategy you suggest will work, providing the data that needs cleaning up isn't stored on each instance (eg, it's in the datastore or memcache), and provided your schduleDeferredCleanup method uses the task queue. An optimization would be to use task names that are based on the time interval in which they run to avoid scheduling duplicate cleanups if the memcache key expires.
One issue to watch out for with the procedure you describe above, though, is race conditions. As stated, a request being processed at the same time as the cleanup task may check memcache, observe the token is there, and neglect to enqueue a cleanup task, whilst the cleanup task has already finished, but not yet removed the memcache key. The easiest way to avoid this is to make the memcache key expire on its own, but before the related task will execute. That way, you may schedule duplicate cleanup tasks, but you should never omit one that's required.