Q: Parse.com query count stability

Q: Parse.com query count stability - java

For testing purpose I put the following code in the onCreate() of an Activity:
// Create 50 objects
for (int i = 0; i < 50; i++) {
ParseObject obj = new ParseObject("test_obj");
obj.put("foo", "bar");
try {
obj.save();
} catch (ParseException pe) {
Log.d("Parsetest", "Failed to save " + pe.toString());
}
}
// Count them
for (int i = 0; i < 10; i ++) {
ParseQuery<ParseObject> query = ParseQuery.getQuery("test_obj");
query.countInBackground(new CountCallback() {
#Override
public void done(int count, ParseException e) {
if (e == null) {
Log.d("Parsetest", "Background found " + count + " objects");
} else {
Log.d("Parsetest", "Query issue" + e.toString());
}
}
});
}
I would expect the count to be always fifty, however running this code yields something like:
D/Parsetest(17761): Background found 0 objects
D/Parsetest(17761): Background found 0 objects
D/Parsetest(17761): Background found 0 objects
D/Parsetest(17761): Background found 0 objects
D/Parsetest(17761): Background found 0 objects
D/Parsetest(17761): Background found 0 objects
D/Parsetest(17761): Background found 50 objects
D/Parsetest(17761): Background found 0 objects
D/Parsetest(17761): Background found 0 objects
D/Parsetest(17761): Background found 0 objects
Can somebody explain this behavior and how to correct this ?

Without knowing further details, I'm inclined to believe the inconsistency is due to threading and the mixing of synchronous/asynchronous calls.
For example, calling obj.save(); is synchronous (reference), however, without seeing the rest of your code, it's possible that the synchronous save is being executed on a background thread.
Additionally, query.countInBackground is asynchronous and is being called multiple times with a for loop. This is going to simultaneously create 10 separate background processes to query Parse for the count of objects and depending on how the save is handled there could be race conditions.
Lastly, there are documented limitations on count operations with Parse.
Count queries are rate limited to a maximum of 160 requests per
minute. They can also return inaccurate results for classes with more
than 1,000 objects. Thus, it is preferable to architect your
application to avoid this sort of count operation (by using counters,
for example.)
From Héctor Ramos on the Parse Developers Google group,
Count queries have always been expensive once you throw some
constraints in. If you only care about the total size of the
collection, you can run a count query without any constraints and that
one should be pretty fast, as getting the total number of records is a
different problem than counting how many of these match an arbitrary
list of constraints. This is just the reality of working with database
systems.
Given the cost of count operations, it is possible that Parse has mechanisms in place to prevent rapid bursts of count operations from a given client.
If you are needing to perform count operations often, the recommended approach is to use cloud code afterSave hooks to increment/decrement a counter as needed.

Related

Java performance issue: Need to iterate more than 8 million records with a target-branch check

We have a system that processes flat-file and (with a couple of validations only) inserts into database.
This code:
//There can be 8 million lines-of-codes
for(String line: lines){
if (!Class.isBranchNoValid(validBranchNoArr, obj.branchNo)){
continue;
}
list.add(line);
}
definition of isBranchNoValid:
//the array length ranges from 2 to 5 only
public static boolean isBranchNoValid(String[] validBranchNoArr, String branchNo) {
for (int i = 0; i < validBranchNoArr.length; i++) {
if (validBranchNoArr[i].equals(branchNo)) {
return true;
}
}
return false;
}
The validation is at line-level (we have to filter or skip the line that doesn't have a branchNo in the array). Earlier, this wasn't (filter) the case.
Now, high-performance degradation is troubling us.
I understand (may be, I am wrong) that this repeated function call is causing a lot of stack creation resulting in a very high GC invocation.
I can't figure out a way (is it even possible) to perform this filter without this high cost of performance degradation (a little difference is fine).

This is not a stack problem for sure, because your function is not recursive nothing is kept in the stack between calls; after each call the variables are erased since they are not needed anymore.
You can put the valid numbers in a set and use that one for some optimization but in your case I am not sure it will bring any benefits at all since you have at most 5 elements.

So there are several possible bottlenecks in your scenario.
reading the lines of the file
Parse the line to construct the object to insert into the database
check the applicability of the object (ie branch no filter)
insert into the db
Generally, you'd say IO is the slowest, so 1. and 2. You're saying nothing except 2. changed, right? That is weird.
Anyway, if you want to optimize that, I wouldn't be passing the array around 8 million times, and I wouldn't iterate it every time either. Since your valid branches are known, create a HashSet from it - it has O(1) access.
Set<String> validBranches = Arrays.stream(branches)
.collect(Collectors.toCollection(HashSet::new));
Then, iterate the lines
for (String line : lines) {
YourObject obj = parse(line);
if (validBranches.contains(obj.branchNo)) {
writeToDb(obj);
}
}
or, in the stream version
Files.lines(yourPath)
.map(this::parse)
.filter(o -> validBranches.contains(o.branchNo))
.forEach(this::writeToDb);
I'd also check if it isn't more efficient to first collect a batch of objects, then write to db. Also, it's possible that handling the lines in parallel gains some speed, in case the parsing is time intensive.

consuming concurrently a collection using reactor

from time to time have to implement the classic concurrent producer-consumer solution across the project I'm involved in, pretty much the problem is reduced in having some collection which gets populated from multiple threads and which is being consumed by several consumers.
In a nutshell the collection say is bounded to 10k entities,
once buffer size is hit a worker task is submitted consuming these 10k entities, there is a limit of such workers say its set to 10, which in worst case scenario means I can have up to 10 workers each consuming 10k entities.
I do have to play with some locking here and there some checks around buffer overflows (case when producers generate too much data while all workers are busy processing their chunks) thus have to discard new events to avoid OOM (not the best solution but stability is p1 ;))
Was looking these days around reactor and a way to use it instead of going low level and do all the things described above, so the dumb question is: "can reactor be used for this use case?"
for now forget about overflow/discarding.. how can i achieve the N consumers for a broadcaster?
was looking particularly around broadcaster with the buffer + a thread pooled dispatcher:
void test() {
final Broadcaster<String> sink = Broadcaster.create(Environment.initialize());
Dispatcher dispatcher = Environment.newDispatcher(2048, 20, DispatcherType.WORK_QUEUE);
sink
.buffer(100)
.consumeOn(dispatcher, this::log);
for (int i=0; i<100000; i++) {
sink.onNext("elementent " + i);
if (i%1000 == 0) {
System.out.println("addded elements " + i);
}
}
}
void log(List<String> values) {
System.out.print("simulating slow processing....");
System.out.println("processing: " + Arrays.toString(values.toArray()));
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
my intention here is have a broadcaster execute the log(..) in asynch manner when buffer size was reached, however it looks like it is always executing log(...) it in blocking mode. executing 100 once done next 100 and so on.. how can i make it asynch ?
thanks
vyvalyty

A possible pattern is to use flatMap with publishOn:
Flux.range(1, 1_000_000)
.buffer(100)
.flatMap(b -> Flux.just(b).publishOn(SchedulerGroup.io())
.doOnNext(this::log))
.consume(...);

Is there a per-request limit for simultaneous transactions?

I'm using a lot of (sharded) counters in my application. According to my current design, a single request can cause 100-200 of different counters to increment.
So for each counter I'm picking up one shard whose value I'm incrementing. I'm incrementing each shard in a transaction, which means I will end up doing 100-200 transactions as part of processing a single request. Naturally I intend to do this asynchronously, so that I will be essentially running all 100-200 transactions in parallel.
As this number feels pretty high, I'm left wondering whether there is some per-request or per-instance limit for the amount of simultaneous transactions (or datastore requests). I could not find information on this from the documentation.
By the way for some reason Google's documentation states that "if your app has counters that are updated frequently, you should not increment them transactionally" [1], but on the other hand their code example on sharding counters uses a transaction for incrementing the shard [2]. I have figured I can use transactions if I just use enough shards. I prefer transactions as I would like my counters not to miss increments.
https://cloud.google.com/appengine/docs/java/datastore/transactions
https://cloud.google.com/appengine/articles/sharding_counters

There are three limitations that will probably cause you problems here:
1/sec write limit per entity group
5 entity groups per XG
10 concurrent 'threads' per instance
The last one is the tricky one for your use case.
Its a bit hard to find info on (and may in fact be out of date information - so its worth testing), but each instance only allows 10 concurrent core threads (regardless of size - F1/F2/F...).
That is, ignoring the creation of background threads, if you were to assume that each request takes a thread, as does each RPC (datastore, memcache, text search etc), you can only use 10 at a time. If the scheduler thinks an incoming request would exceed 10, it will route the request to a new instance.
In a scenario you want to write to 100 entities in parallel, i'd expect it to only allow about 10 concurrent writes (the rest blocking), but also your instance could only service one request at a time.
Alternatives for you:
Use dedicated memcache - you'll need to handle backing the counters onto durable storage but you could do that in batches on a backend. This may result in you losing some data due to flushes, whether thats ok or not you'll have to decide
Use CloudSQL sequences or tables - if you dont require huge scale,
but do require lots of counters, this may be a good approach - you
could store counts as raw counts, or as timeseries data and
post-process for accurate counts
Use pull queues to update counters in batches on a backend. You can process many 'events' across your many counter tables in larger batches. The downside is that the counts will not be up to date at any given point in time
The best approach is probably a hybrid.
For example, accepting some eventual consistency in counts:
When a request comes in - atomic increment of counters in memcache
When a request comes in - queue an 'event' task
Serve needed counts from memcache - if not present load from the datastore
Use TTLs on memcache, so that eventually the datastore is seen as the 'source of truth'
Run a cron which pulls 100 'event' tasks off the queue every 5 minutes (or as appropriate), and updates counters for all the events in a transaction in the datastore
UPDATE: I found this section in the docs, talking about controlling max number of concurrent requests, it makes a nebulous reference to
You may experience increased API latency if this setting is too high.
I'd say it's worth playing with.

I see that you're using a sharded counter approach to avoid contention, as described in: cloud.google.com/appengine/articles/sharding_counters.
Can you collect all your counters in a single entity, so that each shard is a bunch of counters? Then you wouldn't need so many separate transactions. According to cloud.google.com/appengine/docs/python/ndb/#quotas, an entity can be 1MB max, and certainly 200 integers will fit into that size restriction just fine.
It may be that you don't know the property names in advance. Here is an approach expressed in Go using its PropertyLoadSaver interface that can deal with dynamic counter names.
const (
counterPrefix = "COUNTER:"
)
type shard struct {
// We manage the saving and loading of counters explicitly.
counters map[string]int64 `datastore:"-"`
}
// NewShard construct a new shard.
func NewShard() *shard {
return &shard{make(map[string]int64)}
}
// Save implements PropertyLoadSaver.
func (s *shard) Save(c chan<- datastore.Property) error {
defer close(c)
for key, value := range s.counters {
c <- datastore.Property{
Name: counterPrefix + key,
Value: value,
NoIndex: true,
}
}
return nil
}
// Load implements PropertyLoadSaver.
func (s *shard) Load(c <-chan datastore.Property) error {
s.counters = make(map[string]int64)
for prop := range c {
if strings.HasPrefix(prop.Name, counterPrefix) {
s.counters[prop.Name[len(counterPrefix):]] = prop.Value.(int64)
}
}
return nil
}
The key is to use the raw API for defining your own property names when saving to the datastore. The Java API almost certainly has similar access, given the existence of PropertyContainer.
And the rest of the code in described in the sharding article would be expressed in terms of manipulating a single entity that knows about mutiple counters. So, for example, rather than having Increment() deal with a single counter:
// // Increment increments the named counter.
func Increment(c appengine.Context, name string) error {
...
}
we'd change its signature to a bulk-oriented operation:
// // Increment increments the named counters.
func Increment(c appengine.Context, names []string) error {
...
}
and the implementation would find a single shard, call Increment() for each of the counters we'd want to increment, and Save() that single entity to the datastore, all within a single transaction. Query would also involve consulting all the shards... but reads are fast. We still maintain the sharding architecture to avoid write contention.
The complete example code for Go is:
package sharded_counter
import (
"fmt"
"math/rand"
"strings"
"appengine"
"appengine/datastore"
)
const (
numShards = 20
shardKind = "CounterShard"
counterPrefix = "counter:"
)
type shard struct {
// We manage the saving and loading of counters explicitly.
counters map[string]int64 `datastore:"-"`
}
// NewShard constructs a new shard.
func NewShard() *shard {
return &shard{make(map[string]int64)}
}
// Returns a list of the names stored in the shard.
func (s *shard) Names() []string {
names := make([]string, 0, len(s.counters))
for name, _ := range s.counters {
names = append(names, name)
}
return names
}
// Lookup finds the counter's value.
func (s *shard) Lookup(name string) int64 {
return s.counters[name]
}
// Increment adds to the counter's value.
func (s *shard) Increment(name string) {
s.counters[name]++
}
// Save implements PropertyLoadSaver.
func (s *shard) Save(c chan<- datastore.Property) error {
for key, value := range s.counters {
c <- datastore.Property{
Name: counterPrefix + key,
Value: value,
NoIndex: true,
}
}
close(c)
return nil
}
// Load implements PropertyLoadSaver.
func (s *shard) Load(c <-chan datastore.Property) error {
s.counters = make(map[string]int64)
for prop := range c {
if strings.HasPrefix(prop.Name, counterPrefix) {
s.counters[prop.Name[len(counterPrefix):]] = prop.Value.(int64)
}
}
return nil
}
// AllCounters returns all counters.
func AllCounters(c appengine.Context) (map[string]int64, error) {
var results map[string]int64
results = make(map[string]int64)
q := datastore.NewQuery(shardKind)
q = q.Ancestor(ancestorKey(c))
for t := q.Run(c); ; {
var s shard
_, err := t.Next(&s)
if err == datastore.Done {
break
}
if err != nil {
return results, err
}
for _, name := range s.Names() {
results[name] += s.Lookup(name)
}
}
return results, nil
}
// ancestorKey returns an key that all counter shards inherit.
func ancestorKey(c appengine.Context) *datastore.Key {
return datastore.NewKey(c, "CountersAncestor", "CountersAncestor", 0, nil)
}
// Increment increments the named counters.
func Increment(c appengine.Context, names []string) error {
shardName := fmt.Sprintf("shard%d", rand.Intn(numShards))
err := datastore.RunInTransaction(c, func(c appengine.Context) error {
key := datastore.NewKey(c, shardKind, shardName, 0, ancestorKey(c))
s := NewShard()
err := datastore.Get(c, key, s)
// A missing entity and a present entity will both work.
if err != nil && err != datastore.ErrNoSuchEntity {
return err
}
for _, name := range names {
s.Increment(name)
}
_, err = datastore.Put(c, key, s)
return err
}, nil)
return err
}
which, if you look closely, is pretty much the example with a single, unnamed counter, but extended to handle multiple counter names. I changed a little bit on the query side so that the reads are using the same ancestor key so that we're in the same entity group.

Thanks for the responses! I think I now have the answers I need.
Regarding the per-request or per-instance limit
There is per-instance limit for concurrent threads, which effectively limits the amount of concurrent transactions. The default limit is 10. It can be incremented, but it is unclear what side-effects that will have.
Regarding the underlying problem
I chose to divide the counters in groups in such a way that counters that are usually incremented "together" are in the same group. Shards carry partial counts for all counters within the group the individual shard is associated with.
Counts are incremented still in transactions, but due to grouping only a maximum of five transactions per request is needed. Each transaction increments numerous partial counts stored in a single shard, which is represented as a single datastore entity.
Even if the transactions are run in series, the time to process a request will still be acceptable. Each counter group has a few hundred counters. I make sure there are enough shards to avoid contention.
It should be noted that this solution is only possible because the counters can be divided into fairly large groups of counters that are typically incremented together.

Making computation-threads cancellable in a smart way

I am wondering how to reach a compromise between fast-cancel-responsiveness and performance with my threads which body look similar to this loop:
for(int i=0; i<HUGE_NUMBER; ++i) {
//some easy computation like adding numbers
//which are result of previous iteration of this loop
}
If a computation in loop body is quite easy then adding simple check-reaction to each iteration:
if (Thread.currentThread().isInterrupted()) {
throw new InterruptedException("Cancelled");
}
may slow down execution of the code.
Even if I change the above condition to:
if (i % 100 && Thread.currentThread().isInterrupted()) {
throw new InterruptedException("Cancelled");
}
Then compilator cannot just precompute values of i and check condition only in some specific situations since HUGE_NUMBER is variable and can have different values.
So I'd like to ask if there's any smart way of adding such check to a presented code knowing that:
HUGE_NUMBER is variable and can have different values
loop body consists of some easy-to-compute, but relying on prevoius computations code.
What I want to say is that one iteration of a loop is quite fast, but HUGE_NUMBER of iterations can take a little more time and this is what I want to avoid.

First of all, use Thread.interrupted() instead of Thread.currentThread().isInterrupted() in that case.
You should think about if checking the interruption flag really slows down your calculation too much! One the one hand, if the loop body is VERY simple, even a huge number of iterations (the upper limit is Integer.MAX_VALUE) will run in a few seconds. Even when checking the interruption flag will result in an overhead of 20 or 30%, this will not add very much to the total runtime of your algorithm.
On the other hand, if the loop body is not that simple and so it will run longer, testing the interruption flag will not be a remarkable overhead I think.
Don't do tricks like if (i % 10000 == 0), as this will slow down calculation much more than a 'short' Thread.interrupted().
There is one small trick that you could use - but think twice because it makes your code more complex and less readable:
Whenever you have a loop like that:
for (int i = 0; i < max; i++) {
// loop-body using i
}
you can split up the total range of i into several intervals of size INTERVAL_SIZE:
int start = 0;
while (start < max) {
final int next = Math.min(start + INTERVAL_SIZE, max);
for(int i = start; i < next; i++) {
// loop-body using i
}
start = next;
}
Now you can add your interruption check right before or after the inner loop!
I've done some tests on my system (JDK 7) using the following loop-body
if (i % 2 == 0) x++;
and Integer.MAX_VALUE / 2 iterations. The results are as follows (after warm-up):
Simple loop without any interruption checks: 1,949 ms
Simple loop with check per iteration: 2,219 ms (+14%)
Simple loop with check per 1 million-th iteration using modulo: 3,166 ms (+62%)
Simple loop with check per 1 million-th iteration using bit-mask: 2,653 ms (+36%)
Interval-loop as described above with check in outer loop: 1,972 ms (+1.1%)
So even if the loop-body is as simple as above, the overhead for a per-iteration check is only 14%! So it's recommended to not do any tricks but simply check the interruption flag via Thread.interrupted() in every iteration!

Make your calculation an Iterator.
Although this does not sound terribly useful the benefit here is that you can then quite easily write filter iterators that can be surprisingly flexible. They can be added and removed simply - even through configuration if you wish. There are a number of benefits - try it.
You can then add a filtering Iterator that watches the time and checks for interrupt on a regular basis - or something even more flexible.
You can even add further filtering without compromising the original calculation by interspersing it with brittle status checks.

Starvation in non-blocking approaches

I've been reading about non-blocking approaches for some time. Here is a piece of code for so called lock-free counter.
public class CasCounter {
private SimulatedCAS value;
public int getValue() {
return value.get();
}
public int increment() {
int v;
do {
v = value.get();
}
while (v != value.compareAndSwap(v, v + 1));
return v + 1;
}
}
I was just wondering about this loop:
do {
v = value.get();
}
while (v != value.compareAndSwap(v, v + 1));
People say:
So it tries again, and again, until all other threads trying to change the value have done so. This is lock free as no lock is used, but not blocking free as it may have to try again (which is rare) more than once (very rare).
My question is:
How can they be so sure about that? As for me I can't see any reason why this loop can't be infinite, unless JVM has some special mechanisms to solve this.

The loop can be infinite (since it can generate starvation for your thread), but the likelihood for that happening is very small. In order for you to get starvation you need some other thread succeeding in changing the value that you want to update between your read and your store and for that to happen repeatedly.
It would be possible to write code to trigger starvation but for real programs it would be unlikely to happen.
The compare and swap is usually used when you don't think you will have write conflicts very often. Say there is a 50% chance of "miss" when you update, then there is a 25% chance that you will miss in two loops and less than 0.1% chance that no update would succeed in 10 loops. For real world examples, a 50% miss rate is very high (basically not doing anything than updating), and as the miss rate is reduces, to say 1% then the risk of not succeeding in two tries is only 0.01% and in 3 tries 0.0001%.
The usage is similar to the following problem
Set a variable a to 0 and have two threads updating it with a = a+1 a million times each concurrently. At the end a could have any answer between 1000000 (every other update was lost due to overwrite) and 2000000 (no update was overwritten).
The closer to 2000000 you get the more likely the CAS usage is to work since that mean that quite often the CAS would see the expected value and be able to set with the new value.

Edit: I think I have a satisfactory answer now. The bit that confused me was the 'v != compareAndSwap'. In the actual code, CAS returns true if the value is equal to the compared expression. Thus, even if the first thread is interrupted between get and CAS, the second thread will succeed the swap and exit the method, so the first thread will be able to do the CAS.
Of course, it is possible that if two threads call this method an infinite number of times, one of them will not get the chance to run the CAS at all, especially if it has a lower priority, but this is one of the risks of unfair locking (the probability is very low however). As I've said, a queue mechanism would be able to solve this problem.
Sorry for the initial wrong assumptions.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.