App Engine Datastore - Incrementing property increments by 2 - java

I'm trying to build out a simple app engine datastore entity that basically keeps count of how many times it was viewed. I've got the code setup in a JSP file that does increment the variable, however every time it seems to increment by 2 rather than one. Here's the code in question.
DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();
Entity story = datastore.get(KeyFactory.stringToKey(request.getParameter("story_key")));
String json_out = "";
int num_views = 0;
if(story.getProperty("views") != null) {
num_views = Integer.parseInt(story.getProperty("views").toString());
}
//Update the donated status of this object.
story.setProperty("views", num_views + 1);
datastore.put(story);
json_out += "{";
json_out += "\"title\":\"" + story.getProperty("title") + "\", ";
json_out += "\"views\":\"" + num_views + "\"";
json_out += "}";
out.println(json_out);
Any idea why it would be incrementing by 2? I've even tried subtracting one when I get the number of views, but then the number just stays the same all the time as you'd expect. Really odd.

If you are implementing a counter using the datastore, you should you techniques that allow for high throughput. Your solution above could easily write to the datastore more than a few times per second, violating HRD policies. Also, it's not thread safe (not run in a transaction, so updates could apply out of order and your result is not what you expect). Try out shard counters, which fix these issues:
http://code.google.com/appengine/articles/sharding_counters.html

Related

How to allow rapid post requests on Google-App?

I'm setting up an android order taking application, and need to store the orders in google sheets. It works, but until I added the post request function in a for loop. So I fixed this by adding a delay in the for loop. However, is there any other way to do this? Because each time I submit an order the app lags until the for loop has finished.
I've added a thread.sleep to avoid this problem like so:
for (int i = 0; i < order.size(); i++){
String itemName = order.get(i);
String addOn = addOns.get(i);
String quantity = order.get(i);
quantity = quantity.substring(quantity.length() - 2, quantity.length() - 1);
itemName = itemName.substring(0, itemName.length() - 4);
System.out.println("Item ID: " + i + " " + itemName + " " + addOn + " " + quantity + " " + orderId );
try{
Thread.sleep(550); //This is the delay I added.
}catch(InterruptedException ex){
//do stuff
}
addItemToSheet(itemName,addOn,quantity,orderId);
}
if (null != getActivity()) {
((MainActivity) getActivity()).clearOrderList();
}
Here is the google app:
var ss = SpreadsheetApp.openByUrl("SPREADSHEET URL");
var sheet = ss.getSheetByName('Items');
function doPost(e){
var action = e.parameter.action;
if(action == 'addItem'){
return addItem(e);
}
}
function addItem(e){
var date = new Date();
var id = sheet.getLastRow();
var orderNum = e.parameter.orderNum;
var itemName = e.parameter.itemName;
var quantity = e.parameter.quantity;
var extra = e.parameter.extra;
sheet.appendRow([date,id,orderNum,itemName,quantity,extra]);
return ContentService.createTextOutput("Success").setMimeType(ContentService.MimeType.TEXT);
}
Is there anything I can do to either the java loop or google app, so I can avoid using thread.sleep which freezes my app until the operation is done? Maybe an alternative way to delay in java? or a faster operation in google apps?
In order to achieve what you want, I would recommend multi-threading.
Multithreading is a Java feature that allows concurrent execution of
two or more parts of a program for maximum utilization of CPU. Each
part of such program is called a thread. So, threads are light-weight
processes within a process.
This way your application will continue to run in a thread while you POST request is being handled in a different thread. Please bear in mind that in order to wait(), and notify() methods.
The methods above are used to control your workflow.
wait(): This method behaves exactly as if it simply performs the call wait(0).
notify(): This method wakes up a single thread that is waiting on this object's monitor.
More details about multi-threading in Java, here.
Here you shall find more details about the methods mentioned above.

Prometheus Counter Inconsistency

I'm using the Prometheus Java simpleclient within a web service to keep track of how many events result in one status or another.
I'm able to check within the logs that the counter is being invoked and is incrementing internally, but it seems that a lot of times the data is not making it to the /metrics endpoint.
For example, just now, after incrementing the counter 3 times for the same status a few minutes apart each, the log would print out "Current Value = 0, New value = 1" three times. The first two times did not show any data on the /metrics endpoint, and after the 3rd increment, it finally showed a value of 1, which means I lost the record of the first 2 events.
The code I have is the following below, besides some name changes.
private static final Counter myCounter = Counter.build()
.name("myMetric")
.help("My metric")
.labelNames("status").register();
...
private static void incrementCounter(String status) {
Counter.Child counter = myCounter.labels(status);
Logger.info("Before Incrementing counter for status= " + status + ". Current value=" + counter.get());
counter.inc();
Logger.info("After Incrementing counter for status= " + status + ". New value=" + counter.get());
}
I'm at a loss as for why Prometheus doesn't seem to be able to keep track of these counters consistently. Is anyone able to see what's wrong or a better way to record these Counter metrics?
The only reason I can guess are concurrent incrementCounter calls.
The io.prometheus.client.SimpleCollector#labels method is not thread-safe (despite that children field has ConcurrentMap type), thus it is possible to get different io.prometheus.client.Counter.Child on every call.
As for getting metrics via http - every call to /metrics endpoint leads to io.prometheus.client.Counter#collect method call, which retrieves value of the only one child.
I would suggest you to use your own concurrent map to store counters:
private static final ConcurrentMap<String, Counter.Child> counters = new ConcurrentHashMap<>();
// ...
private static void incrementCounter(String status) {
Counter.Child counter = counters.computeIfAbsent(status, myCounter::labels) ;
// ...
}

Google RateLimiter not Working for counter

I have case of limiting calls to 100/s.
I am thinking of using Google Guava RateLimiter. I tested it like this:-
int cps = 100;
RateLimiter limiter = RateLimiter.create(cps);
for(int i=0;i<200;i++) {
limiter.acquire();
System.out.print("\rNumber of records processed = " + i+1);
}
But the code did not stop at 100 records to let 1 sec be completed. Am I doing something wrong?
The RateLimiter is working ok. The problem is that your output is buffered, because you are not flushing it each time. Usually, standard output is line-buffered. So if you had written
System.out.println("Number of records processed = " + (i+1));
you would have seen a pause at 100. However, what you have:
System.out.print("\rNumber of records processed = " + i+1);
has two problems. First, the "\r" is not taken as a new line and does not cause flushing; therefore the whole output is buffered and is printed to the console all in one go. Second, you need to put (i+1) in parentheses. What you have appends i to the string, and then appends 1 to the resultant string.
Besides #DodgyCodeException's suggestions regarding output flushing and concatenating +1, let's run this code to make sure you understand how RateLimiter works:
final double permitsPerSecond = 1;
RateLimiter limiter = RateLimiter.create(permitsPerSecond);
final Stopwatch stopwatch = Stopwatch.createStarted();
int i = 0;
for (; i < 2 * permitsPerSecond; i++) {
limiter.acquire();
}
System.out.println("Elapsed = " + stopwatch.stop().elapsed(TimeUnit.MILLISECONDS) + "ms");
System.out.println("Number of records processed = " + i);
(Note that I set number of tries to twice the permitsPerSecond number.) When you set permitsPerSecond to 1, you'll see:
Elapsed = 1001ms
Number of records processed = 2
For permitsPerSecond = 10; and permitsPerSecond = 100; it'd approaches (in mathematical sense) 2s limit, because 11th or 101st try waits for limit set in RateLimiter:
Elapsed = 1902ms
Number of records processed = 20
and
Elapsed = 1991ms
Number of records processed = 200

I'm getting different results every time I run my code

I'm using ELKI to cluster my data I used KMeansLloyd<NumberVector> with k=3 every time I run my java code I'm getting totally different clusters results, is this normal or there is something I should do to make my output nearly stable?? here my code that I got from elki tutorials
DatabaseConnection dbc = new ArrayAdapterDatabaseConnection(a);
// Create a database (which may contain multiple relations!)
Database db = new StaticArrayDatabase(dbc, null);
// Load the data into the database (do NOT forget to initialize...)
db.initialize();
// Relation containing the number vectors:
Relation<NumberVector> rel = db.getRelation(TypeUtil.NUMBER_VECTOR_FIELD);
// We know that the ids must be a continuous range:
DBIDRange ids = (DBIDRange) rel.getDBIDs();
// K-means should be used with squared Euclidean (least squares):
//SquaredEuclideanDistanceFunction dist = SquaredEuclideanDistanceFunction.STATIC;
CosineDistanceFunction dist= CosineDistanceFunction.STATIC;
// Default initialization, using global random:
// To fix the random seed, use: new RandomFactory(seed);
RandomlyGeneratedInitialMeans init = new RandomlyGeneratedInitialMeans(RandomFactory.DEFAULT);
// Textbook k-means clustering:
KMeansLloyd<NumberVector> km = new KMeansLloyd<>(dist, //
3 /* k - number of partitions */, //
0 /* maximum number of iterations: no limit */, init);
// K-means will automatically choose a numerical relation from the data set:
// But we could make it explicit (if there were more than one numeric
// relation!): km.run(db, rel);
Clustering<KMeansModel> c = km.run(db);
// Output all clusters:
int i = 0;
for(Cluster<KMeansModel> clu : c.getAllClusters()) {
// K-means will name all clusters "Cluster" in lack of noise support:
System.out.println("#" + i + ": " + clu.getNameAutomatic());
System.out.println("Size: " + clu.size());
System.out.println("Center: " + clu.getModel().getPrototype().toString());
// Iterate over objects:
System.out.print("Objects: ");
for(DBIDIter it = clu.getIDs().iter(); it.valid(); it.advance()) {
// To get the vector use:
NumberVector v = rel.get(it);
// Offset within our DBID range: "line number"
final int offset = ids.getOffset(it);
System.out.print(v+" " + offset);
// Do NOT rely on using "internalGetIndex()" directly!
}
System.out.println();
++i;
}
I would say, since you are using RandomlyGeneratedInitialMeans:
Initialize k-means by generating random vectors (within the data sets value range).
RandomlyGeneratedInitialMeans init = new RandomlyGeneratedInitialMeans(RandomFactory.DEFAULT);
Yes, it is normal.
K-Means is supposed to be initialized randomly. It is desirable to get different results when running it multiple times.
If you don't want this, use a fixed random seed.
From the code you copy and pasted:
// To fix the random seed, use: new RandomFactory(seed);
That is exactly what you should do...
long seed = 0;
RandomlyGeneratedInitialMeans init = new RandomlyGeneratedInitialMeans(
new RandomFactory(seed));
This was too long for a comment. As #Idos stated, You are initializing your data randomly; that's why you're getting random results. Now the question is, how do you ensure the results are robust? Try this:
Run the algorithm N times. Each time, record the cluster membership for each observation. When you are finished, classify an observation into the cluster which contained it most often. For example, suppose you have 3 observations, 3 classes, and run the algorithm 3 times:
obs R1 R2 R3
1 A A B
2 B B B
3 C B B
Then you should classify obs1 as A since it was most often classified as A. Classify obs2 as B since it was always classified as B. And classify obs3 as B since it was most often classified as B by the algorithm. The results should become increasingly stable the more times you run the algorithm.

Spring data / Neo4j path length with large data sets

I have been running the following query to find relatives within a certain "distance" of a given person:
#Query("start person=node({0}), relatives=node:__types__(className='Person') match p=person-[:PARTNER|CHILD*]-relatives where LENGTH(p) <= 2*{1} return distinct relatives")
Set<Person> getRelatives(Person person, int distance);
The 2*{1} comes from one conceptual "hop" between people being represented as two nodes - one Person and one Partnership.
This has been fine so far, on test populations. Now I'm moving on to actual data, which consists of sizes from 1-10 million, and this is taking for ever (also from the data browser in the web interface).
Assuming the cost was from loading everything into ancestors, I rewrote the query as a test in the data browser:
start person=node(385716) match p=person-[:PARTNER|CHILD*1..10]-relatives where relatives.__type__! = 'Person' return distinct relatives
And that works fine, in fractions of a second on the same data store. But when I want to put it back into Java:
#Query("start person=node({0}) match p=person-[:PARTNER|CHILD*1..{1}]-relatives where relatives.__type__! = 'Person' return relatives")
Set<Person> getRelatives(Person person, int distance);
That won't work:
[...]
Nested exception is Properties on pattern elements are not allowed in MATCH.
"start person=node({0}) match p=person-[:PARTNER|CHILD*1..{1}]-relatives where relatives.__type__! = 'Neo4jPerson' return relatives"
^
Is there a better way of putting a path length restriction in there? I would prefer not to use a where as that would involve loading ALL the paths, potentially loading millions of nodes where I need only go to a depth of 10. This would presumably leave me no better off.
Any ideas would be greatly appreciated!
Michael to the rescue!
My solution:
public Set<Person> getRelatives(final Person person, final int distance) {
final String query = "start person=node(" + person.getId() + ") "
+ "match p=person-[:PARTNER|CHILD*1.." + 2 * distance + "]-relatives "
+ "where relatives.__type__! = '" + Person.class.getSimpleName() + "' "
+ "return distinct relatives";
return this.query(query);
// Where I would previously instead have called
// return personRepository.getRelatives(person, distance);
}
public Set<Person> query(final String q) {
final EndResult<Person> result = this.template.query(q, MapUtil.map()).to(Neo4jPerson.class);
final Set<Person> people = new HashSet<Person>();
for (final Person p : result) {
people.add(p);
}
return people;
}
Which runs very quickly!
You're almost there :)
Your first query is a full graph scan, which effectively loads the whole database into memory and pulls all nodes through this pattern match multiple times.
So it won't be fast, also it would return huge datasets, don't know if that's what you want.
The second query looks good, the only thing is that you cannot parametrize the min-max values of variable length relationships. Due to effects to query optimization / caching.
So for right now you'd have to go with template.query() or different query methods in your repo for different max-values.

Categories

Resources