datastore count query fetchOptions - java

I want to do the following:
PreparedQuery pq = datastore.prepare(q);
int count = pq.countEntities(FetchOptions.ALL);
But there is no ALL option. So how do I do it?
For context, say I want to count all entry in my table where color is orange.
If I can't do this directly using DatastoreService, can I use Datanucleus's JPA? As in do they support SELECT COUNT(*) ... for the appengine datastore?

You can count total no of record using following code.
com.google.appengine.api.datastore.Query qry = new com.google.appengine.api.datastore.Query("EntityName");
com.google.appengine.api.datastore.DatastoreService datastoreService = DatastoreServiceFactory.getDatastoreService();
int totalCount = datastoreService.prepare(qry).countEntities(FetchOptions.Builder.withDefaults());
i hope it will help you.

The marked answer is not correct, it will max out in 1000
This is how one will get correct count
DatastoreService ds = DatastoreServiceFactory.getDatastoreService();
Query query = new Query("__Stat_Kind__");
Query.Filter eqf = new Query.FilterPredicate("kind_name",
Query.FilterOperator.EQUAL,
"MY_ENTITY_KIND");
query.setFilter(eqf);
Entity entityStat = ds.prepare(query).asSingleEntity();
Long totalEntities = (Long) entityStat.getProperty("count");

You can use Google's plugin for DataNucleus, which seems to show support for count()

This Old but should help for new developers seeking for a way out.
The best way to go about this is using Sharding Counter Techniques, as you save on the entity you know would scale with time, use sharding counter to get the total number of record as it is inserted or the entity group is updated by new record, with this you can get the total number of counter and their corresponding counts which will sum up to give the actual count of the total element in the datastore table or kind.
Use this link for help on how to go about it, for better understanding watch the google i/o 2008 on scaling web applications here, after that you move to this documentation on appengine here, so you get the grasp of it quickly, and also there is a github example test too.
For added example use this link Blog Tutorial which explained a simple example.

Related

Set up Accumulo table through api

new to Accumulo, and this may sound silly, but I was wondering how to setup a table through the api? The documentation is definitely lacking. I have been able to find
conn.tableOperations().createTable("myTable");
as well as like setting up locality groups:
HashSet<Text> metadataColumns = new HashSet<Text>();
metadataColumns.add(new Text("domain"));
metadataColumns.add(new Text("link"));
HashSet<Text> contentColumns = new HashSet<Text>();
contentColumns.add(new Text("body"));
contentColumns.add(new Text("images"));
localityGroups.put("metadata", metadataColumns);
localityGroups.put("content", contentColumns);
conn.tableOperations().setLocalityGroups("mytable", localityGroups);
Map<String, Set<Text>> groups =
conn.tableOperations().getLocalityGroups("mytable");
From the documentation, but I want to know how to take the first approach and build the table. Then build the columns.
Thanks in advance!
There is no inherent schema for a table to set up. Once it is created using the API you found, you can insert whatever key-value pairs you wish in it.

Query All Google App Engine

Im trying to build a query that get all rows in datastore,but my problem is that i have hundreds of rows and when i try to run one time,i almost get limits quotas..
So my question is,what im doing wrong?
Query query = new Query(myObject);
PreparedQuery pq = datastore.prepare(query);
QueryResultList<Entity> results = pq.asQueryResultList(fetchOptions);
resp.setContentType("text/plain");
resp.getWriter().println(results.size());
for (Entity entity : results) {
resp.getWriter().println("entity.getProperty("name")");
What you have wrong is your algorithm. You cant do that with many rows since your frontend will timeout.
Look at task queues and backends.

GAE datastore querying integer fields

I notice strange behavior when querying the GAE datastore. Under certain circumstances Filter does not work for integer fields. The following java code reproduces the problem:
log.info("start experiment");
DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();
int val = 777;
// create and store the first entity.
Entity testEntity1 = new Entity(KeyFactory.createKey("Test", "entity1"));
Object value = new Integer(val);
testEntity1.setProperty("field", value);
datastore.put(testEntity1);
// create the second entity by using BeanUtils.
Test test2 = new Test(); // just a regular bean with an int field
test2.setField(val);
Entity testEntity2 = new Entity(KeyFactory.createKey("Test", "entity2"));
Map<String, Object> description = BeanUtilsBean.getInstance().describe(test2);
for(Entry<String,Object> entry:description.entrySet()){
testEntity2.setProperty(entry.getKey(), entry.getValue());
}
datastore.put(testEntity2);
// now try to retrieve the entities from the database...
Filter equalFilter = new FilterPredicate("field", FilterOperator.EQUAL, val);
Query q = new Query("Test").setFilter(equalFilter);
Iterator<Entity> iter = datastore.prepare(q).asIterator();
while (iter.hasNext()) {
log.info("found entity: " + iter.next().getKey());
}
log.info("experiment finished");
the log looks like this:
INFO: start experiment
INFO: found entity: Test("entity1")
INFO: experiment finished
For some reason it only finds the first entity even though both entities are actually stored in the datastore and both 'field' values are 777 (I see it in the Datastore Viewer)! Why does it matter how the entity is created? I would like to use BeanUtils, because it is convenient.
The same problem occurs on the local devserver and when deployed to GAE.
Ok I found out what is going on. The "problem" is that for some reason BeanUtils transforms integers into strings. A string looks exactly the same in the datastore viewer but it is of course not the same. This pretty much fooled me. I should have studied the apache BeanUtils manual or something.
Have you given the datastore 1 second after writing before you query the data? Sometimes you don't have to (ancestor queries, perhaps) but other times you do. The GAE/J documentation will give full details.
The fact that the entities are created with BeanUtils is completely irrelevant. If the entities are in the datastore (you can see them in the viewer) and the field value is indexed (it does not show "unindexed" next to value in datastore viewer) then you can query for them using a filter. This works... its is the basic functionality of the datastore.
Given the entities are created and indexed, I suggest that Ian Marshalls suggestion is probably correct. To test this, go to the preferences for App Engine and un-tick "Enable local HRD support". This will ensure that when you write an Entity you can query for it immediately.
It is not important if you store an Integer or int or any other numeric value - they are all stored as a long value internally and when you read your value back you will get a Long (despite storing an Integer)

Update all objects in JPA entity

I'm trying to update all my 4000 Objects in ProfileEntity but I am getting the following exception:
javax.persistence.QueryTimeoutException: The datastore operation timed out, or the data was temporarily unavailable.
this is my code:
public synchronized static void setX4all()
{
em = EMF.get().createEntityManager();
Query query = em.createQuery("SELECT p FROM ProfileEntity p");
List<ProfileEntity> usersList = query.getResultList();
int a,b,x;
for (ProfileEntity profileEntity : usersList)
{
a = profileEntity.getA();
b = profileEntity.getB();
x = func(a,b);
profileEntity.setX(x);
em.getTransaction().begin();
em.persist(profileEntity);
em.getTransaction().commit();
}
em.close();
}
I'm guessing that I take too long to query all of the records from ProfileEntity.
How should I do it?
I'm using Google App Engine so no UPDATE queries are possible.
Edited 18/10
In this 2 days I tried:
using Backends as Thanos Makris suggested but got to a dead end. You can see my question here.
reading DataNucleus suggestion on Map-Reduce but really got lost.
I'm looking for a different direction. Since I only going to do this update once, Maybe I can update manually every 200 objects or so.
Is it possible to to query for the first 200 objects and after it the second 200 objects and so on?
Given your scenario, I would advice to run a native update query:
Query query = em.createNativeQuery("update ProfileEntity pe set pe.X = 'x'");
query.executeUpdate();
Please note: Here the query string is SQL i.e. update **table_name** set ....
This will work better.
Change the update process to use something like Map-Reduce. This means all is done in datastore. The only problem is that appengine-mapreduce is not fully released yet (though you can easily build the jar yourself and use it in your GAE app - many others have done so).
If you want to set(x) for all object's, better to user update statement (i.e. native SQL) using JPA entity manager instead of fetching all object's and update it one by one.
Maybe you should consider the use of the Task Queue API that enable you to execute tasks up to 10min. If you want to update such a number of entities that Task Queues do not fit you, you could also consider the user of Backends.
Put the transaction outside of the loop:
em.getTransaction().begin();
for (ProfileEntity profileEntity : usersList) {
...
}
em.getTransaction().commit();
Your class behaves not very well - JPA is not suitable for bulk updates this way - you just starting a lot of transaction in rapid sequence and produce a lot of load on the database. Better solution for your use case would be scalar query setting all the objects without loading them into JVM first ( depending on your objects structure and laziness you would load much more data as you think )
See hibernate reference:
http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/batch.html#batch-direct

Hibernate ScrollableResults Do Not Return The Whole Set of Results

Some of the queries we run have 100'000+ results and it takes forever to load them and then send them to the client. So I'm using ScrollableResults to have a paged results feature. But we're topping at roughly 50k results (never exactly the same amount of results).
I'm on an Oracle9i database, using the Oracle 10 drivers and Hibernate is configured to use the Oracle9 dialect. I tried with the latest JDBC driver (ojdbc6.jar) and the problem was reproduced.
We also followed some advice and added an ordering clause, but the problem was reproduced.
Here is a code snippet that illustrates what we do:
final int pageSize = 50;
Criteria crit = sess.createCriteria(ABC.class);
crit.add(Restrictions.eq("property", value));
crit.setFetchSize(pageSize);
crit.addOrder(Order.asc("property"));
ScrollableResults sr = crit.scroll();
...
...
ArrayList page = new ArrayList(pageSize);
do{
for (Object entry : page)
sess.evict(entry); //to avoid having our memory just explode out of proportion
page.clear();
for (int i =0 ; i < pageSize && ! metLastRow; i++){
if (sr.next())
page.add(sr.get(0));
else
metLastRow = true;
}
metLastRow = metLastRow?metLastRow:sr.isLast();
sendToClient(page);
}while(!metLastRow);
So, why is it that I get the result set to tell me its at the end when it should be having so much more results?
Your code snippet is missing important pieces, like the definitions of resultSet and page. But I wonder anyway, shouldn't the line
if (resultSet.next())
be rather
if (sr.next())
?
As a side note, AFAIK cleaning up superfluous objects from the persistence context could be achieved simply by calling
session.flush();
session.clear();
instead of looping through the collection of object to evict each separately. (Of course, this requires that the query is executed in its own independent session.)
Update: OK, next round of guesses :-)
Can you actually check what rows are sent to the client and compare that against the result of the equivalent SQL query directly against the DB? It would be good to know whether this code retrieves (and sends to the client all rows up to a certain limit, or only some rows (like every 2nd) from the whole resultset, or ... that could shed some light on the root cause.
Another thing you could try is
crit.setFirstResults(0).setMaxResults(200000);
As I had the same issue with a large project code based on List<E> instances,
I wrote a really limited List implementation with only iterator support to browse a ScrollableResults without refactoring all services implementations and method prototypes.
This implementation is available in my IterableListScrollableResults.java Gist
It also regularly flushes Hibernate entities from session. Here is a way to use it, for instance when exporting all non archived entities from DB as a text file with a for loop:
Criteria criteria = getCurrentSession().createCriteria(LargeVolumeEntity.class);
criteria.add(Restrictions.eq("archived", Boolean.FALSE));
criteria.setReadOnly(true);
criteria.setCacheable(false);
List<E> result = new IterableListScrollableResults<E>(getCurrentSession(),
criteria.scroll(ScrollMode.FORWARD_ONLY));
for(E entity : result) {
dumpEntity(file, entity);
}
With the hope it may help

Categories

Resources