What is the best/fastest way to check if an Entity exists in a google-app-engine datastore? For now I'm trying to get the entity by key and checking if the get() returns an error.
I don't know the process of getting an Entity on the datastore. Is there a faster way for doing only this check?
What you proposed would indeed be the fastest way to know if your entity exists. The only thing slowing you down is the time it takes to fetch and deserialize your entity. If your entity is large, this can slow you down.
IF this action (checking for existence) is a major bottleneck for you and you have large entities, you may want to roll your own system of checking by using two entities - first you would have your existing entity with data, and a second entity that either stores the reference to the real entity, or perhaps an empty entity where the key is just a variation on the original entity key that you can compute. You can check for existence quickly using the 2nd entity, and then fetch the first entity only if the data is necessary.
The better way I think would just be to design your keys such they you know there would not be duplicates, or that your operations are idempotent, so that even if an old entity was overwritten, it wouldn't matter.
com.google.appengine.api has been deprecated in favor of the App Engine GCS client.
Have you considered using a query? Guess-and-check is not a scalable way to find out of an entity exists in a data store. A query can be created to retrieve entities from the datastore that meet a specified set of conditions:
https://developers.google.com/appengine/docs/java/datastore/queries
EDIT:
What about the key-only query? Key-only queries run faster than queries that return complete entities. To return only the keys, use the Query.setKeysOnly() method.
new Query("Kind").addFilter(Entity.KEY_RESERVED_PROPERTY, FilterOperator.EQUAL, key).setKeysOnly();
Source: [1]: http://groups.google.com/group/google-appengine-java/browse_thread/thread/b1d1bb69f0635d46/0e2ba938fad3a543?pli=1
You could fetch using a List<Key> containing only one Key, that method returns a Map<Key, Entity> which you can check if it contains an actual value or null, for example:
Entity e = datastoreService.get(Arrays.asList(key)).get(key);
In general though I think it'd be easier to wrap the get() in a try/catch that returns null if the EntityNotFoundException is caught.
Related
Looking for an efficient way to update only one property for an entity in GAE.
I know I can do a get by key, set a property and then put. But will the get not be very inefficient as it will load all properties? I have heard that you can make property specific queries but I was worried that once you load an entity with only say one or two out of its total properties, then put it back in the datastore that the properties not loaded in the query will be lost.
Any Advice?
PS also not sure about the query method because I heard direct gets are more efficient. Any possibility of a query that specifies simply the key and therefore will be just as efficient?
Afaik, entities are stored in a serialised form, so it makes no difference if you need one or all properties as they will all be loaded when entity's serialised form is loaded.
The "property specific queries" are actually called projection queries. They work on indexes only and only recreate "projected" fields you queried by. Since entities are only partially loaded (only projected fields are loaded) they should not be saved back to the Datastore.
Just use normal query and then multi-put. Yes, direct gets are more efficient (and less costly) but you need to have key/id of the entity.
If you need to update one property far more than others, you can move it into a separate, simpler entity that you can load and update independently of the main entity. This could be a child entity, or a separate one that shares key characteristics.
E.g.
Email <- main entity
Unread <- child entity of email
When the email is created, create an unread entity. When it's read, delete the unread entity. When searching for unread emails, perform a key-only query on the Unread entities, extract parent keys to find the Email entities you want.
I am using Spring+Hibernate for my application. I have a few CRUD operations. Before inserting, I need to check if a similar entry is already in the database, if yes it should not be inserted.
For eg: If I am trying to create a Department, before inserting the row, I should check if a department with the same name already exist or not. If yes, the method returns error message.
Now, I know the unique key constraint can be set on the column to do the check. But, I want to know if there is any other way to do this.
The only way I can think of is first fetching all the departments from the database and check against each object.
Please let me know if there is any other way.
The only way I can think of is first fetching all the departments from the database and check against each object.
You don't need to fetch all departments form the database. It should be enough to search the database for the department with the name you want to insert. Since the name should have a unique key anyways it should be fast enough.
If your #Id attribute is department name, then the saveOrUpdate API of Hibernate will check if an object with that id is already present in the DB. If so it will update, else it will create a new entry. Hope this should help you. See this link.
You can try find object from database by "get" method:
Cat cat = (Cat) sess.get(Cat.class, id);
If received object is null, you can add new.
Also for performance better use query with "count" predicate, for avoid whole object loading.
I'm working on an AppEngine project and I'm using JDO on top of the AppEngine datastore for persistence. I have an entity that uses an encoded string as the key and also uses an application generated keyname (also a string). I did this because my app would frequently scoop data (potentially scooping the same thing) from the wild and attempt to persist them. In an attempt to avoid persisting several entities which essentially contain the same data, I decided to hash some properties about these data so as to get a consistent keyname (not manipulating keys directly because of entity relationships).
The problem now is that whenever I calculate my hash (keyname) and attempt to store the entity, if it already exists in the datastore, the datastore (or JDO or whoever the culprit is) silently overwrites the properties of the entity in the datastore without raising any exception. This has serious effects on the app because it overrides the timeStamps (a field) of the entities (which we use for ordering).
How best can I get around this?
You need to do get-before-set (Check and set or CAS).
CAS is a fundamental tenant of concurrency, and it's a necessary evil of parallel computing.
Gets are much cheaper than sets anyway, so it may actually save you money.
Instead of blind writing to datastore, first retrieve; if the entity doesn't exist, catch the exception and just put the entity. If it does exist, do a deep compare before you save. If nothing has changed, don't persist it (and save that cost). If it has changed, choose your merge strategy however you please. One (slightly ugly) way to maintain dated revisions is to store the previous entity as a field in the updated entity (may not work for many revisions).
But, in this case, you have to get before set. If you don't expect many duplicates and want to be really chintzy, you can do an exists query first... Which is to do a keys-only count query on the key you want to use (costs 7x less than a full get). If (count() == 0) then put() else getAndMaybePut() fi
The count query syntax might look slow, but from my benchmarks, it's the fastest (and cheapest) possible way to tell if an entity exists:
public boolean exists(Key key){
Query q;
if (key.getParent() == null)
q = new Query(key.getKind());
else
q = new Query(key.getKind(), key.getParent());
q.setKeysOnly();
q.setFilter(new FilterPredicate(
Entity.KEY_RESERVED_PROPERTY, FilterOperator.EQUAL, key));
return 1 == DatastoreServiceFactory.getDatastoreService().prepare(q)
.countEntities(FetchOptions.Builder.withLimit(1));
}
You must do a get() to see if an entity with the same key exists before you put() the new entity. There is no way around doing this.
You can use memcache and local "in-memory" caching to speed up your get() operation. This may only help if you are likely to read the same information multiple times. If not, the memcache query may actually slow down your process.
To ensure that two requests do not overwrite each other you should use a transaction (not possible with a query as suggested by Ajax unless you put all items in a single entity group which may limit your updates to 1 per second)
In pseudo code:
Create Key from hashing data
Check in-memory cache for key (use a ConcurrentHashSet of keys), return if found
Check MemcacheService for key, return if found
Start transaction
Get entity from datastore, return if found
Create entity in datastore
Commit transaction, return if fails due to concurrent update
Put Key in cache (in-memory and memcache)
Step 7 will fail if another request (thread) has already written the same key at the same time.
What I suggest you is that instead of saving the ID as a string either use a Long ID for your entity or you may use Key datatype, which is auto generated by appengine.
#PersistenceCapable
public class Test{
#PrimaryKey
#Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Long ID;
// getter and setter
}
This will return a unique value to you everytime.
I have a large object that I store using objectify. I need a list of those objects with only subset of the properties populated. How can this be done?
App Engine stores and retrieves entities as encoded Protocol Buffers. There's no way for the underlying infrastructure to store, update, or retrieve only part of an entity, so there's no point having a library that does this - hence Objectify, like other libraries, don't. If you regularly need to access only part of an entity, split those fields into a separate entity.
It's not a good idea to split an entity in two in a noSql database: when you need to read a list of entries, you would be obliged to do n requests to get the second part of the list (n x m if your data is split in more entities). This is naturally due to the fact that there is no possible join in noSql databases.
What could be done is to "cache": duplicate the needed subset in another entity to get the most of performance. It has the disadvantage of being obliged to write twice on a persist of the main entity (if a field of the subset was changed).
What I usually do is write a /** OPTIMIZE xxxx */ comment on the class that needs to read a subset and get back to it when I need more performance.
I am looking for a way to save or update records, according to the table's unique key which is composed of several columns).
I want to achieve the same functionality used by INSERT ... ON DUPLICATE KEY UPDATE - meaning to blindly save a record, and have the DB/Hibernate insert a new one, or update the existing one if the unique key already exists.
I know I can use #SQLInsert( sql="INSERT INTO .. ON DUPLICATE KEY UPDATE"), but I was hoping not to write my own SQLs and let Hibernate do the job. (I am assuming it will do a better job - otherwise why use Hibernate?)
Hibernate may throw a ConstraintViolationException when you attempt to insert a row that breaks a constraint (including a unique constraint). If you don't get that exception, you may get some other general Hibernate exception - it depends on the version of Hibernate and the ability of Hibernate to map the MySQL exception to a Hibernate exception in the version and type of database you are using (I haven't tested it on everything).
You will only get the exception after calling flush(), so you should make sure this is also in your try-catch block.
I would be careful of implementing solutions where you check that the row exists first. If multiple sessions are updating the table concurrently you could get a race condition. Two processes read the row at nearly-the-same time to see if it exists; they both detect that it is not there, and then they both try to create a new row. One will fail depending on who wins the race.
A better solution is to attempt the insert first and if it fails, assume it was there already. However, once you have an exception you will have to roll back, so that will limit how you can use this approach.
This doesn't really sound like a clean approach to me. It would be better to first see if an entity with given key(s) exists. If so, update it and save it, if not create a new one.
EDIT
Or maybe consider if merge() is what you're looking for:
if there is a persistent instance with the same identifier currently associated with the session, copy the state of the given object onto the persistent instance
if there is no persistent instance currently associated with the session, try to load it from the database, or create a new persistent instance
the persistent instance is returned
the given instance does not become associated with the session, it remains detached
< http://docs.jboss.org/hibernate/core/3.3/reference/en/html/objectstate.html
You could use saveOrUpdate() from Session class.