Java App Engine DatastoreService or PersistenceManager?

Java App Engine DatastoreService or PersistenceManager? - java

Working on a app built on GAE in java. Having trouble figuring out whether i want to use DatastoreServices and the Entity class, or make class's for my objects, and using the PersistenceManager.
Can someone explain the differences of these two services?

If you don't have a fixed reason to use either of those, take a look at Objectify. It's a much better and more usable library for storage.
The Datastore services from the SDK are low level services that let you talk directly with Bigtable RPC. You'll be writing code that translates into Bigtable's data formats and API, and calling Bigtable's RPC methods.
The JDO specification and its persistence manager are a Java standard that deals with creating data classes and their storage and retrieval. It's one level of abstraction higher than the direct datastore services. Google has implemented the JDO spec to run on GAE, so you can use it if you want to.
Objectify is an alternative to the JDO that isn't an enterprise Java standard, but is a lot more easy and fun to use. It follows the Python API quite closely, which makes it quite nice and concise.

My vote for DatastoreService because PersistenceManager (JDO) consumes more resources ( = more money) and it is slower.
http://gaejava.appspot.com/ — here you can compare them. Try to run this test a couple times.

Related

How can I integrate both Java and Python in a Visual Studio project?

I'm developing a software which involves database creation, manipulation and extension in a proprietary format, which is partly inspired from SQL but is more extensive.
Some part of my software is coded in Python, most in Java, and just a bit front-end in Visual Basic. How can I integrate the back-end code that is written in Python and Java, with VB (2012) and call functions among them, in a single solution?

I would approach it by picking one of three languages as your database back-end, then develop an API for interacting between them using HTTP or some other protocol.
For instance, you could use Python Django's great ORM to develop your database models, and then use the Django Tastypie library to expose API resources for your database models at endpoints like http://localhost/api/v2/foo/ (a list of foo objects) and http://localhost/api/v2/foo/24/ (a detailed foo object with foo.id = 24).
Then, write a class in your VB.NET App_Code folder called localhostRESTfulApi which consumes your django-tastypie API. This could be a simple or as complex as you want.
The simple case would include invoking API calls verbosely using something like localhostRESTfulApi.POST([endpoint URL], [new object data]). You have to be knowledgeable about how the POST/GET/PATCH requests work in all circumstances and it's not very OOP-y.
The complex solution would be to re-write wrappers for your objects in the django database back-end as classes in VB (and again in Java) which have a constructor that either takes null for a new object or id for an object which it would then retrieve via the API upon instantiation (e.g. by consuming the localhostRESTfulApi.GET method. This way you save yourself some clunkiness by only having to write the API GET/POST logic in the private methods of the object class, instead of littered throughout your application.
You would also write a save() method which does the appropriate API PATCH/PUT request behind the scenes to sync with the database back-end -- or perhaps override your get/set attribute methods on the class to retrieve and save changes on-the-fly (without invoking an asynchronous save() method).
I'm curious to see what solutions other SO users come up with. This is merely one way I have done it previously on a multiple language/platform application (for which I also needed an external RESTful API anyway).

Understand Twitter API

I'm working on a java problem that (at least is trying) to utilize the twitter API, however, it is my first project using any type of API and I am a little confused. What is the benefit of using a java library for the twitter API such as Twitter4J and how would one go about not using one? I'm a little fuzzy on the topic of APIs in general and I'm not finding anything in my searches that really makes it clear how to use one.Do I need to use a Java library or can I do it without one? what are the pros and cons of using one vs not using one. I am relatively new to this and am having some issues. Any help?

First what an API is:
An application programming interface (API) is a particular set of
rules ('code') and specifications that software programs can follow to
communicate with each other. It serves as an interface between
different software programs and facilitates their interaction, similar
to the way the user interface facilitates interaction between humans
and computers. An API can be created for applications, libraries,
operating systems, etc., as a way of defining their "vocabularies" and
resources request conventions (e.g. function-calling conventions). It
may include specifications for routines, data structures, object
classes, and protocols used to communicate between the consumer
program and the implementer program of the API
The use of the Twitter4J API would allow you to easily call commands that do complex operations, such as get tweets as they are coming in. For projects such as this, using an API is best way to go about it as you are also going to be required to get an access key which allows you permission to use the API.
Examples using Twitter4J: http://twitter4j.org/en/code-examples.html

You need to distinguish between an "API" and a "Library"
You NEED the Twitter API: it's the thing that connects twitter to your code. You can use this to send a "post this to my account" command for instance.
You CAN use a library: it helps your code talk to the api, by doing some of the work for you. You can call a function with only a string as parameter, and this function calls the forementioned send-to-twitter API
You can ofcourse say things like that the library has an API, but this would be confusing the situation a bit.
In the end it is quite nice to use the library because it helps you by writing code in your language.

Is it recommended to use GAE types?

I see that GAE provides several types like Email or PostalAddress for its entities. I've read they don't provide any validation. So I wonder: what's the benefit of using them instead of storing the data on a simple String field? Any reason I should use them?
EDIT: answered at this question

I hope Google answers this. Some guesses:
The Python docs talk about how these property types are represented in gdata. Can you instantiate an Email property from a gdata feed and then set it on a datastore entity? Could your app engine app (some day) provide a gdata-style web service?
Maybe Google has heuristics about the storage requirements of these types that help them optimize storage in the datastore.
Maybe they'll add functionality to these types in a later release.
Maybe they like typing for typing's sake.

Currently the only purpose for them seems to be that they seamlessly convert to gdata atom feeds which Google uses. They are probably fields that have been implemented on the framework which GoogleApps is built on and were included on the AppEngine data model. That way if you want to make an app that interacts with GoogleApps, it is that much easier.
I'm also certain that the values are "normalized" properly to be indexed. The value of the LinkProperty is most likely changed from www.stackoverflow.com to com.stackoverflow.www.

BigTable vs noSQL

may i know in 'nosql' there is limitation just like bigtable where we should 'denormalized' our table/entity ?
any api wrapper that allow we to write code once and can be used for google app engine bigtable and nosql ? (something like hiberanate)

Yes, for example in MongoDB you don't have joins since it is non-relational, so it does change how we store and browse the data.
As MongoDB is non-relational (no
joins), references ("foreign keys")
between documents are generally
resolved client-side by additional
queries to the server. Two conventions
are common for references in MongoDB:
first simple manual references, and
second, the DBRef standard, which many
drivers support explicitly.
It seems that the consensus is to denormalize and duplicate to accelerate the reads to avoid the cost of joining distributed data all toghether, with the join and merge logic done on the application level.
As to whether it is an absolute requirement to denormalize the database, I am not sure (other SO members can probably enlighten us). But I think the database should be modeled with these "limitations" in the mind along with a good study of how the data is going to be queried. This should give the least impedance to the process.
See Also:
Bigtable database design theory
GAE - How to live with no joins?
Any API wrapper that allow we to write
code once and can be used for google
app engine BigTable and nosql ?
(something like Hibernate)
JDO is datastore-agnostic, so it might just provide what you want to some extent.
Seems there are lots of recent projects to use JDO and JPA with "NoSQL" products.
See:
Datanucleus-Cassandra
Datanucleus-Cassandra-Plugin

Any API wrapper that allow we to write code once and can be used for google app engine BigTable and nosql ? (something like Hibernate)
While abstraction libraries definitely help portability, you have to take into consideration the particular platform you're running on. If you're going to go with Google App Engine, you have to be aware of the incurred startup costs inherent with additional abstraction libraries.
You should weigh the pros and cons of using something like JDO or JPA. Also take a look at the Objectify library that offers a more native interface that has the downside of being coupled to the App Engine Datastore.

Should I invest in GraniteDS for Flex + Java development?

I'm new to Flex development, and RIAs in general. I've got a CRUD-style Java + Spring + Hibernate service on top of which I'm writing a Flex UI. Currently I'm using BlazeDS. This is an internal application running on a local network.
It's become apparent to me that the way RIAs work is more similar to a desktop application than a web application in that we load up the entire model and work with it directly on the client (or at least the portion that we're interested in). This doesn't really jive well with BlazeDS because really it only supports remoting and not data management, thus it can become a lot of extra work to make sure that clients are in sync and to avoid reloading the model which can be large (especially since lazy loading is not possible).
So it feels like what I'm left with is a situation where I have to treat my Flex application more like a regular old web application where I do a lot of fine grained loading of data.
LiveCycle is too expensive. The free version of WebOrb for Java really only does remoting.
Enter GraniteDS. As far as I can determine, it's the only free solution out there that has many of the data management features of LiveCycle. I've started to go through its documentation a bit and suddenly feel like it's yet another quagmire of framework that I'll have to learn just to get an application running.
So my question(s) to the StackOverflow audience is:
1) do you recommend GraniteDS,
especially if my current Java stack
is Spring + Hibernate?
2) at what point do you feel like it starts to
pay off? That is, at what level of
application complexity do you feel
that using GraniteDS really starts
to make development that much
better? In what ways?

If you're committed to Spring and don't want to introduce Seam then I don't think that Granite DS will give you much beyond Blaze DS. There is a useful utility that ensures only a single instance of any one entity exists in the client at any one time but it's actually pretty easy to do that with a few instances of Dictionary with weak references and some post-processing applied to the server calls. A lot of the other features are Seam-specific as alluded to here in the docs:
http://www.graniteds.org/confluence/display/DOC/6.+Tide+Data+Framework
Generally, the Tide approach is to minimize the amount of code needed to make things work between the client and the server. Its principles are very similar to the ones of JBoss Seam, which is the main reason why the first integration of Tide has been done with this framework. Integrations with Spring and EJB 3 are also available but are a little more limited.
I do however think that Granite's approach to data management is a big improvement over Livecycle's because they are indeed quite different. From the Granite docs:
All client/server interactions are done exclusively by method calls on services exposed by the server, and thus respect transaction boundaries and security defined by the remote services.
This is different to how Livecycle DS uses "managed collections" where you invoke fill() to grab large swathes of data and then invoke commit() methods to persist changes en-mass. This treats the backend like a raw data access API and starts to get complicated (or simply fall apart entirely) when you have fine-grained security requirements. Therefore I think Granite's approach is far more workable.

All data management features (serialization of JPA detached entities, client entity caching, data paging...) work with Spring.
GraniteDS does not mandate anything, you only need Seam if you want to use Seam on the server.

Actually, the free version of WebORB for Java does do data management. I've recently posted a comparison between WebORB for Java, LiveCycle DS, BlazeDS and GraniteDS. You can view this comparison chart here: http://bit.ly/d7RVnJ I'd be interested in your comments and feedback as we want this to be the most comprehensive feature comparison on the web.
Cheers,
Kathleen

Have you looked at the spring-blazeDS integration project?

GraniteDS with Seam Framework, Hibernate and MySql is a very nice combination. What I do is create the database, use seamgen to generate hibernate entities then work from there.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.