Previously I asked one question regarding building Document management system on top of GAE using Google cloud storage Document management system using Google Cloud Storage. I think I got appropriate answers for it. This question just an extension of the same. So My question is: can I handle versioning through my java code like mention in this link (developers.google.com/storage/docs/object-versioning) like listing all versions of an object, retrieving a specfic version of an object etc.
Since I found list API's for listing, deleting objects and doing several operations on Google cloud storage but can I handle version through any API's provided by the same from Java?
Thanks in advance.
As Google Cloud Storage doc states (https://developers.google.com/storage/docs/developer-guide) stored objects are immutable.
I.e. you can only delete object after storing and store new one, even with the same name.
So to have versioning you can organize data in pseudo folders. Like: bucket/file-name/version-1; data/file-name/version-2 etc.
Then you need to add some BL to handle this versions (access most recent one when needed, delete outdated, etc). However, in document management system its good to think about transactions, conflicts etc. So probably you will want to manage versions in DB (on GAE?) and just store versions content in the cloud as files (i.e. named by file content hashes).
Related
I want to port a social networking application from sql to JanusGraph. I'll be building the backend using Java because it has amazing documentation in janusgraph's official website. I have some beginner questions.
JanusGraph graph = JanusGraphFactory.open("my_setup.properties");
Is .properties file, the only identifier to access a graph? or is it
the file path? (In sql we have a name for database. Is there anything like a graph name?)
If I have the copy of properties file with same
preferences and rename it to my_setup_2.properties, will it access
the same graph or it'll create a new graph?
Is there any way I can identify these vertices belongs to this graph
from my storage backend or search backend?
For what kind of queries storage backend is used and for what kind of
queries search backend is used?
Is there anyway to dump my database? (for porting the graph from one
server to another just like sql dump)
I have only found hosting service providers for Janusgraph 0.1.1
(which is outdated. Latest one is 0.2.1 which supports latest elasticsearch) If I go to production with janusgraph 0.1.1 version how bad will it affect me if I use elasticsearch for search backend?
Is .properties file, the only identifier to access a graph? or is it
the file path? (In sql we have a name for database. Is there anything
like a graph name?)
JanusGraph has a pluggable storage and index backend. The .properties file just tells JanusGraph which backend to use and how they are configured. Different graphs instances will just point to different storage folders, indexes, etc. By looking at the documentation for the config file, it seems though you can specify a graphname which can be used with the ConfiguredGraphFactory to open a graph in this fashion ConfiguredGraphFactory.open("graphName")
If I have the copy of properties file with same preferences and rename
it to my_setup_2.properties, will it access the same graph or it'll
create a new graph?
Yes it will access the same data and hence the same graph.
Is there any way I can identify these vertices belongs to this graph
from my storage backend or search backend?
I don't know exactly for every storage backend but in the case of Elasticsearch, indexes created by JanusGraph are prefixed with janusgraph. I think there are similar mechanisms for other backends.
For what kind of queries storage backend is used and for what kind of
queries search backend is used?
The index backend is used whenever you add an has step on a property indexed with a mixed index. I think all other queries, including an has step on a property configured with a composite index will use the storage backend. For OLAP workloads you can even plug Spark or Giraph on your storage backend to do the heavy lifting.
Is there anyway to dump my database? (for porting the graph from one
server to another just like sql dump)
Graphs can be exported and imported to graph file formats like GraphML. It allows you to interface with other graph tools like Gephi for example. You won't be able to sql dump from your SQL database and directly import that to JanusGraph though. If you consider loading a lot of nodes and edges at once, please go through the documentation about bulk loading.
I have only found hosting service providers for Janusgraph 0.1.1
(which is outdated. Latest one is 0.2.1 which supports latest
elasticsearch) If I go to production with janusgraph 0.1.1 version how
bad will it affect me if I use elasticsearch for search backend?
I don't know about any hosting providers for JanusGraph 2.x. You will easily find hosted services for the pluggable storage backends compatible with JanusGraph 2.x.
I am currently attempting to create an automated data export from an existing Google Datastore "Kind" to output to a json file.
I'm having trouble finding a suitable example that allows me to simply pull specific entities from the Datastore and push them out into an output file.
All the examples and documentation I've found assume I am creating an app-engine project to interface with the Datastore. The program I need to create would have to be local to sit on a server and query the Datastore to pull down the data.
Is my approach possible? Any advice on how to achieve this would be appreciated.
Yes, it is possible.
But you'll have to use one of the generic Cloud Datastore Client Libraries, not the GAE-specific one(s). You'll still need a GAE app, but you don't have to run your code in it, see Dependency on App Engine application.
You could also use one of the REST/RPC/GQL APIs, see APIs & Reference.
Some of the referenced docs contain examples as well.
Honestly, I do not enjoy working with SQLite on Android beyond trivial apps. It is a real pain to keep the database structure up to date between app versions and actually writing data access code is not much fun either when one is used to working with Hibernate and Entity Framework.
I am hoping there are alternative ways for me to store persistent data that will be reliable and robust. E.g. would serializing a collection of objects to external storage be an option? I expect my data to be around 5MB at most at any time.
Are there any other options? Specifically, I am downloading e.g. stock lists and contact details from a server, then allow the user to mark records as processed, etc. I was thinking of an XML file, but that creates another problem: how to robustly handle XML in java using the Android API.
Obviously first prize would have been a NoSQL database, but I know that's not going to be practical even if a stable mobile version existed.
Do you look SQLite Android Framework wich give you DAO and generate the database from POJO for you (as Hibernate) ?
For example : http://greenrobot.org/greendao/
Then you can easily update and versioning your database structure.
I'm fairly new to the whole web programming stuff and have the following problem:
I have 2 webapps, one a axis web service and another one is a spring application. Both should get a set of data from a library which contains the data in memory. This data is large so copying the data for each app is no option.
What I did so far is developing the library which loads and contains the data in a static container. The plan was, that both apps instatiate the class containing the container and may access the data.
Sadly, this doesn't work. I get an exception that the object I want to use are in different classloaders.
My question is: How can I provide such a container provider for both libraries in tomcat 7?
BTW: A database is no option, because its to slow.
Edit: I should have been clear about the data. The data is a Topic Map stored in an topic map engine. (see http://www.isotopicmaps.org ). The engine is used to access the data and therefore is the access point to the data. We have an own engine, which hold the data inmemory which is faster than a database backend.
I Want to have a servlet which provides the configuration and loading of topic maps and then the two servlets above should be able to read and modify a topic map. Thats why I need to have a sort of shared access point to the engine.
This is what distributed caches, key-value stores, document stores, and noSql databases are built for. There are many options and new ones each day. The free and open-source options are likely to meet your needs and provide you with as much support as you will needs. The one the is currently my favorite is membase.
So you want a distributed in-memory cache for a server cluster. You can use among others Terracotta for this. You can find here a nice introduction to Terracotta.
Update: I actually disagree the argument that a database is "too slow". If it's slow, then the datamodel and/or data access code is simply badly designed.
I have a little application written in php+mysql I want to port to AppEngine, but I just can't find the way to port my mysql data to the datastore.
How am I supposed to save the data into my datastore? Is that even possible? I can only see documentation for persistence of Java objects, does that mean I have to port my database to a bunch of fake objects, one per line?
Edit: I say fake objects because I don't want to use them, they're just a way to get over a shortcoming of the GAE design.
I have a 30 megs table I need to check on every GET, by using objects I would need to create an object for every row, so I'd have a java class of maybe 45 megs with thousands upon thousands of lines like:
Row Row23423 = new Row (123,346,75,34,"a cow");
I just can't believe this is the only way.
Here's an idea, what about populating the data store by POST-ing the objects one by one? I mean, like the posts in a blog. You write a class that generates and persists the data, and then you Curl the url with the data, one by one. Slow, but it may work?
How to upload data with the bulk loader is described here. It's not supported directly in Java yet, but that doesn't have to stop you - just do the following:
Create an app.yaml that looks something like this:
application: myapp
version: upload
runtime: python
api_version: 1
handlers:
- url: /remote_api
script: $PYTHON_LIB/google/appengine/ext/remote_api/handler.py
login: admin
Make sure the application name is the same as your Java app's, and the version is not the same as the version you're using for Java. Upload this 'empty' app using appcfg.py.
Now, follow the directions for bulk loading in the page linked to above. When it comes time to run the tool, specify the server address with --server=upload.latest.myapp.appspot.com .
Since multiple versions of the same app share the same datastore - even across runtimes - the data uploaded with the Python version will be accessible to the Java one.
There is documentation on the datastore here.
I can't see anything about a raw data-porting service but if you can extract the data from your MySQL database into text files, then it should be relatively easy to write a script to import it into the app engine's data store using the persistence frameworks provided by it.
Your script would take your raw data, convert into a (Java) object model and imprt those Java objects into the store.
Migrating an application to Googles App Engine I think would be quite some task. As you have seen the App Engine does not have a relational database instead it uses BigTable. This will likely involve exporting it to Java objects (serialized in some way) and the inserting them.
You say "fake" objects in your post but I as you will have to use Java objects anyway I don't think they would be fake unless you plan on using one set of objects for the migration and a new set for the application.
There is no (good) general answer to the question of how to port a relational application to the GAE datastore, because the notion of "data" is incompatible between the two. Relational databases are all about the schema. GAE doesn't even have one. It's a schemaless persistent object datastore with very specific APIs. The environment is great for certain types of apps if you're developing from scratch, but it's pretty tricky to port to.
That said, you can import CSV files, as Nick explains, which you should be able to export from MySQL fairly easily. GAE supports Java and Python "at the same time" using the versions mechanism. So you can set up your data store in Python, and then run against it for your application in Java. (A Java version of the bulk loader is under development.)