Couchbase Cluster and Bucket management

Couchbase Cluster and Bucket management - java

I am developing a server-side app using Java and couchbase. I am trying to understand the pros and cons of handling the cluster and bucket management from the java code over using the couchbase admin web console.
For instance, should I handle create/ remove buckets, indexing, and update buckets in my java code?
The reason I want to handle as many as couchbase administration functions is my app is expected to run on-prem not a cloud services. I want to avoid that our customers need to learn how to administrate couchbase.

The main reason to use the management APIs programmatically, rather than using the admin console, is exactly as you say: when you need to handle initializing and maintaining yourself, especially if the application needs to be deployed elsewhere. Generally speaking, you'll want to have some sort of database initializer or manager module in your code, which handles bootstrapping the correct buckets and indexes if they don't exist.
If all you need to do is handle preparing the DB environment one time for your application, you can also use the command line utilities that come with Couchbase, or send calls to the REST API. A small deployment script would probably be easier than writing code to do the same thing.

Related

Microservices - Stubbing/Mocking

I am developing a product using microservices and am running into a bit of an issue. In order to do any work, I need to have all 9 services running on my local development environment. I am using Cloud Foundry to run the applications, but when running locally I am just running the Spring Boot Jars themselves. Is there anyway to setup a more lightweight environment so that I don't need everything running? Ideally, I would only like to have the service I am currently working on to have to be real.

I believe this is a matter of your testing strategy. If you have a lot of micro-services in your system, it is not wise to always perform end-to-end testing at development time -- it costs you productivity and the set up is usually complex (like what you observed).
You should really think about what is the thing you wanna test. Within one service, it is usually good to decouple core logic and the integration points with other services. Ideally, you should be able to write simple unit tests for your core logic. If you wanna test integration points with other services, use mock library (a quick google search shows this to be promising http://spring.io/blog/2007/01/15/unit-testing-with-stubs-and-mocks/)
If you don't have already, I would highly recommend to set up a separate staging area with all micro-services running. You should perform all your end-to-end testing there, before deploying to production.
This post from Martin Fowler has a more comprehensive take on micro-service testing stratey:
https://martinfowler.com/articles/microservice-testing

It boils down to a test technique that you use. Here my recent answer in another topic that you could find useful https://stackoverflow.com/a/44486519/2328781.
In general, I think that Wiremock is a good choice because of the following reasons:
It has out-of-the-box support by Spring Boot
It has out-of-the-box support by Spring Cloud Contract, which gives a possibility to use a very powerful technique called Consumer Driven Contracts.
It has a recording feature. Setup your Wiremock as a proxy and make requests through it. This will generate stubs for you automatically based on your requests and responses.

There are multiple tools out there that let you create mocked versions of your microservices.
When I encountered this exact problem myself I decided to create my own tool which is tailored for microservice testing. The goal is to never have to run all microservices at once, only the one that you are working on.
You can read more about the tool and how to use it to mock microservices here: https://mocki.io/mock-api-microservices. If you only want to run them locally, it is possible using the open source CLI tool

It can be solved if your microservices allow passing metadata along with requests.
Good microservice architecture should use central service discovery, also every service should be able to take metadata map along with request payload. Known fields of this map can be somehow interpreted and modified by the service then passed to next service.
Most popular usage of per-request metadata is request tracing (i.e. collecting tree of nodes used to process this request and timings for every node) but it also can be used to tell entire system which nodes to use
Thus plan is
register your local node in dev environment service discovery
send request to entry node of your system along with metadata telling everyone to use your local service instance instead of default one
metadata will propagate and your local node will be called by dev environment, then local node will pass processed results back to dev env
Alternatively:
use code generation for inter-service communication to reduce risk of failing because of mistakes in RPC code
resort to integration tests, mocking all client apis for microservice under development
fully automate deployment of your system to your local machine. You will possibly need to run nodes with reduced memory (which is generally OK as memory is commonly consumed only under load) or buy more RAM.

An approach would be to use / deploy an app which maps paths / urls to json response files. I personally haven't used it but I believe http://wiremock.org/ might help you

For java microservices, you should try Stybby4j. This will mock the json responses of other microservices using Stubby server. If you feel that mocking is not enough to map all the features of your microservices, you should setup a local docker environment to deploy the dependent microservices.

Backup and restore entities from Google App engine

I am looking to implement a way to transfer data from one application to another programmatically in Google app engine.
I know there is a way to achieve this with database_admin console but that process is very time inefficient.
I am currently implementing this with the use of Google Cloud Storage(GCS) but that involves querying data, saving it to GCS and then reading from GCS from different app and restoring it.
Please let me know if anyone knows a simpler way of transferring data between two applications programmatically.
Thanks!

Haven't tried this myself but it sounds like it should work: Use the data_store admin to backup your objects to GCS from one app, then use your other app to restore that file from GCS. This should be a good method if you only require a one time sync.
If you need to constantly replicate data from one app to another, introducing REST endpoints at one or both sides could help:
https://code.google.com/p/appengine-rest-server/ (this is in Python, I know, but just define a version of your app for the REST endpoint)
You just need to make sure your model definitions match on both sides (pretty much update the app at both sides with the same deployment code) and only have the side that needs to sync data track time of last sync and use the REST endpoints to pull in new data. Cron Jobs can do this.
Alternatively, create a PostPut callback on all of your models to make a POST call every time a model is written to your datastore to the proper REST endpoint on the other app.
You can batch update with one method, or keep a constantly updated version with the other method (at the expense of more calls).

Are you trying to move data between two App Engine applications or trying to export all your data from App Engine so you can move to a different hosting system? Your question doesn't have enough information to understand what you're attempting to do. Based on the vague requirements I would say typically this would be handled with a web service that you write in one application that exposes the data and the other application calls that service to consume the data. I'm not sure why Cloud Endpoints was down voted because that provides a nice way to expose your data as a JSON based web service with a minimum of coding fuss.
I'd recommend adding some more details into your question like exactly what are you trying to accomplish and maybe a mock data sample.

You could create a backup of your data using bulkloader and then restore it on another application.
Details here:
https://developers.google.com/appengine/docs/python/tools/uploadingdata?csw=1#Python_Downloading_and_uploading_all_data
Refer to this post if you are using Java:
Downloading Google App Engine Database

I don't know if this could be suitable for your concrete scenario, but Google Cloud Endpoints are definitely a simple way of transferring data programmatically from Google App Engine.
This is kind of the Google implementation of REST web services, so they allow you to share resources using URLs. This is still an experimental technology, but as long as I've worked with them, they work perfectly. Moreover they are very well integrated with GAE and the Google Plugin for Eclipse.
You can automatically generate an endpoint from a JDO persistent class and I think you can automatically generate the client libraries as well (although I didn't try that).

Two different Java applications sharing the same database

In my web application I have a part which needs to continuously crawl the Web, process those data and present it to a user. So I was wondering if it is a good approach to split it up into two separate applications where one would do the crawling, data processing and store the data in the database. And the other app would be a web application (mounted on some web server) which would present to a user the data from the database and allow him a certain interaction with the data.
The reason I think I need this split is because if I make certain changes to my web app (like adding new functionalities, change the interface etc.) I wouldn't like the crawling to be interrupted.
My application stack is Tapestry (web layer), Spring, Hibernate (over MySQL) and my own implementation of the crawler independent from the others.
Is it good for the integration to be done just by using the same database? This might cause an issue with accessing the database from the both applications at the same time. Or can the integration be done on the Hibernate level, so both applications could use the same Hibernate session? But can the app from one JVM instance access the object from another JVM instance?
I would be grateful for any suggestions regarding this matter.
UPDATE
The user (from web app's interface) would enter the URLs for crawler to parse. The crawler app would just read the tables with URLs the web app populates. And vice versa, the data processed by the crawler would just be presented on the user interface. So, I think I shouldn't concern about any kind locking, right?
Thanks,
Nikola

I would definitely keep them separated like you are planning. The web crawling is more a "batch" process than a request driven web application. The web crawling app will run in its own JVM and your web app will be running in a servlet/Java EE container.
How often will the crawler run or is it a continuously running process? You may want to consider the frequency based on your requirements.
Will the users from web app be updating the same tables that the crawler will post data to? In that case you will need to take precaution otherwise a potential deadlock may arise. If you want your web app to auto refresh data based on new inserts in the tables then you can create a message driven bean (using JMS) to asynchronously notify the web app from the crawler app. When a new data insert message arrives you can either do a form submit on your page or use ajax to update the data on the page itself.
The web app should use connection pooling and the batch app could use DBCP or C3P0. I am not sure you gain much benefit by trying to share the database sessions in this scenario.
This way you have the integration between the two apps while not slowing down each other waiting on other to process.
HTH!

You are right, splitting the application into two could be reasonable in your case.
Disadvantages of separating into two applications -
You can not cache in Hibernate or any other cached mutable objects that are modifiable from both applications in any one of them. Optimistic locking should work fine with two hibernate applications. I don't see any other problems.
Advantages you have already specified in your code.

How to share a library for data access in tomcat 7?

I'm fairly new to the whole web programming stuff and have the following problem:
I have 2 webapps, one a axis web service and another one is a spring application. Both should get a set of data from a library which contains the data in memory. This data is large so copying the data for each app is no option.
What I did so far is developing the library which loads and contains the data in a static container. The plan was, that both apps instatiate the class containing the container and may access the data.
Sadly, this doesn't work. I get an exception that the object I want to use are in different classloaders.
My question is: How can I provide such a container provider for both libraries in tomcat 7?
BTW: A database is no option, because its to slow.
Edit: I should have been clear about the data. The data is a Topic Map stored in an topic map engine. (see http://www.isotopicmaps.org ). The engine is used to access the data and therefore is the access point to the data. We have an own engine, which hold the data inmemory which is faster than a database backend.
I Want to have a servlet which provides the configuration and loading of topic maps and then the two servlets above should be able to read and modify a topic map. Thats why I need to have a sort of shared access point to the engine.

This is what distributed caches, key-value stores, document stores, and noSql databases are built for. There are many options and new ones each day. The free and open-source options are likely to meet your needs and provide you with as much support as you will needs. The one the is currently my favorite is membase.

So you want a distributed in-memory cache for a server cluster. You can use among others Terracotta for this. You can find here a nice introduction to Terracotta.
Update: I actually disagree the argument that a database is "too slow". If it's slow, then the datamodel and/or data access code is simply badly designed.

How do I put data into the datastore of Google's app engine?

I have a little application written in php+mysql I want to port to AppEngine, but I just can't find the way to port my mysql data to the datastore.
How am I supposed to save the data into my datastore? Is that even possible? I can only see documentation for persistence of Java objects, does that mean I have to port my database to a bunch of fake objects, one per line?
Edit: I say fake objects because I don't want to use them, they're just a way to get over a shortcoming of the GAE design.
I have a 30 megs table I need to check on every GET, by using objects I would need to create an object for every row, so I'd have a java class of maybe 45 megs with thousands upon thousands of lines like:
Row Row23423 = new Row (123,346,75,34,"a cow");
I just can't believe this is the only way.
Here's an idea, what about populating the data store by POST-ing the objects one by one? I mean, like the posts in a blog. You write a class that generates and persists the data, and then you Curl the url with the data, one by one. Slow, but it may work?

How to upload data with the bulk loader is described here. It's not supported directly in Java yet, but that doesn't have to stop you - just do the following:
Create an app.yaml that looks something like this:
application: myapp
version: upload
runtime: python
api_version: 1
handlers:
- url: /remote_api
script: $PYTHON_LIB/google/appengine/ext/remote_api/handler.py
login: admin
Make sure the application name is the same as your Java app's, and the version is not the same as the version you're using for Java. Upload this 'empty' app using appcfg.py.
Now, follow the directions for bulk loading in the page linked to above. When it comes time to run the tool, specify the server address with --server=upload.latest.myapp.appspot.com .
Since multiple versions of the same app share the same datastore - even across runtimes - the data uploaded with the Python version will be accessible to the Java one.

There is documentation on the datastore here.
I can't see anything about a raw data-porting service but if you can extract the data from your MySQL database into text files, then it should be relatively easy to write a script to import it into the app engine's data store using the persistence frameworks provided by it.
Your script would take your raw data, convert into a (Java) object model and imprt those Java objects into the store.

Migrating an application to Googles App Engine I think would be quite some task. As you have seen the App Engine does not have a relational database instead it uses BigTable. This will likely involve exporting it to Java objects (serialized in some way) and the inserting them.
You say "fake" objects in your post but I as you will have to use Java objects anyway I don't think they would be fake unless you plan on using one set of objects for the migration and a new set for the application.

There is no (good) general answer to the question of how to port a relational application to the GAE datastore, because the notion of "data" is incompatible between the two. Relational databases are all about the schema. GAE doesn't even have one. It's a schemaless persistent object datastore with very specific APIs. The environment is great for certain types of apps if you're developing from scratch, but it's pretty tricky to port to.
That said, you can import CSV files, as Nick explains, which you should be able to export from MySQL fairly easily. GAE supports Java and Python "at the same time" using the versions mechanism. So you can set up your data store in Python, and then run against it for your application in Java. (A Java version of the bulk loader is under development.)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.