Cassandra's Metrics Monitoring native API

Cassandra's Metrics Monitoring native API - java

I want to capture Dropwizard Metrics of my Cassandra cluster in my Java Program(I don't want to use JMX) and pass those values in JSON to some other server(which will use these values to generate alarms etc). I'm new in Java and would really appreciate if I can get some guidance. Are there any native Dropwizard APIs for collecting these metrics? Can you provide a sample Java code of using that API for fetching any metric for example? The reason for not using JMX is that I've read here that it's not recommended to try to gather metrics from production environment as JMX’s RPC API is fragile.

You can send metrics using available plugins for Metrics library, such as graphite, or ganglia...
To do this, you need to put .jar file for corresponding plugin into lib directory of Cassandra, add corresponding configuration file for plugin, modify Cassandra's jvm.options file with following line:
-Dcassandra.metricsReporterConfigFile=<reporting-configuration>.yaml
and restart Cassandra to pickup the changes.
There are several blog post on configuration of Cassandra to use custom metrics plugins that could provide more details: 1, 2.
You may also try to setup standard Metrics Servlets to do query - it should be configured almost the same way - add library & provide configuration

Related

commitlog analysis in cassandra

I found the commitlog(.log) files in the folder, and would like to analyze them. For example, I wanna know which query is executed in the history of the machine. Is there any code to do that?

Commit log files are specific for a version of Cassandra, and you may need to tinker with CommitLogReader, etc. You can find more information in the documentation on Change Data Capture.
But the main issue for you is that commit log doesn't contain the query executed, it contains the data that is modified. What you really need is the audit functionality - here you have several choices:
It's built-in into upcoming Cassandra 4.0 - see the documentation on how to use it
use ecAudit plugin open sourced by Ericsson - it supports Cassandra 2.2, 3.0 & 3.11
if you use DataStax Enterprise (DSE) it has built-in support for audit logging

Couchbase Cluster and Bucket management

I am developing a server-side app using Java and couchbase. I am trying to understand the pros and cons of handling the cluster and bucket management from the java code over using the couchbase admin web console.
For instance, should I handle create/ remove buckets, indexing, and update buckets in my java code?
The reason I want to handle as many as couchbase administration functions is my app is expected to run on-prem not a cloud services. I want to avoid that our customers need to learn how to administrate couchbase.

The main reason to use the management APIs programmatically, rather than using the admin console, is exactly as you say: when you need to handle initializing and maintaining yourself, especially if the application needs to be deployed elsewhere. Generally speaking, you'll want to have some sort of database initializer or manager module in your code, which handles bootstrapping the correct buckets and indexes if they don't exist.
If all you need to do is handle preparing the DB environment one time for your application, you can also use the command line utilities that come with Couchbase, or send calls to the REST API. A small deployment script would probably be easier than writing code to do the same thing.

Using Neo4j Core Java API and terminal console together

I am writing an application that collects huge amount of data and store it in Neo4j. For this I'm using Java code.
In order to quickly analyze the data I want to use terminal Neo4j server to connect to the same database and then use Neo4j console to query on it using Cypher.
This seems to be a lot of hassle. I have already changed, neo4j-server.properties to connect the directory where my java code is collecting the data. And also changed the flag allow_store_upgrade=true in neo4j.properties.
However, I am still facing issues because of locks.
Is there a standard way to achieve this?

You need to have neo4j-shell-<version>.jar on your classpath and set a remote_shell_enabled='true' as config option while initializing your embedded instance.
I've written up a blog post on this some time ago: http://blog.armbruster-it.de/2014/01/using-remote-shell-combined-with-neo4j-embedded/

How to hold data for jar as library with cluster awareness

I am creating a library (java jar file ) to provide a solution of a problem. Library is mainly targeted for web application (j2ee application) can be used with spring and other framework.
Targeted j2ee application will be deployed in clustered environment.User will use this library by adding it in application class path.
Library has a dependency of some configuration which is packaged itself in library (jar) which will be used at run time.
At run time configuration can be modified.
As it is targeted for clustered environment, In case of any modification to configuration , updated configuration must be replicated to all of nodes of clustered environment.
As per my understanding there can be two ways to hold configuration to use at run time (I am not sure correct me if I am wrong)
1.Store configuration in file
2.Store configuration in database
In first approach (store configuration in file)
There will a property file in library to hold initial configuration .
At server start up time configuration from property file will be copied to some file (abc.xml) at server physical location.
There will be set of APIs to perform CRUD action in abc.xml file from user home location.
And every time abc.xml file will be used.
In this approach holding data is possible but for clustered environment I am not getting how it will update all the nodes of cluster in case of modification.
In second appraoch (store configuration in database table)
While publishing toolkit (jar file) sql table queries also published with jar.
User have to create table using that queries.
There will a property file in library to hold initial configuration .
At server start up time configuration from property file will be copied to database.
There will be set of APIs to perform CRUD action on database.
As there is any modification to configuration all nodes of cluster can updated with latest data using some third party tool (Hazel cast or any thing else.)
In analysis I found Quartz uses database approach to hold its configuration.
So when one download quartz distribution, it also have sql queries to create required tables in database, that will be used by quartz it self.
I want to know what are the standard design pratices to hold configuration in library (jar) form and what are the factor need to be noticed in such cases.

There are other solutions as well. Use a cluster aware caching technologies like EhCache or Apache JCS or Hazelcast. Use the cache API to retrieve configuration data from the library. You could add in a listener within your library to poll on to the configuration file and update the cache.
If you are planning to use solution 1 which you mentioned, you could set up a listener within your library which listens to the configuration file and updates the server copy whenever there is a change. Similarly for Solution 2 as well but if I were in your similar situation, I would rather use a caching technology for frequently accessed data(configurations). The advantage it would give me is that I would not have to update the configuration in all the nodes because it self replicates.

Azure Java Tomcat logging

I am planning to migrate a previously created Java web application to Azure. The application previously used log4j for application level logs that where saved in a locally created file. The problem is that with the Azure Role having multiple instances I must collect and aggregate these logs and also make sure that they are stored in a persistent storage instead of the virtual machines hard drive.
Logging is a critical component of the application but it must not slow down the actual work. I have considered multiple options and I am curious about the best practice, the best solution considering security, log consistency and performance in both storage-time and by later processing. Here is a list of the options:
Using log4j with a custom Appender to store information in Azure SQL.
Using log4j with a custom Appender to store information in Azure Tables storage.
Writing an additional tool that transfers data from local hard drive to either of the above persistent storages.
Is there any other method or are there any complete solutions for this problem for Java?
Which of the above would be best considering the above mentioned criteria?

There's no out-of-the-box solution right now, but... a custom appender for Table Storage makes sense, as you can then query your logs in a similar fashion to diagnostics (perf counters, etc.).
The only consideration is if you're writing log statements in a massive quantity (like hundreds of times per second). At that rate, you'll start to notice transaction costs showing up on the monthly bill. At a penny per 10,000, and 100 per second, you're looking about $250 per instance. If you have multiple instances, the cost goes up from there. With SQL Azure, you'd have no transaction cost, but you'd have higher storage cost.
If you want to go with a storage transfer approach, you can set up Windows Azure diagnostics to watch a directory and upload files periodically to blob storage. The only snag is that Java doesn't have direct support for configuring diagnostics. If you're building your project from Eclipse, you only have a script file that launches everything, so you'd need to write a small .net app, or use something like AzureRunMe. If you're building a Visual Studio project to launch your Java app, then you have the ability to set up diagnostics without a separate app.
There's a blog post from Persistent Systems that just got published, regarding Java and diagnostics setup. I'll update this answer with a link once it's live. Also, have a look at Cloud Ninja for Java, which implements Tomcat logging (and related parsing) by using an external .net exe that sets up diagnostics, as described in the upcoming post.

Please visit my blog and download the document. In this document you can look for chapter "Tomcat Solution Diagnostics" for error logging solution. This document was written long back but you sure can use this method to generate the any kind of Java Based logging (log4j, sure )in Tomcat and view directly.
Chapter 6: Tomcat Solution Diagnostics
Error Logging
Viewing Log Files
http://blogs.msdn.com/b/avkashchauhan/archive/2010/10/29/windows-azure-tomcat-solution-accelerator-full-solution-document.aspx
In any scenario where there is custom application i.e. java.exe, php.exe, python etc, I suggest to create the log file directly at "Local Storage" Folder and then initialize Azure Diagnostics in Worker Role (WorkerRole.cs) to export these custom log files directly from Azure VM to your Azure Blob storage.
How to create custom logs on local storage is described here.
Using Azure Diagnostics and sending logs to Azure blob would be cheapest and robust then any other method u have described.

Finally I decided to write a Log4J Appender. I didn't need to gather diagnostics information, my main goal was only to gather the log files in an easily exchangeable way. My first fear was that it would slow down the application, but with by writing only to memory and only periodically writing out the log data to Azure tables it works perfectly without making too many API calls.
Here are the main steps for my implementation:
First I created an entity class to be stored in Azure Tables, called LogEntity that extends com.microsoft.windowsazure.services.table.client.TableServiceEntity.
Next I wrote the appender that extends org.apache.log4j.AppenderSkeleton containing a java.util.List<LogEntity>.
From the overrided method protected void append(LoggingEvent event) I only added to this collection and then created a thread that periodically empties this list and writes the data to Azure tables.
Finally I added the newly created Appender to my log4j configuration file.

Another alternative;
Can we not continue using log4j the standard way (such as DailyRollingFileAppender) only the file should be created on a UNC path, on a VM (IaaS).
This VM, will only need to have a bit of disk space, but need not have any great processing power. So one could share an available VM, or create a VM with the most minimal configuration, preferably in the same region and cloud service.
The accumulated log files can be accessed via RDP/ FTP etc.
That way one will not incur transaction cost and
cost of developing a special Log4j appender ... it could turn out as a cheaper alternative.
thanks
Jeevan
PS: I am referring more towards, ones application logging and not to the app-server logs (catalina/ manager .log or .out files of Weblogic)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.