I am planning to migrate a previously created Java web application to Azure. The application previously used log4j for application level logs that where saved in a locally created file. The problem is that with the Azure Role having multiple instances I must collect and aggregate these logs and also make sure that they are stored in a persistent storage instead of the virtual machines hard drive.
Logging is a critical component of the application but it must not slow down the actual work. I have considered multiple options and I am curious about the best practice, the best solution considering security, log consistency and performance in both storage-time and by later processing. Here is a list of the options:
Using log4j with a custom Appender to store information in Azure SQL.
Using log4j with a custom Appender to store information in Azure Tables storage.
Writing an additional tool that transfers data from local hard drive to either of the above persistent storages.
Is there any other method or are there any complete solutions for this problem for Java?
Which of the above would be best considering the above mentioned criteria?
There's no out-of-the-box solution right now, but... a custom appender for Table Storage makes sense, as you can then query your logs in a similar fashion to diagnostics (perf counters, etc.).
The only consideration is if you're writing log statements in a massive quantity (like hundreds of times per second). At that rate, you'll start to notice transaction costs showing up on the monthly bill. At a penny per 10,000, and 100 per second, you're looking about $250 per instance. If you have multiple instances, the cost goes up from there. With SQL Azure, you'd have no transaction cost, but you'd have higher storage cost.
If you want to go with a storage transfer approach, you can set up Windows Azure diagnostics to watch a directory and upload files periodically to blob storage. The only snag is that Java doesn't have direct support for configuring diagnostics. If you're building your project from Eclipse, you only have a script file that launches everything, so you'd need to write a small .net app, or use something like AzureRunMe. If you're building a Visual Studio project to launch your Java app, then you have the ability to set up diagnostics without a separate app.
There's a blog post from Persistent Systems that just got published, regarding Java and diagnostics setup. I'll update this answer with a link once it's live. Also, have a look at Cloud Ninja for Java, which implements Tomcat logging (and related parsing) by using an external .net exe that sets up diagnostics, as described in the upcoming post.
Please visit my blog and download the document. In this document you can look for chapter "Tomcat Solution Diagnostics" for error logging solution. This document was written long back but you sure can use this method to generate the any kind of Java Based logging (log4j, sure )in Tomcat and view directly.
Chapter 6: Tomcat Solution Diagnostics
Error Logging
Viewing Log Files
http://blogs.msdn.com/b/avkashchauhan/archive/2010/10/29/windows-azure-tomcat-solution-accelerator-full-solution-document.aspx
In any scenario where there is custom application i.e. java.exe, php.exe, python etc, I suggest to create the log file directly at "Local Storage" Folder and then initialize Azure Diagnostics in Worker Role (WorkerRole.cs) to export these custom log files directly from Azure VM to your Azure Blob storage.
How to create custom logs on local storage is described here.
Using Azure Diagnostics and sending logs to Azure blob would be cheapest and robust then any other method u have described.
Finally I decided to write a Log4J Appender. I didn't need to gather diagnostics information, my main goal was only to gather the log files in an easily exchangeable way. My first fear was that it would slow down the application, but with by writing only to memory and only periodically writing out the log data to Azure tables it works perfectly without making too many API calls.
Here are the main steps for my implementation:
First I created an entity class to be stored in Azure Tables, called LogEntity that extends com.microsoft.windowsazure.services.table.client.TableServiceEntity.
Next I wrote the appender that extends org.apache.log4j.AppenderSkeleton containing a java.util.List<LogEntity>.
From the overrided method protected void append(LoggingEvent event) I only added to this collection and then created a thread that periodically empties this list and writes the data to Azure tables.
Finally I added the newly created Appender to my log4j configuration file.
Another alternative;
Can we not continue using log4j the standard way (such as DailyRollingFileAppender) only the file should be created on a UNC path, on a VM (IaaS).
This VM, will only need to have a bit of disk space, but need not have any great processing power. So one could share an available VM, or create a VM with the most minimal configuration, preferably in the same region and cloud service.
The accumulated log files can be accessed via RDP/ FTP etc.
That way one will not incur transaction cost and
cost of developing a special Log4j appender ... it could turn out as a cheaper alternative.
thanks
Jeevan
PS: I am referring more towards, ones application logging and not to the app-server logs (catalina/ manager .log or .out files of Weblogic)
Related
We maintain our server once a week.
Sometimes, the customer wishes that we change some settings which is already cached in server.
My colleague always write some JSP code to change these settings which are stored in the memory.
Is it a good method to use this kind of methodology?
If our project is not a Web container, which tools can help me?
Usually, in my experience, the server configuration is not stored only in memory of server:
What happens that after a configuration change, the server has been restarted / just went down for some system reason?
What happens if you have more than one instance of the same server to work on (a cluster of servers in other words)?
So, usually, people opt for various "externalized configuration" options that can range from "file-based" configuration + redeploy the whole cluster upon each configuration change, to configuration management servers (like Consul, etc.d, etc). There are also some solutions that came from (and used in) a java world: Apache Zookeeper, Spring cloud config server to name a few, there are others. In addition, sometimes, it's convenient to store the configurations in a database.
Now to your question: If your project is not a web container and you don't care that configuration will "disappear" after a server restart and you're not running a distributed cluster of servers, then, using JSP indeed doesn't seem appropriate in this case.
Maybe you should take a look at JMX - Java management extensions, that have a built-in solution so that you probably will be able to get rid of a web container (which seems to be not used by your team anyway other than for JSP modifications that you've described).
You basically need in memory cache, there are multiple solutions found in answers which include creating your own implementation or using existing java library. You can also get data from database and add cache over the database layer.
I want to build a more advanced logging mechanism for my java web applications, similar to App engine logs.
My needs are:
Stream logs to a database (for ex. sql, bigquery or something else)
Automatically log important data (like app context, request url, request id, browser user agent, user id, etc.)
For point 1, I ca use a "buffering" implementation, where logs are put into different lists, and periodically a cron (thread) gathers all the logs in memory and write's them to database (which can also be on another server)
For point 2, the only way I found of doing this is to inject needed objects into my classes (subsystems), like ServletContext, HttpServletReqest, current user, etc, all modeled into a custom class (let's say AppLogContext), which then can be used by the logging mechanism.
The problem here is that I don't know if this is a good practice. For example, that means that many classes will have to contain this object which has access to servlet context and http request objects and I'm thinking this may create architectural problems (when building modules, layers etc) or even security issues.
App Engine will automatically log this kind of information (and much more, like latencies, cpu usage etc, but this more complicated), and it can be found in the project's Console logs (also it can duplicate logs to big query tables) and I need something for Jetty or other java web app servers.
So, is there another way of doing this, other patterns, different approaches? (couldn't find 3rd party libraries for any of these points)
Thank you.
You don't really need to invent a bicycle.
There is a common practice that you can follow:
Just log using standard logger to a file
(if you need to see logs in request context) Logback, Log4J and SLF4J supports Mapped Diagnostic Context (MDC), that's what you can use to put current request into every log line (just initialize context in a filter, put request id for example, or generate a random uuid). You can aggregate log entries by this id later
Then use ELK:
Logstash got gather logs into
ElasticSearch for storing logs
to analyze using Kibana
I am creating a management system. There are some functions like adding or removing records from the management system. I do not want to use any types of files or database. I had a thought that collections can be a solution for this. But it will not make a permanent effect i.e. the changes made in a run session will not be reflected while running the same application second time.
If there is way, I would like someone to provide some hints for this.
Collections or any other 3rd party Caches are for Runtime storage.
Unless you persist your data it is not possible for the application to know about it and pick it up the second time.
When you run your application two times, one after the other, and want to share information between those executions, you have to store it in a persistence. Especially when you want to be able to reboot your system between executions, this rules out storing the information in RAM. So you either have to use a disk (which would require the usage of files) or some kind of online storage.
In Cloudfoundry, there is no persistent file-system storage in the app containers. The standard suggestion is to use DB, but I have a specific scenario where write to persistent file is needed. I use a 3rd party app that requires storing config data on file. I can select the file path but I cannot override the persistent file storage requirement.
Is there any library that abstracts filesystem visible to Java and actually stores files on Amazon S3?
It is only one single file that builds up as we go along. The file size is about 1 MB but could reach a few MBs.
The application design documentation from cloud foundry recommends not writing to the local file system:
http://docs.cloudfoundry.org/devguide/deploy-apps/prepare-to-deploy.html#filesystem
This is to be compliant with what is called a 12 factor application where one uses backing services to access items like storage systems and run processes that don't rely on shared storage.
Unfortunately there does not appear to be a file system service for Cloud Foundry, although it's been discussed:
https://groups.google.com/a/cloudfoundry.org/forum/#!topic/vcap-dev/Kj08I2H7HHc
Such a service will eventually appear in order to support applications like Drupal, in spite of recommendations to use more efficient mechanisms like S3 to store files.
I got a java web project handling several objects (again containing n objects of type A (e.g. time and value) and m objects of type B (e.g. time and String array)). The web projects itself contains several servlets/jsps for visualization as well as some logic for data manipulation and currently runs on an Apache Tomcat.
Is it possible to store the whole data in the servers (or most of the time: local) memory while the server is running? If the Tomcat is shut down, the data could be stored in a simple file, no restrictions there. On server startup, I just want to read in the files and write the objects to memory. How can I initiate the Tomcat to do so?
The reason why I do not want to use an extra database is, that I want to deliver a zip file containing the tomcat including the deployed *.war file (as I don't want my prof getting stuck with tomcat server setup etc.)
Thanks, ChrisH
You could implement ServletContextListener and write the load-from-file and save-to-file logic in the contextInitialized() and contextDestroyed() methods which are invoked during webapp's startup and shutdown respectively.
You can read and write objects to disk, but they all need to implement java.io.Serializable first. Here is a Serialization tutorial with code examples.
That said, have you considered an embedded database so that you don't need to install a database server? You could use the JDK6's built-in JavaDB for this or its competitor HSQLDB. Alternatively, if it are pure key-value pairs, then you could also just use the java.util.Properties API for this (tutorial here). Just place the propertiesfile somewhere in the classpath and use ClassLoader#getResourceAsStream() to get an InputStream of it, or place it somewhere in WEB-INF and use ServletContext#getResourceAsStream().
I think that HSQLDB is exactly what you need, a small database server that is also embedded natively in Apache Tomcat. It stores data in memory allowing also to write and read contents from a file.
If the app shuts down unexpectedly, you'll lose all your data, because it won't have time to write it to disk.
You could use a database like SQLite/derby/hsql etc. which store their data to the filesystem.
If you don't want to mess with a DB, then you could store everything in memory and flush it to disk every time it's modified. A couple tips here:
Serialization can make this really easy. Make all your objects implement Serializable, and give them a serial version id
use a BufferedOutputStream when
writing to disk, this is faster than a straight FileOutputStream
DO NOT overwrite your old data file directly! Write to a new file, and when done writing, move the completed file on top of your old file. That way, if the server shuts down while you're in the middle of writing your data file, you still have the good file which was written before.
You should acquire a read lock on your data while writing it. Any other code which modifies the data should get a Write lock on the data.
If you don't care about the possibility that your application may scribble all over your data files, your Tomcat / JVM may crash, or your machine may die losing all in-memory objects, then managing persistence as you suggest is an option. But you'll have quite a bit of infrastructure to build, test and maintain. And you'll miss out on the "value add" tools that most RDBMs provide; backup, a query tool, optimizers, replication, etc.
But if catastrophic data loss is not an option for you, you should use an RDBMs, ODBMs, whatever to do your persistence.