Strategies/techniques for crash reporting in Java

Strategies/techniques for crash reporting in Java - java

I'm developing a new Java desktop application and would like to include a crash reporting facility - you know the kind of thing - program crashes, box pops up asking user to click okay to send it, etc.
I'm comfortable with how the error will be sent back to our servers - probably via a simple web service. What I'm more unsure about is how the mechanism for capturing the failure should be implemented?. I would welcome any advice from anyone that has implemented something similar.

There is a command line option you can give the JVM that will run a batch file after the JVM crashes with a memory dump. All you do is create a external program that does the error reporting and then use the JVM option to send the core dump in email using the utility you made.
-XX:-HeapDumpOnOutOfMemoryError -XX:OnError="<cmd args>;<cmd args>"

Use Thread.setUncaughtExceptionHandler and the static Thread.setDefaultUncaughtExceptionHandler to (attempt to) report exceptions to your logging system.

I see three cases:
Catastrophes. The JVM itself is either dead or dying. You cannot assume that any of your code will be able to work - for example you can't allocate any memory. Hence in this case you can't reasonably hope to be able to send any diagnostics. The best you can hope for is to have some diagnostics such as core dumps left in the ashes of the dead program.
In this case you could on startup of a new run look for such debris and suggest that the user gather it or, rather more effort attempt to assemble a diagnostic package yourself.
The low-level application code does not catch an exception, perhaps a RunTime exception such as a NullPointer exception. In this case you could in your main (assuming you have one) you could catch Exception and have some hope your that your Crash Reporter code will work.
Pass the exception, and it's stack trace, to the Crash Reporter.
You low level code catches something really unhealthy. Not enough to terminate process, but worth reporting. Here you not only have the exception to hand but other contextual information. We have rather more to send to the Crash Reporter.

Use logging. The generic pattern works like this:
Create an appender that sends the error message to the server (most logging frameworks support appenders that transmit log messages via mail or even JDBC). If there is no existing appender, they have examples how to do that.
Add that appender to the root logger and set it's threshold to ERROR
Log an error when you notice an exception. The logging framework will then do the plumbing for you.

I don't know if this is the best that Java currently has to offer, but this is what I did a while back.
First all interesting activity likely to crash was dispatched via a command pattern. This application consisted of hitting an application server over the internet, so a lot could go wrong. The exceptions were caught by the command dispatcher and the appropriate result displayed to the user (generally showing an error dialog followed by a shutdown and an e-mail sent about the crash).
Second, a custom event queue was used in Swing to catch any exceptions that happen on the event thread. I would hope that Java has a better solution by now, but basically when an exception happened you had to check if your code was involved, otherwise some Swing bugs could crash your application, which isn't pleasant. And of course recursion had to be checked for (the crash repeating itself over and over again as you try to display a message to the user).
By the way, most any crash will keep your JVM going, including out of memory errors, enough to send an e-mail in most cases, as after an out of memory error generally the error releases enough of the stack (and therefore heap) to allow for further garbage collection and letting your code live. But in such an event you should still exit quickly. IDEA keeps going after an out of memory error, but often isn't functioning well. They would be better off exiting, IMO.
You push a new Queue with the following and subclass EventQueue to link in your behavior.
Toolkit.getDefaultToolkit().getSystemEventQueue().push(newQueue);

One option would be to use BugSense. It is targeted towards mobile-application crash reporting but the API states that it could be used for any kind of crash reporting. It's quite simple from what I've read and all one needs to do is create a simple POST request with all the values.
{
"client": {
"name": "bugsense-android", // Obligatory
"version": "0.6"
},
"request": {
"remote_ip": "10.0.0.1",
"custom_data": {
"key1": "value1",
"key2": "value2"
}
},
"exception": {
"message": "java.lang.RuntimeException: exception requested", // Obligatory
"where": "MainActivity.java:47", // Obligatory
"klass": "java.lang.RuntimeException", // Type of exception
"backtrace": "java.lang.RuntimeException: exception requested\r\nat com.sfalma.trace.example.MainActivity$1.onClick(MainActivity.java:47)\r\nat android.view.View.performClick(View.java:2408)\r\nat android.view.View$PerformClick.run(View.java:8816)\r\nat android.os.Handler.handleCallback(Handler.java:587)\r\nat android.os.Handler.dispatchMessage(Handler.java:92)\r\nat android.os.Looper.loop(Looper.java:123)\r\nat android.app.ActivityThread.main(ActivityThread.java:4627)\r\nat java.lang.reflect.Method.invokeNative(Native Method)\r\nat java.lang.reflect.Method.invoke(Method.java:521)\r\nat com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:868)\r\nat com.android.internal.os.ZygoteInit.main(ZygoteInit.java:626)\r\nat dalvik.system.NativeStart.main(Native Method)\\n" // Obligatory
},
"application_environment": {
"phone": "android", // Device model (could be PC or Max) Obligatory
"appver": "1.2", // Obligatory
"appname": "com.sfalma", // Obligatory
"osver": "2.3", // Obligatory
"wifi_on": "true",
"mobile_net_on": "true",
"gps_on": "true",
"screen_dpi(x:y)": "120.0:120.0",
"screen:width": "240",
"screen:height": "400",
"screen:orientation": "normal"
}
}
You can read more about it here.

Related

akka.pattern.AskTimeoutException while running Lagom HelloWorld example

I have a problem while trying my hands on the Hello World example explained here.
Kindly note that I have just modified the HelloEntity.java file to be able to return something other than "Hello, World!". Most certain my changes are taking time and hence I am getting the below Timeout error.
I am currently trying (doing a PoC) on a single node to understand the Lagom framework and do not have liberty to deploy multiple nodes.
I have also tried modifying the default lagom.circuit-breaker in application.conf "call-timeout = 100s" however, this does not seem to have helped.
Following is the exact error message for your reference:
{"name":"akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://hello-impl-application/system/sharding/HelloEntity#1074448247]] after [5000 ms]. Sender[null] sent message of type \"com.lightbend.lagom.javadsl.persistence.CommandEnvelope\".","detail":"akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://hello-impl-application/system/sharding/HelloEntity#1074448247]] after [5000 ms]. Sender[null] sent message of type \"com.lightbend.lagom.javadsl.persistence.CommandEnvelope\".\n\tat akka.pattern.PromiseActorRef$.$anonfun$defaultOnTimeout$1(AskSupport.scala:595)\n\tat akka.pattern.PromiseActorRef$.$anonfun$apply$1(AskSupport.scala:605)\n\tat akka.actor.Scheduler$$anon$4.run(Scheduler.scala:140)\n\tat scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:866)\n\tat scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:109)\n\tat scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103)\n\tat scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:864)\n\tat akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)\n\tat akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:279)\n\tat akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:283)\n\tat akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235)\n\tat java.lang.Thread.run(Thread.java:748)\n"}
Question: Is there a way to increase the akka Timeout by modifying the application.conf or any of the java source files in the Hello World project? Can you please help me with the exact details.
Thanks in advance for you time and help.

The call timeout is the timeout for circuit breakers, which is configured using lagom.circuit-breaker.default.call-timeout. But that's not what is timing out above, the thing that is timing out above is the request to your HelloEntity, that timeout is configured using lagom.persistence.ask-timeout. The reason why there's a timeout on requests to entities is because in a multi-node environment, your entities are sharded across nodes, so an ask on them may go to another node, which is why a timeout is needed in case that node is not responding.
All that said, I don't think changing the ask-timeout will solve your problem. If you have a single node, then your entities should respond instantly if everything is working ok.
Is that the only error you're seeing in the logs?
Are you seeing this in devmode (ie, using the runAll command), or are you running the Lagom service some other way?
Is your database responding?

Thanks James for the help/pointer.
Adding following lines to resources/application.conf did the trick for me:
lagom.persistence.ask-timeout=30s
hello {
..
..
call-timeout = 30s
call-timeout = ${?CIRCUIT_BREAKER_CALL_TIMEOUT}
..
}
A Call is a Service-to-Service communication. That’s a SeviceClient communicating to a remote server. It uses a circuit breaker. It is a extra-service call.
An ask (in the context of lagom.persistence) is sending a command to a persistent entity. That happens across the nodes insied your Lagom service. It is not using circuit breaking. It is an intra-service call.

Information Exposure Through an Error Message in checkmarx

try {
//code
} catch (ParseException e) {
e.printStackTrace();
} catch (MalformedURLException e) {
LOG.error("Error in finding Resource Bundle", e);
}
I wrote like that, but when I am using Checkmarx code analysis tool I am getting "Information Exposure Through an Error Message". How to resolve and when this we get.

What is Information Exposure Through an Error Message?
The software generates an error message that includes sensitive information about its environment, users, or associated data.
The sensitive information may be valuable information on its own (such as a password), or it may be useful for launching other, more deadly attacks. If an attack fails, an attacker may use error information provided by the server to launch another more focused attack.
(Quote taken from CWE-209: Information Exposure Through an Error Message
)
You did not specify, but I'm assuming that the Checkmarx tool pointed to printStackTrace() as the problematic end point of the flow.
By using this method, an exception (including its entire stack trace) will be printed to the standard error stream. This might include information that may be sensitive by itself (like usernames or passwords) or at least disclose some environment data. If this data is exposed to a user, it can be abused or used maliciously for more effective attacks.
There are many others reasons not to use printStackTrace() that way, as can be seen here: Why is exception.printStackTrace() considered bad practice?

First of all remove e.printStackTrace();.
Now, As its compulsory to log errors so, you can;t remove LOG.error("Error in finding Resource Bundle", e);.
So, just provide the closure for this .. that Logs are being generated. As this is LOW critical their is no big issue.
This happens every-time with our project too :P .

Java log levels - When to use What

When should i use below log levels? If there is any example that would be great.
Trace Vs Debug
Warn Vs Error Vs Fatal
WARN VS ERROR Vs FATAL
Will I need to use FATAL in my application code in first place?
I have never seen FATAL logging in any code still now in projects that i worked on till now.
I have read that, in case of FATAL program will end. If this is the case, I wonder how my log statement will execute.
Moreover, I think FATAL can not be used in the case of memory allocation as JVM will throw out of memory exception and exit the program. Hence developer can not log anything. If this is correct then where exactly i will use FATAL?
For ERROR and Warning:
In catch block, if I do not have a alternate logic (for error condition) to perform then, I will go and log exception with Error level, the exception will be transformed into user specific and displayed in screen.
At the same time, the Warn will be used when we have alternate flow /path to the exception logic.
For Debug
This will be to validate what and where the exception been thrown. What means the data that casued the error. Hence this can be used just before and after the complex logic of the code.
Please let me know if my understanding is correct
example:
class myLogLevel{
void method1( int empId)
{
log.trace("method1 starting") ;
try{
log.info("befor getting data from DB");
log.debug("executing the value for emp id : " + empId );
//DBConnection and code here
} catch (Exception1 e1) {
log.warn("record not found. So assigning default value");
// Code logic to assign default value
}
catch (Exception1 e1) {
// Due to DB connection error. Connection nor established
log.error("DB connection not established");
}
log.trace("method1 ending") ;
}
}

In my past experiences, a somewhat common practice is
Always use DEBUG for your debugging purpose. I seldom see people use TRACE.
For stuff which is bad for the system but not necessarily cause problem (i.e. if it's an error depends on the calling context), use WARN; E.g. you could write a function which sometimes return NaN; but NaN might not be an error for the caller depends on your context.
For stuff that's surely an error somewhere in the system or in the caller input data; that definitely needs human involvement (i.e. someone needs to look at it from your production support team), uses ERROR. E.g. you want to write a person's record into database but found the primary key (firstname, lastname) is NULL.
For stuff that would cause the entire system to shut down or cause seriously impact on the system, use FATAL. That means people needs to look at it immediately. Examples include problems that cause startup failure; memory allocation failure; A messaging processing system failed to initialize the messaging layer; etc.
Hope the above helps.

if every exception catch should log it?

some books mentioned that the followed mode is bad. It says every exception if be rethrowed shouldn't log it to avoid to dupliacte exception log.? any other issues?
I am confused that if I can't log any exception when rethrow it , if the issue exist?
or if I log it, I am confused if the too many log generated if everybody do it.
catch (NoUserException e) {
LOG.error("No user available", e);
throw new UserServiceException("No user available", e);
}
the reference
http://today.java.net/pub/a/today/2006/04/06/exception-handling-antipatterns.html#logAndThrow

I'm not sure about the books you mentioned, but to me, as someone who'll have to debug the code and find the root cause of the bugs, I'd like to read about it later in the logs as close as possible to the place where it first triggered.

Every LOG function have a switch to disable that log message so you have to LOG all exception if it is unexpected one. If you expected that exception, for example you check if the String is a number and you would like to know the result on exception, then you do not need to do the Log.

As far as exceptions are concerned, the most important log message should be located in service layer. Important thing is keeping the whole stack trace so the issue can be easily located even after several rethrows.
You can always put logs in all layers and manipulate logging level for certail layers to see only logs from layer you are currently debugging/working on. Other logs can be set to OFF. Read documentation for your favorite logger to learn more about that.

Sending an email when an Exception is Thrown

I have written a java class where if a method throws an exception, an email is sent, via java mail, with a report to the administrators.
It works - my question is w.r.t elegance - to catch the exception thrown by the main method, the sendEmail() method resides in the catch block of the main method. The sendEmail() method has its own try-catch block.
In effect - it looks like below - is there a more beautiful way of writing this?
try {
foo;
}
catch {
try{
sendEmail();
}
catch {
log(e.message);
}
}

If you want something "more elegant", one simple suggestion is to have your sendEmail helper method catch and log the email exceptions. (I don't imagine you want the exceptions to propagate ... or do some other recovery ...)
However, there is something more important to say. What you are implementing here is the wrong approach to reporting errors.
If something goes badly wrong with your application there is a chance that you will SPAM the administrator with multiple emails reporting the same problem over, and over, and over ...
By sending emails from deep within your code, you are making it hard for the administrator to integrate your application's error reporting.
A better approach is to report the problem via a Java logging frame such as Log4J. If the administrator wants to he / she can configure some kind of monitoring system like LogWatch, Nagios, etc, etc. Such a monitoring system will detect and classify errors, anomalies, etc (like your application's errors) in the various logger streams, de-dup them, and if the administrator configures it send a notification via email, pager or whatever.

Java can have nested try / catch blocks.
If you'd like, you can move the try / catch sendmail block to another method. When the try / catch blocks are more complex, it will make the code easier to understand.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.