Blackboard server - issue with restarting - java

SEVERE: The web application [/webapps/bb-nautilus-BBLEARN] appears to have started a thread named [MessageQueueHandler-bb-nautilus-content-blitz-0] but has failed to stop it. This is very likely to create a memory leak.

If an application starts all kinds of stuff (registering jdbc drivers, starting threads, ...) when it is fired up, it is the responsibility of that application to also clean up after itself when it is stopped.
Are you the author of this application ? Correct your code. Not the author of this application ? Submit a bug report.
In the latter case, until the bug is addressed it might be possible to add a ServletContextListener of your own making to the deployment. But clearing up leftover Threads from "foreign" code is at any rate going to require you to figure out how to find those Thread objects and then subsequently stop() them, which is a deprecated method.

MessageQueue may be busy doing something strange. It refuses to exit. So, reboot the server and try starting Bb Learn after that. Post the new error after you know that no part of Bb Learn was running after a failed app restart.

Related

Whenever my program crashes in eclipse it stays running in the background

It is really frustrating especially when I am working with sockets. Anyone know how to fix this? I constantly go into the task manager...
I think the most likely reason for this is a thread which does not terminate. This might be caused by the thread waiting for a time out, but a number of other reasons might prevent the thread from exiting as well.
I suggest you connect jvisualvm (part of the jdk, located in the bin folder) to your application and investigate what part of your application stays alive.
Edit: If your application runs in your systems default vm, you should see it in jvisualvm out of the box. But if you are using different vms, you have to start the application with appropriate parameters in order to connect jvisualvm to it.
This short guide explains the settings pretty well.

Why does net.exe start <servicename> report a failure when the service starts?

I have a Java application that uses the Apache Daemon service installer to register it as a Windows service. I am using Puppet to run an exec{} block to register the service, which works, and then chains a service{} block to start the service. Puppet uses "net.exe start" to run the service, but that command reports an error, even though the service starts correctly.
The output from running the command in a powershell shell is:
PS C:\ProgramData\PuppetLabs\puppet\etc\modules> net start myservice
The myservice_descriptive_name service is starting.....
The myservice_descriptive_name service could not be started.
More help is available by typing NET HELPMSG 3523.
As I refresh the Windows service panel while this command is running, I see the state change from:
blank field -> starting -> started
Is this a problem caused by the apache wrapper, which is starting a jvm in a separate shell or some other side effect? And, more importantly, can I get around this problem in Puppet while still using the service{} block? Is it possible to substitute sc.exe, which does not suffer the same problem, short of using an exec{} block?
To take the questions in order:
The net start command reports failure because the service appears to have hung.
Yes, the problem is caused by the Apache wrapper.
Specifically, the wrapper is telling Windows that it will reach the first checkpoint within two seconds. Since there does not appear to be any way for the Java code to implement a checkpoint, or to change the wait hint, this means that the service must start within two seconds to be compliant with the Windows service specification.
(In principle, Windows is entitled to terminate your service at this point. So far as I know, no current versions of Windows do so, though they may log error messages.)
Short of modifying Puppet or (preferably) the Apache wrapper, the only obvious workaround is to ensure that your service "starts" immediately, rather than waiting for initialization to complete.
This is less than ideal, since it means that the service can't provide feedback to Puppet if it really does fail to initialize, but no worse than your suggestion of using sc start instead of net start.
JPBlanc's answer explains why the net.exe times out waiting on the service to start, even though it does end up starting. You can definitely try swapping out net.exe calls for sc.exe (Service Control) instead.
I've created a ticket to address this - https://tickets.puppetlabs.com/browse/PUP-5475
If you find that it doesn't also timeout while waiting, please comment and/or file a pull request containing the change. At any rate, using something better than net.exe would be preferred.
The explanation is that the service takes too much time to start and does not communicate correctly with the starter.
When you write a service that initiate communications or DB connections you have to communicate with the Service Control Manager (SCM) to give the information that you are starting. Doing this kind of "I'am still starting message" the SCM can wait as mus time as you need to start. But much service writer or or tools to encapsulate exe files as services ignore that, so the SCM return "service could not be started". In Win32 this is handled by SetServiceStatus function, you will have much details there.

Crystal Report java library, logon hangs forever

Our project uses Business Objects for reports. Our java webapps that launch reports go thruogh a web service we set up to handle the business rules of how we want to launch them. Works great...with one wrinkle.
BO appears to be massively unreliable. The thing frequently goes down or fails to come up after a nightly timed restart. Our Ops team has sort of gotten used to this as a fact of life.
But the part of that which impacts me, on the java team, is our webservice tries to log on to BO, and instead of timing our or erroring like it should, the BO java library hangs forever. Evidently it is connecting to a half-started BO, and never gives up.
Looking around the internet, it appears that others have experienced this, but none of the things I see suggests how to set a timeout on the logon process so that if it fails, the web service doesn't lock up forever (which in turn can cause our app server to become unstable).
The connection is pretty simple:
session = CrystalEnterprise.getSessionMgr().logon(boUserName, boPassword, boServerName, boSecurityType);
All I am looking for is some way to make sure that if BO is dead, my webservice doesn't die with it. A timeout...a way to reliably detect if BO is not started and healthy before trying to logon....something. Our BO "experts" don't seem to think there is anything they can do about BO's instability and they know even less about the java library.
Ideas?
The Java SDK does not detail how to define a timeout when calling logon. I can only assume that this means it falls back on a default network connection timeout.
However, if a connection is made but the SDK doesn't receive the required information (and keeps waiting for an answer), a network timeout will never be reached as this is an application issue, not a network issue.
Therefore, the only thorough solution would be to deal with the instabilities in your BusinessObjects platform (for which you should create a separate question and describe the issue in more detail).
If this is not an option, an alternative could be to launch the connection attempt in a separate thread and implement a timeout yourself, killing the thread when the predefined timeout is reached and optionally retrying the connection attempt several times.
Keep in mind though that while the initial logon might be successful, the instabilities described in your question could cause other issues (e.g. a different SDK call could remain hanging forever due to the same issue that caused your logon call to hang).
Again, the only good solution is to look at the root cause of your platform instabilities.

How to implement a java daemon program in weblogic?

I have the task to port a standalone java deamon program to J2EE on weblogic.
Old: The java program starts two threads which loop endlessly based on an interval that can be configured via a properties file.
New: The program should run on weblogic 10.1.x and start when the managed server it will deployed to is started or the servlet is initialized and it shouldn't have to be invoked by a client.
I know already that creating your own threads is highly discouraged for weblogic so I'll search for another way to make this happen. I already tried via startup class, but that means the server remains in the state STARTING forever because naturally the programm is designed to run forever, I didn't know the server is actually waiting for the Startup Class to end. Next best thing I know of would be the usual servlet by calling its URL once and implement starting the programm in it. Even then, how would you prevent the browser from getting hung up on the servlet call (because it does run forever) without making the program logic asynchronous by creating a thread? Also I read something about Listeners, would that be the thing I should be looking for?
One last thing, I definitly need to run it on weblogic, so suggestions for other solutions wouldn't help me.
This is a confusing question because it's so basic... You just need to create a web service with your endless loops in it. You don't need to hit a URL to start it. Just deploy a .war or .ear file with your code and you're done.
http://docs.oracle.com/cd/E13222_01/wls/docs81/webserv/example.html

Memory Leak in multiple apps

I have a memory leak in two apps in Tomcat 6.0.35 server that appeared "out of nowhere". One app is Solr and the other is our own software. I'm hoping someone has seen this before as it's been happening to me for the last few weeks and I have to keep restarting Tomcat in a production environment.
It appeared on our original server despite the fact that none of the code related to thread or DB connection operation has been touched. As the old server this app runs on was due to be retired I migrated the site to a new server and a "cleaner" environment with the idea that would clear out any legacy stuff. But it continues to happen.
Just before Tomcat shuts down the catalina.out log is filled with errors like:
2012-04-25 21:46:00,300 [main] ERROR org.apache.catalina.loader.WebappClassLoader- The web application [/AppName] appears to have started a thread named [MultiThreadedHttpConnectionManager cleanup] but has failed to stop it. This is very likely to create a memory leak.
2012-04-25 21:46:00,339 [main] ERROR org.apache.catalina.loader.WebappClassLoader- The web application [/AppName] appears to have started a thread named [com.mchan
ge.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#2] but has failed to stop it. This is very likely to create a memory leak.
2012-04-25 21:46:00,470 [main] ERROR org.apache.catalina.loader.WebappClassLoader- The web application [/AppName] is still processing a request that has yet to fin
ish. This is very likely to create a memory leak. You can control the time allowed for requests to finish by using the unloadDelay attribute of the standard Conte
xt implementation.
During that migration we went from Solr 1.4->Solr 3.6 in an attempt to fix the problem. When the errors above start filling the log the Solr error below follows right behind repeated 10-15 times and then tomcat stops working and I have to shutdown and startup to get it to respond.
2012-04-25 21:46:00,527 [main] ERROR org.apache.catalina.loader.WebappClassLoader- The web application [/solr] created a ThreadLocal with key of type [org.a
pache.solr.schema.DateField.ThreadLocalDateFormat] (value [org.apache.solr.schema.DateField$ThreadLocalDateFormat#1f1e90ac]) and a value of type [org.apache.solr.
schema.DateField.ISO8601CanonicalDateFormat] (value [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat#6b2ed43a]) but failed to remove it when the web a
pplication was stopped. This is very likely to create a memory leak.
My research has brought up a lot of suggestions about changing the code that manages threads to make sure they kill off DB pooled connections etc. but the this code has not been changed in nearly 12 months. Also the Solr application is crashing and that's 3rd party so my thinking is that this is environmental (jar conflict, versioning, config fat fingered?)
My last change was updating the mysql connector for java to the latest as some memory leak bugs existed around pooling in earlier releases but the server's just crashed again only a few hours later.
One thing I just noticed is I'm seeing thousands of sessions in the Tomcat web manager but that could be a red herring.
If anyone has seen this any help is very much appreciated.
[Edit]
I think I found the source of the problem. It wasn't a memory leak after all. I've taken over an application from another development team that uses c3p0 for database pooling via Hibernate. c3p0 has a bug/feature that if you don't release DB connections c3p0 can go into a waiting state once all the connections (via MaxPoolSize: default is 15) are used. It will wait indefinitely for a connection to become available. Hence my stall.
I upped the MaxPoolSize firstly from 25->100 and my application ran for several days without a hang and then from 100->1000 and it's been running steady ever since (over 2 weeks).
This isn't the complete solution as I need to find out why it's running out of pooled connections so I also set c3p0's unreturnedConnectionTimeout to 4hrs which enforces a 4hr time limit on all connections regardless of whether they're active or not. If it's an active connection it will close it and re-open again.
Not pretty and c3p0 don't recommend it but it gives me some breathing space to find out the source of the problem.
Note: when using c3p0 with Hibernate the settings are stored in your persistence.xml file but not all settings can be put there. Some settings (e.g. unreturnedConnectionTimeout) must go in c3p0.properties
You state that the sequence of events is:
errors appear
Tomcat stops responding
restart is required
However, the memory leak error messages only get reported when the web application is stopped. Therefore, something is triggering the web applications to stop (or reload). You need to figure out what is triggering this and stop it.
Regarding the actual leaks, you may find this useful:
http://people.apache.org/~markt/presentations/2010-11-04-Memory-Leaks-60mins.pdf
It looks both your app and Solr have some leaks that need to be fixed. The presentation will provide you with some pointers. I would also consider an upgrade to the latest 7.0.x. The memory leak detection has been improved and not all improvements have made it into 6.0.x yet.

Categories

Resources