Our project uses Business Objects for reports. Our java webapps that launch reports go thruogh a web service we set up to handle the business rules of how we want to launch them. Works great...with one wrinkle.
BO appears to be massively unreliable. The thing frequently goes down or fails to come up after a nightly timed restart. Our Ops team has sort of gotten used to this as a fact of life.
But the part of that which impacts me, on the java team, is our webservice tries to log on to BO, and instead of timing our or erroring like it should, the BO java library hangs forever. Evidently it is connecting to a half-started BO, and never gives up.
Looking around the internet, it appears that others have experienced this, but none of the things I see suggests how to set a timeout on the logon process so that if it fails, the web service doesn't lock up forever (which in turn can cause our app server to become unstable).
The connection is pretty simple:
session = CrystalEnterprise.getSessionMgr().logon(boUserName, boPassword, boServerName, boSecurityType);
All I am looking for is some way to make sure that if BO is dead, my webservice doesn't die with it. A timeout...a way to reliably detect if BO is not started and healthy before trying to logon....something. Our BO "experts" don't seem to think there is anything they can do about BO's instability and they know even less about the java library.
Ideas?
The Java SDK does not detail how to define a timeout when calling logon. I can only assume that this means it falls back on a default network connection timeout.
However, if a connection is made but the SDK doesn't receive the required information (and keeps waiting for an answer), a network timeout will never be reached as this is an application issue, not a network issue.
Therefore, the only thorough solution would be to deal with the instabilities in your BusinessObjects platform (for which you should create a separate question and describe the issue in more detail).
If this is not an option, an alternative could be to launch the connection attempt in a separate thread and implement a timeout yourself, killing the thread when the predefined timeout is reached and optionally retrying the connection attempt several times.
Keep in mind though that while the initial logon might be successful, the instabilities described in your question could cause other issues (e.g. a different SDK call could remain hanging forever due to the same issue that caused your logon call to hang).
Again, the only good solution is to look at the root cause of your platform instabilities.
Related
I have been fighting with this issue for ages and I cannot for the life of me figure out what the problem is. Let me set the stage for the stack we are using:
Web-based Java 8 application
GWT
Hibernate 4.3.11
MySQL
MongoDB
Spring
Tomcat 8 (incl Tomcat connection pooling instead of C3PO, for example)
Hibernate Search / Lucene
Terracotta and EhCache
The problem is that every couple of days (sometimes every second day, sometimes once every 10 days, it varies) in the early hours of the morning, our application "locks up". To clarify, it does not crash, you just cannot log in or do anything for that matter. All background tasks - everything - just halt. If we attempt to login when it is in this state, we can see in our log file that it is authenticating us as a valid user, but no response is ever sent so the application just "spins".
The only pattern we have found to date related to when these "lock ups" occur is that it happens when our morning scheduled tasks or SAP imports are running. It is not always the same process that is running though, sometimes the lock up happens during one of our SAP imports and sometimes during internal scheduled task execution. All that these things have in common are that they run outside of business hours (between 1am and 6am) and that they are quite intensive processes.
We are using JavaMelody for monitoring and what we see every time is that starting at different times in this 1 - 6am window, the number of used jdbc connections just start to spike (as per the attached image). Once that starts, it is just a matter of time before the lock up occurs and the only way to solve it is to bounce Tomcat thereby restarting the application.
As for as I can tell, memory, CPU, etc, are all fine when the lock up occurs the only thing that looks like it has an issue is the constantly increasing number of used jdbc connections.
I have checked the code for our transaction management so many times to ensure that transactions are being closed off correctly (the transaction management code is quite old fashioned: explicit begin and commit in try block, rollback in catch blocks and entity manager close in a finally block). It all seems correct to me so I am really, really stumped. In addition to this, I have also recently explicitly configured the Hibernate connection release mode properly to after_transaction, but the issue still occurs.
The other weird thing is that we run several instances of the same application for different clients and this issue only happens regularly for one client. They are our client with by far the most data to be processed though and although all clients run these scheduled tasks, this big client is the only one with SAP imports. That is why I originally thought that the SAP imports were the issue, but it locked up just after 1am this morning and that was a couple hours before the imports even start running. In this case it locked up during an internal scheduled task executing.
Does anyone have any idea what could be causing this strange behavior? I have looked into everything I can think of but to no avail.
After some time and a lot of trial and error, my team and I managed to sort out this issue. Turns out that the spike in JDBC connections was not the cause of the lock-ups but was instead a consequence of the lock-ups. Apache Terracotta was the culprit. It was just becoming unresponsive it seems. It might have been a resource allocation issue, but I don't think so since this was happening on servers that were low usage as well and they had more than enough resources available.
Fortunately we actually no longer needed Terracotta so I removed it. As I said in the question, we were getting these lock-ups every couples of days - at least once per week, every week. Since removing it we have had no such lock-ups for 4 months and counting. So if anyone else experiences the same issue and you are using Terracotta, try dropping it and things might come right, as they did in my case.
As said by coladict, you need to look at "Opened jdbc connections" page in the javamelody monitoring report and before the server "locks up".
Sorry if you need to do that at 2h or 3h in the morning, but perhaps you can run a wget command automatically in the night.
I have a simple JAVA server program thats is constantly checking a queue, and when a message arrives, it just proccess it.
It's very basic, tiny and robust and I basically monitor if it's functioning properly by checking the logs and using some 'ping messages' that give me an OK answer.
I have to do some improvements in this program (add some new functionalities), and was wondering what would be the best aproach to re-do this kind of server, using some (free) server or framework, taking advantage of better monitoring tools. As it's very tiny, code would be dead-easy to port anywhere.
For example, I don't think a Tomcat is necessary for this usage, but would like to know if there are any interesting/recommended alternatives.
I've searched other threads on this topic, but answers given don't mention any kind of server nor framework, for example:
java- how to wait constantly for an event to happen
So I've been tracking a bug for a day or two now which happens out on a remote server that I have little control over. The ins and outs of my code are, I provide a jar file to our UI team, which wraps postgres and provides storage for data that users import. The import process is very slow due to multiple reasons, one of which is that the users are importing unpredictable, large amounts of data (which we can't really cut down on). This has lead to a whole plethora of time out issues.
After some preliminary investigation, I've narrowed it down to the jdbc to the postgres database is timing out. I had a lot of trouble replicating this on my local test setup, but have finally managed to by reducing the 'socketTimeout' of the connection properties to 10s (there's more than 10s between each call made on the connection).
My question now is, what is the best way to keep this alive? I've set the 'tcpKeepAlive' to true, but this doesn't seem to have an effect, do I need to poll the connection manually or something? From what I've read, I'm assuming that polling is automatic, and is controlled by the OS. If this is true, I don't really have control of the OS settings in the run environment, what would be the best way to handle this?
I was considering testing the connection each time it is used, and if it has timed out, I will just create a new one. Would this be the correct course of action or is there a better way to keep the connection alive? I've just taken a look at this post where people are suggesting that you should open and close a connection per query:
When my app loses connection, how should I recover it?
In my situation, I have a series of sequential inserts which take place on a single thread, if a single one fails, they all fail. To achieve this I've used transactions:
m_Connection.setAutoCommit(false);
m_TransactionSave = m_Connection.setSavepoint();
// Do something
m_Connection.commit();
m_TransactionSave = null;
m_Connection.setAutoCommit(true);
If I do keep reconnecting, or use a connection pool like PGBouncer (like someone suggested in comments), how do I persist this transaction across them?
JDBC connections to PostGres can be configured with a keep-alive setting. An issue was raised against this functionality here: JDBC keep alive issue. Additionally, there's the parameter help page.
From the notes on that, you can add the following to your connection parameters for the JDBC connection:
tcpKeepAlive=true;
Reducing the socketTimeout should make things worse, not better. The socketTimeout is a measure of how long a connection should wait when it expects data to arrive, but it has not. Making that longer, not shorter would be my instinct.
Is it possible that you are using PGBouncer? That process will actively kill connections from the server side if there is no activity.
Finally, if you are running on Linux, you can change the TCP keep alive settings with: keep alive settings. I am sure something similar exists for Windows.
I am building an android app that communicates with a server on a regular basis as long as the app is running.
I do this by initiating a connection to the server when the app starts, then I have a separate thread for receiving messages called ReceiverThread, this thread reads the message from the socket, analyzes it, and forwards it to the appropriate part of the application.
This thread runs in a loop, reading whatever it has to read and then blocks on the read() command until new data arrives, so it spends most of it's time blocked.
I handle sending messages through a different thread, called SenderThread. What I am wondering about is: should I structure the SenderThread in a similar fashion? Meaning should I maintain some form a queue for this thread, let it send all the messages in the queue and then block until new messages enter the queue, or should I just start a new instance of the thread every time a message needs to be sent, let it send the message and then "die"? I am leaning towards the first approach, but I do not know what is actually better both in term of performance (keeping a blocked thread in memory versus initializing new threads), and in terms of code correctness.
Also since all of my activities need to be able to send and receive messages I am holding a reference to both threads in my Application class, is that an acceptable approach or should I implement it differently?
One problem I have encountered with this is that sometimes if I close my application and run it again I actually have two instances of ReceiverThread, so I get some messages twice.
I am guessing that this is because my application did not actually close and the previous thread was still active (blocked on the read() operation), and when I opened the application again a new thread was initialized, but both were connected to the server so the server sent the message to both. Any tips on how to get around this problem, or on how to completely re-organize it so it will be correct?
I tried looking up these questions but found some conflicting examples for my first question, and nothing that is useful enough and applies to my second question...
1. Your approach is ok, if you really need to keep an open connection between the server and client at all time at all cost. However I would use an asynchronous connection, like sending an HTTP request to the server and then get a reply whenever the server feels like it.
If you need the server to reply to the client at some later time, but you don't know when, you could also look into the Google Cloud Messaging framework, which gives you a transparent and consistent way of sending small messages to your clients from your server.
You need to consider some things, when you're developing a mobile application.
A smartphone doesn't have endless amount of battery.
A smartphone's Internet connection is somewhat volatile and you will lose Internet connection at different times.
When you keep a direct connection to server all the time, your app keep sending keep-alive packets, which means you'll suck the phone dry pretty fast.
When the Internet connection is as unstable as it gets on mobile broadband, you will lose the connection sometimes and need to recover from this. So if you use TCP because you want to make sure your packets are received you get to resend the same packets a lot of times and so get a lot of overhead.
Also you might run in to threading problems on the server-side, if you open threads on the server on your own, which it sounds like. Let's say you have 200 clients connecting to the server at the same time. Each client has 1 thread open on the server. If the server needs to serve 200 different threads at the same time, this could be quite a performance consuming task for the server in the end and you will need to do a lot work on your own as well.
2. When you exit your application, you'll need to clean-up after you. This should be done in your onPause method of the Activity which is active.
This means, killing off all active threads (or at least interupting them), saving the state of your UI (if you need this) and flushing and closing whatever open connections to the server you have.
As far as using Threads goes, I would recommend using some of the build-in threading tools like Handlers or implementing the AsyncTask.
If you really think Thread is the way to go, I would definitely recommend using a Singleton pattern as a "manager" for your threading.
This manager would control your threads, so you don't end up with more than one Thread talking to the server at any given time, even though you're in another part of the application.
As far as the Application class implementation goes, take a look at the Application class documentation:
Base class for those who need to maintain global application state. You can provide your own implementation by specifying its name in your AndroidManifest.xml's tag, which will cause that class to be instantiated for you when the process for your application/package is created.
There is normally no need to subclass Application. In most situation, static singletons can provide the same functionality in a more modular way.
So keeping away from implementing your own Application class is recommended, however if you let one of your Activities initialize your own Singleton class for managing the Threads and connections you might (just might) run into trouble, because the initialization of the singleton might "bind" to the specific Activity and so if the specific Activity is removed from the screen and paused it might be killed and so the singleton might be killed as well. So initializing the singleton inside your Application implementation might deem useful.
Sorry for the wall of text, but your question is quite "open-ended", so I've tried to give you a somewhat open-ended question - hope it helps ;-)
I have the following situation: using a "classical" Java server (using ServerSocket) I would like to detect (as rapidly as possible) when the connection with the client failed unexpectedly (ie. non-gracefully / without a FIN packet).
The way I'm simulating this is as follows:
I'm running the server on a Linux box
I connect with telnet to the box
After the connection has succeeded I add "DROP" rule in the box's firewall
What happens is that the sending blocks after ~10k of data. I don't know for how long, but I've waited more than 10 minutes on several occasions. What I've researched so far:
Socket.setSoTimeout - however this affects only reads. If there are only writes, it doesn't have an effect
Checking for errors with PrintWriter.checkError(), since PW swallows the exceptions - however it never returns true
How could I detect this error condition, or at least configure the timeout value? (either at the JVM or at the OS level)
Update: after ~20min checkError returned true on the PrintWriter (using the server JVM 1.5 on a CentOS machine). Where is this timeout value configured?
The ~20 min timeout is because of standard TCP settings in Linux. It's really not a good idea to mess with them unless you know what you're doing. I had a similar project at work, where we were testing connection loss by disconnecting the network cable and things would just hang for a long time, exactly like you're seeing. We tried messing with the following TCP settings, which made the timeout quicker, but it caused side effects in other applications where connections would be broken when they shouldn't, due to small network delays when things got busy.
net.ipv4.tcp_retries2
net.ipv4.tcp_syn_retries
If you check the man page for tcp (man tcp) you can read about what these settings mean and maybe find other settings that might apply. You can either set them directly under /proc/sys/net/ipv4 or use sysctl.conf. These two were the ones we found made the send/recv fail quicker. Try setting them both to 1 and you'll see the send call fail a lot faster. Make sure to take not of the current settings before changing them.
I will reiterate that you really shouldn't mess with these settings. They can have side effects on the OS and other applications. The best solution is like Kitson says, use a heartbeat and/or application level timeout.
Also look into how to create a non-blocking socket, so that the send call won't block like that. Although keep in mind that sending with a non-blocking socket is usually successful as long as there's room in the send buffer. That's why it takes around 10k of data before it blocks, even though you broke the connection before that.
The only sure fire way is to generate application level "checks" instead of relying on the transport level. For example, a bi-directional heartbeat message, where if either end does not get the expected message, it closes and resets the connection.