So basically the problem is described in the title.
The server works in the following way:
Listens to a new connection
Once connection is requested - adds the request to the Q,
Continues listening to a new connection
Separate process takes care of a Q and spawns a new thread to deal with the clients' requests.
The server code is similar to this tutorial (everything is in try / catch, unfortunately I cant show the source-code - company policy)
It seems to work very well, until the number of clients exceeds ~ 50, Then it just hangs with no exceptions / warnings / etc. There is a cpu thread limit of 32k, no limits on the number of open files / open sockets / etc. OS = CentOS 5.5 (same seems to happen in ubuntu tho). The server logs data to MySQL using ODBC. Separate stress tests of both showed that I can have up to 32k java processes (limited by /proc/sys/kernel/threads-max ) and MySQL can perform up to 20k simple operations / second, so Im assuming the problem is with the sockets.
So the question really is:
What is the limiting factor in socket connections and how can I make it bigger?
OR am I looking in the wrong place?
The chances are that you have induced a deadlock somewhere in the code. The key indicator here is if by 'hang' you mean the CPU usage of the server drops to nothing and no futher activity is seen in the server.
When the server hangs run jdk tool: jstack against it's process. This should show you what is waiting on what lock. Also in the tool kit is jvisualvm and if on a unix box a simple kill -3 pid will do a thread dump to stderr.
With out the code or at least a reproducable sample I'm afraid I can't help much more. One thing you might want to look at is using jetty as your embedded server instead of a hand roled one, they have already been through the deadlock/threading pain so you don't have to.
Don´t know if this will help you and if your are using it, but try to run your socket server with java switch "-server",this will select the Java HotSpot Server VM.The -server turns on the optimizing JIT along with a few other "server-class" settings. Generally you get the best performance out of this setting. The default VM is -client.
Also check your other params, so your socket server don´t run with minimal resources
Have a nice day
Related
I haven't found a clear answer on this one.
I have a client/server application in Java 7. The server and client are on seperate computers. The client has a short (1 line of 10 characters) command to issue to the server and the server responds (120 character string). This will be repeated every X seconds--where X is the rate in the configuration file. This could be as short as 1 second to Integer.MAX_VALUE seconds.
Every time that I've created a client/server application, the philosophy has been create the connection, do the business, close the connection and then do whatever else with the data. This seems to be the way things should be done--especially when using the try with resources programming.
What are the hiccups with leaving a socket connection hanging out there for X seconds? Is it really a best practice to close down and restart or is it a better practice for the socket to remain connected and just send the command every X seconds?
I think the answer depends a bit on the number of clients you expect to have.
If you will never have very many client connections open, then I'd say leave the connection open and call it good, especially if latency is an issue - even on LANs, I've seen connections take several milliseconds to initialize. If you expect hundreds or thousands of clients to connect and do this, however, I would reconnect every time. As others have said, leaving non-blocking sockets open will often mean you have a thread left running, which can take several megabytes of stack space on a per-thread basis. Do this several thousand times and you will have a big problem on most machines.
Another issue is port space. Just because the TCP/IP stack gives us 65535 total ports doesn't mean all are usable - in fact, most local firewalls will prohibit most from being used, so even if you had enough memory to run thousands of simultaneous threads, you could very likely run out of ports if you leave a lot of connections open simultaneously.
IMHO the client should open, do it's thing and then close.
on the server...
In UNIX one usually forks a process to answer the call (each call); however, on Windows one typically creates a new thread for each inbound call.
Have a web application running across multiple locations,
I can see many connections piling up by running this command on linux:
ps -ef|grep LOCAL
shows me the count of active oracle connections with process id's, and the connection count has been growing up by 5-7 number every hour. After few hours, application slows down and eventually tomcat server needs to be restarted.
As, I am able to see connections growing, Is there any way to get the source of these connections, to find out what classes or object's have created these laid up connections?
And I am not using Tomcat connection pooling, I tried generating thread dumps by issuing kill -3 tomcat pid, but of no use to me, as I am not able to understand them, even tried thread analyzers.
Is there any simple way to get the originator classes associated with these laid up connections to get a small hint, using some tomcat feature, or by any other means?
In JProfiler, you yould use the JDBC probe to get the stack trace that opened a connection. You would select the connection in the timeline
and jump to the events view
where you can select the "Connection opened" event. in the lower pane, the associated stack trace is shown.
Disclaimer: My company develops JProfiler
You could search for uses of javax.sql.DataSource.getConnection() using your IDE.
If you start tomcat in debug mode, you can look for instances of the connection class (and see them increasing). Also, putting a breakpoint on the constructor will catch them in the act of being created.
But really you should be using a connection pool. That is the easiest solution to your problems.
Perhaps these two tools can help you to determine what slows your sever application's performance.
jmeter
ab benchmarking tool
Performance might have slowed due to some simple implementation issues too. You might want to use NIO (buffer oriented, non-blocking IO) instead of IO for web applications, also you might be doing a lot of string concatenations (use StringBuffer).
I am using Cassandra DB in my java application. Am using Thrift client to connect Cassandra from my java application. If the Cassandra disk get full means it automatically terminates. So from my java program i could not find the correct error why the Cassandra is down.
So how to avoid the auto termination of Cassandra or is their any way to identify the disk full error ?
Also i dont have physical access to cassandra drive. Its running in some other remote machine.
Disk errors and, in general, generic hardware/system errors are not usually properly handled in any application. The database should only provide as much durability as possible in such scenarios and it is the correct behavior - shut down and break as little as possible.
As for your application - if you can not connect to the database, there is no difference as to what caused an error. You app will not work anyway.
There are special tools that can monitor your machine, i.e. Nagios. If you are the administrator of that server, use such applications. When the disk is getting filled up you will receive an email or text. Use such tools and don't break an open door by implementing several hundred of lines of code to handle random and very rare situations.
Setup ssh access to Casandra machine and use some ssh client like JSch to run df /casandra/drive (if Linux) or fsutil volume diskfree c:\casandra\drive (if Windows) from your Java client. Capture output that is simple and parse to obtain the free disk space. That way your application will monitor that is happening there and probably should alert the user and refuse to add data if there is an out of disk space threat.
You can also use standard monitoring tools or setup server side script to send the message if the disk space low. However this will not stop your application from crashing, you need to take actions after you see that the disk space is low.
Is there an efficient way to limit the bandwidth of a certain java process?
I am familiar with solutions like trickle to limit bandwidth of a certain process on run time
sudo trickle -s -d 1024 /path/to/app.sh
But when dealing with java processes it makes it more of a challenge because the application initiates a JVM or in some cases a WRAPPER service that initiates a JVM - that means that solutions like 'trickle' will not work.
I can try and limit (using trickle) the whole java process (by wrapping / messing up with /usr/bin/java s.link) - UGLY.
Does anyone know of a better solution for limiting the bandwidth of a java process (JVM)?
Thanks!
Unfortunately I don't think trickle can do it.
I have similar issue and I solved it via throttling bandwidth on a particular port. For example you application opens port 34567 to communicate, then you can apply firewall setting and throttle it down.
On a mac I am using "ipfw", example:
sudo ipfw pipe 1 config bw 5KByte/s
sudo ipfw add 2 pipe 1 src-port 6666
On linux I am using "tc", examples & source: http://www.cyberciti.biz/faq/linux-traffic-shaping-using-tc-to-control-http-traffic/
As a final solution, you can create bash script that monitors processes and picks ones you need and throws port throttling on it.
The question is not really clear.
Do you have control of the Java code? Otherwise, are you the System Administator?
If you are using a Java code you could use the Socket paradigm and then limit each socket connection by using the following method: setPerformancePreferences(int connectionTime, int latency, int bandwidth).
In the other case, the bandwidth limitation capability depends by the OS and the way the Java applications are executed.
I have the following situation: using a "classical" Java server (using ServerSocket) I would like to detect (as rapidly as possible) when the connection with the client failed unexpectedly (ie. non-gracefully / without a FIN packet).
The way I'm simulating this is as follows:
I'm running the server on a Linux box
I connect with telnet to the box
After the connection has succeeded I add "DROP" rule in the box's firewall
What happens is that the sending blocks after ~10k of data. I don't know for how long, but I've waited more than 10 minutes on several occasions. What I've researched so far:
Socket.setSoTimeout - however this affects only reads. If there are only writes, it doesn't have an effect
Checking for errors with PrintWriter.checkError(), since PW swallows the exceptions - however it never returns true
How could I detect this error condition, or at least configure the timeout value? (either at the JVM or at the OS level)
Update: after ~20min checkError returned true on the PrintWriter (using the server JVM 1.5 on a CentOS machine). Where is this timeout value configured?
The ~20 min timeout is because of standard TCP settings in Linux. It's really not a good idea to mess with them unless you know what you're doing. I had a similar project at work, where we were testing connection loss by disconnecting the network cable and things would just hang for a long time, exactly like you're seeing. We tried messing with the following TCP settings, which made the timeout quicker, but it caused side effects in other applications where connections would be broken when they shouldn't, due to small network delays when things got busy.
net.ipv4.tcp_retries2
net.ipv4.tcp_syn_retries
If you check the man page for tcp (man tcp) you can read about what these settings mean and maybe find other settings that might apply. You can either set them directly under /proc/sys/net/ipv4 or use sysctl.conf. These two were the ones we found made the send/recv fail quicker. Try setting them both to 1 and you'll see the send call fail a lot faster. Make sure to take not of the current settings before changing them.
I will reiterate that you really shouldn't mess with these settings. They can have side effects on the OS and other applications. The best solution is like Kitson says, use a heartbeat and/or application level timeout.
Also look into how to create a non-blocking socket, so that the send call won't block like that. Although keep in mind that sending with a non-blocking socket is usually successful as long as there's room in the send buffer. That's why it takes around 10k of data before it blocks, even though you broke the connection before that.
The only sure fire way is to generate application level "checks" instead of relying on the transport level. For example, a bi-directional heartbeat message, where if either end does not get the expected message, it closes and resets the connection.