I have a third party component which tries to send too many UDP messages to too many separate addresses in a certain situation. This is a burst which happens when the software is started and the situation is temporary. I'm actually not sure is it the plain amount of the messages or the fact that each of them go to a separate IP address.
Anyway, changing the underlying protocol or the problematic component is not an option, so I'm looking for a workaround. The StackTrace looks like this:
java.io.IOException: No buffer space available
at java.net.PlainDatagramSocketImpl.send(Native Method)
at java.net.DatagramSocket.send(DatagramSocket.java:612)
This issue occurs (at least) with Java versions 1.6.0_13 and 1.6.0_10 and Linux versions Ubuntu 9.04 and RHEL 4.6.
Are there any Java system properties or Linux configuration tweaks which might help?
I've finally determined what the issue is. The Java IOException is misleading since it is "No buffer space available" but the root issue is that the local ARP table has been filled. On Linux, the default ARP table lookup is 1024 (files /proc/sys/net/ipv4/neigh/default/gc_thresh1, /proc/sys/net/ipv4/neigh/default/gc_thresh2, /proc/sys/net/ipv4/neigh/default/gc_thresh3).
What was happening in my case (and I assume your case), is that your Java code is sending out UDP packets from an IP address that is in the same subnet as your destination addresses. When this is the case, the Linux machine will perform an ARP lookup to translate the IP address into the hardware MAC address. Since you are blasting out packets to many different IPs the local ARP table fills up quickly, hits 1024, and that is when the Java exception is thrown.
The solution is simple, either increase the limit by editing the files I mentioned earlier, or move your server into a different subnet than your destination addresses, which then causes the Linux box to no longer perform neighbor ARP lookups (instead will be handled by a router on the network).
When sending lots of messages, especially over gigabit ethernet in Linux, the stock parameters for your kernel are usually not optimal. You can increase the Linux kernel buffer size for networking through:
echo 1048576 > /proc/sys/net/core/wmem_max
echo 1048576 > /proc/sys/net/core/wmem_default
echo 1048576 > /proc/sys/net/core/rmem_max
echo 1048576 > /proc/sys/net/core/rmem_default
As root.
Or use sysctl
sysctl -w net.core.rmem_max=8388608
There are tons of network options
See Linux Network Tuning by IBM
and More tuning information
Might be a bit complicated but as I know, Java uses the SPI1 pattern for the network sub-library. This allows you to change the implementation used for various network operations. If you use OpenJDK then you could gain some hints how and what to wrap with your implementation. Then, in your implementation you slow down the I/O with some sleeps for example.
Or, just for fun, you could override the default DatagramSocket with your modified implementation. Have the same package name for it and - as I know - it will take precedence over the default JRE class. At least this method worked for me on some buggy 3rd party library.
Edit:
1Service Provider Interface is a method to separate client and service code within an API. This separation allows different client and different provider implementations. Can be recognized from the name ending in Impl usually, just like in your stack trace java.net.PlainDatagramSocketImpl is the provider implementation where the DatagramSocket is the client side API.
You commented that you don't want to slow down the communication the entire way. There are several hacks to avoid it, for example measure the time in your code and slow the communication within the first 1-2 minutes starting at your first incoming method call. Then you can skip the sleep.
Another option would be to identify the misbehaving class in the library, JAD it and fix it. Then replace the original class file in the library.
I'm also currently seeing this problem as well with both Debian & RHEL. At this point I believe I've isolated it down to the NIC and/or the NIC driver. What hardware configuration do you have this also exhibits this problem? This seems to only occur on new Dell PowerEdge servers that we recently acquired that have Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet NICs.
I too can confirm that it is the rapid generation of outbound UDP packets to many different IP addresses in a short window. I've attempted to write a simple Java application that can reproduce it (since ours is occurring with snmp4j).
EDIT
Look at my answer here: Java IOException: No buffer space available while sending UDP packets on Linux
I have got this error when i tried to run coherence cluster in two local JVM using the WIFI connection to database..
If i run it using the ethernet - it runs well.
Related
I am writing a simple p2p application using Java 7 and tomp2p. The problem is that peers need to bootstrap to other peers in the same network and in order for that to work, the ports have to be set correctly and the broadcast messages need to be sent and received properly.
I would like to know the best setup for testing the application (or any distributed application) using a single machine (since I do not always have multiple machines to experiment with).
First, I simply tried running two instances of my application in different terminals (and this worked), but as soon as I tested it in a real network with two machines, the peers of my network were not able to find each other anymore.
Therefore, I am now running Ubuntu 12.04 as a host OS and a virtual machine (virtualbox) with a Fedora 17 image. However, for my application to work, the host and the VM need to appear as if they were in the same network, but somehow I could not figure out the right setup for this to work (this is due to some NAT issues).
Does anybody have experience with testing a distributed application on a single system, and can give me some hints about the setup and the virtual machines used?
Thanks in advance,
r0f1
I want to create Java Network servers which share one IP address. Something like the Piranha cluster:
Is there any solution similar to this?
P.S They have to work as a cluster. If one server is down the second one should handle the traffic.
Well the obvious solution would be to try to build your Java servers behind the Piranha layer; i.e. implement the application services on "real server 1", "real server 2", etcetera in Java
I'm pretty sure that you can't implement a Piranha-like solution in (pure) Java. The IP level load balancing is implemented in the network stack in the OS kernel (I think) of the "director". That rules out (pure) Java for two reasons:
It is impractical to put Java code inside the kernel.
To do it in user space in Java would entail using native code to read and write raw network packets. That is not possible in pure Java.
Besides, the chances are that you'd get better network throughput if the director layer was not implemented in Java, pure or otherwise.
Of course, there are other ways to do load balancing as well ...
Just create your standalone tcp/ip servers to listen on different ports (and ofcourse the IP address would be same as this is your requirement)
I am reading here, that
On connect, the JVM (Java Virtual Machine) tries to resolve the
hostname to IP/port. Windows tries a netbios ns query on UDP (User
Datagram Protocol) port 137 with a timeout of 1.5 seconds, ignores any
ICMP (Internet Control Message Protocol) port unreachable packets and
repeats this two more times, adding up to a value of 4.5 seconds. I
suggest putting critical hostnames in your HOSTS file to make sure
they are resolved quickly. Another possibility is turning off NETBIOS
altogether and running pure TCP/IP on your LAN (Local Area Network).
is this currently an issue still? Because I am working on a heartbeat-sensor and I was curious.
Your citation is not a normative reference, just another hobby site, and in this case it is dead wrong. None of this has anything to do with setSoTimeout(). He is totally confused between name resolution time, connect time, and read time. setSoTimeout() sets a read timeout, and is unaffected by the shenanigans he describes, whether accurately or otherwise, which wouldn't even happen at connect time as he states: they would happen at name-resolution time.
It's far from the only confusion to be found on that site, or even on that page, let me assure you. I told him about several errors on this page ten years ago, and about quite a lot of others, all of which remain uncorrected to this day, which gives you an idea of the site's accuracy, up-to-date-ness, and content review mechanisms. His only response was to add a rude remark about me. Unconvincing as a peer review mechanism.
Stick to authoritative sources.
I have seen applications that can detect adjacent networks and desktops and devices attached to them. They can also know the computer/device name that is attached within 30 seconds.
Shall I try runtime.execute ping and net view command to do it, for I find them fast.
How can I capture the output as a result from these commands?
I tried sockets but they are time consuming.. only advantage, that I can also know that they have application installed (in which I created socket, enabling this communication).
Regards
Time-Outs in the initialization of Socket are useful, but you cannot have each connection connected within less than 300 Milli-seconds. On the server side also there is a timeout implementation. There is one sided communication in both. Multi-threading will help.
So basically the problem is described in the title.
The server works in the following way:
Listens to a new connection
Once connection is requested - adds the request to the Q,
Continues listening to a new connection
Separate process takes care of a Q and spawns a new thread to deal with the clients' requests.
The server code is similar to this tutorial (everything is in try / catch, unfortunately I cant show the source-code - company policy)
It seems to work very well, until the number of clients exceeds ~ 50, Then it just hangs with no exceptions / warnings / etc. There is a cpu thread limit of 32k, no limits on the number of open files / open sockets / etc. OS = CentOS 5.5 (same seems to happen in ubuntu tho). The server logs data to MySQL using ODBC. Separate stress tests of both showed that I can have up to 32k java processes (limited by /proc/sys/kernel/threads-max ) and MySQL can perform up to 20k simple operations / second, so Im assuming the problem is with the sockets.
So the question really is:
What is the limiting factor in socket connections and how can I make it bigger?
OR am I looking in the wrong place?
The chances are that you have induced a deadlock somewhere in the code. The key indicator here is if by 'hang' you mean the CPU usage of the server drops to nothing and no futher activity is seen in the server.
When the server hangs run jdk tool: jstack against it's process. This should show you what is waiting on what lock. Also in the tool kit is jvisualvm and if on a unix box a simple kill -3 pid will do a thread dump to stderr.
With out the code or at least a reproducable sample I'm afraid I can't help much more. One thing you might want to look at is using jetty as your embedded server instead of a hand roled one, they have already been through the deadlock/threading pain so you don't have to.
Don´t know if this will help you and if your are using it, but try to run your socket server with java switch "-server",this will select the Java HotSpot Server VM.The -server turns on the optimizing JIT along with a few other "server-class" settings. Generally you get the best performance out of this setting. The default VM is -client.
Also check your other params, so your socket server don´t run with minimal resources
Have a nice day