I'm running experiments on my machine A and B, both with Ubuntu Server 11.04 installed. A and B are connected to the same 1000M/bps switch.
A is the sender:
while (total<=10,000)
send_udp_datagramPacket(new byte[100]) to B
B is the receiver:
while(true)
receive()
But finally I got less than 10,000 (about 9960) at B. Why is this happening?
Where did the lost packets go? Were they not actually sent to the wire to the switch? Or the switch lost them? Or they indeed got to B, but B's OS discarded them? Or they reached to Java, but Java threw them away because of a full buffer?
Any reply would be appreciated.
Remember, UDP does not provide reliable communication, it is intended for situations in which data loss is acceptable (streaming media for instance). Chances are good that this is a buffer overflow (my guess, don't rely on it) but the point is that if this data loss is not acceptable, use TCP instead.
If this is just for the sake of experimentation, try adding a delay (Thread.sleep()) in the loop and increase it until you get the maximum received packets.
EDIT:
As mentioned in a comment, the sleep() is NOT a fix and WILL eventually loose packets...that's just UDP.
But finally I got less than 10,000 (about 9960) at B. Why is this happening?
UDP is a lossy protocol. Even if you got 10,000 in this test you would still have to code for the possibility that some packets will be lost. They can also be fragmented (if larger than 532 bytes) and/or arrive out of order.
Where did the lost package go?
They were dropped.
Were they not actually sent to the wire to the switch?
They can be dropped just about anywhere. I don't believe Java has any logic for dropping packets (but this to is not guaranteed in all implementations) It could be dropped by the OS, the network adapter, corrupted on the wire, dropped by the switch.
Or the switch lost them?
It will do this if the packet arrived corrupt in some way or a buffer filled.
Or they indeed got to B, but B's OS discarded them?
Yes, or A's OS could have discarded them.
Or they reached to Java, but Java threw them away because of a full buffer?
Java doesn't have its own buffers. It uses the underlying buffers from the OS. But the packets could be lost at this stage.
Note: No matter how much you decrease the packet loss, you must always allow for some loss.
Why does this Java programme cause UDP packet loss?
The question is ill-formed. Neither Java nor your program causes UDP packet loss. UDP causes UDP packet loss. There is no guarantee that any UDP packet will ever arrive. See RFC 768.
Related
I have scoured the interwebz with no result. We are facing a problem where some Android devices experience severe packet loss. To give some background, the application connects to a specific Wifi and looks for UDP packets broadcast on port 17216. These packets are of size 832 bytes, excluding the wrapped headers, and are sent at a regular rate of four per second.
We have only met the problem on two devices, a low-end Turbox Rubik II tablet and an ASUS Memo Pad HD 7. The other devices we've tested (phones and tablets) all gather the packets at the stipulated regular interval.
The function that receives the packets is this:
public void run()
{
while (isUDPServerRunning)
{
try
{
socket.receive(packet);
ProcessRawPacketData();
DisplayLoggingInfo();
}
catch (IOException e)
{
Log.e("receive", e.getMessage());
e.printStackTrace();
}
}
}
And that is part of a Runnable. The socket is created thus:
byte[] buffer = new byte[1024];
DatagramSocket socket;
DatagramPacket packet = new DatagramPacket(buffer, buffer.length);
with the socket being initialized in the onCreate() method of our Service extension:
socket = new DatagramSocket(SERVERPORT);
The packets are being received by the Wifi module. We've confirmed that by rooting one of the devices and installing a packet sniffer, so the problem must somehow be code related.
On the affected devices packets are received correctly for a couple of seconds and then there is complete dropout that lasts for several seconds, so I estimate the loss to exceed 50%.
Any help would be much appreciated. We are pulling our hair out.
Update I was mistaken about the packet sniffer. It seems that the packet sniffer is also losing several relevant packets on the rooted device. Sometimes, though, simply starting the packet sniffer fixes the issue! Turning Bluetooth on/off like suggested below does not seem to make a difference. Could this be another hardware issue?
Update 2 Here is an example of the logs I'm printing immediately after the socket.receive() line. Notice how it skips half a minute's worth of packets and then works fine for a few seconds.
05-25 15:44:38.670: D/LOG(4393): Packet Received
05-25 15:44:38.941: D/LOG(4393): Packet Received
05-25 15:45:09.482: D/LOG(4393): Packet Received
05-25 15:45:09.716: D/LOG(4393): Packet Received
05-25 15:45:09.928: D/LOG(4393): Packet Received
05-25 15:45:10.184: D/LOG(4393): Packet Received
05-25 15:45:10.451: D/LOG(4393): Packet Received
05-25 15:45:10.661: D/LOG(4393): Packet Received
Packet loss (as you know, of course) can happen at multiple stages along the transmission:
Sending from the server
Transmission over the network
Physical reception at the client and handling in hardware
Processing/buffering of the packet in the kernel/OS
Handling/buffering of the packet in your app.
You can quickly check whether point 1 or 2 are an issue by having other devices listen for the same broadcast while being connected to the same Wifi router. Sounds like you already did this and that there is no issue. (Note that a packet that gets dropped in step 2 (or sometimes even 1) might not be missing from the WireShark dump if you run it on the server.)
Points 3 through 5 are therefore likely to be the problem and they might be a little harder to separate out.
Here are a couple of things that might help:
Like #Mick suggested, don't just print out when you received the packet, but give every packet an increasing ID number to figure out whether you actually lost a packet or whether it was just delayed.
Move your packet-receiving code into its own thread (if it isn't already) and set the priority of that thread to MAX_PRIORITY to minimize the chance that your code is holding up the lunch line. Given that the Memo Pad is a quad-core 1.2GHz machine, MAX_PRIORITY shouldn't even be necessary, but if you aren't currently running the receive-loop in its own dedicated thread, you might see hick-ups anyways. If this fixes things, simply have a minimal receive-loop stick the packets into your own buffer-queue and have an independent thread process them.
Check/increase the size of the packet buffer for receiving packets via setReceiveBufferSize(...) (more verbose Java reference here). Make sure you specify a size that can hold many packets. Given that running the packet-sniffer sometimes seems to help things, it does sound like there might be some socket setting that can improve things, which the sniffer happens to set.
On the server you can also add a tag to the packet that tells all involved devices how to treat the packet. If you call setTrafficClass(IPTOS_RELIABILITY), you are asking everyone involved to optimize their packet handling for maximum reliability. Not all devices will care, but it may make a difference.
You can try to use DatagramChannels instead of DatagramSockets and then use select() to wait for the next packet to read. While this technically should not make a difference, sometimes using a different API call can provide a work-around for an issue.
Unfortunately Android is a very heterogeneous environment where many manufacturers will provide their own kernel modules, etc. This also introduces various incompatibilities or non-standard behavior everywhere. You might be able to find a custom ROM (Cyanogen, etc.?) for one or both of your problem-devices. If installing that instead of the factory ROM fixes your problem, then it's a bug in the manufacturer provided (kernel) network drivers, in which case, you might get lucky to find a work-around, or you could maybe file a bug-report with them, but in general, you might just have to select those devices as unsupported in the Play Store to avoid bad reviews...
Finally, here is a work-around that should fix the issue for sure:
Add some code to your client that detects dropped packets and, if the drop-rate goes too high, opens a TCP connection to the server instead, which will then guarantee packet delivery. Given that your packets are small and infrequent and that only a few devices will ever need to use this mechanism, I don't think that this should cause a problem for your server load. If you don't have a way to change the server code to provide a TCP stream, you could write an independent proxy-server that collects the UDP packets and makes them available via TCP. If you can run it on the same machine as the original server, you even know what IP address it is at (the same as the source address of the UDP packets that did arrive).
Just a wild guess, but how long do your computations on the packet take? Is it possible that the allocated buffer for the socket fills up and starts to drop the packages?
I know, this sounds unlikely for a transfer rate at about 4 KB/s... But if your computations take longer than 250 ms than this would occur sooner or later. This would also explain why some devices work like a charm, and others don't.
Have you tried to remove the computations and just print the "package received" message for debugging?
Interestingly enough, both of the devices that are experiencing UDP packet loss happen to have Mediatek SoCs. Do your other test devices have this same chipset?
This may be a bug in the driver for the Wi-Fi of those SoCs. Being that it only shows up with UDP, and isn't always 100%, it may have been unnoticed by everyone until now.
This sounds very similar to Bluetooth interference symptoms that can be seen on Android (and iOS - in fact anything with WiFi and Bluetooth together) devices.
2.4Ghz WiFi and Bluetooth share the same bandwidth and can interfere with each other - on some devices this is vey pronounced, maybe due to the internal layout.
It is also possible that you can see it on some devices and not others because of the versions of WiFi they support - the newer 5GHz based wifi does not interfere with bluetooth in the same way, but some older or more basic Android devices may not support this.
You can test if this is the cause quite easily by switching off bluetooth on the device while testing (if your app can function without bluetooth).
TCP_NODELAY is an option to enable quick sending of TCP packets, regardless of their size. This is very useful option when speed matters, however, I'm curious what it will do to this:
Socket socket = [some socket];
socket.setTcpNoDelay(true);
OutputStream out = socket.getOutputStream();
out.write(byteArray1);
out.write(byteArray2);
out.write(byteArray3);
out.flush();
I tried to find what flush actually does on a SocketOutputStream, but as far as I know now, it doesn't do anything. I had hoped it would tell the socket "send all your buffered data NOW", but, unfortunately, no it doesn't.
My question is: are these 3 byte arrays sent in one packet? I know you don't have much control over how TCP constructs a network packet, but is there any way to tell the socket to (at least try to) pack these byte arrays, so network overhead is avoided? Could manually packing the byte arrays and sending them in one call to write help?
My question is: are these 3 byte arrays sent in one packet?
As you have disabled the Nagle algorithm, almost certainly not, but you can't be 100% sure.
I know you don't have much control over how TCP constructs a network packet, but is there any way to tell the socket to (at least try to) pack these byte arrays
Yes. Don't disable the Nagle algorithm.
so network overhead is avoided? Could manually packing the byte arrays and sending them in one call to write help?
Yes, or simpler still just wrap the socket output stream in a BufferedOutputStream and call flush() when you want the data to be sent, as per your present code. You are correct that flush() does nothing on a socket output stream, but it flushes a BufferedOutputStream.
Could manually packing the byte arrays and sending them in one call to write help?
Yes, send them all in a single call to write. This will maximize the chances that your bytes will be sent in a single packet.
Of course, you can never know, since there are too many variables involved - different OSs and lots of different networking gear between you and your peer, but if you give the OS the ability to pack everything together, it will generally try to.
If you disable nagle and make seperate system calls (remember, the OS controls the sockets, not your appliation or java), you're asking the OS to send them individually. The OS has no idea you're about to call write again with some more data.
I have two debian servers located on the same subnet. They are connected by a switch. I am aware the UDP is unreliable.
Question 1: I assume the link layer is ethernet. And MTU from a standard
Ethernet is 1500 bytes. However, when I did a ping from one server to
another, I found out that the maximum packet size can be sent is
65507. Shouldn't it be 1500 bytes? Can I say, because there's no router in between these two servers, therefore, the IP datagram will
not be fragmented.
Question 2: Because two servers are directly connected with a switch, can I
assume that all datagrams arrives in order and no loss on the path?
Question 3: How can I determine that the chances of datagram dropped
at the server because of buffer overflow. What size to set the receive buffer so that datagram will not overflow receive buffer.
No. UDP is not even reliable between processes on the same machine. If packets are sent to a socket without giving the receiver process time to read them, the buffer will overflow and packets will be lost.
You did your ping test with fragmentation enabled. Besides that, ping doesn't use UDP, but ICMP, so the results mean nothing. UDP packets smaller than the MTU will not be fragmented, but the MTU depends on more factors, such as IP options and VLAN headers, so it may not be greater than 1500.
No. Switches perform buffering, and it's possible for the internal buffers to overflow. Consider a 24 port switch where 23 nodes are all transmitting as fast as possible to the last node. Clearly the connection to the last node cannot handle the aggregate traffic of 23 other links, the switch will try to buffer packets but eventually end up dropping them.
Besides that, electrical noise can corrupt packets in transit, causing them to be discarded when the checksum fails.
To analyze the chance of buffer overflow, you could employ queuing theory to find the probability that a packet arrives when the buffer is full. You'll need some assumptions regarding the probability distribution on the rate of packet transmission and the processing time. The number of packets in the buffer then form a finite chain, hopefully Markov, which you can solve for the steady-state probabilities of each state in the chain. Good search keywords to find out more would be "queuing theory", "Markov chain", "call capacity", "circuit capacity", "load factor".
EDIT: You changed the title of the question. The answer to your new question is: "You can't prove something that isn't true." If you want to make a reliable application using UDP, you should add your own acknowledgement and loss handling logic.
The 64 KB maximum packet size is the absolute limit of the protocol, as opposed to the 1500 byte MTU you may have configured (the MTU can be changed easily, the 64 KB limit cannot).
In practice you will probably never see reordered datagrams in your scenario. And you'll probably only lose them if the receiving side is not processing them fast enough (or is shut off completely).
The "chances" of a datagram being dropped by the receiver is not something we can really quantify without knowing a whole lot more about your situation. If the receiver processes datagrams faster than the sender sends them, you're fine, otherwise you may lose some--know how many and exactly when is a considerably finer point.
The IP stack will fragment and defragment the packet for you. You can test this by setting the the no-fragment flag. The packet will be dropped.
No. They will most likely come in order, and probably not dropped, but the network stack, in your sender, router and receiver, are free to drop the packet if it can't handle it when it arrives. Also remember that when a large packet is fragmented, one lost fragment means that the whole packet will be dropped by the stack.
I guess you can probe by sending 1000 packets and measure loss, but historical values does not predict the future...
Question 1: You are confusing the MTU with the tcp maximum packet size see here
Question 2: Two servers connected via a switch does not guarantee datagrams arriving in order. There will be other network transmissions occurring that will interfere with the udp stream potentially causing out of sequence frames
Question 3: Answered by Ben Voigt above.
I am receiving ~3000 UDP packets per second, each of them having a size of ~200bytes. I wrote a java application which listens to those UDP packets and just writes the data to a file. Then the server sends 15000 messages with previously specified rate. After writing to the file it contains only ~3500 messages. Using wireshark I confirmed that all 15000 messages were received by my network interface. After that I tried changing the buffer size of the socket (which was initially 8496bytes):
(java.net.MulticastSocket)socket.setReceiveBufferSize(32*1024);
That change increased the number of messages saved to ~8000. I kept increasing the buffer size up to 1MB. After that, number of messages saved reached ~14400. Increasing buffer size to larger values wouldn't increase the number of messages saved. I think I have reached the maximum allowed buffer size. Still, I need to capture all 15000 messages which were received by my network interface.
Any help would be appreciated. Thanks in advance.
Smells like a bug, most likely in your code. If the UDP packets are delivered over the network, they will be queued for delivery locally, as you've seen in Wireshark. Perhaps your program just isn't making timely progress on reading from its socket - is there a dedicated thread for this task?
You might be able to make some headway by detecting which packets are being lost by your program. If all the packets lost are early ones, perhaps the data is being sent before the program is waiting to receive them. If they're all later, perhaps it exits too soon. If they are at regular intervals there may be some trouble in your code which loops receiving packets. etc.
In any case you seem exceptionally anxious about lost packets. By design UDP is not a reliable transport. If the loss of these multicast packets is a problem for your system (rather than just a mystery that you'd like to solve for performance reasons) then the system design is wrong.
The problem you appear to be having is that you get delay writing to a file. I would read all the data into memory before writing to the file (or writing to a file in another thread)
However, there is no way to ensure 100% of packet are received with UDP without the ability to ask for packets to be sent again (something TCP does for you)
I see that you are using UDP to send the file contents. In UDP the order of packets is not assured. If you not worried about the order, you put all the packets in a queue and have another thread process the queue and write the contents to file. By this the socket reader thread is not blocked because of file operations.
The receive buffer size is configured at OS level.
For example on Linux system, sysctl -w net.core.rmem_max=26214400 as in this article
https://access.redhat.com/site/documentation/en-US/JBoss_Enterprise_Web_Platform/5/html/Administration_And_Configuration_Guide/jgroups-perf-udpbuffer.html
This is a Windows only answer, but the following changes in the Network Controller Card properties made a DRAMATIC difference in packet loss for our use-case.
We are consuming around 200 Mbps of UDP data and were experiencing substantial packet loss under moderate server load.
The network card in use is an Asus ROG Aerion 10G card, but I would expect most high-end network controller cards to expose similar properties. You can access them via Device Manager->Network card->Right-Click->Properties->Advanced Options.
1. Increase number of Receive Buffers:
Default value was 512; we could increase it up to 1024. In our case, higher settings were accepted, but the network card becomes disabled once we exceed 1024. Having a larger number of available buffers at the network-card level gives the system more tolerance to latency in transferring data from the network card buffers to the socket buffers where our apps finally can read the data.
2. Set Interrupt Moderation Rate to 'Off':
If I understood correctly, interrupt moderation coalesces multiple "buffer fill" notifications (via interrupts) into a single notification. So, the CPU will be interrupted less-often and fetch multiple buffers during each interrupt. This reduces CPU usage, but increases the chance a ready buffer is overwritten before being fetched, in case the interrupt is serviced late.
Additionally, we increased the socket buffer size (as the OP already did) and also enabled Circular Buffering at the socket level, as suggested by Len Holgate in a comment, this should also increase tolerance to latency in processing the socket buffers.
I read this question about the error that I'm getting and I learned that UDP data payloads can't be more than 64k. The suggestions that I've read are to use TCP, but that is not an option in this particular case. I am interfacing with an external system that is transmitting data over UDP, but I don't have access to that external system at this time, so I'm simulating it.
I have data messages that are upwards of 1,400,000 bytes in some instances and it's a requirement that the UDP protocol is used. I am not able to change protocols (I would much rather use TCP or a reliable protocol build on UDP). Instead, I have to find a way to transmit large payloads over UDP from a test application into the system that I am building and to read those large payloads in the system that I'm building for processing. I don't have to worry about dropped packets, either - if I don't get the datagram, I don't care - just wait for the next payload to arrive. If it's incomplete or missing, just throw it all away and continue waiting. I also don't know the size of the datagram in advance (they range of a few hundred bytes to 1,400,000+ bytes.
I've already set my send and receive buffer sizes large enough, but that's not sufficient. What else can I do?
UDP packets have a 16 bit length field. It's nothing to do with Java. They cannot be bigger, period. If the server you are talking to is immutable, you are stuck with what you can fit into a packet.
If you can change the server and thus the protocol, you can more or less reimplement TCP for yourself. Since UDP is defined to be unreliable, you need the full retransmission mechanism to cope with packets that are dropped in the network somewhere. So, you have to split the 'message' into chunks, send the chunks, and have a protocol for requesting retransmission of lost chunks.
It's a requirement ...
The requirement should also therefore dictate the packetization technique. You need more information about the external system and its protocol. Note that the maximum IPv4 UDP payload Is 65535-28 bytes, and the maximum practical payload is < 1500 bytes once a router gets involved.