Severe UDP packet loss on some Android devices

Severe UDP packet loss on some Android devices - java

I have scoured the interwebz with no result. We are facing a problem where some Android devices experience severe packet loss. To give some background, the application connects to a specific Wifi and looks for UDP packets broadcast on port 17216. These packets are of size 832 bytes, excluding the wrapped headers, and are sent at a regular rate of four per second.
We have only met the problem on two devices, a low-end Turbox Rubik II tablet and an ASUS Memo Pad HD 7. The other devices we've tested (phones and tablets) all gather the packets at the stipulated regular interval.
The function that receives the packets is this:
public void run()
{
while (isUDPServerRunning)
{
try
{
socket.receive(packet);
ProcessRawPacketData();
DisplayLoggingInfo();
}
catch (IOException e)
{
Log.e("receive", e.getMessage());
e.printStackTrace();
}
}
}
And that is part of a Runnable. The socket is created thus:
byte[] buffer = new byte[1024];
DatagramSocket socket;
DatagramPacket packet = new DatagramPacket(buffer, buffer.length);
with the socket being initialized in the onCreate() method of our Service extension:
socket = new DatagramSocket(SERVERPORT);
The packets are being received by the Wifi module. We've confirmed that by rooting one of the devices and installing a packet sniffer, so the problem must somehow be code related.
On the affected devices packets are received correctly for a couple of seconds and then there is complete dropout that lasts for several seconds, so I estimate the loss to exceed 50%.
Any help would be much appreciated. We are pulling our hair out.
Update I was mistaken about the packet sniffer. It seems that the packet sniffer is also losing several relevant packets on the rooted device. Sometimes, though, simply starting the packet sniffer fixes the issue! Turning Bluetooth on/off like suggested below does not seem to make a difference. Could this be another hardware issue?
Update 2 Here is an example of the logs I'm printing immediately after the socket.receive() line. Notice how it skips half a minute's worth of packets and then works fine for a few seconds.
05-25 15:44:38.670: D/LOG(4393): Packet Received
05-25 15:44:38.941: D/LOG(4393): Packet Received
05-25 15:45:09.482: D/LOG(4393): Packet Received
05-25 15:45:09.716: D/LOG(4393): Packet Received
05-25 15:45:09.928: D/LOG(4393): Packet Received
05-25 15:45:10.184: D/LOG(4393): Packet Received
05-25 15:45:10.451: D/LOG(4393): Packet Received
05-25 15:45:10.661: D/LOG(4393): Packet Received

Packet loss (as you know, of course) can happen at multiple stages along the transmission:
Sending from the server
Transmission over the network
Physical reception at the client and handling in hardware
Processing/buffering of the packet in the kernel/OS
Handling/buffering of the packet in your app.
You can quickly check whether point 1 or 2 are an issue by having other devices listen for the same broadcast while being connected to the same Wifi router. Sounds like you already did this and that there is no issue. (Note that a packet that gets dropped in step 2 (or sometimes even 1) might not be missing from the WireShark dump if you run it on the server.)
Points 3 through 5 are therefore likely to be the problem and they might be a little harder to separate out.
Here are a couple of things that might help:
Like #Mick suggested, don't just print out when you received the packet, but give every packet an increasing ID number to figure out whether you actually lost a packet or whether it was just delayed.
Move your packet-receiving code into its own thread (if it isn't already) and set the priority of that thread to MAX_PRIORITY to minimize the chance that your code is holding up the lunch line. Given that the Memo Pad is a quad-core 1.2GHz machine, MAX_PRIORITY shouldn't even be necessary, but if you aren't currently running the receive-loop in its own dedicated thread, you might see hick-ups anyways. If this fixes things, simply have a minimal receive-loop stick the packets into your own buffer-queue and have an independent thread process them.
Check/increase the size of the packet buffer for receiving packets via setReceiveBufferSize(...) (more verbose Java reference here). Make sure you specify a size that can hold many packets. Given that running the packet-sniffer sometimes seems to help things, it does sound like there might be some socket setting that can improve things, which the sniffer happens to set.
On the server you can also add a tag to the packet that tells all involved devices how to treat the packet. If you call setTrafficClass(IPTOS_RELIABILITY), you are asking everyone involved to optimize their packet handling for maximum reliability. Not all devices will care, but it may make a difference.
You can try to use DatagramChannels instead of DatagramSockets and then use select() to wait for the next packet to read. While this technically should not make a difference, sometimes using a different API call can provide a work-around for an issue.
Unfortunately Android is a very heterogeneous environment where many manufacturers will provide their own kernel modules, etc. This also introduces various incompatibilities or non-standard behavior everywhere. You might be able to find a custom ROM (Cyanogen, etc.?) for one or both of your problem-devices. If installing that instead of the factory ROM fixes your problem, then it's a bug in the manufacturer provided (kernel) network drivers, in which case, you might get lucky to find a work-around, or you could maybe file a bug-report with them, but in general, you might just have to select those devices as unsupported in the Play Store to avoid bad reviews...
Finally, here is a work-around that should fix the issue for sure:
Add some code to your client that detects dropped packets and, if the drop-rate goes too high, opens a TCP connection to the server instead, which will then guarantee packet delivery. Given that your packets are small and infrequent and that only a few devices will ever need to use this mechanism, I don't think that this should cause a problem for your server load. If you don't have a way to change the server code to provide a TCP stream, you could write an independent proxy-server that collects the UDP packets and makes them available via TCP. If you can run it on the same machine as the original server, you even know what IP address it is at (the same as the source address of the UDP packets that did arrive).

Just a wild guess, but how long do your computations on the packet take? Is it possible that the allocated buffer for the socket fills up and starts to drop the packages?
I know, this sounds unlikely for a transfer rate at about 4 KB/s... But if your computations take longer than 250 ms than this would occur sooner or later. This would also explain why some devices work like a charm, and others don't.
Have you tried to remove the computations and just print the "package received" message for debugging?

Interestingly enough, both of the devices that are experiencing UDP packet loss happen to have Mediatek SoCs. Do your other test devices have this same chipset?
This may be a bug in the driver for the Wi-Fi of those SoCs. Being that it only shows up with UDP, and isn't always 100%, it may have been unnoticed by everyone until now.

This sounds very similar to Bluetooth interference symptoms that can be seen on Android (and iOS - in fact anything with WiFi and Bluetooth together) devices.
2.4Ghz WiFi and Bluetooth share the same bandwidth and can interfere with each other - on some devices this is vey pronounced, maybe due to the internal layout.
It is also possible that you can see it on some devices and not others because of the versions of WiFi they support - the newer 5GHz based wifi does not interfere with bluetooth in the same way, but some older or more basic Android devices may not support this.
You can test if this is the cause quite easily by switching off bluetooth on the device while testing (if your app can function without bluetooth).

Related

DatagramChannel Send Missing On Wire

I'm seeing some occasional missing data with a datagram channel in a tool I'm developing. UDP is part of the requirement here, so I'm mostly just trying to troubleshoot the behavior I'm seeing. The tool is being developed with Java 7 (another requirement), but the computer on which I'm seeing the behavior occur is running on a Java 8 JRE.
I have a decorator class that decorates a call to DatagramChannel.send with some additional behavior, but the call effectively boils down to this:
public int send( ByteBuffer buffer, SocketAddress target ) throws
{
// some additional decorating code that can't be shared follows
int bytesToWrite = buffer.remaining();
int bytesWritten = decoratedChannel.send(buffer, target);
if (bytesWritten != bytesToWrite) {
// log the occurrence
return bytesWritten;
}
}
There is an additional bit of decoration above this that performs our own fragmentation (as part of the requirements of the remote host). Thus the source data is always guaranteed to be at most 1000 bytes (well within the limit for an ethernet frame). The decorated channel is also configured for blocking I/O.
What I'm seeing on rare occasions, is that this routine (and thus the DatagramChannel's send method) will be called, but no data is seen on the wire (which is monitored with Wireshark). The send routine always returns the number of bytes that should have been written in this case too (so bytesWritten == bytesToWrite).
I understand that UDP has reliability issues (for which we have our own data reliability mechanism that accounts for data loss and other issues), but I'm curious about the behavior of the Datagram channel's implementation. If send is returning the number of bytes written, should I not at least see a corresponding frame in Wireshark? Otherwise, I would expect the native implementation to possibly throw an exception, or at least not return the number of bytes I expected to write?

I actually ended up discovering the cause with more fiddling in Wireshark. I was unintentionally filtering out ARP requests, which seem to be the cause of the problem, as mentioned in this answer:
ARP queues only one outbound IP datagram for a specified destination address while that IP address is being resolved to a MAC address. If a UDP-based application sends multiple IP datagrams to a single destination address without any pauses between them, some of the datagrams may be dropped if there is no ARP cache entry already present. An application can compensate for this by calling the Iphlpapi.dll routine SendArp() to establish an ARP cache entry, before sending the stream of packets.
It appears the ARP entries were going stale really quick and the occasional ARP request would cause the dropped packet. I increased the ARP timeout for the interface on the PC and the dropped packet happens much less often now.

Which is the best approach to send large UDP packets in sequence

I have an android application that needs to send data through the protocol UDP every 100 milliseconds. Each UDP packet has 15000 bytes average. packets are sent in broadcast
Every 100 milliseconds lines below are run through a loop.
DatagramPacket sendPacket = new DatagramPacket(sendData, sendData.length, broadcast, 9876);
clientSocket.send(sendPacket);
Application starts working fine, but after about 1 minute frequency of received packets decreases until the packets do not arrive over the destination.
The theoretical limit (on Windows) for the maximum size of a UDP packet is 65507 bytes
I know the media MTU of a network is 1500 bytes and when I send a packet bigger it is broken into several fragments and if a fragment does not reach the destination the whole package is lost.
I do not understand why at first 1 minute the packets are sent correctly and after a while the packets do not arrive more. So I wonder what would be the best approach to solve this problem?

It's exactly the problem you described. Each datagram you broadcast is split into 44 packets. If any one of those is lost, the datagram is lost. As soon as you have enough traffic to cause, say, 1% packet loss, you have 35% datagram loss. 2% packet loss equals 60% datagram loss.
You need to keep your broadcast datagrams small enough not to fragment. If you have a stream of 65,507 byte chunks such that you cannot change the fact that you must have the whole chunk for the data to be useful, then naive UDP broadcast was bad choice.
I'd have to know a lot more about the specifics of your application to make a sensible recommendation. But if you have a chunk of data around 64KB such that you need the whole chunk for the data to be useful, and you can't change that, then you should be using an approach that divides that data into pieces with some redundancy such that some pieces can be lost. With erasure coding, you can divide 65,507 bytes of data into 46 chunks, each 1,490 bytes, such that the original data can be reconstructed from any 44 chunks. This would tolerate moderate datagram loss with only about a 4% increase in data size.

TCP is used specifically instead of UDP when you need reliable and correctly ordered delivery. But assuming you really need UDP for broadcasting, you could:
debug the network to see how & where packets are lost, or maybe it is the receiver that is clogged/lagged. But often you don't have control over these things. Is a WiFi network involved? If so it's hard to get good QoS.
do something on the application layer to ensure ordering and reliable delivery. For example, SIP normally uses UDP, but the protocol uses transactions and sequence numbers so clients & servers will retransmit messages as needed.
implement packet loss concealment. Using maths, the receiver can recreate a lost packet, analogous to how a RAID disk setup can lose drives and still function.
That your setup works fine for a minute and then doesn't is a hint that there is either network congestion or software congestion on the broadcast or receiver side.
Can you do some packet captures with Wireshark and share the results?

I chose to UDP as my peer 2 peer service, and how can I prove it's reliable in my situation

I have two debian servers located on the same subnet. They are connected by a switch. I am aware the UDP is unreliable.
Question 1: I assume the link layer is ethernet. And MTU from a standard
Ethernet is 1500 bytes. However, when I did a ping from one server to
another, I found out that the maximum packet size can be sent is
65507. Shouldn't it be 1500 bytes? Can I say, because there's no router in between these two servers, therefore, the IP datagram will
not be fragmented.
Question 2: Because two servers are directly connected with a switch, can I
assume that all datagrams arrives in order and no loss on the path?
Question 3: How can I determine that the chances of datagram dropped
at the server because of buffer overflow. What size to set the receive buffer so that datagram will not overflow receive buffer.

No. UDP is not even reliable between processes on the same machine. If packets are sent to a socket without giving the receiver process time to read them, the buffer will overflow and packets will be lost.
You did your ping test with fragmentation enabled. Besides that, ping doesn't use UDP, but ICMP, so the results mean nothing. UDP packets smaller than the MTU will not be fragmented, but the MTU depends on more factors, such as IP options and VLAN headers, so it may not be greater than 1500.
No. Switches perform buffering, and it's possible for the internal buffers to overflow. Consider a 24 port switch where 23 nodes are all transmitting as fast as possible to the last node. Clearly the connection to the last node cannot handle the aggregate traffic of 23 other links, the switch will try to buffer packets but eventually end up dropping them.
Besides that, electrical noise can corrupt packets in transit, causing them to be discarded when the checksum fails.
To analyze the chance of buffer overflow, you could employ queuing theory to find the probability that a packet arrives when the buffer is full. You'll need some assumptions regarding the probability distribution on the rate of packet transmission and the processing time. The number of packets in the buffer then form a finite chain, hopefully Markov, which you can solve for the steady-state probabilities of each state in the chain. Good search keywords to find out more would be "queuing theory", "Markov chain", "call capacity", "circuit capacity", "load factor".
EDIT: You changed the title of the question. The answer to your new question is: "You can't prove something that isn't true." If you want to make a reliable application using UDP, you should add your own acknowledgement and loss handling logic.

The 64 KB maximum packet size is the absolute limit of the protocol, as opposed to the 1500 byte MTU you may have configured (the MTU can be changed easily, the 64 KB limit cannot).
In practice you will probably never see reordered datagrams in your scenario. And you'll probably only lose them if the receiving side is not processing them fast enough (or is shut off completely).
The "chances" of a datagram being dropped by the receiver is not something we can really quantify without knowing a whole lot more about your situation. If the receiver processes datagrams faster than the sender sends them, you're fine, otherwise you may lose some--know how many and exactly when is a considerably finer point.

The IP stack will fragment and defragment the packet for you. You can test this by setting the the no-fragment flag. The packet will be dropped.
No. They will most likely come in order, and probably not dropped, but the network stack, in your sender, router and receiver, are free to drop the packet if it can't handle it when it arrives. Also remember that when a large packet is fragmented, one lost fragment means that the whole packet will be dropped by the stack.
I guess you can probe by sending 1000 packets and measure loss, but historical values does not predict the future...

Question 1: You are confusing the MTU with the tcp maximum packet size see here
Question 2: Two servers connected via a switch does not guarantee datagrams arriving in order. There will be other network transmissions occurring that will interfere with the udp stream potentially causing out of sequence frames
Question 3: Answered by Ben Voigt above.

How to minimize UDP packet loss

I am receiving ~3000 UDP packets per second, each of them having a size of ~200bytes. I wrote a java application which listens to those UDP packets and just writes the data to a file. Then the server sends 15000 messages with previously specified rate. After writing to the file it contains only ~3500 messages. Using wireshark I confirmed that all 15000 messages were received by my network interface. After that I tried changing the buffer size of the socket (which was initially 8496bytes):
(java.net.MulticastSocket)socket.setReceiveBufferSize(32*1024);
That change increased the number of messages saved to ~8000. I kept increasing the buffer size up to 1MB. After that, number of messages saved reached ~14400. Increasing buffer size to larger values wouldn't increase the number of messages saved. I think I have reached the maximum allowed buffer size. Still, I need to capture all 15000 messages which were received by my network interface.
Any help would be appreciated. Thanks in advance.

Smells like a bug, most likely in your code. If the UDP packets are delivered over the network, they will be queued for delivery locally, as you've seen in Wireshark. Perhaps your program just isn't making timely progress on reading from its socket - is there a dedicated thread for this task?
You might be able to make some headway by detecting which packets are being lost by your program. If all the packets lost are early ones, perhaps the data is being sent before the program is waiting to receive them. If they're all later, perhaps it exits too soon. If they are at regular intervals there may be some trouble in your code which loops receiving packets. etc.
In any case you seem exceptionally anxious about lost packets. By design UDP is not a reliable transport. If the loss of these multicast packets is a problem for your system (rather than just a mystery that you'd like to solve for performance reasons) then the system design is wrong.

The problem you appear to be having is that you get delay writing to a file. I would read all the data into memory before writing to the file (or writing to a file in another thread)
However, there is no way to ensure 100% of packet are received with UDP without the ability to ask for packets to be sent again (something TCP does for you)

I see that you are using UDP to send the file contents. In UDP the order of packets is not assured. If you not worried about the order, you put all the packets in a queue and have another thread process the queue and write the contents to file. By this the socket reader thread is not blocked because of file operations.

The receive buffer size is configured at OS level.
For example on Linux system, sysctl -w net.core.rmem_max=26214400 as in this article
https://access.redhat.com/site/documentation/en-US/JBoss_Enterprise_Web_Platform/5/html/Administration_And_Configuration_Guide/jgroups-perf-udpbuffer.html

This is a Windows only answer, but the following changes in the Network Controller Card properties made a DRAMATIC difference in packet loss for our use-case.
We are consuming around 200 Mbps of UDP data and were experiencing substantial packet loss under moderate server load.
The network card in use is an Asus ROG Aerion 10G card, but I would expect most high-end network controller cards to expose similar properties. You can access them via Device Manager->Network card->Right-Click->Properties->Advanced Options.
1. Increase number of Receive Buffers:
Default value was 512; we could increase it up to 1024. In our case, higher settings were accepted, but the network card becomes disabled once we exceed 1024. Having a larger number of available buffers at the network-card level gives the system more tolerance to latency in transferring data from the network card buffers to the socket buffers where our apps finally can read the data.
2. Set Interrupt Moderation Rate to 'Off':
If I understood correctly, interrupt moderation coalesces multiple "buffer fill" notifications (via interrupts) into a single notification. So, the CPU will be interrupted less-often and fetch multiple buffers during each interrupt. This reduces CPU usage, but increases the chance a ready buffer is overwritten before being fetched, in case the interrupt is serviced late.
Additionally, we increased the socket buffer size (as the OP already did) and also enabled Circular Buffering at the socket level, as suggested by Len Holgate in a comment, this should also increase tolerance to latency in processing the socket buffers.

why does this Java programme cause UDP packet loss?

I'm running experiments on my machine A and B, both with Ubuntu Server 11.04 installed. A and B are connected to the same 1000M/bps switch.
A is the sender:
while (total<=10,000)
send_udp_datagramPacket(new byte[100]) to B
B is the receiver:
while(true)
receive()
But finally I got less than 10,000 (about 9960) at B. Why is this happening?
Where did the lost packets go? Were they not actually sent to the wire to the switch? Or the switch lost them? Or they indeed got to B, but B's OS discarded them? Or they reached to Java, but Java threw them away because of a full buffer?
Any reply would be appreciated.

Remember, UDP does not provide reliable communication, it is intended for situations in which data loss is acceptable (streaming media for instance). Chances are good that this is a buffer overflow (my guess, don't rely on it) but the point is that if this data loss is not acceptable, use TCP instead.
If this is just for the sake of experimentation, try adding a delay (Thread.sleep()) in the loop and increase it until you get the maximum received packets.
EDIT:
As mentioned in a comment, the sleep() is NOT a fix and WILL eventually loose packets...that's just UDP.

But finally I got less than 10,000 (about 9960) at B. Why is this happening?
UDP is a lossy protocol. Even if you got 10,000 in this test you would still have to code for the possibility that some packets will be lost. They can also be fragmented (if larger than 532 bytes) and/or arrive out of order.
Where did the lost package go?
They were dropped.
Were they not actually sent to the wire to the switch?
They can be dropped just about anywhere. I don't believe Java has any logic for dropping packets (but this to is not guaranteed in all implementations) It could be dropped by the OS, the network adapter, corrupted on the wire, dropped by the switch.
Or the switch lost them?
It will do this if the packet arrived corrupt in some way or a buffer filled.
Or they indeed got to B, but B's OS discarded them?
Yes, or A's OS could have discarded them.
Or they reached to Java, but Java threw them away because of a full buffer?
Java doesn't have its own buffers. It uses the underlying buffers from the OS. But the packets could be lost at this stage.
Note: No matter how much you decrease the packet loss, you must always allow for some loss.

Why does this Java programme cause UDP packet loss?
The question is ill-formed. Neither Java nor your program causes UDP packet loss. UDP causes UDP packet loss. There is no guarantee that any UDP packet will ever arrive. See RFC 768.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.