Dealing with UDP unreliability

Dealing with UDP unreliability - java

I'm using DatagramPacket and DatagramSocket classes in Java to send messages with UDP. I have incomplete knowledge about networking. I know that:
when a datagram is sent, it may in fact be split into multiple pieces of data travelling independently on the network (for example, if my datagram length is greater than MTU).
UDP does not guarantee the order of messages at receiving (and does not guarantee the receiving of messages at all).
Putting this information together, I "understand" that if I send one (large) DatagramPacket, I may receive the bytes of my datagram in any order (and some parts may even be missing)! But I think I misunderstood something because if it was the case, nobody would use such a protocol.
How can I ensure that the datagram I receive (if I receive it) is equal to the datagram I have sent?

Your understanding is incorrect. If your datagram is broken into fragments by IP (below the UDP layer) at the sending side, then IP at the receiver will reassemble those fragments in the correct order before passing the entire reassembled datagram up to the receiver's UDP layer. If any fragments of the datagram are lost then the reassembly will fail, the partially-reconstructed datagram will be discarded, and nothing will be passed up to the receiver's UDP layer. So the receiving UDP -- and therefore the receiving application -- gets either a complete datagram or nothing. It will never get a partial datagram, and it will never get a datagram whose content has been scrambled because of fragmentation.
The receiving application can be given a partial (truncated) datagram if the incoming datagram is larger than the application's receive buffer, but that has nothing to do with fragmentation.

Putting this information together, I "understand" that if I send one (large) DatagramPacket, I may receive the bytes of my datagram in any order
No.
(and some parts may even be missing)!
No.
You will receive a UDP datagram intact and entire or not at all.
But I think I misunderstood something because if it was the case, nobody would use such a protocol.
Correct. It isn't the case.
How can I ensure that the datagram I receive (if I receive it) is equal to the datagram I have sent?
It always is. If it arrives. However it may arrive zero, one, or more times, and it may arrive out of order.
The generally accepted maximum practical UDP datagram size is 534 bytes of payload. You are guaranteed that IP will not fragment that, either at the sender or at any intermediate host, and non-fragmentation decreases your chance of packet loss. (If any fragment is lost the datagram is lost, as stated by #ottomeister.)
If sequence is important to you, you need sequence numbers in your datagrams. This can also help to protect you against duplicates, as you know what sequence number you're up to so you can spot a duplicate.
If arrival is important to you, you need an ACK- or NACK-based application protocol.

Related

Nio Selector.select IO readiness

In java NIO, does Selector.select() guarantee that at least one entire UDP datagram content is available at the Socket Channel, or in theory Selector could wake when there is less then a datagram, say couple of bytes ?
What happens if transport protocol is TCP, with regards to Selector.select(), is there difference to UDP ?
From the API:
Selects a set of keys whose corresponding channels are ready for I/O operations.
It doesn't however specify what ready means.
So my questions:
how incoming datagrams/streams go from hardware to Java application Socket (Channels).
when using UDP or TCP client, should one assume that at least one datagram is received or Selector could wake when there is only a part of datagram available ?

It doesn't however specify what ready means.
So my questions:
how incoming packages/streams go from hardware to Java application Socket (Channels).
They arrive at the NIC where they are buffered and then passed to the network protocol stack and from there to the socket receive buffer. From there they are retrieved when you call read().
when using UDP or TCP client, should one assume that at least one package is received
You mean packet. Actually in the case of UDP you mean datagram. You can assume that an entire datagram has been received in the case of UDP.
or Selector could wake when there is only a part of [packet] available?
In the case of TCP you can assume that either at least one byte or end of stream is available. There is no such thing as a 'package' or 'packet' or 'message' at the TCP level.

java datagramchannel data loss

I've read some conflicting things about how UDP/Java datagram channels operate. I need to know a few things:
Does UDP have an inherit way to tell if the packet that is received whole, and in order, before .read(ByteBuffer b) is called? I've read in at least one article saying that UDP inherit'ly discards incomplete or out of order data.
Does datagramchannel treat one send(buffer.. ) as one datagram packet? what if its a partial send?
Can a .read(.. ) read more than one packet of data, resulting in data being discarded if the buffer being given as the commands argument was only designed to handle one packet of data?

Does UDP have an [inherent] way to tell if the packet that is received whole, and in order, before .read(ByteBuffer b) is called? I've read in at least one article saying that UDP inherit'ly discards incomplete or out of order data.
Neither statement is correct. It would be more accurate to say that IP has a way to tell if a datagram's fragments have all arrived, and then and only then does it even present it to UDP. Reassembly is the responsibility of the IP layer, not UDP. If the fragments don't arrive, UDP never even sees it. If they expire before reassembly is complete, IP throws them away.
Before/after read() is called is irrelevant.
Does datagramchannel treat one send(buffer.. ) as one datagram packet?
Yes.
what if it's a partial send?
There is no such thing in UDP.
Can a read(.. ) read more than one packet of data
A UDP read will return exactly and only one datagram, or fail.
resulting in data being discarded if the buffer being given as the commands argument was only designed to handle one packet of data?
Can't happen.
Re your comment below, which is about a completely different question, the usual technique for detecting truncation is to use a buffer one larger than the largest expected datagram. Then if you ever get a datagram that size, (i) it's an application protocol error, and (ii) it may have been truncated too.

Multicast data overhead?

My application uses multicast capability of UDP.
In short I am using java and wish to transmit all data using a single multicast address and port. Although the multicast listeners will be logically divided into subgroups which can change in runtime and may not wish to process data that comes from outside of their group.
To make this happen I have made the code so that all the running instances of application will join the same multicast group and port but will carefully observe the packet's sender to determine if it belongs to their sub-group.
Warning minimum packet size for my application is 30000-60000 bytes!!!!!
Will reading every packet using MulticastSocket.receive(DatagramPacket) and determining if its the required packet cause too much overhead (even buffer overflow).
Would it generate massive traffic leading to congestion in the network because every packet is sent to everyone ?

Every packet is not sent to everyone since multicast (e.g. PIM) would build a multicast tree that would place receivers and senders optimally. So, the network that would copy the packet as and when needed. Multicast packets are broadcasted (technically more accurate, flooded at Layer2) at the final hop. IGMP assists multicast at the last hop and makes sure that if there is no receiver joining in the last hop, then no such flooding is done.
"and may not wish to process data that comes from outside of their group." The receive call would return the next received datagram and so there is little one can do to avoid processing packets that are not meant for the subgroup classification. Can't your application use different multiple groups?

Every packet may be sent to everyone, but each one will only appear on the network once.
However unless this application is running entirely in a LAN that is entirely under your control including all routers, it is already wildly infeasible. The generally accepted maximum UDP datagram size is 534 once you go through a router you don't control.

UDP packets waiting and then arriving together

I have a simple Java program which acts as a server, listening for UDP packets. I then have a client which sends UDP packets over 3g.
Something I've noticed is occasionally the following appears to occur: I send one packet and seconds later it is still not received. I then send another packet and suddenly they both arrive.
I was wondering if it was possible that some sort of system is in place to wait for a certain amount of data instead of sending an undersized packet. In my application, I only send around 2-3 bytes of data per packet - although the UDP header and what not will bulk the message up a bit.
The aim of my application is to get these few bytes of data from A to B as fast as possible. Huge emphasis on speed. Is it all just coincidence? I suppose I could increase the packet size, but it just seems like the transfer time will increase, and 3g isn't exactly perfect.

Since the comments are getting rather lengthy, it might be better to turn them into an answer altogether.
If your app is not receiving data until a certain quantity is retrieved, then chances are, there is some sort of buffering going on behind the scenes. A good example (not saying this applies to you directly) is that if you or the underlying libraries are using InputStream.readLine() or InputStream.read(bytes), then it will block until it receives a newline or bytes number of bytes before returning. Judging by the fact that your program seems to retrieve all of the data when a certain threshold is reached, it sounds like this is the case.
A good way to debug this is, use Wireshark. Wireshark doesn't care about your program--its analyzing the raw packets that are sent to and from your computer, and can tell you whether or not the issue is on the sender or the receiver.
If you use Wireshark and see that the data from the first send is arriving on your physical machine well before the second, then the issue lies with your receiving end. If you're seeing that the first packet arrives at the same time as the second packet, then the issue lies with the sender. Without seeing the code, its hard to say what you're doing and what, specifically, is causing the data to only show up after receiving more than 2-3 bytes--but until then, this behavior describes exactly what you're seeing.

There are several probable causes of this:
Cellular data networks are not "always-on". Depending on the underlying technology, there can be a substantial delay between when a first packet is sent and when IP connectivity is actually established. This will be most noticeable after IP networking has been idle for some time.
Your receiver may not be correctly checking the socket for readability. Regardless of what high-level APIs you may be using, underneath there needs to be a call to select() to check whether the socket is readable. When a datagram arrives, select() should unblock and signal that the socket descriptor is readable. Alternatively, but less efficiently, you could set the socket to non-blocking and poll it with a read. Polling wastes CPU time when there is no data and delays detection of arrival for up to the polling interval, but can be useful if for some reason you can't spare a thread to wait on select().
I said above that select() should signal readability on a watched socket when data arrives, but this behavior can be modified by the socket's "Receive low-water mark". The default value is usually 1, meaning any data will signal readability. But if SO_RCVLOWAT is set higher (via setsockopt() or a higher-level equivalent), then readability will be not be signaled until more than the specified amount of data has arrived. You can check the value with getsockopt() or whatever API is equivalent in your environment.
Item 1 would cause the first datagram to actually be delayed, but only when the IP network has been idle for a while and not once it comes up active. Items 2 and 3 would only make it appear to your program that the first datagram was delayed: a packet sniffer at the receiver would show the first datagram arriving on time.

Where did datagram packet go when their destination is offline?

Are those packet simply disappear? or they waits for the destination? Or the packet go back then throws an exception?
And in java, what is the difference between the byte[] buffer with the length, in the DatagramPacket constructor?
DatagramPacket dp = new DatagramPacket(new byte[...], length);

From Wikipedia:
UDP is... Unreliable – When a message
is sent, it cannot be known if it will
reach its destination; it could get
lost along the way. There is no
concept of acknowledgment,
retransmission or timeout.

Even if the destination is online, there is no guarantee, the UDP packet will arrive, arrive in the order sent, or not be fragmented. (I believe packets smaller than 532 bytes will not be fragmented) It is possible to have all three; fragmented, out of order and incomplete for the same packet.
The simplicity and stability of your network will determine how robust UDP packet delivery is, but you have to assume it is unreliable at least some of the time. All you can do is minimise the loss.
It is up to you to decide what to do if a packet is lost and how to detect it.
If you want broadcast, reliable delivery of messages I suggest you look at JMS Topics or Queues, like ActiveMQ.

If using UDP protocol, you can't guarantee that your packet is going to be received.
So the answer is, it will be sent, even if its destination is not online.
TCP protocol, its guaranteed that costumer will receive the packet. Even if he is offline, once he get's online, that packet will be received.

Are those packet simply disappear? or they waits for the destination? Or the packet go back then throws an exception?
What happens depends on the nature of the "offline" status.
If the UDP message reaches the host, but the application is not listening, it will typically be silently discarded. It definitely won't be queued waiting for the application to listen. (That would be pointless, and potentially dangerous.)
If the UDP message cannot get to the host because the host itself is offline, the message will be silently discarded. (If the packets can reach the destination host's local network, then there is nothing apart from the host itself that can tell if the host actually received the packets.)
If the network doesn't know how to route the IP packets to the UDP server (and a few other scenarios), an ICMP "Destination Unreachable" packet may be sent to the sender, and that typically gets reported as a Java exception. However this is not guaranteed. So the possible outcomes are:
the UDP packet is black-holed and the sender gets no indication, or
the UDP packet is black-holed and the sender gets a Java exception.
If the UDP packet is blocked by a firewall, then the behaviour is hard to predict. (Firewalls often "lie" in their responses to unwanted traffic.)
The only situation where you would expect there to be queuing of UDP traffic is when the network is working, the host is working and the application is listening. Limited queuing is then possible if the application is slow in accepting the packets; i.e. it takes too long between successive calls to receive on the datagram socket. But even there, the queueing / buffering is strictly limited, and beyond that the messages will be dropped.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.