Netty Adaptive UDP Multicast Support

Netty Adaptive UDP Multicast Support - java

Newbies having trouble processing a UDP video stream using Netty 3.2.4. On different machines we see dropped bytes etc using Netty. We have a little counter after Netty gets the bytes in, to see how many bytes are received. The variance is more than what just what UDP unreliability would account for. In our case, we also save the bytes to a file to playback the video. Playing the video in VLC really illustrates the dropped bytes. (Packet sizes being sent were around 1000 Bytes).
Questions
Are our assumptions of the Netty API correct in terms of not being able to use the AdaptiveReceiveBufferSizePredictor for UDP stream listener?
Is there a better explanation of the behavior we're seeing?
Is there a better solution? Is there a way to use an adaptive predictor with UDP?
What We've Tried
...
DatagramChannelFactory datagramChannelFactory = new OioDatagramChannelFactory(
Executors.newCachedThreadPool());
connectionlessBootstrap = new ConnectionlessBootstrap(datagramChannelFactory);
...
datagramChannel = (DatagramChannel) connectionlessBootstrap.bind(new
InetSocketAddress(multicastPort));
datagramChannel.getConfig().setReceiveBufferSizePredictor(new
FixedReceiveBufferSizePredictor(2*1024*1024));
...
From documentation and Google searches, I think the correct way to do this is to use a OioDatagramChannelFactory instead of a NioDatagramChannelFactory.
Additionally, while I couldn't find it explicity stated, you can only use a FixedReceiveBufferSizePredictor with the OioDatagramChannelFactory (vs AdaptiveReceiveBufferSizePredictor). We found this out by looking at the source code and realizing that the AdaptiveReceiveBufferSizePredictor's previousReceiveBufferSize() method was not being called from the OioDatagramWorker class (whereas it was called from the NioDatagramWorker)
So, we originally set the FixedReceivedBufferSizePredictor to (2*1024*1024)
Observed Behavior
Running on different machines(different processing power) we're seeing a different number of bytes being taken in by Netty. In our case, we are streaming video via UDP and we are able to use the playback of the streamed bytes to diagnose the quality of the bytes read in (Packet sizes being sent were around 1000 Bytes).
We then experimented with different buffer sizes and found that 1024*1024 seemed to make things work better...but really have no clue why.
In looking at how FixedReceivedBufferSizePredictor works, we realized that it simply creates a new buffer each time a packet comes in. In our case it would create a new buffer of 2*1024*1024 Bytes whether the packet was 1000 Bytes or 3 MB. Our packets were only 1000 Bytes, so we we didn't think that was our problem. Could any of this logic in here be causing a performance problem? For example, the creation of the buffer each time a packet comes in?
Our Workaround
We then thought about ways to make the buffer size dynamic but realized we couldn't use the AdaptiveReceiveBufferSizePredictor as noted above. We experimented and created our own MyAdaptiveReceiveBufferSizePredictor along with the accompanying MyOioDatagramChannelFactory, *Channel, *ChannelFactory, *PipelineSink, *Worker classes (that eventually call the MyAdaptiveReceiveBufferSizePredictor). The predictor simply changes the buffer size to double the buffer size based on the last packet size or reduce it. This seemed to improve things.

Not right sure what causes your performance issues but I found this thread.
It might be caused by the creation of ChannelBuffers for each incoming packet in which case you'll have to wait for Milestone 4.0.0.

Related

Queuing byte data in java

I want to stream data over network continuously. The source gives me a byte array that I'd want to store in a data structure which serves as buffer to compensate for any network lags.
What is the most efficient data structure to store the bytes in a queue fashion. Think of it as a pipe where one thread pumps in the data and other one reads and sends it over the network, while the pipe itself is long enough to contain multiple frames of the input data.
Is Queue efficient enough?

A Queue would not be efficient if you put bytes in one at a time. It would eat lots of memory, create GC pressure, and slow things down.
You could make the overhead of Queues reasonable if you put reasonably-sized (say 64kB) byte[]s or ByteBuffers in them. That buffer size could be tunable and changed based on performance experiments or perhaps even be adaptive at runtime.
TCP already compensates for network lags. If you are using UDP then you will need to handle congestion properly or things will go badly. In practice using TCP or UDP directly creates a lot of extra work and reinvention of wheels.
ZeroMQ (or the pure Java JeroMQ) is a good library option with an efficient wire protocol (good enough for realtime stock trading platforms). It handles the queueing transparently and gives a lot of options for different client models including things like PUB SUB that would help if you have lots of clients on a broadcast. Within a process ZeroMQ can manage the queueing of data being producuers and consumers. You could even use it to efficiently broadcast the same bytes to workers that do independent things with the same stream (ex: one doing usage metering and another doing transcoding).
There are other libraries that may also work. I think Netty handles things like this efficiently for example.

You should look into the OKIO libraray

Socket communication between Java and C: Good buffer size

I have to implement a socket communicatio between a Server written in Java and a Client written in C.
The maximum amount of data that I will have to transmit is 64KB.
In the most socket communication tutorials they are working with buffer sizes of about 1024 Byte or less.
Is it a (maybe performance) problem to set the buffer to 64KB?
The two software parts will run on the same machine or at least in the same local area network.
And if it is a problem: How to handle messages that are bigger than buffer in general?

The buffer can be smaller than the messages without any problem while the receiver consumes the data as fast as the sender generates it. A bigger buffer lets your receiver to have more time to process the message, but usually you don't need a giant buffer: for example, when you download software the size of a file can be more than 1GB, but your browser/ftp client just reads the buffer and stores the data in a file in your local hard disk.
And in general, you can ignore the language used to create the client or the server, only the network protocol matters. Every language has its own libraries to handle sockets with ease.

I suggest a larger buffer but I suspect you see less than 5% difference whether you use 1 KB or 64 KB.
Note: b = bit and B = byte, k = 1000 and K = 1024 and it is best not to get the confused (not that it is likely to matter here)

How does TCP_NODELAY affect consecutive write() calls?

TCP_NODELAY is an option to enable quick sending of TCP packets, regardless of their size. This is very useful option when speed matters, however, I'm curious what it will do to this:
Socket socket = [some socket];
socket.setTcpNoDelay(true);
OutputStream out = socket.getOutputStream();
out.write(byteArray1);
out.write(byteArray2);
out.write(byteArray3);
out.flush();
I tried to find what flush actually does on a SocketOutputStream, but as far as I know now, it doesn't do anything. I had hoped it would tell the socket "send all your buffered data NOW", but, unfortunately, no it doesn't.
My question is: are these 3 byte arrays sent in one packet? I know you don't have much control over how TCP constructs a network packet, but is there any way to tell the socket to (at least try to) pack these byte arrays, so network overhead is avoided? Could manually packing the byte arrays and sending them in one call to write help?

My question is: are these 3 byte arrays sent in one packet?
As you have disabled the Nagle algorithm, almost certainly not, but you can't be 100% sure.
I know you don't have much control over how TCP constructs a network packet, but is there any way to tell the socket to (at least try to) pack these byte arrays
Yes. Don't disable the Nagle algorithm.
so network overhead is avoided? Could manually packing the byte arrays and sending them in one call to write help?
Yes, or simpler still just wrap the socket output stream in a BufferedOutputStream and call flush() when you want the data to be sent, as per your present code. You are correct that flush() does nothing on a socket output stream, but it flushes a BufferedOutputStream.

Could manually packing the byte arrays and sending them in one call to write help?
Yes, send them all in a single call to write. This will maximize the chances that your bytes will be sent in a single packet.
Of course, you can never know, since there are too many variables involved - different OSs and lots of different networking gear between you and your peer, but if you give the OS the ability to pack everything together, it will generally try to.
If you disable nagle and make seperate system calls (remember, the OS controls the sockets, not your appliation or java), you're asking the OS to send them individually. The OS has no idea you're about to call write again with some more data.

Java DatagramPacket (UDP) maximum send/recv buffer size

In Java when using DatagramPacket suppose you have a byte[1024*1024] buffer. If you just pass that for the DatagramPacket when sending/receiving will a Java receive call for the DatagramPacket block until it has read the entire megabyte?
I'm asking if Java will split it up or just try to send the entire thing which gets dropped.
Normally the size limit is around 64KB for a UDP packet, but I wondered since Java's API allow for byte arrays if that is a limit and something super huge is dropped or split up and reassembled for you.
If it is dropped what API call would tell me the maximum data payload I can use in the Java call? I've heard that IPv6 also has jumbo frames, but does DatagramPacket (or DatagramSocket) support that since UDP defines the header spec?

DatagramPacket is just a wrapper on a UDP based socket, so the usual UDP rules apply.
64 kilobytes is the theoretical maximum size of a complete IP datagram, but only 576 bytes are guaranteed to be routed. On any given network path, the link with the smallest Maximum Transmit Unit will determine the actual limit. (1500 bytes, less headers is the common maximum, but it is impossible to predict how many headers there will be so its safest to limit messages to around 1400 bytes.)
If you go over the MTU limit, IPv4 will automatically break the datagram up into fragments and reassemble them at the end, but only up to 64 kilobytes and only if all fragments make it through. If any fragment is lost, or if any device decides it doesn't like fragments, then the entire packet is lost.
As noted above, it is impossible to know in advance what the MTU of path will be. There are various algorithms for experimenting to find out, but many devices do not properly implement (or deliberately ignore) the necessary standards so it all comes down to trial and error. Or you can just guess 1400 bytes per message.
As for errors, if you try to send more bytes than the OS is configured to allow, you should get an EMSGSIZE error or its equivalent. If you send less than that but more than the network allows, the packet will just disappear.

java.net.DatagramPacket buffer max size is 65507.
See
https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure
Maximum Transmission Unit (MTU) size varies dependent on implementation but is arguably irrelevant to the basic question "Java DatagramPacket (UDP) maximum send/recv buffer size" as the MTU is transparent to the java.net.DatagramPacket layer.

# Mihai Danila. Because I couldn't add a comment to the above answer, that's why writing into reply section.
In continuation of your answer on MTU size, in my practice, I try to use NetworkInterface.getMTU()-40 for setting the buffer size of DatagramSocket.setSendBufferSize(). So, trying not to rely on getSendBufferSize() This is to make sure it matches different window sizes on different platforms and is universally acceptable on ethernet (ignoring dial-up for a moment). I haven't hardcoded it to 1460 bytes (1500-20-20) because on windows, the MTU size is universally 1500. However, windows platform's own window size is 8192 bytes, but I believe, by setting the SO_SNDBUF to < MTU, I am burdening the network/IP layer less, and for all the hops for routers and receivers, some overheads. Thus, reducing some latency over the network.
Similarly, for the receive buffer, I am using a max of 64K or 65535 bytes. This way my program is portable on different platforms using different window sizes.
Do you think it sounds OK? I have not implemented any tools to measure any differences but assuming that its the case based on what's out there.

How to minimize UDP packet loss

I am receiving ~3000 UDP packets per second, each of them having a size of ~200bytes. I wrote a java application which listens to those UDP packets and just writes the data to a file. Then the server sends 15000 messages with previously specified rate. After writing to the file it contains only ~3500 messages. Using wireshark I confirmed that all 15000 messages were received by my network interface. After that I tried changing the buffer size of the socket (which was initially 8496bytes):
(java.net.MulticastSocket)socket.setReceiveBufferSize(32*1024);
That change increased the number of messages saved to ~8000. I kept increasing the buffer size up to 1MB. After that, number of messages saved reached ~14400. Increasing buffer size to larger values wouldn't increase the number of messages saved. I think I have reached the maximum allowed buffer size. Still, I need to capture all 15000 messages which were received by my network interface.
Any help would be appreciated. Thanks in advance.

Smells like a bug, most likely in your code. If the UDP packets are delivered over the network, they will be queued for delivery locally, as you've seen in Wireshark. Perhaps your program just isn't making timely progress on reading from its socket - is there a dedicated thread for this task?
You might be able to make some headway by detecting which packets are being lost by your program. If all the packets lost are early ones, perhaps the data is being sent before the program is waiting to receive them. If they're all later, perhaps it exits too soon. If they are at regular intervals there may be some trouble in your code which loops receiving packets. etc.
In any case you seem exceptionally anxious about lost packets. By design UDP is not a reliable transport. If the loss of these multicast packets is a problem for your system (rather than just a mystery that you'd like to solve for performance reasons) then the system design is wrong.

The problem you appear to be having is that you get delay writing to a file. I would read all the data into memory before writing to the file (or writing to a file in another thread)
However, there is no way to ensure 100% of packet are received with UDP without the ability to ask for packets to be sent again (something TCP does for you)

I see that you are using UDP to send the file contents. In UDP the order of packets is not assured. If you not worried about the order, you put all the packets in a queue and have another thread process the queue and write the contents to file. By this the socket reader thread is not blocked because of file operations.

The receive buffer size is configured at OS level.
For example on Linux system, sysctl -w net.core.rmem_max=26214400 as in this article
https://access.redhat.com/site/documentation/en-US/JBoss_Enterprise_Web_Platform/5/html/Administration_And_Configuration_Guide/jgroups-perf-udpbuffer.html

This is a Windows only answer, but the following changes in the Network Controller Card properties made a DRAMATIC difference in packet loss for our use-case.
We are consuming around 200 Mbps of UDP data and were experiencing substantial packet loss under moderate server load.
The network card in use is an Asus ROG Aerion 10G card, but I would expect most high-end network controller cards to expose similar properties. You can access them via Device Manager->Network card->Right-Click->Properties->Advanced Options.
1. Increase number of Receive Buffers:
Default value was 512; we could increase it up to 1024. In our case, higher settings were accepted, but the network card becomes disabled once we exceed 1024. Having a larger number of available buffers at the network-card level gives the system more tolerance to latency in transferring data from the network card buffers to the socket buffers where our apps finally can read the data.
2. Set Interrupt Moderation Rate to 'Off':
If I understood correctly, interrupt moderation coalesces multiple "buffer fill" notifications (via interrupts) into a single notification. So, the CPU will be interrupted less-often and fetch multiple buffers during each interrupt. This reduces CPU usage, but increases the chance a ready buffer is overwritten before being fetched, in case the interrupt is serviced late.
Additionally, we increased the socket buffer size (as the OP already did) and also enabled Circular Buffering at the socket level, as suggested by Len Holgate in a comment, this should also increase tolerance to latency in processing the socket buffers.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.