I have a device wich send TCP packets with 1440byte payload very fast to Windows XP os. I set the TcpAckFrequency to be 0, i.e. send ACK immediately back after a package received.
I wrote a Java application which read in a thread the socket with:
in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
in.read(charArray, 0, 720);
My problem is that Windows buffer become full after some (50-60)packet received, after that send a DUP ACK which slows down the communication.
I don't understand why become it full, as i always reading the buffer?
One possible explanation is that the Java application is not able to keep up. In other words, it can't process the data as fast at the device is sending it. If that is the problem, you have to come up with a solution that lets the application process the data faster.
If you have multiple cores, you may be able to get better thoughput by refactoring the Java application to use one thread to read the data from the socket, and another thread (or threads) to do the processing.
Another solution might simply be to profile the Java application to see if there is scope for tuning optimization.
Finally, you might be able to improve throughput by using a larger buffer size for the BufferedReader.
Related
I need to have lots of network connections open at the same time(!) and transfer data as fast as possible. Thousands of connections. Right now, I have one thread for each connection and reading charwise from the Inputstream of that connection.
And I have the strong suspicion that the CPU/switching between the thousands of threads might impose some performance problems here even though the servers are really slow (low two-digit KB/s), since I've observed that the throughput isn't even close to being proportional to the number of threads.
Therefore I'd like to ask some programmers experienced in parallel programming:
Is it worth rewriting the entire program so that one thread reads from multiple InputStreams in a round robin like fashion? Would that, if there is a speedup, be worth the programming? How many connections per thread? Or do you have another idea for reading really really fast from multiple network input streams?
If I don't read a char, will the server wait to send the next one until I do? What if my thread is sleeping?
reading charwise
You know data is transmitted in packets right? Reading a single character at a time is very inefficient. Each read has to traverse all the layers from your program to the network stack in the operating system. You should try to read one full segment of data at a time.
If I don't read a char, will the server wait to send the next one until I do? What if my thread is sleeping?
That's why the operating system has a buffer for incoming data, also called a window. When TCP segments arrive, they are put into the receive buffer. When your program requests to read from the socket, the operating system returns data from the receive buffer. If the receive buffer is full, the packet is lost and has to be sent again.
For more about how TCP works, see https://beej.us/guide/bgnet/
Wikipedia is pretty good but fairly dense
https://en.m.wikipedia.org/wiki/Transmission_Control_Protocol
Is it worth rewriting the entire program so that one thread reads from multiple InputStreams in a round robin like fashion? Would that, if there is a speedup, be worth the programming?
What you're describing would require moving from blocking I/O to non-blocking I/O. Non-blocking will require fewer system resources, but it is significantly harder to implement correctly and efficiently. So don't do it unless you have a pressing reason.
Thousands of threads (and stacks...) are probably too many for the OS scheduler, memory management units, caches...
You need just a few threads (one per CPU) and use a select()-based solution
on each of them.
Have a look at Selector, ServerSocketChannel and SocketChannel.
(see pages 30-31 of https://www.enib.fr/~harrouet/Data/Courses/Memo_Sockets.pdf)
Edit (after a question in the comments)
Selector is not just a clever algorithm encapsulated in a class.
It relies internally on the select() system-call (or equivalent,
there are many).
The operating system is aware of a set of file-descriptors (communication
means) it has to watch and, as soon as something happens on one (or
several) of them, it wakes up the process (or thread) which is blocked
on this selector.
The idea is to stay blocked as long as possible (to save resources) and
to be waken-up only on when something useful has to be done with incoming
(there are variants) data.
In your current implementation, you use thousands of threads which are
all blocked on a read()/recv() operation because you cannot know
beforehand which connection will be the next one to deliver something.
On the other hand, with a select()-based implementation, a single
thread can be blocked watching many connections at the same time
but will only react to handle the few ones which just delivered new
data.
So I suggest that you start a pool of few threads (one per CPU for example) and as soon as the main program accepts a new incoming
connection it chooses one of them (you can keep a count for each
of them) in order to make it in charge of this new connection.
All of this requires the proper synchronisation of course and probably
a trick (a special file descriptor in the selector for example) in
order to wake-up a blocked thread when it is assigned a new connection.
I am familiar with the concept of InputStream,buffers and why they are useful (when you need to work with data that might be larger then the machines RAM for example).
I was wondering though, how does the InputStream actually carry all that data?. Could a OutOfMemoryError be caused if there is TOO much data being transfered?
Case-scenario
If I connect from a client to a server,requesting a 100GB file, the server starts iterating through the bytes of the file with a buffer, and writing the bytes back to the client with outputStream.write(byte[]). The client is not ready to read the InputStream right now,for whatever reason. Will the server continue sending the bytes of the file indefinitely? and if so, won't the outputstream/inputstream be larger than the RAM of one of these machines?
InputStream and OutputStream implementations do not generally use a lot of memory. In fact, the word "Stream" in these types means that it does not need to hold the data, because it is accessed in a sequential manner -- in the same way that a stream can transfer water between a lake and the ocean without holding a lot of water itself.
But "stream" is not the best word to describe this. It's more like a pipe, because when you transfer data from a server to a client, every stage transfers back-pressure from the client that controls the rate at which data gets sent. This is similar to how your faucet controls the rate of flow through your pipes all the way to the city reservoir:
As the client reads data, it's InputStream only requests more data from the OS when its internal (small) buffers are empty. Each request allows only a limited amount of data to be transferred;
As data is requested from the OS, its own internal buffer empties, and it notifies the server about how much space there is for new data. The server can send only this much (that's called 'flow control' in TCP: https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Resource_usage)
On the server side, the server-side OS sends out data from its own internal buffer when the client has space to receive it. As its own internal buffer empties, it allows the writing process to re-fill it with more data.
As the server-side process write()s to its OutputStream, the OutputStream will try to write data to the OS. When the OS buffer is full, it will make the server process wait until the server-side buffer has space to accept new data.
Notice that a slow client can make the server process take a very long time. If you're writing a server, and you don't control the clients, then it's very important to consider this and to ensure that there are not a lot of server-side resources tied up while a long data transfer takes place.
Your question is as interesting as difficult to answer properly.
First: InputStream and OutputStream are not a storage means, but an access means: They describe that the data shall be accessed in sequential, unidirectional order, but not how it shall be stored. The actual way of storing the data is implementation-dependent.
So, would there be an InputStream that stores the whole amount of data simultaneally in memory? Yes, could be, though it would be an appalling implementation. The most common and sensitive implementation of InputStreams / OutputStreams is by storing just a fixed and short amount of data into a temporary buffer of 4K-8K, for example.
(So far, I supposed you already knew that, but it was necessary to tell.)
Second: What about connected writting / reading streams between a server and a client? In a common scenario of buffered writting, the server will not write more data than the buffer allows. So, if the server starts writing, and the client then goes down (for whatever reason), the server will just keep writing until the buffer is full, and then set it as ready for reading, and until the read is not completed (by the client peer), the server won't fill the buffer again. Remember: This kind of read/write is blocking: The client blocks until there is a buffer ready to be read, and the server blocks (or, at least, the server thread bound to this connection, it's understood) until the last read is completed.
How many time will the server block? Typically, a server should have a security timeout to ensure that long blocks will break the connection, thus releasing the blocked thread. The same should have the client.
The timeouts set for the connection depend on the implementation, and the protocol.
No, it does not need to hold all data. I just advances forward in the file (usually using buffered data). The stream can discard old buffers as it pleases.
Note that there are a a lot of very different implementations of inputstreams, so the exact behaviour varies a lot.
This question concerns the design of a Java application for reading and processing large amounts of data from a few dozen UDP sockets, but I think it is relevant for other languages and environments.
I've seen network applications like the one described above have dedicated thread(s) for reading data off the socket buffer as quickly as possible, requeuing it inside the application and then processing it in a separate thread.
Is there anything wrong with leaving the data in the socket buffer until your processing thread is ready to receive the next piece of data? Is there any advantage to reading the data quickly and requeuing inside the application?
If the processing logic is not fast enough, the buffers will fill up. But if the processing logic is too slow to handle the inbound data, it seems like it does not matter where the data is queued. In case of a sudden spike in inbound data, the socket buffers should be large enough to handle it.
The buffer size for received UDP packets in the network stack is limited. If the buffer is full, some packets will be lost.
If the software handling UDP packets know, that it may need some time before it is able to process the packet, it makes sense to read the packet as soon as possible, relieving the network stack buffer and rather implement your own buffer or queue for the packets, in which they can be cached until processing resources are actually available.
I have written a small program in Java which reads info from a serial port (some text messages and also some binary data, mixed), and then send it through a TCP socket.
At first I tried to get data into a buffer, and send everything when a '/n' arrived, but binary data doesn't work that way.
My second thought was sending the data all the time, byte by byte. But it gets too slow, and I guess I'm losing data while sending it, even thought I used threads.
Next idea would be sending data when I reach 100 bytes stored for example, but it is a dirty fix.
My serial data comes from a GPS receiver, and these devices usually send many sentences of data (text and binary) each second (or configurable to 10, 50, 100Hz, it depends of the model). Anyway, there is a gap of time where data is received, and a considerable larger gap of time when nothing is being received. I guess the fanciest way to get rid of my data would be detecting that gap and sending all the stored data then, before new data arrives when the gap ends.
How could I detect this kind of event?
EDIT: I use RXTX package to get rid of the serial port, and I'm working over Windows 7 64bits
To detegt the gap, just enable timeout with enableReceiveTimeout.
Just read and write byte[] buffers as you get them, making sure to only write the actual count returned by the read() method.
I am receiving ~3000 UDP packets per second, each of them having a size of ~200bytes. I wrote a java application which listens to those UDP packets and just writes the data to a file. Then the server sends 15000 messages with previously specified rate. After writing to the file it contains only ~3500 messages. Using wireshark I confirmed that all 15000 messages were received by my network interface. After that I tried changing the buffer size of the socket (which was initially 8496bytes):
(java.net.MulticastSocket)socket.setReceiveBufferSize(32*1024);
That change increased the number of messages saved to ~8000. I kept increasing the buffer size up to 1MB. After that, number of messages saved reached ~14400. Increasing buffer size to larger values wouldn't increase the number of messages saved. I think I have reached the maximum allowed buffer size. Still, I need to capture all 15000 messages which were received by my network interface.
Any help would be appreciated. Thanks in advance.
Smells like a bug, most likely in your code. If the UDP packets are delivered over the network, they will be queued for delivery locally, as you've seen in Wireshark. Perhaps your program just isn't making timely progress on reading from its socket - is there a dedicated thread for this task?
You might be able to make some headway by detecting which packets are being lost by your program. If all the packets lost are early ones, perhaps the data is being sent before the program is waiting to receive them. If they're all later, perhaps it exits too soon. If they are at regular intervals there may be some trouble in your code which loops receiving packets. etc.
In any case you seem exceptionally anxious about lost packets. By design UDP is not a reliable transport. If the loss of these multicast packets is a problem for your system (rather than just a mystery that you'd like to solve for performance reasons) then the system design is wrong.
The problem you appear to be having is that you get delay writing to a file. I would read all the data into memory before writing to the file (or writing to a file in another thread)
However, there is no way to ensure 100% of packet are received with UDP without the ability to ask for packets to be sent again (something TCP does for you)
I see that you are using UDP to send the file contents. In UDP the order of packets is not assured. If you not worried about the order, you put all the packets in a queue and have another thread process the queue and write the contents to file. By this the socket reader thread is not blocked because of file operations.
The receive buffer size is configured at OS level.
For example on Linux system, sysctl -w net.core.rmem_max=26214400 as in this article
https://access.redhat.com/site/documentation/en-US/JBoss_Enterprise_Web_Platform/5/html/Administration_And_Configuration_Guide/jgroups-perf-udpbuffer.html
This is a Windows only answer, but the following changes in the Network Controller Card properties made a DRAMATIC difference in packet loss for our use-case.
We are consuming around 200 Mbps of UDP data and were experiencing substantial packet loss under moderate server load.
The network card in use is an Asus ROG Aerion 10G card, but I would expect most high-end network controller cards to expose similar properties. You can access them via Device Manager->Network card->Right-Click->Properties->Advanced Options.
1. Increase number of Receive Buffers:
Default value was 512; we could increase it up to 1024. In our case, higher settings were accepted, but the network card becomes disabled once we exceed 1024. Having a larger number of available buffers at the network-card level gives the system more tolerance to latency in transferring data from the network card buffers to the socket buffers where our apps finally can read the data.
2. Set Interrupt Moderation Rate to 'Off':
If I understood correctly, interrupt moderation coalesces multiple "buffer fill" notifications (via interrupts) into a single notification. So, the CPU will be interrupted less-often and fetch multiple buffers during each interrupt. This reduces CPU usage, but increases the chance a ready buffer is overwritten before being fetched, in case the interrupt is serviced late.
Additionally, we increased the socket buffer size (as the OP already did) and also enabled Circular Buffering at the socket level, as suggested by Len Holgate in a comment, this should also increase tolerance to latency in processing the socket buffers.