so this is basically my situation.
I have the server and client both connected and running. The way it works is the server sends information to the client, then WAITS until it receives a response from the client, then the server sends information once again to the client, and this process repeats again and again.
This means they are basically just communicating back in forth. The data they are sending is volatile, sometimes it's large, others small. Needless to say even if it is very small I still flush it out, because the other cannot perform until it receives it.
I was wondering if I should use smaller or larger buffer sizes for sending/receiving. This is because I feel like it may be a waste if I have a huge buffer size like 65536, then flush out something as small as "Hello", but then again sometimes the data can be as large as 100 megabytes.
Anybody have any clue on what I should do?
Is it waste to have a large buffer size if I flush out a small amount of data?
Is there a way to adjust buffer size depending on the data being sent?
Thanks! I appreciate all/any responses.
Related
I am familiar with the concept of InputStream,buffers and why they are useful (when you need to work with data that might be larger then the machines RAM for example).
I was wondering though, how does the InputStream actually carry all that data?. Could a OutOfMemoryError be caused if there is TOO much data being transfered?
Case-scenario
If I connect from a client to a server,requesting a 100GB file, the server starts iterating through the bytes of the file with a buffer, and writing the bytes back to the client with outputStream.write(byte[]). The client is not ready to read the InputStream right now,for whatever reason. Will the server continue sending the bytes of the file indefinitely? and if so, won't the outputstream/inputstream be larger than the RAM of one of these machines?
InputStream and OutputStream implementations do not generally use a lot of memory. In fact, the word "Stream" in these types means that it does not need to hold the data, because it is accessed in a sequential manner -- in the same way that a stream can transfer water between a lake and the ocean without holding a lot of water itself.
But "stream" is not the best word to describe this. It's more like a pipe, because when you transfer data from a server to a client, every stage transfers back-pressure from the client that controls the rate at which data gets sent. This is similar to how your faucet controls the rate of flow through your pipes all the way to the city reservoir:
As the client reads data, it's InputStream only requests more data from the OS when its internal (small) buffers are empty. Each request allows only a limited amount of data to be transferred;
As data is requested from the OS, its own internal buffer empties, and it notifies the server about how much space there is for new data. The server can send only this much (that's called 'flow control' in TCP: https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Resource_usage)
On the server side, the server-side OS sends out data from its own internal buffer when the client has space to receive it. As its own internal buffer empties, it allows the writing process to re-fill it with more data.
As the server-side process write()s to its OutputStream, the OutputStream will try to write data to the OS. When the OS buffer is full, it will make the server process wait until the server-side buffer has space to accept new data.
Notice that a slow client can make the server process take a very long time. If you're writing a server, and you don't control the clients, then it's very important to consider this and to ensure that there are not a lot of server-side resources tied up while a long data transfer takes place.
Your question is as interesting as difficult to answer properly.
First: InputStream and OutputStream are not a storage means, but an access means: They describe that the data shall be accessed in sequential, unidirectional order, but not how it shall be stored. The actual way of storing the data is implementation-dependent.
So, would there be an InputStream that stores the whole amount of data simultaneally in memory? Yes, could be, though it would be an appalling implementation. The most common and sensitive implementation of InputStreams / OutputStreams is by storing just a fixed and short amount of data into a temporary buffer of 4K-8K, for example.
(So far, I supposed you already knew that, but it was necessary to tell.)
Second: What about connected writting / reading streams between a server and a client? In a common scenario of buffered writting, the server will not write more data than the buffer allows. So, if the server starts writing, and the client then goes down (for whatever reason), the server will just keep writing until the buffer is full, and then set it as ready for reading, and until the read is not completed (by the client peer), the server won't fill the buffer again. Remember: This kind of read/write is blocking: The client blocks until there is a buffer ready to be read, and the server blocks (or, at least, the server thread bound to this connection, it's understood) until the last read is completed.
How many time will the server block? Typically, a server should have a security timeout to ensure that long blocks will break the connection, thus releasing the blocked thread. The same should have the client.
The timeouts set for the connection depend on the implementation, and the protocol.
No, it does not need to hold all data. I just advances forward in the file (usually using buffered data). The stream can discard old buffers as it pleases.
Note that there are a a lot of very different implementations of inputstreams, so the exact behaviour varies a lot.
I have written a small program in Java which reads info from a serial port (some text messages and also some binary data, mixed), and then send it through a TCP socket.
At first I tried to get data into a buffer, and send everything when a '/n' arrived, but binary data doesn't work that way.
My second thought was sending the data all the time, byte by byte. But it gets too slow, and I guess I'm losing data while sending it, even thought I used threads.
Next idea would be sending data when I reach 100 bytes stored for example, but it is a dirty fix.
My serial data comes from a GPS receiver, and these devices usually send many sentences of data (text and binary) each second (or configurable to 10, 50, 100Hz, it depends of the model). Anyway, there is a gap of time where data is received, and a considerable larger gap of time when nothing is being received. I guess the fanciest way to get rid of my data would be detecting that gap and sending all the stored data then, before new data arrives when the gap ends.
How could I detect this kind of event?
EDIT: I use RXTX package to get rid of the serial port, and I'm working over Windows 7 64bits
To detegt the gap, just enable timeout with enableReceiveTimeout.
Just read and write byte[] buffers as you get them, making sure to only write the actual count returned by the read() method.
I am receiving ~3000 UDP packets per second, each of them having a size of ~200bytes. I wrote a java application which listens to those UDP packets and just writes the data to a file. Then the server sends 15000 messages with previously specified rate. After writing to the file it contains only ~3500 messages. Using wireshark I confirmed that all 15000 messages were received by my network interface. After that I tried changing the buffer size of the socket (which was initially 8496bytes):
(java.net.MulticastSocket)socket.setReceiveBufferSize(32*1024);
That change increased the number of messages saved to ~8000. I kept increasing the buffer size up to 1MB. After that, number of messages saved reached ~14400. Increasing buffer size to larger values wouldn't increase the number of messages saved. I think I have reached the maximum allowed buffer size. Still, I need to capture all 15000 messages which were received by my network interface.
Any help would be appreciated. Thanks in advance.
Smells like a bug, most likely in your code. If the UDP packets are delivered over the network, they will be queued for delivery locally, as you've seen in Wireshark. Perhaps your program just isn't making timely progress on reading from its socket - is there a dedicated thread for this task?
You might be able to make some headway by detecting which packets are being lost by your program. If all the packets lost are early ones, perhaps the data is being sent before the program is waiting to receive them. If they're all later, perhaps it exits too soon. If they are at regular intervals there may be some trouble in your code which loops receiving packets. etc.
In any case you seem exceptionally anxious about lost packets. By design UDP is not a reliable transport. If the loss of these multicast packets is a problem for your system (rather than just a mystery that you'd like to solve for performance reasons) then the system design is wrong.
The problem you appear to be having is that you get delay writing to a file. I would read all the data into memory before writing to the file (or writing to a file in another thread)
However, there is no way to ensure 100% of packet are received with UDP without the ability to ask for packets to be sent again (something TCP does for you)
I see that you are using UDP to send the file contents. In UDP the order of packets is not assured. If you not worried about the order, you put all the packets in a queue and have another thread process the queue and write the contents to file. By this the socket reader thread is not blocked because of file operations.
The receive buffer size is configured at OS level.
For example on Linux system, sysctl -w net.core.rmem_max=26214400 as in this article
https://access.redhat.com/site/documentation/en-US/JBoss_Enterprise_Web_Platform/5/html/Administration_And_Configuration_Guide/jgroups-perf-udpbuffer.html
This is a Windows only answer, but the following changes in the Network Controller Card properties made a DRAMATIC difference in packet loss for our use-case.
We are consuming around 200 Mbps of UDP data and were experiencing substantial packet loss under moderate server load.
The network card in use is an Asus ROG Aerion 10G card, but I would expect most high-end network controller cards to expose similar properties. You can access them via Device Manager->Network card->Right-Click->Properties->Advanced Options.
1. Increase number of Receive Buffers:
Default value was 512; we could increase it up to 1024. In our case, higher settings were accepted, but the network card becomes disabled once we exceed 1024. Having a larger number of available buffers at the network-card level gives the system more tolerance to latency in transferring data from the network card buffers to the socket buffers where our apps finally can read the data.
2. Set Interrupt Moderation Rate to 'Off':
If I understood correctly, interrupt moderation coalesces multiple "buffer fill" notifications (via interrupts) into a single notification. So, the CPU will be interrupted less-often and fetch multiple buffers during each interrupt. This reduces CPU usage, but increases the chance a ready buffer is overwritten before being fetched, in case the interrupt is serviced late.
Additionally, we increased the socket buffer size (as the OP already did) and also enabled Circular Buffering at the socket level, as suggested by Len Holgate in a comment, this should also increase tolerance to latency in processing the socket buffers.
I'm writing a Java client application that will consume high rate UDP data and I want to minimize packet loss at the host/application layer (I understand there may be unavoidable loss in the network layer).
What is a reasobaly high Buffer Size (MulticastSocket.setReceiverBufferSize())?
What is the ideal DatagramPacket buffer size? Is there a downside to using 64k?
I have very limited insight into the network topology and the sender application. This is running on Linux. TCP is not an option.
What is a reasobaly high Buffer Size (MulticastSocket.setReceiverBufferSize())?
Figure out how much your application might jitter and the rate of data you need to receive. e.g. if your application pauses to do something for 0.5 seconds (like garbage collection), and you're receiving data at 10MB/sec, you'd need a buffer of 5MB to make up for not receiving data for those 0.5 seconds.
Note that you might need to tune the net.core.rmem_max sysctl on linux to be allowed to set the buffers to the desired size(iirc you actually only get half the size of what you specify in the sysctl) , the default net.core.rmem_max might be rather low.
What is the ideal DatagramPacket buffer size? Is there a downside to using 64k?
The ideal is that of the MTU of your network, for normal ethernet, that means an UDP payload of 1472 bytes. Anything bigger is a bad idea, as it causes fragmented IP packet - IP fragmentation is generally considered a bad thing, as it causes more overhead and can cause more lost data.
Sockets end and receive buffers can be as large as you like, a megabyte or two if you want.
The maximum practical datagram size via a router is 534 bytes.
Does anyone have any tips on how to calculate the bandwidth usage of a socket?
For example, as I send data over a socket to the server I am connected to, I want to show the Kb/s that is being sent.
Google search didn't reveal anything useful. Maybe I'm searching the wrong terms.
The best you're probably going to be able to easily do is to record when you start writing and then count bytes you've successfully sent to the Socket.getOutputStream.write() method. For a small amount of data, that will be very inaccurate as it's just filling up the OS's transmission buffer which will initially take bytes much faster than it actually sends them.
It should amortize to essentially the correct rate over a fairly large amount of data, however.