Sockets (in Java). Splitting byte chunks

Sockets (in Java). Splitting byte chunks - java

Lets say you have a continious binary stream of data. And each of data's pieces should be somehow split. What is the best way to do it?
Socket.read(byte[] arr) doesnt guarantee that you will recieve exactly the same ammount of bytes as you sent using Socket.write(byte[] arr) arr may be split (out of 10 bytes you first read 8 and then 2) or spliced.
One of the ways of solving this is specifying incoming byte array's size first. Read exactly 4 bytes, convert them into an Integer x, then read x bytes. But this only works in TCP and may completely mess everything up if just one time you will send wrong byte array size
Another one I can think of is prefixing chunks of data with a pseudo random byte sequences. Initialize Random on client and server with the seed and use its random.nextBytes(byte[] arr) for prefixes. The downside of it is that in order to make sure that there is a very little possibility of having a random sequence in an actual data chunk you have to make it pretty long. That will add up a lot of useless traffic. And again it is not a way out in UDP sockets.
So what are other good ways of doing this and are there any simple libraries which would allow me to simply do conn.sendDataChunk(byte[] arr) ?

One of the ways of solving this is specifying incoming byte array's size first. Read exactly 4 bytes, convert them into an Integer x, then read x bytes.
Yep, that's exactly what you should do. In other words, you're adding a message header before each message. This is a practical necessity when you want to layer a message-based network protocol atop of a stream-based one that has no concept of message boundaries. (TCP purposefully obscures the IP packet boundaries.)
You could also use this as an opportunity to add other fields to the message header, such as a message ID number to help you distinguish between different message types.
But this only works in TCP and may completely mess everything up if just one time you will send wrong byte array size.
This is true. So don't send a malformed header!
In all seriousness, a header with a length field is standard practice. It's a good idea to add sanity checks on the length: make sure it's not too large (or negative) so you don't end up allocating 2GB of memory to read the next message.
Also, don't assume you can read the whole header in with a single read() either. It may take multiple reads to get the whole header.

Regarding UDP sockets: reading a UDP socket is not the same as reading a TCP socket. You either get the UDP packet or you do not. From UDP:
Datagrams – Packets are sent individually and are checked for integrity only if they arrive. Packets have definite boundaries which are honored upon receipt, meaning a read operation at the receiver socket will yield an entire message as it was originally sent.
So for UDP, you do not need to worry about reading an incorrect number of bytes. But you do have to worry about what happens if some data does not arrive, or if it arrives in a different order than it was sent.

Related

Netty Unable to receive whole packet

It seems that I can't figure out, why am I getting first packet splitted, and the rest of the packets I receive as one.
The first received thing is IMEI (17 bytes), the netty server sends back 01 respond and starts sending the packets, which I respond to them with another respond.
But why do I keep getting first packet in two parts ? While others are being send as one (which is ok). It always receives up to 1024 bytes and then the rest of 251 bytes . The whole package is up to 1275 bytes..

Generally speaking there is not guarantee if a packet is split or not when using TCP. So you can not make any assumptions on this.
That said what you see may be the result of using AdaptiveRecvByteBufAllocator (which is the default) as it starts with small allocation sizes and then increase these if needed.
You could use a different RecvByteBufAllocator if you want to change the behaviour. But again this is nothing you can depend on.

Why use a LengthFieldPrepender/LengthFieldBasedFrameDecoder

I thought about this for sometime now, why should I use a LengthFieldPrepender and LengthFieldBasedFrameDecoder in a TCP connection?
I don’t get the reason my only idea was to ensure that the data is transferred correctly and check the length but if my understanding of TCP is correct TCP itself should handle that the data is transferred correctly.

TCP is a stream protocol. It is up to the application to frame the data, i.e. determine where a unit of data - a packet or a message - starts and where it ends. The two basic methods to achieve this reliably are either to prepend the length of the message or to append a delimiter. There are many ways to encode the prepended length and there are many possibilities for a delimiter. The TCP protocol does not guarantee that the data that was sent by means of a single write will be received by a single read, although this is often the case for short messages.

You're correct that the TCP protocol should handle that the data is transferred correctly. Though in many cases, the amount of data being sent is not known at the time of reception (i.e. a variable amount).
To solve this, the length of the data being sent is added to the header of the packet. If we know that the length is of a fixed size n, then we can read n bytes, then the next length bytes which holds the data.

udp file transfer project - is error checking necessary?

I have been given the classical task of transferring files using UDP. On different resources, I have read both checking for errors on the packets (adding CRC alongside data to packets) is necessary AND UDP already checks for corrupted packets and discards them, so I only need to worry about resending dropped packets.
Which one of them is correct? Do I need to manually perform an integrity check on the arrived packets or incorrect ones are already discarded?
Language for the project is Java by the way.
EDIT: Some sources (course books, internet) say checksum only covers the header, therefore ensures sender and receiver IP's are correct etc.. Some sources say checksum also covers the data segment. Some sources say checksum may cover data segment BUT it's optional and decided by the OS.
EDIT 2: Asked my professors and they say UDP error checking on data segment is optional in IPv4, defauld in IPv6. But I still don't know if it's in programmer's control, or OS's, or another layer...

First fact:
UDP has a 16 bit checksum field starting at bit 40 of the packet header. This suffers from (at least) 2 weaknesses:
Checksum is not mandatory, all bits set to 0 are defined as "No checksum"
it is a 16 bit check-sum in the strict sense of the word, so it is susceptible to undetected corruption.
Together this means, that UDP's built-in checksum may or may not be reliable enough, depending on your environment.
Second fact:
An even more realistic threat than data courruption along the transport is packet loss reordering: USP makes no guarantees about
all packets to (eventually) arrive at all
packets to arrive in the same sequence as sent
indeed UDP has no built-in mechanism at all to deal with payloads bigger than a single packet, stemming from the fact, that it wasn't built for that.
Conclusion:
Appending packet after packet as received without additional measures is bound to produce a receive stream differing from the send stream in all but the very favourablest environments., making it a less than optimal protocol for direct file transfer.
If you do want or must use UDP to transfer files, you need to build those parts, that are integral to TCP but not to UDP into the application. There is a saying though, that this will most likely result in an inefrior reimplementation of TCP.
Successfull implementations include many peer-to-peer file sharing protocols, where protection against connection interruption and packet loss or reordering need to be part of the apllication functionality anyway to defeat or mitigate filters.
Implementation recommendations:
What has worked for us is a chunked window implementation: The payload is separated into chunks of a fixed and convenient length, (we used 1023 bytes) a status array of N such chunks is kept on the sending and receiving end.
On the sending side:
A UDP message is inititated, containing such a chunk, its sequence number (more than once) in the stream and a checksum or hash.
The status array marks this chunk as "sent/pending" with a timestamp
Sending stops, if the complete status array (send window) is consumed
On the receiving side:
received packets are checked against their checksum,
corrupted packets are negativly acknowledged if all copies of the sequence number agree, dropped else
OK packets are marked in the status array as "received/pending" with a timestamp
Acknowledgement works by sending an ack packet if either enough chunks have been received to fill an ack packet, or the timestamp of the oldest "receive/pending" grows too old (some ms to some 100ms).
Ack packets need checksumming, but no sequencing.
Chunks, for which an ack has been sent, are marked as "ack/pending" with timestamp in the status array
On the sending side:
Ack packets are received and checked, corrupted packets are dropped
Chunks, for which an ack was received, are marked as "ack/done" in the status array
If the first chunk in the status array is marked "ack/done", the status array slides up, until its first chunk again is not maked done.
This possibly releases one or more unsent chunks to be sent.
for chunks in status "sent/pending", a timeout on the timestamp triggers a new send for this chunk, as the original chunk might have been lost.
On the receiving side:
Reception of chunk i+N (N being the window width) marks chunk i as ack/done, sliding up the receive window. If not all chunks sliding out of the receive window are makred as "ack/pending", this constitutes an unrecoverable error.
for chunks in status "ack/pending", a timeout on the timestamp triggers a new ack for this chunk, as the original ack message might have been lost.
Obviously there is the need for a special message type from the sending side, if the send window slides out the end of the file, to signal reception of an ack without sending chunk N+i, we implemented it by simply sending N chunks more than exist, but without the payload.

You can be sure the packets you receive are the same as what was sent (i.e. if you send packet A and receive packet A you can be sure they are identical). The transport layer CRC checking on the packets ensures this. Since UDP does not have guaranteed delivery however, you need to be sure you received everything that was sent and you need to make sure you order it correctly.
In other words, if packets A, B, and C were sent in that order you might actually receive only A and B (or none). You might get them out of order, C, B, A. So your checking needs to take care of the guaranteed delivery aspect that TCP provides (verify ordering, ensure all the data is there, and notify the server to resend whatever you didn't receive) to whatever degree you require.
The reason to prefer UDP over TCP is that for some applications neither data ordering nor data completeness matter. For example, when streaming AAC audio packets the individual audio frames are so small that a small amount of them can be safely discarded or played out of order without disrupting the listening experience to any significant degree. If 99.9% of the packets are received and ordered correctly you can play the stream just fine and no one will notice. This works well for some cellular/mobile applications and you don't even have to worry about resending missed frames (note that Shoutcast and some other servers do use TCP for streaming in some cases [to facilitate in-band metadata], but they don't have to).
If you need to be sure all the data is there and ordered correctly, then you should use TCP, which will take care of verifying that data is all there, ordering it correctly, and resending if necessary.

The UDP protocol uses the same strategy for checking packets with errors that the TCP protocol uses - a 16 bits checksum in the packet header.
The UDP packet structure is well known (as well as the TCP) so the packet can be easily tampered if not encrypted, adding another checksum (for instance CRC-32) would also make it more robust. If the purpose is to encrypt data (manually or over an SSL channel), I wouldn't bother adding another checksum.
Please take also into consideration that a packet can be sent twice. Make sure you deal with that accordingly.
You can check both packet structure on Wikipedia, both have checksums:
Transmission Control Protocol
User Datagram Protocol
You can check the TCP packet structure with more detail to get tips on how to deal with dropped packets. TCP protocol uses a "Sequence Number" and "Acknowledgment Number" for that purpose.
I hope this helps, and good luck.

UDP will drop packets that don't meet the internal per-packet checksum; CRC checking is useful to determine at the application layer if, once a payload appears to be complete, that what was received is actually complete (no dropped packets) and matches what was sent (no man-in-the-middle or other attacks).

Redirect serial port output to a socket

I have written a small program in Java which reads info from a serial port (some text messages and also some binary data, mixed), and then send it through a TCP socket.
At first I tried to get data into a buffer, and send everything when a '/n' arrived, but binary data doesn't work that way.
My second thought was sending the data all the time, byte by byte. But it gets too slow, and I guess I'm losing data while sending it, even thought I used threads.
Next idea would be sending data when I reach 100 bytes stored for example, but it is a dirty fix.
My serial data comes from a GPS receiver, and these devices usually send many sentences of data (text and binary) each second (or configurable to 10, 50, 100Hz, it depends of the model). Anyway, there is a gap of time where data is received, and a considerable larger gap of time when nothing is being received. I guess the fanciest way to get rid of my data would be detecting that gap and sending all the stored data then, before new data arrives when the gap ends.
How could I detect this kind of event?
EDIT: I use RXTX package to get rid of the serial port, and I'm working over Windows 7 64bits

To detegt the gap, just enable timeout with enableReceiveTimeout.

Just read and write byte[] buffers as you get them, making sure to only write the actual count returned by the read() method.

In Java, how do I deal with UDP messages that are greater than the maximum UDP data payload?

I read this question about the error that I'm getting and I learned that UDP data payloads can't be more than 64k. The suggestions that I've read are to use TCP, but that is not an option in this particular case. I am interfacing with an external system that is transmitting data over UDP, but I don't have access to that external system at this time, so I'm simulating it.
I have data messages that are upwards of 1,400,000 bytes in some instances and it's a requirement that the UDP protocol is used. I am not able to change protocols (I would much rather use TCP or a reliable protocol build on UDP). Instead, I have to find a way to transmit large payloads over UDP from a test application into the system that I am building and to read those large payloads in the system that I'm building for processing. I don't have to worry about dropped packets, either - if I don't get the datagram, I don't care - just wait for the next payload to arrive. If it's incomplete or missing, just throw it all away and continue waiting. I also don't know the size of the datagram in advance (they range of a few hundred bytes to 1,400,000+ bytes.
I've already set my send and receive buffer sizes large enough, but that's not sufficient. What else can I do?

UDP packets have a 16 bit length field. It's nothing to do with Java. They cannot be bigger, period. If the server you are talking to is immutable, you are stuck with what you can fit into a packet.
If you can change the server and thus the protocol, you can more or less reimplement TCP for yourself. Since UDP is defined to be unreliable, you need the full retransmission mechanism to cope with packets that are dropped in the network somewhere. So, you have to split the 'message' into chunks, send the chunks, and have a protocol for requesting retransmission of lost chunks.

It's a requirement ...
The requirement should also therefore dictate the packetization technique. You need more information about the external system and its protocol. Note that the maximum IPv4 UDP payload Is 65535-28 bytes, and the maximum practical payload is < 1500 bytes once a router gets involved.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.