Corrupt Protocol Buffers Messages - java

I am using Protocol Buffers for Swift (latest from CocoaPods) and Google's official Java Protocol buffer client (version 2.6.0) to to pass messages between a Java server (ServerSocket) and a Swift iOS app (GCDAsyncSocket).
Most messages (several hundred per second; I am streaming audio as float arrays, among other things) flow just fine. Occasionally, however, a message from client to server won't parse. The Java code throws a
com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero)
On both ends I am sending a 4-byte Big-Endian integer representing the number of bytes to follow, then the raw protobuf message. On both ends I am receiving the number of bytes to follow, blocking until I get that many bytes, and then attempting to parse.
There are no errors observed in the Java->Swift direction, only Swift->Java.
The vast majority of messages are fine. The problem appears to increase in frequency with the number of messages being processed.
In Java each client has a thread talking to it and a thread listening to it. The listener thread pulls messages off the wire and puts them into LinkedBlockingQueues per client. The talking thread pulls message off the LinkedBlockingQueue for that client, serializes them, and send them to that client's output stream.
// Take a messageBuilder, serialize and transmit it
func transmit(messageBuilder: Message_.Builder) {
do {
messageBuilder.src = self.networkID;
let data = try messageBuilder.build().data()
var dataLength = CFSwapInt32HostToBig(UInt32(data.length))
self.socket.writeData(NSData(bytes: &dataLength, length: 4), withTimeout: 1, tag: 0)
self.socket.writeData(data, withTimeout: 1, tag: 0)
} catch let error as NSError {
NSLog("Failed to transmit.")
NSLog(error.localizedDescription)
}
}
Java receiving side:
public void run() {
while (true) {
try {
byte[] lengthField = new byte[4];
try {
ghost.in.readFully(lengthField, 0, 4);
} catch (EOFException e) {
e.printStackTrace();
ghost.shutdown();
return;
}
Integer bytesToRead = ByteBuffer.wrap(lengthField).order(ByteOrder.BIG_ENDIAN).getInt();
byte[] wireMessage = new byte[bytesToRead];
in.readFully(wireMessage, 0, bytesToRead);
HauntMessaging.Message message = HauntMessaging.Message.parseFrom(wireMessage);
// do something with the message
} catch (IOException e) {
e.printStackTrace();
ghost.shutdown();
return;
}
}
}
Any ideas?

To debug protocol buffer messages:
capture the packet in Wireshark
right-click on the packet subsection that contains only the protobuf message and copy the hex stream
use a hex editor to save the hex stream to a file
protoc ‒‒decode_raw < file and match the output tags and data to the tags in your .proto file
Due to the exception message, Protocol message contained an invalid tag (zero), I suspect that Swift failed to build the protobuf message and sent an empty message.

Got it!
The two calls consecutive calls to socket.writeData were not necessarily atomic, but called from several threads. They were getting interleaved, so that first it wrote a length and then it wrote a different length (and/or somebody else's message).
Surrounding those two calls in a dispatch_async block for a DISPATCH_QUEUE_SERIAL fixed the problem.

Related

Send message through Socket Outputstream from Swift Client to Java Server

As the title says, I'm having some problems connecting my Swift Client to a Java Server by using only native libraries.
Right now I have a Java(client) to Java(server) connection, which works flawlessly, and a Swift(client) to Java(server) connection. My Swift client does connect to the Java server (it shows on the console), but the readFully of my server hangs - the only thing I see is the enormous message length on the console and that's pretty much it.
Swift Code (send):
func send(message: String) {
let data = "msg:\(message)".data(using: .utf8)!
data.withUnsafeBytes {
guard let pointer = $0.baseAddress?.assumingMemoryBound(to: UInt8.self) else {
print("Error joining chat")
return
}
out!.write(pointer, maxLength: data.count)
}
}
Java Server (read):
int lenght = in.readInt();
System.out.println("Message: " + lenght);
if(length>0){
byte[] message = new byte[length];
in.readFully(message), 0, message.length); //This is where the problem is
}
So, basically the Java server hangs on the readFully method. Any ideas why that is? Whether I send a simple String or a Byte array, it always hangs, and for some reason the length is extremely big. Can someone help me understand what I'm doing wrong because I've around this for days and still didn't understand it (still pretty new to Swift).
Thank you very much!
In your Java code, you expect the first 4 bytes to be the length of the message, but in the Swift code, you didn't send the length of the message first. You sent the message itself directly. So now the Java code is left confused as to why the message is shorter than it expected, and waiting for more data to read.
You can try sending the number of bytes in data first:
let byteCountAsByteArray = withUnsafeBytes(of: Int32(data.count).bigEndian, Array.init)
out!.write(byteCountAsByteArray, maxLength: 4)
out!.write(pointer, maxLength: data.count)
I suspect that you are using a DataInputStream in your Java code, hence .bigEndian. But if you are reading this in a little endian way, use .littleEndian instead.

Sending Protocol Buffer encoded message from Python Server to Java Client

I'm writing a little server that uses protocol buffer to encode some data.
TCP Socket is opened between Android Client and Python Server
Android Client sends string for processing as normal newline delimited utf-8.
Python Server does some processing to generate a response, which gives an Array of Int Arrays: [[int]]. This is encoded in the protocol buffer file:
syntax = "proto2";
package tts;
message SentenceContainer {
repeated Sentence sentence = 1;
}
message Sentence {
repeated uint32 phonemeSymbol = 1;
}
It gets loaded into this structure and sent as follows...
container = ttsSentences_pb2.SentenceContainer()
for sentence in input_sentences:
phonemes = container.sentence.add()
# Add all the phonemes to the phoneme list
phonemes.phonemeSymbol.extend(processor.text_to_sequence(sentence))
payload = container.SerializeToString()
client.send(payload)
Android Client receives Protocol Buffer encoded message and tries to decode.
This is where I'm stuck...
# I get the InputStream when the TCP connection is first opened
bufferIn = socket.getInputStream();
TtsSentences.SentenceContainer sentences = TtsSentences.SentenceContainer.parseDelimitedFrom(bufferIn);
When receiving the message the client gets this exception:
E/TCP: Server Error
com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.
at com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:164)
at com.google.protobuf.GeneratedMessageLite.parsePartialDelimitedFrom(GeneratedMessageLite.java:1527)
at com.google.protobuf.GeneratedMessageLite.parseDelimitedFrom(GeneratedMessageLite.java:1496)
at com.tensorspeech.tensorflowtts.TtsSentences$SentenceContainer.parseDelimitedFrom(TtsSentences.java:221)
at com.tensorspeech.tensorflowtts.network.PersistentTcpClient.run(PersistentTcpClient.java:100)
at com.tensorspeech.tensorflowtts.MainActivity.lambda$onCreate$0$MainActivity(MainActivity.java:71)
at com.tensorspeech.tensorflowtts.-$$Lambda$MainActivity$NTUE8bAusaoF3UGkWb7-Jt806BY.run(Unknown Source:2)
at java.lang.Thread.run(Thread.java:919)
I already know this problem is caused because Protocol buffer is not self delimiting, but I'm not sure how I'm supposed to properly delimit it. I've tried adding a newline client.send(payload + b'\n'), and adding in the PB size in bytes to the beginning of the payload client.send(container.ByteSize().to_bytes(2, 'little') + payload), but am not sure how to proceed.
It's a shame there's no documentation on how to use Protocol Buffer over TCP Sockets in Java...
OK, I worked this out...
In the case where you have a short-lived connection, the socket closing would signify the end of the payload, so no extra logic is required.
In my case, I have a long-lived connection, so closing the socket to signify the end of the payload wouldn't work.
With a Java Client & Server, you could get around this by using:
MessageLite.writeDelimitedTo(OutputStream)
then on the recipient side:
MessageLite.parseDelimitedFrom(InputStream).
Easy enough...
But in the Python API, there is no writeDelimitedTo() function. So instead we must recreate what writeDelimitedTo() is doing. Fortunately, it's simple. It simply adds a _VarintBytes equal to the payload size to the beginning of the message!
client, _ = socket.accept()
payload = your_PB_item.SerializeToString()
size = payload.ByteSize()
client.send(_VarintBytes(size) + payload)
Then on the Java recipient side...
bufferIn = socket.getInputStream();
yourPbItem message;
if ((message = yourPbItem.parseDelimitedFrom(bufferIn)) != null) {
// Do stuff :)
}
This way, your protocol buffer library knows exactly how many bytes to read, and then to stop caring about the InputStream, rather than sitting listening indefinitely.

Transmiting/receiving compressed data with sockets: how to properly receive the data sent from the client

I have developed a client-server chat using the Sockets and it works great, but when I try to transmit data with Deflate compression it doesn't work: the output is "empty" (actually it's not empty, but I'll explain below).
The compression/decompression part is 100% working (I have already tested it), so the problem must be elsewhere in the transmission/receiving part.
I send the message from the client to the server using these methods:
// streamOut is an instance of DataOutputStream
// message is a String
if (zip) { // zip is a boolean variable: true means that compression is active
streamOut.write(Zip.compress(message)); // Zip.compress(String) returns a byte[] array of the compressed "message"
} else {
// if compression isn't active, the client sends the not compressed message to the server (and this works great)
streamOut.writeUTF(message);
}
streamOut.flush();
And I receive the message from the client to the server using these other methods:
// streamIn is an instace of DataInputStream
if (server.zip) { // same as before: true = compression is active
ByteArrayOutputStream bos = new ByteArrayOutputStream();
byte[] buf = new byte[512];
int n;
while ((n = streamIn.read(buf)) > 0) {
bos.write(buf, 0, n);
}
byte[] output = bos.toByteArray();
System.out.println("output: " + Zip.decompress(output)); // Zip.decompress(byte[]) returns a String of decompressed byte[] array received
} else {
System.out.println("output: " + streamIn.readUTF()); // this works great
}
Debugging a little bit my program, I've discovered that the while loop never ends, so:
byte[] output = bos.toByteArray();
System.out.println("output: " + Zip.decompress(output));
is never called.
If I put those 2 lines of code in the while loop (after bos.write()), then all works fine (it prints the message sent from the client)! But I don't think that's the solution, because the byte[] array received may vary in size. Because of this I assumed that the problem is in the receiving part (the client is actually able to send data).
So my problem became the while loop in the receiving part. I tried with:
while ((n = streamIn.read(buf)) != -1) {
and even with the condition != 0, but it's the same as before: the loop never ends, so the output part is never called.
-1 is only returned when the socket is closed or broken. You could close the socket after sending your zipped content, and your code would start working. But I suspect you want to keep the socket open for more (future) chat messages. So you need some other way of letting the client know when a discrete message has been fully transmitted. Like Patrick suggested, you could transmit the message length before each zipped payload.
You might be able to leverage something in the deflate format itself, though. I think it has a last-block-in-stream marker. If you're using java.util.zip.Inflater have a look at Inflater.finished().
The read function will not return a -1 until the stream is closed. What you can do is calculate the number of bytes that should be sent from the server to the client, and then read that number of bytes on the client side.
Calculating the number of bytes is as easy as sending the length of the byte array returned from the Zip.compress function before the actual message, and then use the readInt function to get that number.
Using this algorithm makes sure that you read the correct number of bytes before decompressing, so even if the client actually reads 0 bytes it will continue to read until it receives all bytes it wants. You can do a streamIn.read(buf, 0, Math.min(bytesLeft, buf.length)) to only read as many bytes you want.
Your problem is the way you are working with stream. You must send some meta-data so your client know what to expect as data. Idealy you are creating a protocol/state machine to read the stream. For your example, as a quick and dirt solution, send something like data size or a termination sequence or something.
Example of solution:
Server: send the "data size" before the compressed data
Client: wait for the "data size" bytes. Now loop till read is equal or greater "data size" value. Something like:
while( streamIn.ready() && dataRead < dataExpected)
{
dataRead += streamIn.read(buf);
}
Of course you need to read the dataExpected before, with a similar code.
Tip: You could also use UDP if you dont mind having the possibility to lose data. Its easier to program with datagrams...

Java Socket Issue: Packets Are Merged At The Receiver Side

I'm having a socket problem. This problem occurs when I'm running the server and client on the same PC i.e. using "localhost" parameter. But problem is not seen when different PCs are being used.
Client sends a file with these codes:
output_local.write(buffer, 0, bytesRead);
output_local.flush();
And after that in another method I'm sending a command with these:
outputStream.write(string);
outputStream.flush();
Server appends the command to the end of the file. So it thinks it hasn't received the command from the client yet. do you have an idea what might causing this problem? How can I solve the defect? below is the file receive method at the server:
while (true) {
try {
bytesReceived = input.read(buffer);
} catch (IOException ex) {
Logger.getLogger(Server.class.getName()).log(Level.SEVERE, null, ex);
System.out.println("exception occured");
break;
}
System.out.println("received:" + bytesReceived);
try {
/* Write to the file */
wr.write(buffer, 0, bytesReceived);
} catch (IOException ex) {
Logger.getLogger(Server.class.getName()).log(Level.SEVERE, null, ex);
}
total_byte = total_byte + bytesReceived;
if (total_byte >= filesizeInt) {
break;
}
}
If you want message-like support, you need a create a protocol to clarify what you're going to send and receive.
In TCP, you can't rely on separate "packets" being received separately (e.g., sending 4 chunks of 10 bytes may be received as 1 chunk of 40, or of 2 chunks of 20, or one chunk of 39 and one chunk of 1). TCP guarantees in order delivery, but not any particular 'packetization' of your data.
So, for example, if you're sending a string you need to first send the string length then its bytes. The logic in pseudocode would be something like:
Client:
Send the command indicator
Send the payload length
Send the payload
Server:
Read the command indicator
Read the payload length
Loop reading payload until the complete length has been read
The defect is that you're treating a stream-based protocol (TCP) as if it were a message-oriented protocol. It's not. You should assume that this can happen.
If you need to break your stream into individual messages, you should use either delimiters or (preferably IMO) a length prefix for each message. You should also then anticipate that any read you issue may not receive as much data as you've asked for - in other words, not only can messages be combined if you're not careful, but they can easily be split.
I mentioned that I prefer length-prefixing to delimiters. Pros and cons:
The benefit of using a message delimiter is that you don't need to know the message size before you start sending.
The benefits of using a length prefix are:
The code for reading the message doesn't need to care about the data within the message at all - it only needs to know how long it is. You read the message length, you read the message data (looping round until you've read it all) and then you pass the message on for process. Simple.
You don't need to worry about "escaping" the delimiter if you want it to appear within a normal message.
As TCP is a stream oriented connection, this behaviour is normal if the writer writes faster than the reader reads, or than the TCP stack sends packets.
You should add a separator to separate the parts of the streams, e.g. by using a length field for sub packets, or by using separators such as newline (\n, char code 10).
Another option could be to use UDP (or even SCTP), but that depends on the task to be fulfilled.

DataOutputStream not flushing

I have a Java Client which sends UTF-8 strings to a C# TCP-Server, I'm using a DataOutputStream to send the strings. The code looks like this:
public void sendUTF8String(String ar) {
if (socket.isConnected()) {
try {
dataOutputStream.write(ar.getBytes(Charset.forName("UTF-8")));
dataOutputStream.flush();
} catch (IOException e) {
handleException(e);
}
}
}
The problem is that flush doesn't seem to work right. If I send two Strings close to each other, the server receives only one message with both strings. The whole thing works if I do a Thread.sleep(1000) between calls, this is obviously not a solution.
What am I missing?
flush() doesn't guarantee that a data packet gets shipped off. Your TCP/IP stack is free to bundle your data for maximum efficiency. Worse, there are probably a bunch of other TCP/IP stacks between you and your destination, and they are free to do the same.
I think you shouldn't rely on packet bundling. Insert a logical terminator/divider in your data and you will be on the safe side.
You shouldn't worry about how the data is broken up into packets.
You should include the length of the string in your messages, and then on the receiving end you would read the length first. So for example to send you would do
byte[] arbytes = ar.getBytes(Charset.forName("UTF-8"));
output.writeInt(arbytes.length)
output.write(arbytes)
and then in your reader you do
byte[] arbytes = new byte[input.readInt()];
for(int i = 0; i < len; i++){
arbytes[i] = input.read();
}
//convert bytes back to string.
You can't just call input.read(arbytes) because the read function doesn't necessarily read the entire length of the array. You can do a loop where you read a chunk at a time but the code for that is a bit more complex.
Anyway, you get the idea.
Also, if you really want to control what goes in what packets, you can use Datagram Sockets, but if you do that then delivery of the packet is not guaranteed.
Socket send a stream of data, not messages.
You shouldn't rely on the packets you receive being the same size as they are sent.
Packets can be grouped together as you have seen but they can also be broken up.
Use #Chad Okere's suggestion on how to ensure you get blocks the same was they are sent.
However in your case, you can just use
dataOutputStream.writeUTF(ar); // sends a string as UTF-8
and
String text = dataInputStream.readUTF(); // reads a string as UTF-8

Categories

Resources