Transferring chars from java to c++: Issue with endianness

Transferring chars from java to c++: Issue with endianness - java

I've run into mind twisting bafflement after putting my hands into an old legacy project. The project consists of a Java application and a c++ application, which communicate with sockets. Both applications are designed to work on cross platform environments, so I'd be happy to keep the code as universal as possible.
I ended up rewriting parts of the communication logic, since the previous implementation had some issues with foreign characters. Now I ran into a problem with endianness, which I hope someone could spell out for me.
The Java software writes messages to socket with OutputStreamWriter, using UTF-16LE encoding, as follows.
OutputStream out = _socket.getOutputStream();
outputWriter = new OutputStreamWriter(new BufferedOutputStream(out), "UTF-16LE");
// ... create msg
outputWriter.write(msg, 0, msg.length());
outputWriter.flush();
The c++ program receives the message character by character as follows:
char buf[1];
std::queue<char> q;
std::u16string recUtf16Msg;
do {
int iResult = recv(socket, buf, 1, 0);
if (iResult <= 0)
break; // Error or EOS
for (int i = 0; i < iResult; i++) {
q.push(buf[i]);
}
while (q.size() >= 2) {
char firstByte = q.front();
q.pop();
char secondByte = q.front();
q.pop();
char16_t utf16char = (firstByte << (sizeof(char) * CHAR_BIT)) ^
(0x00ff & secondByte);
// Change endianness, if necessary
utf16char = ntohs(utf16char);
recUtf16Msg.push_back(utf16char);
}
// ... end of message check removed for clarity
} while (true);
Now the issue which I'm really facing is that the code above actually works, but I'm not really sure why. The c++ side is written to receive messages which use network byte order (big endian) but it seems that java is sending the data using little endian encoding.
On c++ side we even use ntons-function to change endianness to the one desired by host machine. According to specification I understand that hton is supposed to do swap endianness if host platform uses little endian byte order. However ntonhs actually swaps the endianness of the received small endian characters, which ends up as big endian and the software works flawlessly.
Maybe someone can point out what exactly is happening? Do I accidentally switch bytes already to when creating utf16char? Why htons makes everything work, while it seems to act exactly opposite to the documentation? To compile I'm using Clang with libc++.
I left out parts of the code for clarity, but you should get the general idea. Also, I'm aware that using queue and dynamic array may not be the most effective way of handling data, but it's clean and performs well enough for this purpose.

Related

C# - Writing strings to a stream using two bytes for length, not one

I am creating an easy to use server-client model with an extensible protocol, where the server is in Java and clients can be Java, C#, what-have-you.
I ran into this issue: Java data streams write strings with a short designating the length, followed by the data.
C# lets me specify the encoding I want, but it only reads one byte for the length. (actually, it says '7 bits at a time'...this is odd. This might be part of my problem?)
Here is my setup: The server sends a string to the client once it connects. It's a short string, so the first byte is 0 and the second byte is 9; the string is 9 bytes long.
//...
_socket.Connect(host, port);
var stream = new NetworkStream(_socket);
_in = new BinaryReader(stream, Encoding.UTF8);
Console.WriteLine(_in.ReadString()); //outputs nothing
Reading a single byte before reading the string of course outputs the expected string. But, how can I set up my stream reader to read a string using two bytes as the length, not one? Do I need to subclass BinaryReader and override ReadString()?

The C# BinaryWriter/Reader behavior uses, if I recall correctly, the 8th bit to signify where the last byte of the count is. This allows for counts up to 127 to fit in a single byte while still allowing for actual count values much larger (i.e. up to 2^31-1); it's a bit like UTF8 in that respect.
For your own purposes, note that you are writing the whole protocol (presumably), so you have complete control over both ends. Both behaviors you describe, in C# and Java, are implemented by what are essentially helper classes in each language. There's nothing saying that you have to use them, and both languages offer a way to simply encode text directly into an array of bytes which you can send however you like.
If you do want to stick with the Java-based protocol, you can use BitConverter to convert between a short to a byte[] so that you can send and receive those two bytes explicitly. For example:
_in = new BinaryReader(stream, Encoding.UTF8);
byte[] header = _in.ReadBytes(2);
short count = BitConverter.ToInt16(header, 0);
byte[] data = _in.ReadBytes(count);
string text = Encoding.UTF8.GetString(data);
Console.WriteLine(text); // outputs something

Read serial (/dev/tty*) fom java

I am trying to read serial using Java:
byte[] text = new byte[5];
for(int i = 0;i<5;i++){
text[i] = (byte)in.read();
}
For some reason it returns me some weird data. I have tried libraries like RXTX and COMM, but other issues comes then. Is there some way to read everything in raw Java way?
Any code snippets would be helpful.
Thanks

The problem is that Byte is just too narrow for this data serial data(because it is signed), so I do loose most significant bit. Solution is to read everything to int.

BufferedInputStream says it is not available, but works anyway

Can someone explain to me why this works just fine with in.available()>0 commented out, but when I put it back in it breaks?
mySocket = new Socket("blahblah", 12345);
BufferedInputStream in = new BufferedInputStream(mySocket.getInputStream());
....
char result[] = new char[length];
for (int i = 0; i < length && !mySocket.isClosed() /*&& in.available()>0*/; i++){
result[i] = (char)in.read();
}
More specifically: I'm making an Android app where a user can search for a term, that search is sent to some thingy in interspace, I get back results in xml form, and do stuff with them. When the xml I get back is small enough (see "length" in code above), the code works just fine with in.available()>0 left in. But if the length is large, in.available() returns 0. But with that commented out, everything continues to run smoothly.
Why is that? And is it something I need to worry about and fix?

in.available() lets you know if you can read data at that moment without blocking. As Sockets have a stream of data, it may not available immediately but in a short time. e.g. if you have a 1 Gbit connection, full packets will be no closer than 15 micro-seconds which is long time for a computer.

I think the reason in.available() == 0 when the data is large is because it hasn't had a chance to write it to your socket yet. You shouldn't need to use in.available(). Also, I wouldn't suggest reading a single char at a time, that will be really slow with a lot of data and VERY chatty over the network. Consider reading in a byte array of size "length".

Java and C++ - String

I have a Java Client and C++ server. All values are sent as byte array. The numeric values are received fine but the string values when stored in char array in C++, have special characters like new page or new line feed at the end of the value. Can someone suggest a solution to the problem?

Yes - use google protocol buffers for serialization/deserialization. It's an open-source, stable, easy-to-use cross-platform package.

How are you serialising / deserialising? You should decide on an encoding (for example ASCII) then write the length of the string first as an int, that way the server can read the int and will know how many bytes to read of the string.
Once its read the bytes it just needs to tail the char* with a '\0' to terminate the string in the array.
Depending on what you are using to write the string in Java you would do something like:
writeInt(string.length());
writeBytes(string.getBytes("ASCII"));
and in your C++ server you would do the reverse.

1) Make sure the server code is complying with your protocol at the byte level.
2) Make sure the client code is complying with your protocol at the byte level.
3) If you have done 1 and 2, and you still have problems, your protocol is broken. Most likely, it fails to properly specify how the server specifies where the strings end and how the client establishes where the strings end.

Encrypting in ANSI C, decrypting in Java

I have to send a short string from ANSI C application to Java application through a socket - already done. Because its important data I have to encrypt it using password like "abc123". How should I do it in the simpliest way?

By "socket" I assume you mean a TCP/IP connection. In that case you should consider using Secure Sockets Layer (SSL). SSL pretty much solves most of the security problems associated with sending data across the wire. The only thing you need to work out is how to distribute keys to each end of the pipe.
I strongly recommend that you don't roll your own system. Crypto is hard to get right so use an existing, well tested implementation.
If you're talking about a Unix domain socket then you probably don't need to bother with encryption since domain sockets are just inter-process pipes.

As mentioned it depends very much on how secure you want this to be, the sensible answer is to find a Java and C implementation of the same cryptosystem and use those.
If you are willing to accept the lower security that usually comes with home brewing these things which I assume you are by the "simplest way" in your question and assuming both the source and runtime for both ends are secure. I.E. you only need to worry about the data in transit being intercepted. You could just use whatever password you desire as a seed for a pseudo random number generator (remainder of dividing a large prime by the byte index or similar) and XOR the bytes of data with the random numbers generated. Not the most secure but would be very quick to implement.
uint8_t encrypt(uint8_t iData, size_t iPos) {
// Super large prime, our 'password', best kept super secret
const uint64_t iSeed = 32416190071;
// Mostly to stop divide by zero
// Also starting in the obvious place gives more info on the prime
const size_t iOffset = 10;
uint8_t iPad = iSeed % (iPos + iOffset);
return iPad^iData;
}

Char encrypt(char ch) {
Return (ch ^0x55);
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Transferring chars from java to c++: Issue with endianness - java

Related

C# - Writing strings to a stream using two bytes for length, not one

Read serial (/dev/tty*) fom java

BufferedInputStream says it is not available, but works anyway

Java and C++ - String

Encrypting in ANSI C, decrypting in Java

Categories

Resources