Handling C char arrays with Java char arrays

Handling C char arrays with Java char arrays - java

Through a socket I am sending information from a program written in C to a program written in Java.
Through the program in C, I am sending two bytes through a char array (using an internet socket), and the received information in Java is stored in a char array also.
My main problem is that the received information in the Java array does not correspond properly to the transmitted information from the C program.
I have read that the chars in Java are 16 bits long, and the chars in C are 8 bits long. That may be the problem, but I do not know how to handle/solve that.
The C code to send information is the following:
char buffer[256];
bzero(buffer,256);
n = read(fd, buffer, 255); // getting info from an uart port, which works properly
n = write(sockfd,buffer,3); // send the information through the socket
Part of the Java code (for an Android app) is the following:
char[] buffer = new char[256];
BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
int readX = in.read(buffer, 0, 3);
if (readX > 0) { // I am using a handler to manipulate the info
Message msg = new Message();
msg.obj = buffer;
mHandler.sendMessage(msg);
}
....
// Part of the handler is the following:
mHandler = new Handler() {
#Override
public void handleMessage(Message msg) {
char[] buffer;
buffer = (char[])msg.obj; // the information here is different from the one sent from the C program
....
}
}
Any suggestion to solve this problem I would really appreciate it.
Thanks in advance, Gus.

In C and C++ the char data type is 8-bit characters, corresponding roughly to the Java byte type. In Java, the fundamental char type is a 16-bit Unicode character. When you convert from bytes to characters (or vice-versa) in Java, a mapping has to occur, depending on the character encoding of the byte stream (UTF-8, ISO-8859-1, etc), so you have to know how the C byte stream is encoded. In your case I'd guess it's ISO-8859-1. If it's really binary data, then you should use the Java byte type.
EDIT:
You have to know whether the data being sent from C is character or binary, and if character, how the C program is encoding the data (ISO-8859-1, UTF-8, etc).
If the data is binary then you must use BufferedInputStream to read bytes, not BufferedReader, which decodes bytes into characters, which you don't want since the data is binary.
If the data is character, then you can use the 2-argument constructor
InputStreamReader(InputStream in, String charSetName)
or one of the other 2-arg constructors that let you specify how to decode bytes into characters.

Use the Java byte type, which is an 8-bit signed integer. Also ensure that your char type in C is actually 8 bits. The Boost StaticAssert facility can be used to ensure that.

Related

Data is corrupted after sending from PHP to JAVA using tcp

I am trying to send data from PHP TCP server to JAVA TCP client.
I am comparing my results by comparing hex values of the data.
PHP script reads STDIN, sends it through socket one byte at a time and java reads it using DataInputStream.read(), converts to hex and displays.
If I manually type data into script - it works ok.
If I use file with data - it works OK
But when I assign /dev/urandom(even few bytes) - the data on the java side is coming corrupted. There is always a hex of value efbfbd in random places instead of correct data.
Please help me with this issue.
PHP code:
$f = fopen( 'php://stdin', 'rb' );
while($line = fread($f, 1)){
$length = 1;
echo bin2hex($line)."\n";
echo socket_write($client, $line, 1)."\n";
$sent = socket_write($client, $line, $length);
if ($sent === false) {
break;
}
// Check if the entire message has been sented
if ($sent < $length) {
// If not sent the entire message.
// Get the part of the message that has not yet been sented as message
$line = substr($line, $sent);
// Get the length of the not sented part
$length -= $sent;
}
Java code:
in = new DataInputStream(clientSocket.getInputStream());
byte[] data = new byte[1];
int count = 0;
while(in.available() > 0){
//System.out.println(in.available());
in.read(data);
String message = new String(data);
System.out.println(message);
//System.out.flush();
System.out.println( toHex(message) );
//in.flush();
message = "";
}

You're stumbling upon encoding. By calling new String(data) the byte array is converted using your default encoding to a string, whatever this encoding may is (you can set the encoding by java -Dfile.encoding=UTF-8 to UTF-8 for example).
The Java code you want would most likely look the following:
in = new DataInputStream(clientSocket.getInputStream());
byte[] data = new byte[1];
int count = 0;
while (in.available() > 0) {
// System.out.println(in.available());
in.read(data);
String hexMessage = Integer.toHexString(data[0] & 0xFF);
String stringMessage = new String(data, "UTF-8"); // US-ASCII, ISO-8859-1, ...
System.out.println(hexMessage);
}
Update: I missed the 32bit issue. The 8-bit byte, which is signed in Java, is sign-extended to a 32-bit int. To effectively undo this sign extension, one can mask the byte with 0xFF.

There are two main issues with your Java program.
First - the use of in.available(). It does not tell you how many bytes there are still in the message. It merely says how many bytes are ready in the stream and for available reading without being blocked. For example, if the server sends two packets of data over the socket, one has arrived, but one is still being sent over the Internet, and each packet has 200 bytes (this is just an example), then in the first call you'll get the answer 200. If you read 200 bytes, you're sure not to be blocked. But if the second packet has not arrived yet, your next check of in.available() will return 0. If you stop at this point, you only have half the data. Not what you wanted.
Typically you either have to read until you reach end-of-stream (InputStream.read() returns -1), and then you can't use the same stream anymore and you close the socket, or you have a specific protocol that tells you how many bytes to expect and you read that number of bytes.
But that's not the reason for the strange values you see in output from your program. The reason is that Java and PHP represent strings completely differently. In PHP, a string can contain any bytes at all, and the interpretation of them as characters is up to the prorgrammer.
This basically means that a PHP string is the equivalent of a byte[] in Java.
But Java Strings are completely different. It consists internally of an array of char, and char is always two bytes in UTF-16 encoding. When you convert bytes you read into a Java String, it's always done by encoding the bytes using some character encoding so that the appropriate characters are stored in the string.
For example, if your bytes are 44 4F 4C 4C, and the character encoding is ISO-8859-1, this will be interpreted as the characters \u0044, \u004F, \u004C, \u004C. It will be a string of four characters - "DOLL". But if your character encoding is UTF-16, the bytes will be interpreted as \u444F and \u4C4C. A string of only two characters, "䑏䱌".
When you were reading from the console or from a file, the data was probably in the encoding that Java expects by default. This is usually the case when the file is written in pure English, with just English letters, spaces and punctuation. These are all 7-bit characters which are the same in ISO-8859-1 and UTF-8, which are the common defaults. But in /dev/urandom you'd have some bytes in the range 80 through FF, which may be treated differently when interpreted into a UTF-16 Java string.
Furthermore, you didn't show your toHex() method in Java. It probably reads bytes back from the string again, but using which encoding? If you read the bytes into the String using ISO-8859-1, and got them out in UTF-8, you'd get completely different bytes.
If you want to see exactly what PHP sent you, don't put the bytes in a String. Write a toHex method that works on byte arrays, and use the byte[] you read directly.
Also, always remember to check the number of bytes returned by read() and only interpret that number of bytes! read() does not always fill the entire array. So in your new toHex() method, you need to also pass the number of bytes read as a parameter, so that it doesn't display the parts of the array after them. In your case you just have a one-byte array - which is not recommended - but even in this case, read() can return 0, and it's a perfectly legal value indicating that in this particular call to read() there were no bytes available although there may be some available in the next read().

As the comment above says you might be having troubles with the string representation of the bytes String message = new String(data); To be certain, you should get the data bytes and encode them in Base64 for example. You can use a library such as Apache Commons or Java 8 to do that. You should be able to do something similar in PHP to compare.

C# - Writing strings to a stream using two bytes for length, not one

I am creating an easy to use server-client model with an extensible protocol, where the server is in Java and clients can be Java, C#, what-have-you.
I ran into this issue: Java data streams write strings with a short designating the length, followed by the data.
C# lets me specify the encoding I want, but it only reads one byte for the length. (actually, it says '7 bits at a time'...this is odd. This might be part of my problem?)
Here is my setup: The server sends a string to the client once it connects. It's a short string, so the first byte is 0 and the second byte is 9; the string is 9 bytes long.
//...
_socket.Connect(host, port);
var stream = new NetworkStream(_socket);
_in = new BinaryReader(stream, Encoding.UTF8);
Console.WriteLine(_in.ReadString()); //outputs nothing
Reading a single byte before reading the string of course outputs the expected string. But, how can I set up my stream reader to read a string using two bytes as the length, not one? Do I need to subclass BinaryReader and override ReadString()?

The C# BinaryWriter/Reader behavior uses, if I recall correctly, the 8th bit to signify where the last byte of the count is. This allows for counts up to 127 to fit in a single byte while still allowing for actual count values much larger (i.e. up to 2^31-1); it's a bit like UTF8 in that respect.
For your own purposes, note that you are writing the whole protocol (presumably), so you have complete control over both ends. Both behaviors you describe, in C# and Java, are implemented by what are essentially helper classes in each language. There's nothing saying that you have to use them, and both languages offer a way to simply encode text directly into an array of bytes which you can send however you like.
If you do want to stick with the Java-based protocol, you can use BitConverter to convert between a short to a byte[] so that you can send and receive those two bytes explicitly. For example:
_in = new BinaryReader(stream, Encoding.UTF8);
byte[] header = _in.ReadBytes(2);
short count = BitConverter.ToInt16(header, 0);
byte[] data = _in.ReadBytes(count);
string text = Encoding.UTF8.GetString(data);
Console.WriteLine(text); // outputs something

Send byte array to iOS app from Java app (with quickserver)

I have a byte array in Java. The values are both positive and negative (since they are greater than 127 in the original unsigned array). Now I want to send this array with Quickserver (http://www.quickserver.org/) to my TCP client in an iOS application I am writing as well. I pass the byte array to sendClientBinary() method which accepts the byte array as its input. However, when I receive the array in the iOS client app, all the negative values seem to have been converted to some kind of complement form and mainly into two-byte values: -71 (0xB9) in Netbeans looks in Xcode memory view as 0xC2 0xB9 and -67 (0xBD) in Netbeans appears as 0xC2 0xBD in Xcode.
Can anyone please provide explanation for this?
I am also able to convert my byte array to char array and masking out all the upper bytes, so now the char array holds the correct values in the full 0-255 range, however, there is no way how to pass a char array to sendClientBinary() method that only accepts byte array as input.
Should I try to be casting or converting char array to byte array somehow again?
//Some code in Java:
//reading my byte array from a method and converting it to char array (sorry if it's not the most efficient way, just need something simple right now
byte byteArray[] = (byte[])functionReturningByteArray();
char charArray[] = new char[byteArray.length];
for (int ij = 0; ij < byteArray.length; ij++)
{
charArray[ij] = (char) byteArray[ij];
if (charArray[ij] > 255)
charArray[ij] &= 0xFF;
}
//and the code sending the data over TCP socket (via Quickserver):
clientH.setDataMode(DataMode.BINARY, DataType.OUT);
clientH.sendClientBinary(byteArray);
//--this is received in iOS as 16-bit values with some prefix such as 0xC2 or 0xC3 for negative values, if not for the prefix the value would be correct
//or an attempt to send the charArray:
clientH.setDataMode(DataMode.byte, DataType.OUT);
clientH.sendClientBytes(charArray.toString());
//--this doesn't resemble my bytes once received in iOS at all
//iOS reception code:
case NSStreamEventHasBytesAvailable:
{
if(stream == inputStream)
{
int len = 0;
len = [inputStream read:receptionBuf maxLength:2048*2048*2];
packetBytesReceived += len;
[packetData appendBytes:receptionBuf length:len];
NSString* fullData = [[NSString alloc] initWithData:packetData encoding:NSASCIIStringEncoding];
...
...
I think the problem might be in NSASCIIStringEncoding since I am receiving characters in the main part of my data packet, but some content is just byte data values and this probably could be the cause...? Will start working on it.

0xc2 is the prepend for byte in UTF-8 encoding. It denotes that you are sending a special character in UTF-8 in the 0xc2 sequence. So 0xC2 0xB9 would translate to a superscript character; in particular ^1. My guess (since I assume this is not what you are actually sending) is that your encoding is set incorrectly some place.

Problem solved. I am reading my binary portion of data payload directly from packetData variable in iOS application (instead of fullData which is NSString) without converting it first to string and then decoding to byte with UTF8 encoding again.

Java help to convert UDP packet to int

I'm receiving an UDP packet (in which format I don't know, I think UTF-16 -little endian-), only thing that I know is the following doc. directly from the developers page:
The master servers each respond by sending FF FF FF FF 73 0A followed
by a (relatively) unique 4-byte "challenge" number.
So this is how I'm receiving the packet:
byte[] buff = new byte[64];
DatagramPacket packet = new DatagramPacket(buff, buff.length);
socket.receive(packet);
Packet was received, everything is Okay, but now I'm stuck. I need that 4 byte integer. I must split the buffer or... I don't know what to do.
This is the received data:
˙˙˙˙s
Ň?t
I tried to convert to hex but the output is very interesting:
-0000000000000000000000000008cf52df7c08c
Method to convert:
public String toHex(String arg) throws UnsupportedEncodingException {
return String.format("%040x", new BigInteger(arg.getBytes("UTF-16LE")));
}
Than I tried to convert hex to string (from the method above) and the result is much more interesting (sorry I can't copy paste, something goes wrong), anyway the method used to convert hex to string is:
public String hexToString(String hex){
StringBuilder output = new StringBuilder();
for (int i = 0; i < hex.length(); i+=2) {
String str = hex.substring(i, i+2);
output.append((char)Integer.parseInt(str, 16));
}
return new String(output);
}
So with all that said, I'm stuck. I don't know what am I supposed to do. Am I need to split the UDP packet in to pieces or...?

I'm receiving an UDP packet (in which format I don't know, I think UTF-16 -little endian-), only thing that I know is the following doc.
You really need to find out what the packet actually contains. The packet contents you have posted in your question don't make much sense to me, and don't seem to correspond to the supposed format.
Start out by dumping the bytes of the byte array like this:
byte[] bytes = ...
int len = // number of bytes read.
for (int i = 0; i < len; i++) {
System.out.format("%02x ", bytes[i]);
}
Then compare that with the expected format from the documentation. If they match (more or less) then you can start on the problem of deciding how to extract the data that you need. Otherwise, you first need to figure out what the format REALLY is. Maybe we can help ... but we need a reliable rendering of the packet (e.g. produced as above.)
FWIW, the reason that you are getting -0000000000000000000000000008cf52df7c08c is (I think) that BigInteger(byte[]) is interpreting the byte array as a signed number. Any way, that's a not a good way to do this. The UDP packet body is a sequence of bytes, not a number.
I also think it is unlikely that the UDP packet is UTF-16. FFFF is described thus in the official Unicode code charts:
Noncharacters:
These codes are intended for process-internal uses, but are not permitted for interchange. [...]
FFFF : • the value FFFF is guaranteed not to be a Unicode character at all
So if someone is claiming that this is UTF-16, the usage is violating the Unicode standard.

Java Character with value greater than 127 sending via tcp

Good day,
I am having some trouble. I am trying to send a character array over the network to a server.
here is my code:
char[] arr= {3,4};
public void sendMessage(String message){
if (out != null && !out.checkError()) {
out.print(arr);
out.flush();
}
}
When I run that, lanshark detects that it recieved a packet with 2 bytes of data. ok good all is well.
Now when i run this:
char[] arr= {3,160}; // notice 160 *****
public void sendMessage(String message){
if (out != null && !out.checkError()) {
out.print(arr);
out.flush();
}
}
lanshark says that the packet has 3 bytes of data? the exact data is :
03 c2 a0
Now why is it adding the c2 in there? i understand that it has some thing to do with the fact that my char is bigger than 127. But i need to send this value of 160 .
Please can you help me. Do i need to use another type of data, or send it in a different way? I know you can do this in C. How can i do it in java?
here is the code for my out object:
PrintWriter out;
out = new PrintWriter(new BufferedWriter(new OutputStreamWriter(socket.getOutputStream())), true);
Thanks

In fact, the program is doing exactly what you told it to do. It is sending two characters; i.e. the unicode codepoints 3 (\u0003) and 160 (\u00a0). These characters are being encoded using your platform's default character encoding ... which appears to be UTF-8. The bytes c2 a0 are the UTF-8 encoding for the \u00a0 character.
But what you are actually trying to do is send 2 bytes.
In Java char is a 16 bit type, not an 8 bit type. If you want to send 8 bit values you need to use the byte type.
The other mistake you are making is that you are trying to send (essentially) binary data using a Writer. The Writer interface is for (16-bit) character oriented data. You should be using the OutputStream API ...
Anyhow ... here's a code snippet to illustrate how you should send an array of bytes;
byte[] arr = new byte[]{3, (byte) 160);
OutputStream out = socket.getOutputStream();
out.write(arr);
I just changed it to that now, " out.print(new String(arr).getBytes(Charset.forName("UTF-8"))); " and this is the data i get on wireshark: 11 bytes : 5b42403431653230396538 it should still be 2 bytes?
You are making it worse!
Lets take this apart:
new String(arr) gives you a 2 character String.
.getBytes(...) will turn that into a 3 byte array containing the bytes 03 c2 a0.
out.print(...) will attempt to call a print method on the PrintWriter API.
But which one? Well you supplied an argument whose declared type is byte[]. And that will result in you calling print(Object).
But wait a minute ... what does PrintWriter.print(Object) do? Well the first thing is that it does is to call toString() on the argument.
And what does that do? Well since the object is a byte[], this calls the default toString() method provided by java.lang.Object. And that gives you a String that looks like B[#xxxxxxxx where [B is the "class name" for a byte array, and the sequence of xs is a hexadecimal representation of the array object's identity hashcode!
And then you output that that String.
And behold your 2 bytes (actually characters) have turned into 11 bytes.

You haven't told us anything about out - not even the type - but I strongly suspect it's using UTF-8 to encode your character data. U+00A0 (i.e. character 160) is encoded as 0xc2 0xa0 in UTF-8.
Using UTF-8 is almost certainly the best thing to do if you control both the reading and writing code: it means the whole of Unicode is available to you. If you used ISO-8859-1 (which is what I suspect you expected to use - a single byte per character, for all characters) you would be limiting yourself to only the range [U+0000, U+00FF].

You can make it as follow to send your data as "UTF-8":
out.print(new String(arr).getBytes(Charset.forName("UTF-8"));
Also you have to create your output stream with UTF-8 encoding as follow:
out = new PrintWriter(new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), "UTF-8")), true);
Now you will send your char array encoded with UTF-8, so up to 8 bytes per char (max value of 256). See answer of #JonSkeet to understand why you need "UTF-8" encoding.

Make sure the in and out are in the same encoding,then you will get exactly what you want. eg.,
When out sending:
mPrintWriterClient = new PrintWriter(new BufferedWriter(new OutputStreamWriter(mSocket.getOutputStream(), "UTF-8")), true);
when in receiving:
mBufferedReaderClient = new BufferedReader(new InputStreamReader(mSocket.getInputStream(), "UTF-8"));

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.