Java Character with value greater than 127 sending via tcp

Java Character with value greater than 127 sending via tcp - java

Good day,
I am having some trouble. I am trying to send a character array over the network to a server.
here is my code:
char[] arr= {3,4};
public void sendMessage(String message){
if (out != null && !out.checkError()) {
out.print(arr);
out.flush();
}
}
When I run that, lanshark detects that it recieved a packet with 2 bytes of data. ok good all is well.
Now when i run this:
char[] arr= {3,160}; // notice 160 *****
public void sendMessage(String message){
if (out != null && !out.checkError()) {
out.print(arr);
out.flush();
}
}
lanshark says that the packet has 3 bytes of data? the exact data is :
03 c2 a0
Now why is it adding the c2 in there? i understand that it has some thing to do with the fact that my char is bigger than 127. But i need to send this value of 160 .
Please can you help me. Do i need to use another type of data, or send it in a different way? I know you can do this in C. How can i do it in java?
here is the code for my out object:
PrintWriter out;
out = new PrintWriter(new BufferedWriter(new OutputStreamWriter(socket.getOutputStream())), true);
Thanks

In fact, the program is doing exactly what you told it to do. It is sending two characters; i.e. the unicode codepoints 3 (\u0003) and 160 (\u00a0). These characters are being encoded using your platform's default character encoding ... which appears to be UTF-8. The bytes c2 a0 are the UTF-8 encoding for the \u00a0 character.
But what you are actually trying to do is send 2 bytes.
In Java char is a 16 bit type, not an 8 bit type. If you want to send 8 bit values you need to use the byte type.
The other mistake you are making is that you are trying to send (essentially) binary data using a Writer. The Writer interface is for (16-bit) character oriented data. You should be using the OutputStream API ...
Anyhow ... here's a code snippet to illustrate how you should send an array of bytes;
byte[] arr = new byte[]{3, (byte) 160);
OutputStream out = socket.getOutputStream();
out.write(arr);
I just changed it to that now, " out.print(new String(arr).getBytes(Charset.forName("UTF-8"))); " and this is the data i get on wireshark: 11 bytes : 5b42403431653230396538 it should still be 2 bytes?
You are making it worse!
Lets take this apart:
new String(arr) gives you a 2 character String.
.getBytes(...) will turn that into a 3 byte array containing the bytes 03 c2 a0.
out.print(...) will attempt to call a print method on the PrintWriter API.
But which one? Well you supplied an argument whose declared type is byte[]. And that will result in you calling print(Object).
But wait a minute ... what does PrintWriter.print(Object) do? Well the first thing is that it does is to call toString() on the argument.
And what does that do? Well since the object is a byte[], this calls the default toString() method provided by java.lang.Object. And that gives you a String that looks like B[#xxxxxxxx where [B is the "class name" for a byte array, and the sequence of xs is a hexadecimal representation of the array object's identity hashcode!
And then you output that that String.
And behold your 2 bytes (actually characters) have turned into 11 bytes.

You haven't told us anything about out - not even the type - but I strongly suspect it's using UTF-8 to encode your character data. U+00A0 (i.e. character 160) is encoded as 0xc2 0xa0 in UTF-8.
Using UTF-8 is almost certainly the best thing to do if you control both the reading and writing code: it means the whole of Unicode is available to you. If you used ISO-8859-1 (which is what I suspect you expected to use - a single byte per character, for all characters) you would be limiting yourself to only the range [U+0000, U+00FF].

You can make it as follow to send your data as "UTF-8":
out.print(new String(arr).getBytes(Charset.forName("UTF-8"));
Also you have to create your output stream with UTF-8 encoding as follow:
out = new PrintWriter(new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), "UTF-8")), true);
Now you will send your char array encoded with UTF-8, so up to 8 bytes per char (max value of 256). See answer of #JonSkeet to understand why you need "UTF-8" encoding.

Make sure the in and out are in the same encoding,then you will get exactly what you want. eg.,
When out sending:
mPrintWriterClient = new PrintWriter(new BufferedWriter(new OutputStreamWriter(mSocket.getOutputStream(), "UTF-8")), true);
when in receiving:
mBufferedReaderClient = new BufferedReader(new InputStreamReader(mSocket.getInputStream(), "UTF-8"));

Related

Data is corrupted after sending from PHP to JAVA using tcp

I am trying to send data from PHP TCP server to JAVA TCP client.
I am comparing my results by comparing hex values of the data.
PHP script reads STDIN, sends it through socket one byte at a time and java reads it using DataInputStream.read(), converts to hex and displays.
If I manually type data into script - it works ok.
If I use file with data - it works OK
But when I assign /dev/urandom(even few bytes) - the data on the java side is coming corrupted. There is always a hex of value efbfbd in random places instead of correct data.
Please help me with this issue.
PHP code:
$f = fopen( 'php://stdin', 'rb' );
while($line = fread($f, 1)){
$length = 1;
echo bin2hex($line)."\n";
echo socket_write($client, $line, 1)."\n";
$sent = socket_write($client, $line, $length);
if ($sent === false) {
break;
}
// Check if the entire message has been sented
if ($sent < $length) {
// If not sent the entire message.
// Get the part of the message that has not yet been sented as message
$line = substr($line, $sent);
// Get the length of the not sented part
$length -= $sent;
}
Java code:
in = new DataInputStream(clientSocket.getInputStream());
byte[] data = new byte[1];
int count = 0;
while(in.available() > 0){
//System.out.println(in.available());
in.read(data);
String message = new String(data);
System.out.println(message);
//System.out.flush();
System.out.println( toHex(message) );
//in.flush();
message = "";
}

You're stumbling upon encoding. By calling new String(data) the byte array is converted using your default encoding to a string, whatever this encoding may is (you can set the encoding by java -Dfile.encoding=UTF-8 to UTF-8 for example).
The Java code you want would most likely look the following:
in = new DataInputStream(clientSocket.getInputStream());
byte[] data = new byte[1];
int count = 0;
while (in.available() > 0) {
// System.out.println(in.available());
in.read(data);
String hexMessage = Integer.toHexString(data[0] & 0xFF);
String stringMessage = new String(data, "UTF-8"); // US-ASCII, ISO-8859-1, ...
System.out.println(hexMessage);
}
Update: I missed the 32bit issue. The 8-bit byte, which is signed in Java, is sign-extended to a 32-bit int. To effectively undo this sign extension, one can mask the byte with 0xFF.

There are two main issues with your Java program.
First - the use of in.available(). It does not tell you how many bytes there are still in the message. It merely says how many bytes are ready in the stream and for available reading without being blocked. For example, if the server sends two packets of data over the socket, one has arrived, but one is still being sent over the Internet, and each packet has 200 bytes (this is just an example), then in the first call you'll get the answer 200. If you read 200 bytes, you're sure not to be blocked. But if the second packet has not arrived yet, your next check of in.available() will return 0. If you stop at this point, you only have half the data. Not what you wanted.
Typically you either have to read until you reach end-of-stream (InputStream.read() returns -1), and then you can't use the same stream anymore and you close the socket, or you have a specific protocol that tells you how many bytes to expect and you read that number of bytes.
But that's not the reason for the strange values you see in output from your program. The reason is that Java and PHP represent strings completely differently. In PHP, a string can contain any bytes at all, and the interpretation of them as characters is up to the prorgrammer.
This basically means that a PHP string is the equivalent of a byte[] in Java.
But Java Strings are completely different. It consists internally of an array of char, and char is always two bytes in UTF-16 encoding. When you convert bytes you read into a Java String, it's always done by encoding the bytes using some character encoding so that the appropriate characters are stored in the string.
For example, if your bytes are 44 4F 4C 4C, and the character encoding is ISO-8859-1, this will be interpreted as the characters \u0044, \u004F, \u004C, \u004C. It will be a string of four characters - "DOLL". But if your character encoding is UTF-16, the bytes will be interpreted as \u444F and \u4C4C. A string of only two characters, "䑏䱌".
When you were reading from the console or from a file, the data was probably in the encoding that Java expects by default. This is usually the case when the file is written in pure English, with just English letters, spaces and punctuation. These are all 7-bit characters which are the same in ISO-8859-1 and UTF-8, which are the common defaults. But in /dev/urandom you'd have some bytes in the range 80 through FF, which may be treated differently when interpreted into a UTF-16 Java string.
Furthermore, you didn't show your toHex() method in Java. It probably reads bytes back from the string again, but using which encoding? If you read the bytes into the String using ISO-8859-1, and got them out in UTF-8, you'd get completely different bytes.
If you want to see exactly what PHP sent you, don't put the bytes in a String. Write a toHex method that works on byte arrays, and use the byte[] you read directly.
Also, always remember to check the number of bytes returned by read() and only interpret that number of bytes! read() does not always fill the entire array. So in your new toHex() method, you need to also pass the number of bytes read as a parameter, so that it doesn't display the parts of the array after them. In your case you just have a one-byte array - which is not recommended - but even in this case, read() can return 0, and it's a perfectly legal value indicating that in this particular call to read() there were no bytes available although there may be some available in the next read().

As the comment above says you might be having troubles with the string representation of the bytes String message = new String(data); To be certain, you should get the data bytes and encode them in Base64 for example. You can use a library such as Apache Commons or Java 8 to do that. You should be able to do something similar in PHP to compare.

Java help to convert UDP packet to int

I'm receiving an UDP packet (in which format I don't know, I think UTF-16 -little endian-), only thing that I know is the following doc. directly from the developers page:
The master servers each respond by sending FF FF FF FF 73 0A followed
by a (relatively) unique 4-byte "challenge" number.
So this is how I'm receiving the packet:
byte[] buff = new byte[64];
DatagramPacket packet = new DatagramPacket(buff, buff.length);
socket.receive(packet);
Packet was received, everything is Okay, but now I'm stuck. I need that 4 byte integer. I must split the buffer or... I don't know what to do.
This is the received data:
˙˙˙˙s
Ň?t
I tried to convert to hex but the output is very interesting:
-0000000000000000000000000008cf52df7c08c
Method to convert:
public String toHex(String arg) throws UnsupportedEncodingException {
return String.format("%040x", new BigInteger(arg.getBytes("UTF-16LE")));
}
Than I tried to convert hex to string (from the method above) and the result is much more interesting (sorry I can't copy paste, something goes wrong), anyway the method used to convert hex to string is:
public String hexToString(String hex){
StringBuilder output = new StringBuilder();
for (int i = 0; i < hex.length(); i+=2) {
String str = hex.substring(i, i+2);
output.append((char)Integer.parseInt(str, 16));
}
return new String(output);
}
So with all that said, I'm stuck. I don't know what am I supposed to do. Am I need to split the UDP packet in to pieces or...?

I'm receiving an UDP packet (in which format I don't know, I think UTF-16 -little endian-), only thing that I know is the following doc.
You really need to find out what the packet actually contains. The packet contents you have posted in your question don't make much sense to me, and don't seem to correspond to the supposed format.
Start out by dumping the bytes of the byte array like this:
byte[] bytes = ...
int len = // number of bytes read.
for (int i = 0; i < len; i++) {
System.out.format("%02x ", bytes[i]);
}
Then compare that with the expected format from the documentation. If they match (more or less) then you can start on the problem of deciding how to extract the data that you need. Otherwise, you first need to figure out what the format REALLY is. Maybe we can help ... but we need a reliable rendering of the packet (e.g. produced as above.)
FWIW, the reason that you are getting -0000000000000000000000000008cf52df7c08c is (I think) that BigInteger(byte[]) is interpreting the byte array as a signed number. Any way, that's a not a good way to do this. The UDP packet body is a sequence of bytes, not a number.
I also think it is unlikely that the UDP packet is UTF-16. FFFF is described thus in the official Unicode code charts:
Noncharacters:
These codes are intended for process-internal uses, but are not permitted for interchange. [...]
FFFF : • the value FFFF is guaranteed not to be a Unicode character at all
So if someone is claiming that this is UTF-16, the usage is violating the Unicode standard.

Handling C char arrays with Java char arrays

Through a socket I am sending information from a program written in C to a program written in Java.
Through the program in C, I am sending two bytes through a char array (using an internet socket), and the received information in Java is stored in a char array also.
My main problem is that the received information in the Java array does not correspond properly to the transmitted information from the C program.
I have read that the chars in Java are 16 bits long, and the chars in C are 8 bits long. That may be the problem, but I do not know how to handle/solve that.
The C code to send information is the following:
char buffer[256];
bzero(buffer,256);
n = read(fd, buffer, 255); // getting info from an uart port, which works properly
n = write(sockfd,buffer,3); // send the information through the socket
Part of the Java code (for an Android app) is the following:
char[] buffer = new char[256];
BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
int readX = in.read(buffer, 0, 3);
if (readX > 0) { // I am using a handler to manipulate the info
Message msg = new Message();
msg.obj = buffer;
mHandler.sendMessage(msg);
}
....
// Part of the handler is the following:
mHandler = new Handler() {
#Override
public void handleMessage(Message msg) {
char[] buffer;
buffer = (char[])msg.obj; // the information here is different from the one sent from the C program
....
}
}
Any suggestion to solve this problem I would really appreciate it.
Thanks in advance, Gus.

In C and C++ the char data type is 8-bit characters, corresponding roughly to the Java byte type. In Java, the fundamental char type is a 16-bit Unicode character. When you convert from bytes to characters (or vice-versa) in Java, a mapping has to occur, depending on the character encoding of the byte stream (UTF-8, ISO-8859-1, etc), so you have to know how the C byte stream is encoded. In your case I'd guess it's ISO-8859-1. If it's really binary data, then you should use the Java byte type.
EDIT:
You have to know whether the data being sent from C is character or binary, and if character, how the C program is encoding the data (ISO-8859-1, UTF-8, etc).
If the data is binary then you must use BufferedInputStream to read bytes, not BufferedReader, which decodes bytes into characters, which you don't want since the data is binary.
If the data is character, then you can use the 2-argument constructor
InputStreamReader(InputStream in, String charSetName)
or one of the other 2-arg constructors that let you specify how to decode bytes into characters.

Use the Java byte type, which is an 8-bit signed integer. Also ensure that your char type in C is actually 8 bits. The Boost StaticAssert facility can be used to ensure that.

How do you write any ASCII character to a file in Java?

Basically I'm trying to use a BufferedWriter to write to a file using Java. The problem is, I'm actually doing some compression so I generate ints between 0 and 255, and I want to write the character who's ASCII value is equal to that int. When I try writing to the file, it writes many ? characters, so when I read the file back in, it reads those as 63, which is clearly not what I want. Any ideas how I can fix this?
Example code:
int a = generateCode(character); //a now has an int between 0 and 255
bw.write((char) a);
a is always between 0 and 255, but it sometimes writes '?'

You are really trying to write / read bytes to / from a file.
When you are processing byte-oriented data (as distinct from character-oriented data), you should be using InputStream and OutputStream classes and not Reader and Writer classes.
In this case, you should use FileInputStream / FileOutputStream, and wrap with a BufferedInputStream / BufferedOutputStream if you are doing byte-at-a-time reads and writes.
Those pesky '?' characters are due to issues the encoding/decoding process that happens when Java converts between characters and the default text encoding for your platform. The conversion from bytes to characters and back is often "lossy" ... depending on the encoding scheme used. You can avoid this by using the byte-oriented stream classes.
(And the answers that point out that ASCII is a 7-bit not 8-bit character set are 100% correct. You are really trying to read / write binary octets, not characters.)

You need to make up your mind what are you really doing. Are you trying to write some bytes to a file, or are you trying to write encoded text? Because these are different concepts in Java; byte I/O is handled by subclasses of InputStream and OutputStream, while character I/O is handled by subclasses of Reader and Writer. If what you really want to write is bytes to a file (which I'm guessing from your mention of compression), use an OutputStream, not a Writer.
Then there's another confusion you have, which is evident from your mention of "ASCII characters from 0-255." There are no ASCII characters above 127. Please take 15 minutes to read this: "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" (by Joel Spolsky). Pay particular attention to the parts where he explains the difference between a character set and an encoding, because it's critical for understanding Java I/O. (To review whether you understood, here's what you need to learn: Java Writers are classes that translate character output to byte output by applying a client-specified encoding to the text, and sending the bytes to an OutputStream.)

Java strings are based on 16 bit wide characters, it tries to perform conversions around that assumption if there is no clear specifications.
The following sample code, write and reads data directly as bytes, meaning 8-bit numbers which have an ASCII meaning associated with them.
import java.io.*;
public class RWBytes{
public static void main(String[] args)throws IOException{
String filename = "MiTestFile.txt";
byte[] bArray1 =new byte[5];
byte[] bArray2 =new byte[5];
bArray1[0]=65;//A
bArray1[1]=66;//B
bArray1[2]=67;//C
bArray1[3]=68;//D
bArray1[4]=69;//E
FileOutputStream fos = new FileOutputStream(filename);
fos.write(bArray1);
fos.close();
FileInputStream fis = new FileInputStream(filename);
fis.read(bArray2);
ByteArrayInputStream bais = new ByteArrayInputStream(bArray2);
for(int i =0; i< bArray2.length ; i++){
System.out.println("As the bytem value: "+ bArray2[i]);//as the numeric byte value
System.out.println("Converted as char to printiong to the screen: "+ String.valueOf((char)bArray2[i]));
}
}
}
A fixed subset of the 7 bit ASCII code is printable, A=65 for example, the 10 corresponds to the "new line" character which steps down one line on screen when found and "printed". Many other codes exist which manipulate a character oriented screen, these are invisible and manipulated the screen representation like tabs, spaces, etc. There are also other control characters which had the purpose of ringing a bell for example.
The higher 8 bit end above 127 is defined as whatever the implementer wanted, only the lower half have standard meanings associated.
For general binary byte handling there are no such qualm, they are number which represent the data. Only when trying to print to the screen the become meaningful in all kind of ways.

How is this 13 bytes long?

Two quotes:
All of the remaining messages in the protocol take the form of <length prefix><message ID><payload>. The length prefix is a four byte big-endian value. The message ID is a single decimal byte. The payload is message dependent.
request: <len=0013><id=6><index><begin><length>
The request message is fixed length, and is used to request a block.
The payload contains the following information:
index: integer specifying the zero-based piece index
begin: integer specifying the zero-based byte offset within the piece
length: integer specifying the requested length.
When I write everything it sums up to 5 bytes. Using
ByteArrayOutputStream byteStream = new ByteArrayOutputStream();
byteStream.write( 13 );
byteStream.write( 6 );
byteStream.write( index );
byteStream.write( begin );
byteStream.write( length );
message = byteStream.toByteArray();
EDIT: Sorry i was kind of pissed when i wrote it. its the bittorent protocol.
Using this spec.

The write() method writes one byte.
If you send it a char or int it just strips everything above the 8th bit with & 0xFF.
You have more options with DataOutputStream (writeInt, writeShort etc.) but it uses big endian byte order so you might need to perform an Integer.reverseBytes() (or Short.reverseBytes()) call before passing in the value to the writeXYZ() method.
ByteArrayOutputStream byteStream = new ByteArrayOutputStream();
DataOutputStream dout = new DataOutputStream(byteStream);
dout.writeInt( 0x13 ); // L:4
dout.write( 6 ); // L:5
dout.writeShort( index ); // guess, L:7
dout.writeLong( begin ); // >4GB support? L:15
dout.writeInt( length ); // clients accept below to 2^17, L:19
dout.flush(); // to be sure
message = byteStream.toByteArray();
Remark: The spec doesn't state the length of index, begin and length. I just wanted to give a sample of the available options.
Edit 2: Edited the sample based on D.Shawley's answer and a spec found here.

I'm not sure what you are getting at here... the quoted text doesn't say what the length of <index>, <begin>, or <length> is. The first quote states, rather clearly, that a message consists of a 4-byte length, followed by a 1-byte identifier, and an arbitrary payload.
The length of the payload is probably either the value specified as <length> or <length>+5 depending on exactly what <length> means. The second quote looks like the definition of whatever message is identified by the 1-byte identifier of 0x06. I would guess that:
the payload, the bytes that make up <index><begin><length>, is probably 14 bytes long
the length is being displayed in hex so 0x0013 is 19 decimal
In any case, the byte stream that you are generating does not seem to match the message definition AND the message definition lacks clarity.

write() writes bytes. 5 write()'s produces 5 bytes.

See write(int b).
Writes the specified byte to this
output stream. The general contract
for write is that one byte is written
to the output stream. The byte to be
written is the eight low-order bits of
the argument b. The 24 high-order bits
of b are ignored.
Subclasses of
OutputStream must provide an
implementation for this method.
Parameters:
b - the byte.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.