Two quotes:
All of the remaining messages in the protocol take the form of <length prefix><message ID><payload>. The length prefix is a four byte big-endian value. The message ID is a single decimal byte. The payload is message dependent.
request: <len=0013><id=6><index><begin><length>
The request message is fixed length, and is used to request a block.
The payload contains the following information:
index: integer specifying the zero-based piece index
begin: integer specifying the zero-based byte offset within the piece
length: integer specifying the requested length.
When I write everything it sums up to 5 bytes. Using
ByteArrayOutputStream byteStream = new ByteArrayOutputStream();
byteStream.write( 13 );
byteStream.write( 6 );
byteStream.write( index );
byteStream.write( begin );
byteStream.write( length );
message = byteStream.toByteArray();
EDIT: Sorry i was kind of pissed when i wrote it. its the bittorent protocol.
Using this spec.
The write() method writes one byte.
If you send it a char or int it just strips everything above the 8th bit with & 0xFF.
You have more options with DataOutputStream (writeInt, writeShort etc.) but it uses big endian byte order so you might need to perform an Integer.reverseBytes() (or Short.reverseBytes()) call before passing in the value to the writeXYZ() method.
ByteArrayOutputStream byteStream = new ByteArrayOutputStream();
DataOutputStream dout = new DataOutputStream(byteStream);
dout.writeInt( 0x13 ); // L:4
dout.write( 6 ); // L:5
dout.writeShort( index ); // guess, L:7
dout.writeLong( begin ); // >4GB support? L:15
dout.writeInt( length ); // clients accept below to 2^17, L:19
dout.flush(); // to be sure
message = byteStream.toByteArray();
Remark: The spec doesn't state the length of index, begin and length. I just wanted to give a sample of the available options.
Edit 2: Edited the sample based on D.Shawley's answer and a spec found here.
I'm not sure what you are getting at here... the quoted text doesn't say what the length of <index>, <begin>, or <length> is. The first quote states, rather clearly, that a message consists of a 4-byte length, followed by a 1-byte identifier, and an arbitrary payload.
The length of the payload is probably either the value specified as <length> or <length>+5 depending on exactly what <length> means. The second quote looks like the definition of whatever message is identified by the 1-byte identifier of 0x06. I would guess that:
the payload, the bytes that make up <index><begin><length>, is probably 14 bytes long
the length is being displayed in hex so 0x0013 is 19 decimal
In any case, the byte stream that you are generating does not seem to match the message definition AND the message definition lacks clarity.
write() writes bytes. 5 write()'s produces 5 bytes.
See write(int b).
Writes the specified byte to this
output stream. The general contract
for write is that one byte is written
to the output stream. The byte to be
written is the eight low-order bits of
the argument b. The 24 high-order bits
of b are ignored.
Subclasses of
OutputStream must provide an
implementation for this method.
Parameters:
b - the byte.
Related
I currently ran into a actually quite weird problem with the put method of java.nio.bytebuffer and I wondered if you might know the answer to it, so let's get to it.
My goal is it to concatenate some data to a bytebuffer. The problem is after I call the put method it always adds a 0 after the byte array.
Here is the method which has those side effects:
public ByteBuffer toByteBuffer() {
ByteBuffer buffer = ByteBuffer.allocateDirect(1024));
buffer.put(type.name().getBytes()); // 0 added here
data.forEach((key, value) -> {
buffer.putChar('|');
buffer.put(key.getBytes()); // 0 added here
buffer.putChar('=');
buffer.put(value.getBytes()); // 0 added here
});
buffer.flip();
return buffer;
}
Expected Output should look something like this:
ClientData|OAT=14.9926405|FQRIGHT=39.689075|.....
Actual Ouptut where _ represents the 0s:
ClientData_|OAT_=14.9926405_|FQRIGHT_=39.689075_|.....
The documentation doesn't say anything about this side effect.
Also the put method only puts a 0 in between the byte arrays but not at the very end of the buffer.
I assume the method might be wrong or at least not properly documented, but I really have no clue why it would behave that way.
I think you may be slightly misinterpreting what is happening here. I note your comment about "but not at the very end of the buffer".
The \0 is actually coming from the putChar(char) calls, not the put(byte[]) calls. As it says in the docs (emphasis added):
Writes two bytes containing the given char value, in the current byte order
The default byte order is big endian; given that the chars you are writing are in the 7-bit ASCII range, this means "the byte you want" is going to be preceded by 0x00.
If you want to write a byte, use put(byte):
buffer.put((byte) '|');
I am trying to send data from PHP TCP server to JAVA TCP client.
I am comparing my results by comparing hex values of the data.
PHP script reads STDIN, sends it through socket one byte at a time and java reads it using DataInputStream.read(), converts to hex and displays.
If I manually type data into script - it works ok.
If I use file with data - it works OK
But when I assign /dev/urandom(even few bytes) - the data on the java side is coming corrupted. There is always a hex of value efbfbd in random places instead of correct data.
Please help me with this issue.
PHP code:
$f = fopen( 'php://stdin', 'rb' );
while($line = fread($f, 1)){
$length = 1;
echo bin2hex($line)."\n";
echo socket_write($client, $line, 1)."\n";
$sent = socket_write($client, $line, $length);
if ($sent === false) {
break;
}
// Check if the entire message has been sented
if ($sent < $length) {
// If not sent the entire message.
// Get the part of the message that has not yet been sented as message
$line = substr($line, $sent);
// Get the length of the not sented part
$length -= $sent;
}
Java code:
in = new DataInputStream(clientSocket.getInputStream());
byte[] data = new byte[1];
int count = 0;
while(in.available() > 0){
//System.out.println(in.available());
in.read(data);
String message = new String(data);
System.out.println(message);
//System.out.flush();
System.out.println( toHex(message) );
//in.flush();
message = "";
}
You're stumbling upon encoding. By calling new String(data) the byte array is converted using your default encoding to a string, whatever this encoding may is (you can set the encoding by java -Dfile.encoding=UTF-8 to UTF-8 for example).
The Java code you want would most likely look the following:
in = new DataInputStream(clientSocket.getInputStream());
byte[] data = new byte[1];
int count = 0;
while (in.available() > 0) {
// System.out.println(in.available());
in.read(data);
String hexMessage = Integer.toHexString(data[0] & 0xFF);
String stringMessage = new String(data, "UTF-8"); // US-ASCII, ISO-8859-1, ...
System.out.println(hexMessage);
}
Update: I missed the 32bit issue. The 8-bit byte, which is signed in Java, is sign-extended to a 32-bit int. To effectively undo this sign extension, one can mask the byte with 0xFF.
There are two main issues with your Java program.
First - the use of in.available(). It does not tell you how many bytes there are still in the message. It merely says how many bytes are ready in the stream and for available reading without being blocked. For example, if the server sends two packets of data over the socket, one has arrived, but one is still being sent over the Internet, and each packet has 200 bytes (this is just an example), then in the first call you'll get the answer 200. If you read 200 bytes, you're sure not to be blocked. But if the second packet has not arrived yet, your next check of in.available() will return 0. If you stop at this point, you only have half the data. Not what you wanted.
Typically you either have to read until you reach end-of-stream (InputStream.read() returns -1), and then you can't use the same stream anymore and you close the socket, or you have a specific protocol that tells you how many bytes to expect and you read that number of bytes.
But that's not the reason for the strange values you see in output from your program. The reason is that Java and PHP represent strings completely differently. In PHP, a string can contain any bytes at all, and the interpretation of them as characters is up to the prorgrammer.
This basically means that a PHP string is the equivalent of a byte[] in Java.
But Java Strings are completely different. It consists internally of an array of char, and char is always two bytes in UTF-16 encoding. When you convert bytes you read into a Java String, it's always done by encoding the bytes using some character encoding so that the appropriate characters are stored in the string.
For example, if your bytes are 44 4F 4C 4C, and the character encoding is ISO-8859-1, this will be interpreted as the characters \u0044, \u004F, \u004C, \u004C. It will be a string of four characters - "DOLL". But if your character encoding is UTF-16, the bytes will be interpreted as \u444F and \u4C4C. A string of only two characters, "䑏䱌".
When you were reading from the console or from a file, the data was probably in the encoding that Java expects by default. This is usually the case when the file is written in pure English, with just English letters, spaces and punctuation. These are all 7-bit characters which are the same in ISO-8859-1 and UTF-8, which are the common defaults. But in /dev/urandom you'd have some bytes in the range 80 through FF, which may be treated differently when interpreted into a UTF-16 Java string.
Furthermore, you didn't show your toHex() method in Java. It probably reads bytes back from the string again, but using which encoding? If you read the bytes into the String using ISO-8859-1, and got them out in UTF-8, you'd get completely different bytes.
If you want to see exactly what PHP sent you, don't put the bytes in a String. Write a toHex method that works on byte arrays, and use the byte[] you read directly.
Also, always remember to check the number of bytes returned by read() and only interpret that number of bytes! read() does not always fill the entire array. So in your new toHex() method, you need to also pass the number of bytes read as a parameter, so that it doesn't display the parts of the array after them. In your case you just have a one-byte array - which is not recommended - but even in this case, read() can return 0, and it's a perfectly legal value indicating that in this particular call to read() there were no bytes available although there may be some available in the next read().
As the comment above says you might be having troubles with the string representation of the bytes String message = new String(data); To be certain, you should get the data bytes and encode them in Base64 for example. You can use a library such as Apache Commons or Java 8 to do that. You should be able to do something similar in PHP to compare.
I am creating an easy to use server-client model with an extensible protocol, where the server is in Java and clients can be Java, C#, what-have-you.
I ran into this issue: Java data streams write strings with a short designating the length, followed by the data.
C# lets me specify the encoding I want, but it only reads one byte for the length. (actually, it says '7 bits at a time'...this is odd. This might be part of my problem?)
Here is my setup: The server sends a string to the client once it connects. It's a short string, so the first byte is 0 and the second byte is 9; the string is 9 bytes long.
//...
_socket.Connect(host, port);
var stream = new NetworkStream(_socket);
_in = new BinaryReader(stream, Encoding.UTF8);
Console.WriteLine(_in.ReadString()); //outputs nothing
Reading a single byte before reading the string of course outputs the expected string. But, how can I set up my stream reader to read a string using two bytes as the length, not one? Do I need to subclass BinaryReader and override ReadString()?
The C# BinaryWriter/Reader behavior uses, if I recall correctly, the 8th bit to signify where the last byte of the count is. This allows for counts up to 127 to fit in a single byte while still allowing for actual count values much larger (i.e. up to 2^31-1); it's a bit like UTF8 in that respect.
For your own purposes, note that you are writing the whole protocol (presumably), so you have complete control over both ends. Both behaviors you describe, in C# and Java, are implemented by what are essentially helper classes in each language. There's nothing saying that you have to use them, and both languages offer a way to simply encode text directly into an array of bytes which you can send however you like.
If you do want to stick with the Java-based protocol, you can use BitConverter to convert between a short to a byte[] so that you can send and receive those two bytes explicitly. For example:
_in = new BinaryReader(stream, Encoding.UTF8);
byte[] header = _in.ReadBytes(2);
short count = BitConverter.ToInt16(header, 0);
byte[] data = _in.ReadBytes(count);
string text = Encoding.UTF8.GetString(data);
Console.WriteLine(text); // outputs something
Good day,
I am having some trouble. I am trying to send a character array over the network to a server.
here is my code:
char[] arr= {3,4};
public void sendMessage(String message){
if (out != null && !out.checkError()) {
out.print(arr);
out.flush();
}
}
When I run that, lanshark detects that it recieved a packet with 2 bytes of data. ok good all is well.
Now when i run this:
char[] arr= {3,160}; // notice 160 *****
public void sendMessage(String message){
if (out != null && !out.checkError()) {
out.print(arr);
out.flush();
}
}
lanshark says that the packet has 3 bytes of data? the exact data is :
03 c2 a0
Now why is it adding the c2 in there? i understand that it has some thing to do with the fact that my char is bigger than 127. But i need to send this value of 160 .
Please can you help me. Do i need to use another type of data, or send it in a different way? I know you can do this in C. How can i do it in java?
here is the code for my out object:
PrintWriter out;
out = new PrintWriter(new BufferedWriter(new OutputStreamWriter(socket.getOutputStream())), true);
Thanks
In fact, the program is doing exactly what you told it to do. It is sending two characters; i.e. the unicode codepoints 3 (\u0003) and 160 (\u00a0). These characters are being encoded using your platform's default character encoding ... which appears to be UTF-8. The bytes c2 a0 are the UTF-8 encoding for the \u00a0 character.
But what you are actually trying to do is send 2 bytes.
In Java char is a 16 bit type, not an 8 bit type. If you want to send 8 bit values you need to use the byte type.
The other mistake you are making is that you are trying to send (essentially) binary data using a Writer. The Writer interface is for (16-bit) character oriented data. You should be using the OutputStream API ...
Anyhow ... here's a code snippet to illustrate how you should send an array of bytes;
byte[] arr = new byte[]{3, (byte) 160);
OutputStream out = socket.getOutputStream();
out.write(arr);
I just changed it to that now, " out.print(new String(arr).getBytes(Charset.forName("UTF-8"))); " and this is the data i get on wireshark: 11 bytes : 5b42403431653230396538 it should still be 2 bytes?
You are making it worse!
Lets take this apart:
new String(arr) gives you a 2 character String.
.getBytes(...) will turn that into a 3 byte array containing the bytes 03 c2 a0.
out.print(...) will attempt to call a print method on the PrintWriter API.
But which one? Well you supplied an argument whose declared type is byte[]. And that will result in you calling print(Object).
But wait a minute ... what does PrintWriter.print(Object) do? Well the first thing is that it does is to call toString() on the argument.
And what does that do? Well since the object is a byte[], this calls the default toString() method provided by java.lang.Object. And that gives you a String that looks like B[#xxxxxxxx where [B is the "class name" for a byte array, and the sequence of xs is a hexadecimal representation of the array object's identity hashcode!
And then you output that that String.
And behold your 2 bytes (actually characters) have turned into 11 bytes.
You haven't told us anything about out - not even the type - but I strongly suspect it's using UTF-8 to encode your character data. U+00A0 (i.e. character 160) is encoded as 0xc2 0xa0 in UTF-8.
Using UTF-8 is almost certainly the best thing to do if you control both the reading and writing code: it means the whole of Unicode is available to you. If you used ISO-8859-1 (which is what I suspect you expected to use - a single byte per character, for all characters) you would be limiting yourself to only the range [U+0000, U+00FF].
You can make it as follow to send your data as "UTF-8":
out.print(new String(arr).getBytes(Charset.forName("UTF-8"));
Also you have to create your output stream with UTF-8 encoding as follow:
out = new PrintWriter(new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), "UTF-8")), true);
Now you will send your char array encoded with UTF-8, so up to 8 bytes per char (max value of 256). See answer of #JonSkeet to understand why you need "UTF-8" encoding.
Make sure the in and out are in the same encoding,then you will get exactly what you want. eg.,
When out sending:
mPrintWriterClient = new PrintWriter(new BufferedWriter(new OutputStreamWriter(mSocket.getOutputStream(), "UTF-8")), true);
when in receiving:
mBufferedReaderClient = new BufferedReader(new InputStreamReader(mSocket.getInputStream(), "UTF-8"));
Basically I'm trying to use a BufferedWriter to write to a file using Java. The problem is, I'm actually doing some compression so I generate ints between 0 and 255, and I want to write the character who's ASCII value is equal to that int. When I try writing to the file, it writes many ? characters, so when I read the file back in, it reads those as 63, which is clearly not what I want. Any ideas how I can fix this?
Example code:
int a = generateCode(character); //a now has an int between 0 and 255
bw.write((char) a);
a is always between 0 and 255, but it sometimes writes '?'
You are really trying to write / read bytes to / from a file.
When you are processing byte-oriented data (as distinct from character-oriented data), you should be using InputStream and OutputStream classes and not Reader and Writer classes.
In this case, you should use FileInputStream / FileOutputStream, and wrap with a BufferedInputStream / BufferedOutputStream if you are doing byte-at-a-time reads and writes.
Those pesky '?' characters are due to issues the encoding/decoding process that happens when Java converts between characters and the default text encoding for your platform. The conversion from bytes to characters and back is often "lossy" ... depending on the encoding scheme used. You can avoid this by using the byte-oriented stream classes.
(And the answers that point out that ASCII is a 7-bit not 8-bit character set are 100% correct. You are really trying to read / write binary octets, not characters.)
You need to make up your mind what are you really doing. Are you trying to write some bytes to a file, or are you trying to write encoded text? Because these are different concepts in Java; byte I/O is handled by subclasses of InputStream and OutputStream, while character I/O is handled by subclasses of Reader and Writer. If what you really want to write is bytes to a file (which I'm guessing from your mention of compression), use an OutputStream, not a Writer.
Then there's another confusion you have, which is evident from your mention of "ASCII characters from 0-255." There are no ASCII characters above 127. Please take 15 minutes to read this: "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" (by Joel Spolsky). Pay particular attention to the parts where he explains the difference between a character set and an encoding, because it's critical for understanding Java I/O. (To review whether you understood, here's what you need to learn: Java Writers are classes that translate character output to byte output by applying a client-specified encoding to the text, and sending the bytes to an OutputStream.)
Java strings are based on 16 bit wide characters, it tries to perform conversions around that assumption if there is no clear specifications.
The following sample code, write and reads data directly as bytes, meaning 8-bit numbers which have an ASCII meaning associated with them.
import java.io.*;
public class RWBytes{
public static void main(String[] args)throws IOException{
String filename = "MiTestFile.txt";
byte[] bArray1 =new byte[5];
byte[] bArray2 =new byte[5];
bArray1[0]=65;//A
bArray1[1]=66;//B
bArray1[2]=67;//C
bArray1[3]=68;//D
bArray1[4]=69;//E
FileOutputStream fos = new FileOutputStream(filename);
fos.write(bArray1);
fos.close();
FileInputStream fis = new FileInputStream(filename);
fis.read(bArray2);
ByteArrayInputStream bais = new ByteArrayInputStream(bArray2);
for(int i =0; i< bArray2.length ; i++){
System.out.println("As the bytem value: "+ bArray2[i]);//as the numeric byte value
System.out.println("Converted as char to printiong to the screen: "+ String.valueOf((char)bArray2[i]));
}
}
}
A fixed subset of the 7 bit ASCII code is printable, A=65 for example, the 10 corresponds to the "new line" character which steps down one line on screen when found and "printed". Many other codes exist which manipulate a character oriented screen, these are invisible and manipulated the screen representation like tabs, spaces, etc. There are also other control characters which had the purpose of ringing a bell for example.
The higher 8 bit end above 127 is defined as whatever the implementer wanted, only the lower half have standard meanings associated.
For general binary byte handling there are no such qualm, they are number which represent the data. Only when trying to print to the screen the become meaningful in all kind of ways.