I am sending a byte by a TCP connection, when I send a single negative number (like -30 in this example) I get three bytes:
Client Side:
PrintWriter out = new PrintWriter(new BufferedWriter(new OutputStreamWriter(socket.getOutputStream())));
out.write((byte)-30);
out.flush();
out.close();
Server Side:
is = new DataInputStream(clientSocket.getInputStream());
is.readFully(bbb);
for (int i=0;i<bbb.length;i++)
System.out.println(i+":"+bbb[i]);
what i get is:
0:-17
1:-65
2:-94
but I sent just -30
You're using a writer, and you're calling Writer.write(int):
Writes a single character. The character to be written is contained in the 16 low-order bits of the given integer value; the 16 high-order bits are ignored.
So you've got an conversion to int, then the bottom 16 bits of that int are taken. So you're actually writing Unicode character 65506 (U+FFE2), in your platform default encoding (which appears to be UTF-8). That's not what you want to write, but that's what you are writing.
If you only want to write binary data, you shouldn't be using a Writer at all. Just use OutputStream - wrap it in DataOutputStream if you want, but don't use a Writer. The Writer classes are for text.
Related
How to count number of bytes of this binary file (t.dat) without running this code
(as a theoretical question) ?
Assuming that you run the following program on Windows using the default ASCII encoding.
public class Bin {
public static void main(String[] args){
DataOutputStream output = new DataOutputStream(
new FileOutputStream("t.dat"));
output.writeInt(12345);
output.writeUTF("5678");
output.close();
}
}
Instead of trying to compute the bytes output by each write operation, you could simply check the length of the file after it's closed using new File("t.dat").length().
If you wanted to figure it out without checking the length directly, an int takes up 4 bytes, and something written with writeUTF takes up 2 bytes to represented the encoded length of the string, plus the space the string itself takes, which in this case is another 4 bytes -- in UTF-8, each of the characters in "5678" requires 1 byte.
So that's 4 + 2 + 4, or 10 bytes.
I am using BufferedWriter to write text to files in Java. However, I am providing the custom buffer size in the constructor. The thing is, it is writing to the file in chunks of whatever the size I am giving (for example, if I give the buffer size as 8KB, the files are written once for 8KB). But, when I look at the memory occupied by the bufferedwriter object (using YourKit profiler), it is actually twice the given buffer size (16KB in this case).
I tried to look at the internal implementation to see why this is happening, I see that it is creating a char array with the given size. And when it writes to the array, it makes sense that it occupies twice the buffer size as each char occupies 2 bytes.
My question is, how is BufferedWriter managing to write only 8KB in this case, where it is storing 16KB in the buffer. And is this technically correct? Because each time, it is flushing only 8KB (half) even though it has 16KB in buffer.
But I expected all the chars stored in the char array to be written to the file when it reaches the buffer size (which would be 16 KB in my given example).
8K of chars occupies 16 KB of memory. Correct.
Now lets assume that the chars are actually all in the ASCII subset.
When you write a character stream to an output file in Java, the characters are encoded as a byte stream according to some encoding scheme. (This encoding is performed by stuff inside the OutputStreamWriter class, for example.)
When you encode those 8K of characters using an 8 bit character set / encoding scheme such as ASCII or Latin-1 ... or to UTF-8 (!!) ... each character is encoded as 1 byte. Therefore flushing a buffer containing those 8K characters generates an 8K byte write.
The size of BufferedWriter is the char array size.
public BufferedWriter(Writer out, int sz) {
super(out);
if (sz <= 0)
throw new IllegalArgumentException("Buffer size <= 0");
this.out = out;
cb = new char[sz];
nChars = sz;
nextChar = 0;
lineSeparator = java.security.AccessController.doPrivileged(
new sun.security.action.GetPropertyAction("line.separator"));
}
A single char is not equal to a single byte. It is all defined by your character encoding.
Therefore, to execute the task exactly as what you described, you have to switch to another class: BufferedOutputStream, which the internal buffer is exactly counted by number of bytes.
public BufferedOutputStream(OutputStream out, int size) {
super(out);
if (size <= 0) {
throw new IllegalArgumentException("Buffer size <= 0");
}
buf = new byte[size];
}
It depends on the encoding used to write to the file: ISO-8859-1 store a character as a single byte, UTF-8 encodes all ASCII character as a single byte.
write() method in FileOutputStream takes an int but truncates the first 3 bytes and writes the byte to stream.
If a file contains characters whose ASCII value is more than 127 and bytes are read from it and then written to output stream(another text file) how will it display it because in Java bytes can have a max value of +127.
If a text file(input.text) has character '›' whose ASCII value is 155.
An input stream,input, reads from it :
int in= new FileInputStream("input.txt").read();//in = 155
Now it writes to another text file(output.txt)
new FileOutputStream("output.txt").write(in);
Here integer "in" is truncated to byte which will have corresponding decimal value : -101.
How it successfully manages to write the character to file even though information about it seems to have been lost?
Just now i went through the description of write(int) method in java docs and what i observed was
The general contract for write is that one byte is written to the output stream. The byte to be written is the eight low-order bits of the argument b. The 24 high-order bits of b are ignored.
So i believe that contrary to what i thought earlier(the int in write() is truncated as would happen while downcasting an integer to byte for values greater than 127) the 24 high order bits are only ignored and only 8 least significant bits are considered.
No truncation and conversion to byte occurs.
I guess i am correct.
I think your confusion is caused by the fact that specs for character sets typically take the view that bytes are unsigned, while Java treats bytes as signed.
In fact 155 as an unsigned byte is -101 as a signed byte. (256 - 101 == 155). The bit patterns are identical. It is just a matter of whether you think of them as signed or unsigned.
How the truncation is coded is implementation specific. But there is no loss of information ... assuming that you had an 8-bit code in the first place.
I was reading through this article. It has this following snippet
OutputStream output = new FileOutputStream("c:\\data\\output-text.txt");
while(moreData) {
int data = getMoreData();
output.write(data);
}
output.close();
It is mentioned:
OutputStreams are used for writing byte based data, one byte at a time. The write() method of an OutputStream takes an int which contains the byte value of the byte to write.
Let's say I am writing the string Hello World to the file, so each character in string gets converted to int using getMoreData() method. and how does it get written? as character or byte in the output-text.txt? If it gets written in byte, what is the advantage of writing in bytes if I have to "reconvert" byte to character?
Each character (and almost anything stored on a file) is a byte / bytes. For example:
Lowercase 'a' is written as one byte with decimal value 97.
Number '1' is written as one byte with decimal value 49
There's no more concept of data types once the information is written into a file, everything is just a stream of bytes. What's important is the encoding used to store the information into the file
Have a look at ascii table, which is very useful for beginners learning information encoding.
To illustrate this, create a file containing the text 'hello world'
$ echo 'hello world' > hello.txt
Then output the bytes written to the file using od command:
$ od -td1 hello.txt
0000000 104 101 108 108 111 32 119 111 114 108 100 10
0000014
The above means, at address 0000000 from the start of the file, I see one byte with decimal value 104 (which is character 'h'), then one byte with decimal value 101 (which is character 'e") and so on..
The article is incomplete, because an OutputStream has overloaded methods for write that take a byte[], a byte[] along with offset and length arguments, or a single int.
In the case of writing a String to a stream when the only interface you have is OutputStream (say you don't know what the underlying implementation is), it would be much better to use output.write(string.getBytes()). Iteratively peeling off a single int at a time and writing it to the file is going to perform horribly compared to a single call to write that passes an array of bytes.
Streams operate on bytes and simply read/write raw data.
Readers and writers interpret the underlying data as strings using character sets such as UTF-8 or US-ASCII. This means they may take 8 bit characters (ASCII) and convert the data into UTF-16 strings.
Streams use bytes, readers/writers use strings (or other complex types).
The Java.io.OutputStream class is the superclass of all classes representing an output stream of bytes. When bytes are written to the OutputStream, it may not write the bytes immediately, instead the write method may put the bytes into a buffer.
There are methods to write as mentioned below:
void write(byte[] b)
This method writes b.length bytes from the specified byte array to this output stream.
void write(byte[] b, int position, int length)
This method writes length bytes from the specified byte array starting at offset position to this output stream.
void write(int b)
This method writes the specified byte to this output stream.
I am trying to make a tcp request in Java to a existing TCP server that is already available.
the interface specification is:
Field Length Type
Length 2 bytes 16-bits binary
Message ID 1 byte 8-bits binary
MSGTYPE 1 byte 8-bits binary
Variable1 4 bytes 32-bits binary
Variable2 30 bytes ASCII
Variable3 1 byte 8-bits binary
I understand how to convert a String to Binary using BigInteger.
String testing = "Test Binary";
byte[] bytes = testing.getBytes();
BigInteger bi = new BigInteger(bytes);
System.out.println(bi.toString(2));
My Understanding is that if i wanted to make a TCP request i would first
need to convert each binary to a string
append the values to a StringBuffer.
Unfortunately my understanding is limited so i wanted some advice on creating the TCP request correctly.
I wouldn't use String (as you have binary data), StringBuffer (ever), or BigInteger (as it not what its designed for).
Assuming you have a Big Endian data stream I would use DataOutputStream
DataOutptuStream out = new DataOutputStream(new BufferedOutputStream(socket.getOutputStream()));
out.writeShort(length);
out.write(messageId);
out.write(msgtype);
out.write((var1+"\0\0\0\0").substring(0, 4).getBytes("ISO-8859-1"));
out.write(var2.getBytes("ISO-8859-1"));
out.write(var2);
out.flush(); // optionally.
If you have a little endian protocol, you need to use ByteBuffer in which case I would use a blocking NIO SocketChannel.
BTW I would use ISO-8859-1 (8-bit bytes) rather than US-ASCII (7-bit bytes)
You are going into the wrong direction. The fact that the message specification states that, for example, first field is a 16 bit binary doesn't mean that you will send a binary string. You will just send an (unsigned?) 16 bit number which will be, as a matter of fact, codified in binary since it's internal representation can just be that one.
When writing onto a socket through an DataOutputStream like in
int value = 123456;
out.writeInt(value);
you are already writing it in binary.