While reading a Java book on byte streams, I came across this example which the book uses to show the difference between the two. The example used is the number 199. According to the book, if this number is written to character stream, then it is written as three different characters: 0x31 0xC7 0x39. But if this is written to byte stream, it is written as single value 0xC7. My doubt is, 199 does not fit into a byte in Java. So, shouldn't it be written as two bytes instead of one? Is 199 written as 1 byte or two bytes in binary streams?
If you call OutputStream.write(int), which is a method for writing a single byte, it will ignore all the bits except the bottom eight. That means that 199 and -57 would be written exactly the same way. For that particular method, that's the way it works because it is only supposed to write a byte.
If you called some other method, it will work differently. For instance, DataOutputStream.writeInt writes an integer as four bytes, because that's what that method is for.
Related
So i'm working on a bittorrent project. I need to create a bitfield according to number of piece. So i use BitSet but the problem is, the method toByteArray doesn't return the byte array in the order of byte that i wanted.
Eg:
//number of piece=11
bitSet.set(5,16); //following bittorrent specification
bitSet.toByteArray() -> 0xe0ff (this is the byte that i get)
But what i want is 0xffe0
Thanks in advance.
This is due to big-endian vs little-endian mismatch. BitSet is strictly bit-wise little-endian. BitSet#toByteArray() handles reversing the bit-order, but outputs little-endian bytes. So you'll have to rearrange the bytes yourself to match the desired order. The desired order will depend on the size of the "word" in your output data structure.
If the output is a short, you swap bytes 2 at a time, but if it's a long you'll need to reverse each set of 4 output bytes. It's possible to do this with a ByteBuffer but it's probably more work, unless you're already using ByteBuffers.
Unfortunately, there isn't a BitSet.toShortArray(), which would do what you want.
You might find the Wikipedia article Endianness useful.
I'm learning about Text I/O and Binary I/O in java right now. I read that each value that you write to a file is initially stored in binary. For text I/O, the individual digits are converted to it's corresponding Unicode values and then encoded to the file-specific encoding such as ASCII. For binary I/O, the binary value is directly represented in the file. For example, 199 would be represented as 0xC7 which in binary is 11000111. Now I'm confused on one part. If a variable is initially stored as a binary format, does each digit represent a separate byte that is stored or is the entirety of the number stored as a single byte. For example, is 199 originally stored as 0xc7 which would be 11000111 in binary? Or would it be stored in 3 bytes with each byte representing the binary value for the digit. If it was stored in 3 separate bytes, does binary I/O convert that 3 byte number to a single byte? If it's stored in a single byte, how does text I/O translate that single byte into 3 separate byte values. I'm just confused on how to word this. Hope you can understand what I'm getting at. Thanks
The only thing which a computer is capable of dealing with are sets of 0/1 bits which are stored in memory or, if you wish on a storage device. Those bits can be streamed to monitors and converted to characters by graphical hardware. Sams story with keyboards, you type a key and a few bits of data will be send to the computer.
Bits are stored in memory and are accessible by memory addresses. The addresses are also sets of bits.
For practical reasons the bits are grouped into bytes, words, long words, ... A byte used to be the smallest addressable unit of bits and historically ended up as a group of 8 bits, which is currently used in most of the hardware. Modern memory can store data in multiple byte addressable chunks. Same for the disk, you store data there, using specific addressing mechanisms. But in any case those are just sets of bits.
What you are confused about is the interpretation of those bits. They can represent integer numbers, floating point numbers, characters, addresses, ... The way they are interpreted only depends on the program which uses them.
Characters do not exist in the computer. They are just an abstraction which is provided by programming languages. The programs interpret the bits stored on the computer. There are standards. For example the ASCII encoding maps English characters plus a few special characters into numbers from 0 to 127. Those fit into a single byte (leaving number 128 to 255 for special use). A print command will read those bytes one by one and send them to graphics to form letters on the screen as specified in the encoding standard. Different encoding scheme will display the same bytes differently.
If you write a program wit the "hello world" sting in it, the program will convert the symbols between quotes into a set of 11 ascii bytes. (In 'c' it will add yet another byte which is equal to '0' and ends the string this way). Unicode is yet another way to represent characters. Every unicode character is represented by multiple bytes of data. There are other schemes as well. One thing to pay attention to. If you write strings on the disk using certain encoding, you should read them with the same encoding, or your prints will give you garbage. But you can always read and copy then as binary data without interpretation.
So, any variable of any type is just an abstraction and always consists of bytes of data which your program knows how to interpret based on the data type and/or operations it wants to perform. Variables of type int, double, any java object, including String, are just sets of bytes of different sizes. Only the program (and java interpreter is a program) knows what to do with them, use them in calculations or display as characters.
In ojdbc6, an accessor can call the oracle.jdbc.driver.T4CMAREngine's unmarshalCLR method during unmarshaling of results from a database. Inside unmarshalCLR, there is also this unmashalUB1 method.
What do these two methods do?
It's an Oracle database specific thing relating to their TNS protocol.
A google search turns up a spec, though I have no idea how accurate or up-to-date it is.
Mentioning CLRs:
A CLR is a byte array in 64-byte blocks. If its length <=64, it is just
length-byte-preceeded and written as native. Null arrays can be written as the
single bytes 0x0 or 0xff. If length >64, first a LNG byte (0xfe) is written,
then the array is written in length-byte-preceeded chunks of 64 bytes (although
the final chunk can be shorter), followed by a 0 byte. A chunk preceeded by a
length of 0xfe is ignored.
Looks like a CLR is an encoded byte array.
A UB1 is simply an unsigned byte (data type length of 1 byte).
I was reading through this article. It has this following snippet
OutputStream output = new FileOutputStream("c:\\data\\output-text.txt");
while(moreData) {
int data = getMoreData();
output.write(data);
}
output.close();
It is mentioned:
OutputStreams are used for writing byte based data, one byte at a time. The write() method of an OutputStream takes an int which contains the byte value of the byte to write.
Let's say I am writing the string Hello World to the file, so each character in string gets converted to int using getMoreData() method. and how does it get written? as character or byte in the output-text.txt? If it gets written in byte, what is the advantage of writing in bytes if I have to "reconvert" byte to character?
Each character (and almost anything stored on a file) is a byte / bytes. For example:
Lowercase 'a' is written as one byte with decimal value 97.
Number '1' is written as one byte with decimal value 49
There's no more concept of data types once the information is written into a file, everything is just a stream of bytes. What's important is the encoding used to store the information into the file
Have a look at ascii table, which is very useful for beginners learning information encoding.
To illustrate this, create a file containing the text 'hello world'
$ echo 'hello world' > hello.txt
Then output the bytes written to the file using od command:
$ od -td1 hello.txt
0000000 104 101 108 108 111 32 119 111 114 108 100 10
0000014
The above means, at address 0000000 from the start of the file, I see one byte with decimal value 104 (which is character 'h'), then one byte with decimal value 101 (which is character 'e") and so on..
The article is incomplete, because an OutputStream has overloaded methods for write that take a byte[], a byte[] along with offset and length arguments, or a single int.
In the case of writing a String to a stream when the only interface you have is OutputStream (say you don't know what the underlying implementation is), it would be much better to use output.write(string.getBytes()). Iteratively peeling off a single int at a time and writing it to the file is going to perform horribly compared to a single call to write that passes an array of bytes.
Streams operate on bytes and simply read/write raw data.
Readers and writers interpret the underlying data as strings using character sets such as UTF-8 or US-ASCII. This means they may take 8 bit characters (ASCII) and convert the data into UTF-16 strings.
Streams use bytes, readers/writers use strings (or other complex types).
The Java.io.OutputStream class is the superclass of all classes representing an output stream of bytes. When bytes are written to the OutputStream, it may not write the bytes immediately, instead the write method may put the bytes into a buffer.
There are methods to write as mentioned below:
void write(byte[] b)
This method writes b.length bytes from the specified byte array to this output stream.
void write(byte[] b, int position, int length)
This method writes length bytes from the specified byte array starting at offset position to this output stream.
void write(int b)
This method writes the specified byte to this output stream.
What else are represented as stream of bytes?
At a certain level of abstraction, just about everything is stored, represented or transferred as a sequence or stream of bytes.
Ok, what can be stored/transferred as a System.IO.Stream object in .NET or counterpart in Java?
Any information that can be represented by a computer can (in theory) be turned into a sequence of bytes and stored / transferred via a byte-oriented I/O stream. You may need to write some software to transform the computer representation of the information into a sequence of bytes that is suitable for transfer via a byte stream. However, any finite representation can be transformed into bytes.
The only things that you cannot represent and transmit as a byte stream are those that only have an infinite representation (e.g. the complete value of Pi, or the set of all prime numbers), and those that have no digital representation (e.g. beauty or Barack Obama).
Ok, what can be stored/transferred as a System.IO.Stream object in .NET or counterpart in Java?
I don't know about the .NET case, but Java's ObjectOutputStream only works for classes that implement the Serializable or Externalizable interfaces. (And in the former case, all other classes in the non-transient closure of the original object must also implement Serializable.)
Some system classes are not Serializable; for example, Thread, Process, various IO classes and most AWT / Swing related classes. The common theme is that that these classes all involve some kind of resource that is managed by the operating system.
This may be more of a philosophical matter, but anything that you can think of objectively can be stored as a sequence of numbers. Bytes are just one example, but you can store them as a sequence of numbers, text characters (because they are also translatable to numbers), peanuts on a table, anything.
For example, you can represent the same thing as either bytes or hex digits themselves represented as decimal numbers and the characters A, B, C, D, E and F right? such as
#nav{color:#123ABC;}
You can also Base-64 anything, and Base64 means there are 64 possibilities. You could make up Base65 if you wanted to, and it would work too.
Then what can be represented? What can you think of? What can you define rationally? All that can be thought can be represented as a stream of numbers - every file in our hard drives is one after the other in a huge stream, the concept of "folders", "files", etc. is just an abstraction of offsets in that huge chain of ones and zeroes that we interpret as bytes, ints, chars, etc.