The DataInputStream class's documentation says that it is able to
read the data in a machine-independent way , what exactly does that
mean ?
Does it mean it will receive the exact same data regardless
of , in what programming language the other end of the Socket
communication is written ?
Is the DataInputStream useful for reading
primitives sent from an application/program written in other
language(e.g. if a C application is sending and a Java application
is receiving) if not which class would ?
The DataInputStream works on a sequence of bytes. When it reads larger values from such a sequence, it uses a fixed interpretation. For example, when reading an int, which requires 4 bytes, it reads them in big-endian format. That is, if the byte stream contains 0x01 0x23 0x45 0x67, the DataInputStream will read this as the integer 0x01234567.
In short, it uses a fixed Endianness instead of relying on the endianness of the platform.
Plus, it defines the exact size and representation of several data types, whose sizes depend on the execution environment in other programming languages. For example, int C the type int is at least 16 bits wide, while Java defines it as exactly 32 bits wide, and so does Java's DataInputStream.
The DataInputStream is great when you need to exchange data between Java programs. If you need to exchange data between different programming languages, you should use another library that is implemented in all involved programming languages. Maybe Google's protobuf. Or if your data is text data, use JSON or XML.
Related
I'm aware that this is probably not the best idea but I've been playing around trying to read a file in PHP that was encoded using Java's DataOutputStream.
Specifically, in Java I use:
dataOutputStream.writeInt(number);
Then in PHP I read the file using:
$data = fread($handle, 4);
$number = unpack('N', $data);
The strange thing is that the only format character in PHP that gives the correct value is 'N', which is supposed to represent "unsigned long (always 32 bit, big endian byte order)". I thought that int in java was always signed?
Is it possible to reliably read data encoded in Java in this way or not? In this case the integer will only ever need to be positive. It may also need to be quite large so writeShort() is not possible. Otherwise of course I could use XML or JSON or something.
This is fine, as long as you don't need that extra bit. l (instead of N) would work on a big endian machine.
Note, however, that the maximum number that you can store is 2,147,483,647 unless you want to do some math on the Java side to get the proper negative integer to represent the desired unsigned integer.
Note that a signed Java integer uses the two's complement method to represent a negative number, so it's not as easy as flipping a bit.
DataOutputStream.writeInt:
Writes an int to the underlying output stream as four bytes, high byte
first.
The formats available for the unpack function for signed integers all use machine dependent byte order. My guess is that your machine uses a different byte order than Java. If that is true, the DataOutputStream + unpack combination will not work for any signed primitive.
I have a C# server that cannot be altered. In C#, a byte ranges fom 0 - 255, while in JAVA it ranges from -128 to 127.
I have read about the problem with unsigned byte/ints/etc and the only real option as I have found out is to use "more memory" to represent the unsigned thing:
http://darksleep.com/player/JavaAndUnsignedTypes.html
Is that really true?
So when having network communication between the JAVA client and the C# server, the JAVA client receives byte arrays from the server. The server sends them "as unsigned" but when received they will be interpreted as signed bytes, right?
Do I then have to typecast each byte into an Int and then add 127 to each of them?
I'm not sure here... but how do I interpret it back to the same values (int, strings etc) as I had on the C# server?
I find this whole situation extremely messy (not to mention the endianess-problems, but that's for another post).
A byte is 0-255 in C#, not 0-254.
However, you really don't need to worry in most cases - basically in both C# and Java, a byte is 8 bits. If you send a byte array from C#, you'll receive the same bits in Java and vice versa. If you then convert parts of that byte array to 32-bit integers, strings etc, it'll all work fine. The signed-ness of bytes in Java is almost always irrelevant - it's only if you treat them numerically that it's a problem.
What's more of a problem is the potential for different endianness when (say) converting a 32-bit integer into 4 bytes in Java and then reading the data in C# or vice versa. You'd have to give more information about what you're trying to do in order for us to help you there. In particular do you already have a protocol that you need to adhere to? If so, that pretty much decides what you need to do - and how hard it will be depends on what the protocol is.
If you get to choose the protocol, you may wish to use a platform-independent serialization format such as Protocol Buffers which is available for both .NET and Java.
Unfortunately, yes ... the answer is to "use more memory", at least on some level.
You can store the data as a byte array in java, but when you need to use that data numerically you'll need to move up to an int and add 256 to negative values. A bitwise & will do this for you quickly and efficiently.
int foo;
if (byte[3] < 0)
foo = (byte[3] & 0xFF);
else
foo = byte[3];
Since Java bytes are signed values and I'm trying to establish a TCP socket connection with a C# program that is expecting the bytes to be unsigned.
I am not able to change the code on the C# portion.
How can I go about sending these bytes in the correct format using Java.
Thanks
No, Java bytes are signed values. In general C# bytes are unsigned. (You'd need the sbyte type to refer to signed bytes; I can't remember the last time I used sbyte.)
However, it shouldn't matter at all in terms of transferring data across the wire - normally you just send across whatever binary data you've got (e.g. what you've read from a file) and both sides will do the right thing. A byte with value -1 on the Java side will come through as a byte with value 255 on the C# side.
If you can tell us more about exactly what you're trying to do (what the data is) we may be able to help more, but I strongly suspect you can just ignore the difference in this case.
It doesn't matter. The numbers are just strings of bits and the fact that they're signed or unsigned doesn't matter as far as the bits are concerned. And it's the bits that are transferred so all that signed/unsigned information is irrelevant.
1100101010101111 is still 1100101010101111 regardless of signed/unsigned.
I'd like to know if there is a simple way to "cast" a byte array containing a data-structure of a known layout to an Object. The byte[] consists of BCD packed values, 1 or 2-byte integer values and character values. I'm obtaining the byte[] via reading a file with a FileInputStream.
People who've worked on IBM-Mainframe systems will know what I mean right away - the problem is I have to do the same in Java.
Any suggestions welcome.
No, because the object layout can vary depending on what VM you're using, what architecture the code is running on etc.
Relying on an in-memory representation has always felt brittle to me...
I suggest you look at DataInputStream - that will be the simplest way to parse your data, I suspect.
Not immediately, but you can write one pretty easily if you know exactly what the bytes represent.
To convert a BCD packed number you need to extract the two digits encoded. The four lower bits encode the lowest digit and you get that by &'ing with 15 (1111 binary). The four upper bits encode the highest digit which you get by shifting right 4 bits and &'ing with 15.
Also note that IBM most likely have tooling available if you this is what you are actually doing. For the IBM i look for the jt400 IBM Toolbox for Java.
What else are represented as stream of bytes?
At a certain level of abstraction, just about everything is stored, represented or transferred as a sequence or stream of bytes.
Ok, what can be stored/transferred as a System.IO.Stream object in .NET or counterpart in Java?
Any information that can be represented by a computer can (in theory) be turned into a sequence of bytes and stored / transferred via a byte-oriented I/O stream. You may need to write some software to transform the computer representation of the information into a sequence of bytes that is suitable for transfer via a byte stream. However, any finite representation can be transformed into bytes.
The only things that you cannot represent and transmit as a byte stream are those that only have an infinite representation (e.g. the complete value of Pi, or the set of all prime numbers), and those that have no digital representation (e.g. beauty or Barack Obama).
Ok, what can be stored/transferred as a System.IO.Stream object in .NET or counterpart in Java?
I don't know about the .NET case, but Java's ObjectOutputStream only works for classes that implement the Serializable or Externalizable interfaces. (And in the former case, all other classes in the non-transient closure of the original object must also implement Serializable.)
Some system classes are not Serializable; for example, Thread, Process, various IO classes and most AWT / Swing related classes. The common theme is that that these classes all involve some kind of resource that is managed by the operating system.
This may be more of a philosophical matter, but anything that you can think of objectively can be stored as a sequence of numbers. Bytes are just one example, but you can store them as a sequence of numbers, text characters (because they are also translatable to numbers), peanuts on a table, anything.
For example, you can represent the same thing as either bytes or hex digits themselves represented as decimal numbers and the characters A, B, C, D, E and F right? such as
#nav{color:#123ABC;}
You can also Base-64 anything, and Base64 means there are 64 possibilities. You could make up Base65 if you wanted to, and it would work too.
Then what can be represented? What can you think of? What can you define rationally? All that can be thought can be represented as a stream of numbers - every file in our hard drives is one after the other in a huge stream, the concept of "folders", "files", etc. is just an abstraction of offsets in that huge chain of ones and zeroes that we interpret as bytes, ints, chars, etc.