This question already has answers here:
Reading/writing a BINARY File with Strings?
(2 answers)
Closed 8 years ago.
I'm working on a simple VM/interpreter kind of program for a simple toy language I implemented. Currently the compiler emits textual assembly instructions such as push and add to be executed by the VM.
Recently I figured it would be a better idea to actually compile to binary opcodes instead of textual ones, for performance and space. (It doesn't actually matter in this toy project, but this is for learning).
Soon I realized, even though generally I consider myself a decent Java programmer, that I have no idea how to work with binary data in Java. No clue.
Regarding this, I have two questions:
How can I save binary data to a file? For example say I want to save the byte 00000001 and then the byte 00100000 to a file (each byte will be an opcode for the VM) - how can I do that?
How can I read binary data from a file on disk, save it in variables, parse it and manipulate it? Do I use the usual I/0 and parsing techniques I use with regular Strings, or is this something different?
Thanks for your help
You should be able to use any OutputStream to do this, because they all have a write method that takes a byte. For example, you could use a FileOutputStream.
Similarly, you can use an InputStream's read method for reading bytes.
OutputStream and its subtypes have methods to write bytes to files. DataOutputStream has additional methods to write int, long, float, double, etc. if you need those, although it writes them in big endian byte order. The corresponding input streams have equivalent methods for reading.
You may want to try to use ByteArrayInputStream and DataOutputStream
Refer to the Oracle documentation for details.
Also, checkout this similar question: How to output binary data to a file in Java?
Use a FileOutputStream. It has a write(byte[] b) method to write an array of bytes to a file and a write(int b) method to write a single byte to the file.
So, to save 00000001 followed by 00100001, you can do :
FileOutputStream file = new FileOutputStream(new File(path));
file.write(1);
file.write(33);
Similarly you can read FileInputStream for reading from a binary file.
Does anyone know where I can find that algorithm? It takes a double and StringBuilder and appends the double to the StringBuilder without creating any objects or garbage. Of course I am not looking for:
sb.append(Double.toString(myDouble));
// or
sb.append(myDouble);
I tried poking around the Java source code (I am sure it does it somehow) but I could not see any block of code/logic clear enough to be re-used.
I have written this for ByteBuffer. You should be able to adapt it. Writing it to a direct ByteBuffer saves you having to convert it to bytes or copy it into "native" space.
See public ByteStringAppender append(double d)
If you are logging this to a file, you might use the whole library as it can write around 20 million doubles per second sustained. It can do this without system calls as it writes to a memory mapped file.
I am trying to write Long using Java command for Random File IO as follows:
fstreamOut = new FileOutputStream(new File("C:\\Basmah","dataOutput.7"),true);
DataOutputStream out=new DataOutputStream(fstreamOut);
Long p= Long.parseLong(longNumberInString ); // Number of digits for this long key are 7-15
out.writeLong(p);
The problem is that when I write 7-15 digit number using writeLong ; it writes 8 bytes in file.
Then I am trying to read the same record into my program and decode it
Long l=in.readLong();
but I dont get the same number as I wrote ; Instead Iget EOF exception.
A long id 64-bit long. That makes 8 bytes. The DataOutputStream's writeLong method writes the binary representation of the long, not the textual one.
Without knowing the code used to read the long value, it's impossible to tell why it doesn't work.
The code given in your example and comment should work. The fact that it doesn't suggests that something else is going on here:
Maybe the writing and reading is happening on different files.
Maybe the file being written is not flushed / closed before you attempt to read it.
Maybe something else is overwriting the file.
Maybe the snippets of code you have provided are different enough to the real code to make a difference.
In the code that attempts to read the file, print what you get when you call f.length().
I'm attempting to read magic numbers/bytes to check the format of a file. Will reading a file byte by byte work in the same way on a Linux machine?
Edit: The following shows to get the magic bytes from a class file using an int. I'm trying to do the same for a variable number of bytes.
http://www.rgagnon.com/javadetails/java-0544.html
I'm not sure that I understand what you are trying to do, but it sounds like what you are trying to do isn't the same thing as what the code that you are linking to is doing.
The java class format is specified to start with a magic number, so that code can only be used to verify if a file might be a java class or not. You can't use the same logic and apply it to arbritraty file formats.
Edit: .. or do you only want to check for wav files?
Edit2: Everything in Java is in big endian, that means that you can use DataInputStream.readInt to read the first four bytes from the file, and then compare the returned int with 0x52494646 (RIFF as a big endian integer)
According to here, the C compiler will pad out values when writing a structure to a binary file. As the example in the link says, when writing a struct like this:
struct {
char c;
int i;
} a;
to a binary file, the compiler will usually leave an unnamed, unused hole between the char and int fields, to ensure that the int field is properly aligned.
How could I to create an exact replica of the binary output file (generated in C), using a different language (in my case, Java)?
Is there an automatic way to apply C padding in Java output? Or do I have to go through compiler documentation to see how it works (the compiler is g++ by the way).
Don't do this, it is brittle and will lead to alignment and endianness bugs.
For external data it is much better to explicitly define the format in terms of bytes and write explicit functions to convert between internal and external format, using shift and masks (not union!).
This is true not only when writing to files, but also in memory. It is the fact that the struct is padded in memory, that leads to the padding showing up in the file, if the struct is written out byte-by-byte.
It is in general very hard to replicate with certainty the exact padding scheme, although I guess some heuristics would get you quite far. It helps if you have the struct declaration, for analysis.
Typically, fields larger than one char will be aligned so that their starting offset inside the structure is a multiple of their size. This means shorts will generally be on even offsets (divisible by 2, assuming sizeof (short) == 2), while doubles will be on offsets divisible by 8, and so on.
UPDATE: It is for reasons like this (and also reasons having to do with endianness) that it is generally a bad idea to dump whole structs out to files. It's better to do it field-by-field, like so:
put_char(out, a.c);
put_int(out, a.i);
Assuming the put-functions only write the bytes needed for the value, this will emit a padding-less version of the struct to the file, solving the problem. It is also possible to ensure a proper, known, byte-ordering by writing these functions accordingly.
Is there an automatic way to apply C
padding in Java output? Or do I have
to go through compiler documentation
to see how it works (the compiler is
g++ by the way).
Neither. Instead, you explicitly specify a data/communication format and implement that specification, rather than relying on implementation details of the C compiler. You won't even get the same output from different C compilers.
For interoperability, look at the ByteBuffer class.
Essentially, you create a buffer of a certain size, put() variables of different types at different positions, and then call array() at the end to retrieve the "raw" data representation:
ByteBuffer bb = ByteBuffer.allocate(8);
bb.order(ByteOrder.LITTLE_ENDIAN);
bb.put(0, someChar);
bb.put(4, someInteger);
byte[] rawBytes = bb.array();
But it's up to you to work out where to put padding-- i.e. how many bytes to skip between positions.
For reading data written from C, then you generally wrap() a ByteBuffer around some byte array that you've read from a file.
In case it's helpful, I've written more on ByteBuffer.
A handy way of reading/writing C structs in Java is to use the javolution Struct class (see http://www.javolution.org). This won't help you with automatically padding/aligning your data, but it does make working with raw data held in a ByteBuffer much more convenient. If you're not familiar with javolution, it's well worth a look as there's lots of other cool stuff in there too.
This hole is configurable, compiler has switches to align structs by 1/2/4/8 bytes.
So the first question is: Which alignment exactly do you want to simulate?
With Java, the size of data types are defined by the language specification. For example, a byte type is 1 byte, short is 2 bytes, and so on. This is unlike C, where the size of each type is architecture-dependent.
Therefore, it would be important to know how the binary file is formatted in order to be able to read the file into Java.
It may be necessary to take steps in order to be certain that fields are a specific size, to account for differences in the compiler or architecture. The mention of alignment seem to suggest that the output file will depend on the architecture.
you could try preon:
Preon is a java library for building codecs for bitstream-compressed data in a
declarative (annotation based) way. Think JAXB or Hibernate, but then for binary
encoded data.
it can handle Big/Little endian binary data, alignment (padding) and various numeric types along other features. It is a very nice library, I like it very much
my 0.02$
I highly recommend protocol buffers for exactly this problem.
As I understand it, you're saying that you don't control the output of the C program. You have to take it as given.
So do you have to read this file for some specific set of structures, or do you have to solve this in a general case? I mean, is the problem that someone said, "Here's the file created by program X, you have to read it in Java"? Or do they expect your Java program to read the C source code, find the structure definition, and then read it in Java?
If you've got a specific file to read, the problem isn't really very difficult. Either by reviewing the C compiler specifications or by studying example files, figure out where the padding is. Then on the Java side, read the file as a stream of bytes, and build the values you know are coming. Basically I'd write a set of functions to read the required number of bytes from an InputStream and turn them into the appropriate data type. Like:
int readInt(InputStream is,int len)
throws PrematureEndOfDataException
{
int n=0;
while (len-->0)
{
int i=is.read();
if (i==-1)
throw new PrematureEndOfDataException();
byte b=(byte) i;
n=(n<<8)+b;
}
return n;
}
You can alter the packing on the c side to ensure that no padding is used, or alternatively you can look at the resultant file format in a hex editor to allow you to write a parser in Java that ignores bytes that are padding.