Java: Efficiently converting an array of longs to an array of bytes - java

I have an array of longs I want to write to disk. The most efficient disk I/O functions take in byte arrays, for example:
FileOutputStream.write(byte[] b, int offset, int length)
...so I want to begin by converting my long[] to byte[] (8 bytes for each long). I'm struggling to find a clean way to do this.
Direct typecasting doesn't seem allowed:
ConversionTest.java:6: inconvertible types
found : long[]
required: byte[]
byte[] byteArray = (byte[]) longArray;
^
It's easy to do the conversion by iterating over the array, for example:
ByteBuffer bytes = ByteBuffer.allocate(longArray.length * (Long.SIZE/8));
for( long l: longArray )
{
bytes.putLong( l );
}
byte[] byteArray = bytes.array();
...however that seems far less efficient than simply treating the long[] as a series of bytes.
Interestingly, when reading the file, it's easy to "cast" from byte[] to longs using Buffers:
LongBuffer longs = ByteBuffer.wrap(byteArray).asLongBuffer();
...but I can't seem to find any functionality to go the opposite direction.
I understand there are endian considerations when converting from long to byte, but I believe I've already addressed those: I'm using the Buffer framework shown above, which defaults to big endian, regardless of native byte order.

No, there is not a trivial way to convert from a long[] to a byte[].
Your best option is likely to wrap your FileOutputStream with a BufferedOutputStream and then write out the individual byte values for each long (using bitwise operators).
Another option is to create a ByteBuffer and put your long values into the ByteBuffer and then write that to a FileChannel. This handles the endianness conversion for you, but makes the buffering more complicated.

Concerning the efficiency, many details will, in fact, hardly make a difference. The hard disk is by far the slowest part involved here, and in the time that it takes to write a single byte to the disk, you could have converted thousands or even millions of bytes to longs. Every performance test here will not tell you anything about the performance of the implementation, but about the performance of the hard disk. In doubt, one should make dedicated benchmarks comparing the different conversion strategies, and comparing the different writing methods, respectively.
Assuming that the main goal is a functionality that allows a convenient conversion and does not impose an unnecessary overhead, I'd like to propose the following approach:
One can create a ByteBuffer of sufficient size, view this as a LongBuffer, use the bulk LongBuffer#put(long[]) method (which takes care of endianness conversions, of necessary, and does this as efficient as it can be), and finally, write the original ByteBuffer (which is now filled with the long values) to the file, using a FileChannel.
Following this idea, I think that this method is convenient and (most likely) rather efficient:
private static void bulkAndChannel(String fileName, long longArray[])
{
ByteBuffer bytes =
ByteBuffer.allocate(longArray.length * Long.BYTES);
bytes.order(ByteOrder.nativeOrder()).asLongBuffer().put(longArray);
try (FileOutputStream fos = new FileOutputStream(fileName))
{
fos.getChannel().write(bytes);
}
catch (IOException e)
{
e.printStackTrace();
}
}
(Of course, one could argue about whether allocating a "large" buffer is the best idea. But thanks to the convenience methods of the Buffer classes, this could easily and with reasonable effort be modified to write "chunks" of data with an appropriate size, for the case that one really wants to write a huge array and the memory overhead of creating the corresponding ByteBuffer would be prohibitively large)

OP here.
I have thought of one approach: ByteBuffer.asLongBuffer() returns an instance of ByteBufferAsLongBufferB, a class which wraps ByteBuffer in an interface for treating the data as longs while properly managing endianness. I could extend ByteBufferAsLongBufferB, and add a method to return the raw byte buffer (which is protected).
But this seems so esoteric and convoluted I feel there must be an easier way. Either that, or something in my approach is flawed.

Related

how to store file bytes in memory for files bigger than 2^32 bytes size

While working with encription of big files I've been forced to learn how this files are being read or written in/from the filesystem at a lower abstraction level than usual.
But there are some things that I can't understand at all being the main one why does java.io.File.size return a long value when, if it's possible, to persist it's bytes in memory I need to create a byte array whose constructor needs an int "parameter", e.g:
File file = // ...
byte[] total = new byte[(int) file.size()];
byte[] buffer = new byte[1024];
...
while working with normal files I wouldn't even think about this since int can hold up to 2^32 values which would be, if I'm not mistaken, a bit more than 2GB.
I wonder why is this happening, if this is actually done this way or I'm misunderstanding Java's API and which alternatives I have to simplify this task.
PD: I know that by working with big files my interest doesn't rely on reading and completely maintaining it in memory since I'll easily raise an OutOfMemoryException but just a random use case.

How to read different objects from byte[]

I have a byte[] with 3 different objects. How can I read from the byte[] and separate the objects?
My code:
public byte[] toByteArray() {
byte[] bytes;
byte[] sb = start.toString().getBytes();
byte[] gb = goal.toString().getBytes();
byte[] mb = gameBoard.toString().getBytes();
bytes = new byte[sb.length + gb.length + mb.length];
System.arraycopy(sb, 0, bytes, 0, sb.length);
System.arraycopy(gb, 0, bytes, sb.length, gb.length);
System.arraycopy(mb, 0, bytes, gb.length, mb.length);
return bytes;
}
Seems like you are talking about Java not JavaScript.
I recommend you to have a look at binary serialization which I guess is what you are looking for: Saving to binary/serialization java
If you store your data like this, it will be a very difficult task to read them.
I recommend using some build-in object-to-byte[] (and back) conversions like Serializable.
Also, to store several object inside one byte[] array, have a look into ObjectOutputStream
First of all you will need an actual byte[] where stuff can be read from. There are some issues about what you are trying.
toString() usually is not fit to get some data you can reconstruct the object from. It might work with an integer, get a bit messed up with floating point, and be outright impossible with complex objects which only tell about their type and id. (as Davide comment pointed)
There are no cues about where one object starts and ends. Even worse: you might have messed up the start position of 3rd object.
The JRE has a built-in serialization.
Other people use XML or JSON when they need to interact with something else. You might even implement your own flavor of java.text.Format which is able to format and parse your objects. Pick your poison.

Adding elements of an Array to an ArrayList

I have a very small threaded application, which is collecting small chunks of data in arrays (because it is sound data, and Java wants that to be an array) and trying to put it into an ArrayList for storage. All of that is effectively the front half of a producer/consumer pattern.
Problem: It doesn't seem to work.
On the producer end, I have this code:
public synchronized void run() {
// do a whole bunch of audio set-up
try {
// more audio stuff
while (true) {
if (producing) {
byte[] data = new byte[line.getBufferSize() ];
numBytesRead = line.read(data, 0, data.length);
System.out.println("Producer: Size of dat[] is " + data.length);
// Save this chunk of data.
buffer.addData(data);
}
This seems straightforward, aside from the audio stuff and bookkeeping.
In the buffer class, I have:
public class Buffer {
ArrayList list ;
public void addData(byte[] data) {
list.addAll(Arrays.asList(data));
}
This also seems straightforward.
Here is the problem: If my array is of length (say) 1024, and the elements are all there (which I've verified that they are) I would expect the size of the ArrayList to grow by 1024 every time I add data. It doesn't. It grows by 1, as though I was making either an ArrayList of ArrayLists or an ArrayList of Arrays, rather than the Arraylist of elements I desire.
I suspect I'm going to have this problem on the flip side as well, where I might have an ArrayList of tens of thousands of bytes, and want to retrieve an array of the first 1024 elements.
I cannot help but think I'm missing something very simple. Can anyone shed light on why this is not working? (Or if there is some fundamentally better way to do what I'm trying to do?)
Arrays.asList() will not perform the conversion from byte to Byte, it will return a list containing 1 element; the byte[] you pass in.
If your aim is to add Byte objects for every byte, you will have to do that yourself in a loop. Note that this will use much more memory than passing byte[]s however.
Also note that it is not garuanteed that, even if the input stream has more than enough data left, that you will read data.length bytes every time (result of buffer sizes, concurency, etc.) so you run the risk of passing a bunch of 0 bytes at the end of your buffer if you read less bytes than you asked for.
byte[] data should be Byte[]. You must use object, not primitive.
I cannot help but think I'm missing something very simple. Can anyone shed light on why this is not working? (Or if there is some fundamentally better way to do what I'm trying to do?)
Storing byte data in an ArrayList<Byte> has a lot of memory overhead, compared to a byte[] array. if you're working with a large amount of data, you may want to use a byte[] array for storage as well. Take a look at the source code for ByteArrayOutputStream - I don't know if will work for you as-is, but you might be able to create a similar sort of class that manages an expanding byte array.

How to initialize a ByteBuffer if you don't know how many bytes to allocate beforehand?

Is this:
ByteBuffer buf = ByteBuffer.allocate(1000);
...the only way to initialize a ByteBuffer?
What if I have no idea how many bytes I need to allocate..?
Edit: More details:
I'm converting one image file format to a TIFF file. The problem is the starting file format can be any size, but I need to write the data in the TIFF to little endian. So I'm reading the stuff I'm eventually going to print to the TIFF file into the ByteBuffer first so I can put everything in Little Endian, then I'm going to write it to the outfile. I guess since I know how long IFDs are, headers are, and I can probably figure out how many bytes in each image plane, I can just use multiple ByteBuffers during this whole process.
The types of places that you would use a ByteBuffer are generally the types of places that you would otherwise use a byte array (which also has a fixed size). With synchronous I/O you often use byte arrays, with asynchronous I/O, ByteBuffers are used instead.
If you need to read an unknown amount of data using a ByteBuffer, consider using a loop with your buffer and append the data to a ByteArrayOutputStream as you read it. When you are finished, call toByteArray() to get the final byte array.
Any time when you aren't absolutely sure of the size (or maximum size) of a given input, reading in a loop (possibly using a ByteArrayOutputStream, but otherwise just processing the data as a stream, as it is read) is the only way to handle it. Without some sort of loop, any remaining data will of course be lost.
For example:
final byte[] buf = new byte[4096];
int numRead;
// Use try-with-resources to auto-close streams.
try(
final FileInputStream fis = new FileInputStream(...);
final ByteArrayOutputStream baos = new ByteArrayOutputStream()
) {
while ((numRead = fis.read(buf)) > 0) {
baos.write(buf, 0, numRead);
}
final byte[] allBytes = baos.toByteArray();
// Do something with the data.
}
catch( final Exception e ) {
// Do something on failure...
}
If you instead wanted to write Java ints, or other things that aren't raw bytes, you can wrap your ByteArrayOutputStream in a DataOutputStream:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream dos = new DataOutputStream(baos);
while (thereAreMoreIntsFromSomewhere()) {
int someInt = getIntFromSomewhere();
dos.writeInt(someInt);
}
byte[] allBytes = baos.toByteArray();
Depends.
Library
Converting file formats tends to be a solved problem for most problem domains. For example:
Batik can transcode between various image formats (including TIFF).
Apache POI can convert between office spreadsheet formats.
Flexmark can generate HTML from Markdown.
The list is long. The first question should be, "What library can accomplish this task?" If performance is a consideration, your time is likely better spent optimising an existing package to meet your needs than writing yet another tool. (As a bonus, other people get to benefit from the centralised work.)
Known Quantities
Reading a file? Allocate file.size() bytes.
Copying a string? Allocate string.length() bytes.
Copying a TCP packet? Allocate 1500 bytes, for example.
Unknown Quantities
When the number of bytes is truly unknown, you can do a few things:
Make a guess.
Analyze example data sets to buffer; use the average length.
Example
Java's StringBuffer, unless otherwise instructed, uses an initial buffer size to hold 16 characters. Once the 16 characters are filled, a new, longer array is allocated, and then the original 16 characters copied. If the StringBuffer had an initial size of 1024 characters, then the reallocation would not happen as early or as often.
Optimization
Either way, this is probably a premature optimization. Typically you would allocate a set number of bytes when you want to reduce the number of internal memory reallocations that get executed.
It is unlikely that this will be the application's bottleneck.
The idea is that it's only a buffer - not the whole of the data. It's a temporary resting spot for data as you read a chunk, process it (possibly writing it somewhere else). So, allocate yourself a big enough "chunk" and it normally won't be a problem.
What problem are you anticipating?

Best way to read structured binary files with Java

I have to read a binary file in a legacy format with Java.
In a nutshell the file has a header consisting of several integers, bytes and fixed-length char arrays, followed by a list of records which also consist of integers and chars.
In any other language I would create structs (C/C++) or records (Pascal/Delphi) which are byte-by-byte representations of the header and the record. Then I'd read sizeof(header) bytes into a header variable and do the same for the records.
Something like this: (Delphi)
type
THeader = record
Version: Integer;
Type: Byte;
BeginOfData: Integer;
ID: array[0..15] of Char;
end;
...
procedure ReadData(S: TStream);
var
Header: THeader;
begin
S.ReadBuffer(Header, SizeOf(THeader));
...
end;
What is the best way to do something similar with Java? Do I have to read every single value on its own or is there any other way to do this kind of "block-read"?
To my knowledge, Java forces you to read a file as bytes rather than being able to block read. If you were serializing Java objects, it'd be a different story.
The other examples shown use the DataInputStream class with a File, but you can also use a shortcut: The RandomAccessFile class:
RandomAccessFile in = new RandomAccessFile("filename", "r");
int version = in.readInt();
byte type = in.readByte();
int beginOfData = in.readInt();
byte[] tempId;
in.read(tempId, 0, 16);
String id = new String(tempId);
Note that you could turn the responce objects into a class, if that would make it easier.
If you would be using Preon, then all you would have to do is this:
public class Header {
#BoundNumber int version;
#BoundNumber byte type;
#BoundNumber int beginOfData;
#BoundString(size="15") String id;
}
Once you have this, you create Codec using a single line:
Codec<Header> codec = Codecs.create(Header.class);
And you use the Codec like this:
Header header = Codecs.decode(codec, file);
You could use the DataInputStream class as follows:
DataInputStream in = new DataInputStream(new BufferedInputStream(
new FileInputStream("filename")));
int x = in.readInt();
double y = in.readDouble();
etc.
Once you get these values you can do with them as you please. Look up the java.io.DataInputStream class in the API for more info.
I may have misunderstood you, but it seems to me you're creating in-memory structures you hope will be a byte-per-byte accurate representation of what you want to read from hard-disk, then copy the whole stuff onto memory and manipulate thence?
If that's indeed the case, you're playing a very dangerous game. At least in C, the standard doesn't enforce things like padding or aligning of members of a struct. Not to mention things like big/small endianness or parity bits... So even if your code happens to run it's very non-portable and risky - you depend on the compiler's creator not changing its mind on future versions.
Better to create an automaton to both validate the structure being read (byte per byte) from HD is valid, and filling an in-memory structure if it's indeed OK. You may loose some milliseconds (not so much as it may seem for modern OSes do a lot of disk read caching) though you gain platform and compiler independence. Plus, your code will be easily ported to another language.
Post Edit: In a way I sympathize with you. In the good-ol' days of DOS/Win3.11, I once created a C program to read BMP files. And used exactly the same technique. Everything was nice until I tried to compile it for Windows - oops!! Int was now 32 bits long, rather than 16! When I tried to compile on Linux, discovered gcc had very different rules for bit fields allocation than Microsoft C (6.0!). I had to resort to macro tricks to make it portable...
I used Javolution and javastruct, both handles the conversion between bytes and objects.
Javolution provides classes that represent C types. All you need to do is to write a class that describes the C structure. For example, from the C header file,
struct Date {
unsigned short year;
unsigned byte month;
unsigned byte day;
};
should be translated into:
public static class Date extends Struct {
public final Unsigned16 year = new Unsigned16();
public final Unsigned8 month = new Unsigned8();
public final Unsigned8 day = new Unsigned8();
}
Then call setByteBuffer to initialize the object:
Date date = new Date();
date.setByteBuffer(ByteBuffer.wrap(bytes), 0);
javastruct uses annotation to define fields in a C structure.
#StructClass
public class Foo{
#StructField(order = 0)
public byte b;
#StructField(order = 1)
public int i;
}
To initialize an object:
Foo f2 = new Foo();
JavaStruct.unpack(f2, b);
I guess FileInputStream lets you read in bytes. So, opening the file with FileInputStream and read in the sizeof(header). I am assuming that the header has a fixed format and size. I don't see that mentioned in the initial post, but assuming that is the case as it would get much more complex if the header has optional args and different sizes.
Once you have the info, there can be a header class in which you assign the contents of the buffer that you've already read. And then parse the records in a similar fashion.
Here is a link to read byte using a ByteBuffer (Java NIO)
http://exampledepot.com/egs/java.nio/ReadChannel.html
As other people mention DataInputStream and Buffers are probably the low-level API's you are after for dealing with binary data in java.
However you probably want something like Construct (wiki page has good examples too: http://en.wikipedia.org/wiki/Construct_(python_library), but for Java.
I don't know of any (Java versions) off hand, but taking that approach (declaratively specifying the struct in code) would probably be the right way to go. With a suitable fluent interface in Java it would probably be quite similar to a DSL.
EDIT: bit of googling reveals this:
http://javolution.org/api/javolution/io/Struct.html
Which might be the kind of thing you are looking for. I have no idea whether it works or is any good, but it looks like a sensible place to start.
I would create an object that wraps around a ByteBuffer representation of the data and provide getters to read directly from the buffer. In this way, you avoid copying data from the buffer to primitive types. Furthermore, you could use a MappedByteBuffer to get the byte buffer. If your binary data is complex, you can model it using classes and give each class a sliced version of your buffer.
class SomeHeader {
private final ByteBuffer buf;
SomeHeader( ByteBuffer fileBuffer){
// you may need to set limits accordingly before
// fileBuffer.limit(...)
this.buf = fileBuffer.slice();
// you may need to skip the sliced region
// fileBuffer.position(endPos)
}
public short getVersion(){
return buf.getShort(POSITION_OF_VERSION_IN_BUFFER);
}
}
Also useful are the methods for reading unsigned values from byte buffers.
HTH
I've written up a technique to do this sort of thing in java - similar to the old C-like idiom of reading bit-fields. Note it is just a start but could be expanded upon.
here
In the past I used DataInputStream to read data of arbitrary types in a specified order. This will not allow you to easily account for big-endian/little-endian issues.
As of 1.4 the java.nio.Buffer family might be the way to go, but it seems that the your code might actually be more complicated. These classes do have support for handling endian issues.
A while ago I found this article on using reflection and parsing to read binary data. In this case, the author is using reflection to read the java binary .class files. But if you are reading the data into a class file, it may be of some help.

Categories

Resources