How to read different objects from byte[]

How to read different objects from byte[] - java

I have a byte[] with 3 different objects. How can I read from the byte[] and separate the objects?
My code:
public byte[] toByteArray() {
byte[] bytes;
byte[] sb = start.toString().getBytes();
byte[] gb = goal.toString().getBytes();
byte[] mb = gameBoard.toString().getBytes();
bytes = new byte[sb.length + gb.length + mb.length];
System.arraycopy(sb, 0, bytes, 0, sb.length);
System.arraycopy(gb, 0, bytes, sb.length, gb.length);
System.arraycopy(mb, 0, bytes, gb.length, mb.length);
return bytes;
}

Seems like you are talking about Java not JavaScript.
I recommend you to have a look at binary serialization which I guess is what you are looking for: Saving to binary/serialization java

If you store your data like this, it will be a very difficult task to read them.
I recommend using some build-in object-to-byte[] (and back) conversions like Serializable.
Also, to store several object inside one byte[] array, have a look into ObjectOutputStream

First of all you will need an actual byte[] where stuff can be read from. There are some issues about what you are trying.
toString() usually is not fit to get some data you can reconstruct the object from. It might work with an integer, get a bit messed up with floating point, and be outright impossible with complex objects which only tell about their type and id. (as Davide comment pointed)
There are no cues about where one object starts and ends. Even worse: you might have messed up the start position of 3rd object.
The JRE has a built-in serialization.
Other people use XML or JSON when they need to interact with something else. You might even implement your own flavor of java.text.Format which is able to format and parse your objects. Pick your poison.

Related

Why to use ByteArrayInputStream rather than byte[] in Java

As I understand ByteArrayInputStream is used to read byte[] data.
Why should I use it rather than simple byte[] (for example reading it from DB).
What is the different between them?

If the input is always a byte[], then you're right, there's often no need for the stream. And if you don't need it, don't use it. One additional advantage of a ByteArrayInputStream is that it serves as a very strong indication that you intend the bytes to be read-only (since the stream doesn't provide an interface for changing them), though it's important to note that a programmer can often still access the bytes directly, so you shouldn't use that in a situation where security is a concern.
But if it's sometimes a byte[], sometimes a file, sometimes a network connection, etc, then you need some sort of abstraction for "a stream of bytes, and I don't care where they come from." That's what an InputStream is. When the source happens to be a byte array, ByteArrayInputStream is a good InputStream to use.
This is helpful in many situations, but to give two concrete examples:
You're writing a library that takes bytes and processes them somehow (maybe it's an image processing library, for instance). Users of your library may supply bytes from a file, or from a byte[] in memory, or from some other source. So, you provide an interface that accepts an InputStream — which means that if what they have is a byte[], they need to wrap it in a ByteArrayInputStream.
You're writing code that reads a network connection. But to unit test that code, you don't want to have to open up a connection; you want to just supply some bytes in the code. So the code takes an InputStream, and your test provides a ByteArrayInputStream.

A ByteArrayInputStream contains an internal buffer that contains bytes that
may be read from the stream. An internal counter keeps track of the next byte to be supplied by the read method.
ByteArrayInputStream is like wrapper which protects underlying array from external modification
It has high order read ,mark ,skip functions
A stream also has the advantage that you don't have to have all bytes in memory at the same time, which is convenient if the size of the data is large and can easily be handled in small chunks.
Reference doc
Where as if you choose byte[] ,then you have to generate wheels to do reading ,skipping and track current index explicitly
byte data[] = { 65, 66, 67, 68, 69 }; // data
for (int index = 0; index < data.length; index++) {
System.out.print((char) data[index] + " ");
}
int c = 0;
ByteArrayInputStream bInput = new ByteArrayInputStream(data);
while ((bInput.read()) != -1) {
System.out.println(Character.toUpperCase((char) c));
}

ByteArrayInputStream is a good wrapper for byte[], the core is understanding stream, a stream is an ordered sequence of bytes of indeterminate length.Input streams move bytes of data into a
java program from some generally external source, in java io, you can decorate one stream to another stream to get more function. but the performance maybe bad. the power of the stream metaphor is that difference between these source and destinations are abstracted way,all input and output operations are simply traded as streams using the same class and the same method,you don not learn a new API for every different kind of device, the same API that read file can read network sockets,serial ports, Bluetooth transmissions, and more.

Java: Efficiently converting an array of longs to an array of bytes

I have an array of longs I want to write to disk. The most efficient disk I/O functions take in byte arrays, for example:
FileOutputStream.write(byte[] b, int offset, int length)
...so I want to begin by converting my long[] to byte[] (8 bytes for each long). I'm struggling to find a clean way to do this.
Direct typecasting doesn't seem allowed:
ConversionTest.java:6: inconvertible types
found : long[]
required: byte[]
byte[] byteArray = (byte[]) longArray;
^
It's easy to do the conversion by iterating over the array, for example:
ByteBuffer bytes = ByteBuffer.allocate(longArray.length * (Long.SIZE/8));
for( long l: longArray )
{
bytes.putLong( l );
}
byte[] byteArray = bytes.array();
...however that seems far less efficient than simply treating the long[] as a series of bytes.
Interestingly, when reading the file, it's easy to "cast" from byte[] to longs using Buffers:
LongBuffer longs = ByteBuffer.wrap(byteArray).asLongBuffer();
...but I can't seem to find any functionality to go the opposite direction.
I understand there are endian considerations when converting from long to byte, but I believe I've already addressed those: I'm using the Buffer framework shown above, which defaults to big endian, regardless of native byte order.

No, there is not a trivial way to convert from a long[] to a byte[].
Your best option is likely to wrap your FileOutputStream with a BufferedOutputStream and then write out the individual byte values for each long (using bitwise operators).
Another option is to create a ByteBuffer and put your long values into the ByteBuffer and then write that to a FileChannel. This handles the endianness conversion for you, but makes the buffering more complicated.

Concerning the efficiency, many details will, in fact, hardly make a difference. The hard disk is by far the slowest part involved here, and in the time that it takes to write a single byte to the disk, you could have converted thousands or even millions of bytes to longs. Every performance test here will not tell you anything about the performance of the implementation, but about the performance of the hard disk. In doubt, one should make dedicated benchmarks comparing the different conversion strategies, and comparing the different writing methods, respectively.
Assuming that the main goal is a functionality that allows a convenient conversion and does not impose an unnecessary overhead, I'd like to propose the following approach:
One can create a ByteBuffer of sufficient size, view this as a LongBuffer, use the bulk LongBuffer#put(long[]) method (which takes care of endianness conversions, of necessary, and does this as efficient as it can be), and finally, write the original ByteBuffer (which is now filled with the long values) to the file, using a FileChannel.
Following this idea, I think that this method is convenient and (most likely) rather efficient:
private static void bulkAndChannel(String fileName, long longArray[])
{
ByteBuffer bytes =
ByteBuffer.allocate(longArray.length * Long.BYTES);
bytes.order(ByteOrder.nativeOrder()).asLongBuffer().put(longArray);
try (FileOutputStream fos = new FileOutputStream(fileName))
{
fos.getChannel().write(bytes);
}
catch (IOException e)
{
e.printStackTrace();
}
}
(Of course, one could argue about whether allocating a "large" buffer is the best idea. But thanks to the convenience methods of the Buffer classes, this could easily and with reasonable effort be modified to write "chunks" of data with an appropriate size, for the case that one really wants to write a huge array and the memory overhead of creating the corresponding ByteBuffer would be prohibitively large)

OP here.
I have thought of one approach: ByteBuffer.asLongBuffer() returns an instance of ByteBufferAsLongBufferB, a class which wraps ByteBuffer in an interface for treating the data as longs while properly managing endianness. I could extend ByteBufferAsLongBufferB, and add a method to return the raw byte buffer (which is protected).
But this seems so esoteric and convoluted I feel there must be an easier way. Either that, or something in my approach is flawed.

Java: Faster alternative to String(byte[])

I am developing a Java-based downloader for binary data. This data is transferred via a text-based protocol (UU-encoded). For the networking task the netty library is used. The binary data is split by the server into many thousands of small packets and sent to the client (i.e. the Java application).
From netty I receive a ChannelBuffer object every time a new message (data) is received. Now I need to process that data, beside other tasks I need to check the header of the package coming from the server (like the HTTP status line). To do so I call ChannelBuffer.array() to receive a byte[] array. This array I can then convert into a string via new String(byte[]) and easily check (e.g. compare) its content (again, like comparison to the "200" status message in HTTP).
The software I am writing is using multiple threads/connections, so that I receive multiple packets from netty in parallel.
This usually works fine, however, while profiling the application I noticed that when the connection to the server is good and data comes in very fast, then this conversion to the String object seems to be a bottleneck. The CPU usage is close to 100% in such cases, and according to the profiler very much time is spent in calling this String(byte[]) constructor.
I searched for a better way to get from the ChannelBuffer to a String, and noticed the former also has a toString() method. However, that method is even slower than the String(byte[]) constructor.
So my question is: Does anyone of you know a better alternative to achieve what I am doing?

Perhaps you could skip the String conversion entirely? You could have constants holding byte arrays for your comparison values and check array-to-array instead of String-to-String.
Here's some quick code to illustrate. Currently you're doing something like this:
String http200 = "200";
// byte[] -> String conversion happens every time
String input = new String(ChannelBuffer.array());
return input.equals(http200);
Maybe this is faster:
// Ideally only convert String->byte[] once. Store these
// arrays somewhere and look them up instead of recalculating.
final byte[] http200 = "200".getBytes("UTF-8"); // Select the correct charset!
// Input doesn't have to be converted!
byte[] input = ChannelBuffer.array();
return Arrays.equals(input, http200);

Some of the checking you are doing might just look at part of the buffer. If you could use the alternate form of the String constructor:
new String(byteArray, startCol, length)
That might mean a lot less bytes get converted to a string.
Your example of looking for "200" within the message would be an example.
2
You might find that you can use the length of the byte array as a clue. If some messages are long and you are looking for a short one, ignore the long ones and don't convert to characters. Or something like that.
3
Along with what #EricGrunzke said, partially looking in the byte buffer to filter out some messages and find that you don't need to convert them from bytes to characters.
4
If your bytes are ASCII characters, the conversion to characters might be quicker if you use charset "ASCII" instead of whatever the default is for your server:
new String(bytes, "ASCII")
might be faster in that case.
In fact, you might be able to pick and choose the charset for conversion byte-character in some organized fashion that speeds up things.

Depending on what you are trying to do there are a few options:
If you are just trying to get the response status to then can't you just call getStatus()? This would probably be faster than getting the string out.
If you are trying to convert the buffer, then, assuming you know it will be ASCII, which it sounds like you do, then just leave the data as byte[] and convert your UUDecode method to work on a byte[] instead of a String.
The biggest cost of the string conversion is most likely the copying of the data from the byte array to the internal char array of the String, this combined with the conversion is most likely just a bunch of work that you don't need to do.

Adding elements of an Array to an ArrayList

I have a very small threaded application, which is collecting small chunks of data in arrays (because it is sound data, and Java wants that to be an array) and trying to put it into an ArrayList for storage. All of that is effectively the front half of a producer/consumer pattern.
Problem: It doesn't seem to work.
On the producer end, I have this code:
public synchronized void run() {
// do a whole bunch of audio set-up
try {
// more audio stuff
while (true) {
if (producing) {
byte[] data = new byte[line.getBufferSize() ];
numBytesRead = line.read(data, 0, data.length);
System.out.println("Producer: Size of dat[] is " + data.length);
// Save this chunk of data.
buffer.addData(data);
}
This seems straightforward, aside from the audio stuff and bookkeeping.
In the buffer class, I have:
public class Buffer {
ArrayList list ;
public void addData(byte[] data) {
list.addAll(Arrays.asList(data));
}
This also seems straightforward.
Here is the problem: If my array is of length (say) 1024, and the elements are all there (which I've verified that they are) I would expect the size of the ArrayList to grow by 1024 every time I add data. It doesn't. It grows by 1, as though I was making either an ArrayList of ArrayLists or an ArrayList of Arrays, rather than the Arraylist of elements I desire.
I suspect I'm going to have this problem on the flip side as well, where I might have an ArrayList of tens of thousands of bytes, and want to retrieve an array of the first 1024 elements.
I cannot help but think I'm missing something very simple. Can anyone shed light on why this is not working? (Or if there is some fundamentally better way to do what I'm trying to do?)

Arrays.asList() will not perform the conversion from byte to Byte, it will return a list containing 1 element; the byte[] you pass in.
If your aim is to add Byte objects for every byte, you will have to do that yourself in a loop. Note that this will use much more memory than passing byte[]s however.
Also note that it is not garuanteed that, even if the input stream has more than enough data left, that you will read data.length bytes every time (result of buffer sizes, concurency, etc.) so you run the risk of passing a bunch of 0 bytes at the end of your buffer if you read less bytes than you asked for.

byte[] data should be Byte[]. You must use object, not primitive.

I cannot help but think I'm missing something very simple. Can anyone shed light on why this is not working? (Or if there is some fundamentally better way to do what I'm trying to do?)
Storing byte data in an ArrayList<Byte> has a lot of memory overhead, compared to a byte[] array. if you're working with a large amount of data, you may want to use a byte[] array for storage as well. Take a look at the source code for ByteArrayOutputStream - I don't know if will work for you as-is, but you might be able to create a similar sort of class that manages an expanding byte array.

Best way to read structured binary files with Java

I have to read a binary file in a legacy format with Java.
In a nutshell the file has a header consisting of several integers, bytes and fixed-length char arrays, followed by a list of records which also consist of integers and chars.
In any other language I would create structs (C/C++) or records (Pascal/Delphi) which are byte-by-byte representations of the header and the record. Then I'd read sizeof(header) bytes into a header variable and do the same for the records.
Something like this: (Delphi)
type
THeader = record
Version: Integer;
Type: Byte;
BeginOfData: Integer;
ID: array[0..15] of Char;
end;
...
procedure ReadData(S: TStream);
var
Header: THeader;
begin
S.ReadBuffer(Header, SizeOf(THeader));
...
end;
What is the best way to do something similar with Java? Do I have to read every single value on its own or is there any other way to do this kind of "block-read"?

To my knowledge, Java forces you to read a file as bytes rather than being able to block read. If you were serializing Java objects, it'd be a different story.
The other examples shown use the DataInputStream class with a File, but you can also use a shortcut: The RandomAccessFile class:
RandomAccessFile in = new RandomAccessFile("filename", "r");
int version = in.readInt();
byte type = in.readByte();
int beginOfData = in.readInt();
byte[] tempId;
in.read(tempId, 0, 16);
String id = new String(tempId);
Note that you could turn the responce objects into a class, if that would make it easier.

If you would be using Preon, then all you would have to do is this:
public class Header {
#BoundNumber int version;
#BoundNumber byte type;
#BoundNumber int beginOfData;
#BoundString(size="15") String id;
}
Once you have this, you create Codec using a single line:
Codec<Header> codec = Codecs.create(Header.class);
And you use the Codec like this:
Header header = Codecs.decode(codec, file);

You could use the DataInputStream class as follows:
DataInputStream in = new DataInputStream(new BufferedInputStream(
new FileInputStream("filename")));
int x = in.readInt();
double y = in.readDouble();
etc.
Once you get these values you can do with them as you please. Look up the java.io.DataInputStream class in the API for more info.

I may have misunderstood you, but it seems to me you're creating in-memory structures you hope will be a byte-per-byte accurate representation of what you want to read from hard-disk, then copy the whole stuff onto memory and manipulate thence?
If that's indeed the case, you're playing a very dangerous game. At least in C, the standard doesn't enforce things like padding or aligning of members of a struct. Not to mention things like big/small endianness or parity bits... So even if your code happens to run it's very non-portable and risky - you depend on the compiler's creator not changing its mind on future versions.
Better to create an automaton to both validate the structure being read (byte per byte) from HD is valid, and filling an in-memory structure if it's indeed OK. You may loose some milliseconds (not so much as it may seem for modern OSes do a lot of disk read caching) though you gain platform and compiler independence. Plus, your code will be easily ported to another language.
Post Edit: In a way I sympathize with you. In the good-ol' days of DOS/Win3.11, I once created a C program to read BMP files. And used exactly the same technique. Everything was nice until I tried to compile it for Windows - oops!! Int was now 32 bits long, rather than 16! When I tried to compile on Linux, discovered gcc had very different rules for bit fields allocation than Microsoft C (6.0!). I had to resort to macro tricks to make it portable...

I used Javolution and javastruct, both handles the conversion between bytes and objects.
Javolution provides classes that represent C types. All you need to do is to write a class that describes the C structure. For example, from the C header file,
struct Date {
unsigned short year;
unsigned byte month;
unsigned byte day;
};
should be translated into:
public static class Date extends Struct {
public final Unsigned16 year = new Unsigned16();
public final Unsigned8 month = new Unsigned8();
public final Unsigned8 day = new Unsigned8();
}
Then call setByteBuffer to initialize the object:
Date date = new Date();
date.setByteBuffer(ByteBuffer.wrap(bytes), 0);
javastruct uses annotation to define fields in a C structure.
#StructClass
public class Foo{
#StructField(order = 0)
public byte b;
#StructField(order = 1)
public int i;
}
To initialize an object:
Foo f2 = new Foo();
JavaStruct.unpack(f2, b);

I guess FileInputStream lets you read in bytes. So, opening the file with FileInputStream and read in the sizeof(header). I am assuming that the header has a fixed format and size. I don't see that mentioned in the initial post, but assuming that is the case as it would get much more complex if the header has optional args and different sizes.
Once you have the info, there can be a header class in which you assign the contents of the buffer that you've already read. And then parse the records in a similar fashion.

Here is a link to read byte using a ByteBuffer (Java NIO)
http://exampledepot.com/egs/java.nio/ReadChannel.html

As other people mention DataInputStream and Buffers are probably the low-level API's you are after for dealing with binary data in java.
However you probably want something like Construct (wiki page has good examples too: http://en.wikipedia.org/wiki/Construct_(python_library), but for Java.
I don't know of any (Java versions) off hand, but taking that approach (declaratively specifying the struct in code) would probably be the right way to go. With a suitable fluent interface in Java it would probably be quite similar to a DSL.
EDIT: bit of googling reveals this:
http://javolution.org/api/javolution/io/Struct.html
Which might be the kind of thing you are looking for. I have no idea whether it works or is any good, but it looks like a sensible place to start.

I would create an object that wraps around a ByteBuffer representation of the data and provide getters to read directly from the buffer. In this way, you avoid copying data from the buffer to primitive types. Furthermore, you could use a MappedByteBuffer to get the byte buffer. If your binary data is complex, you can model it using classes and give each class a sliced version of your buffer.
class SomeHeader {
private final ByteBuffer buf;
SomeHeader( ByteBuffer fileBuffer){
// you may need to set limits accordingly before
// fileBuffer.limit(...)
this.buf = fileBuffer.slice();
// you may need to skip the sliced region
// fileBuffer.position(endPos)
}
public short getVersion(){
return buf.getShort(POSITION_OF_VERSION_IN_BUFFER);
}
}
Also useful are the methods for reading unsigned values from byte buffers.
HTH

I've written up a technique to do this sort of thing in java - similar to the old C-like idiom of reading bit-fields. Note it is just a start but could be expanded upon.
here

In the past I used DataInputStream to read data of arbitrary types in a specified order. This will not allow you to easily account for big-endian/little-endian issues.
As of 1.4 the java.nio.Buffer family might be the way to go, but it seems that the your code might actually be more complicated. These classes do have support for handling endian issues.

A while ago I found this article on using reflection and parsing to read binary data. In this case, the author is using reflection to read the java binary .class files. But if you are reading the data into a class file, it may be of some help.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.