This is the first time I stored binaries in rethinkdb and it went quiet well for storing them. The approach was quite simple (as documented in the command reference of ReQl. Also retrieving the binary from the Database again is fairly easy, yet I am struggling to convert it into a byte[]. The documentation says that r.binary() should return byte[] but in my case it returns a MapObject with a key called data. When retrieving that, data is an object but cannot be casted to byte[].
My code
MapObject mo = (MapObject)r.binary(continents.get("visibleMapImageBinary")).build();
//^^ is the MapObject that I can retrieve
String b = (String)tempor.get("data");
However I do not know how to get this back into a byte[]. Also tried to convert the String into a byte array, which also failed.
Thanks for any advice :)
After some trying out (and googling for what [B is, which I just thought stood for nothing but actually is the primtive of byte[]). This is my solution, no need to build() or even cast to the MapObject.
byte[] temp = (byte[])continents.get("visibleMapImageBinary");
I actually thought I tried this before, but it seems like I didn't ... well :)
I am trying to read serial using Java:
byte[] text = new byte[5];
for(int i = 0;i<5;i++){
text[i] = (byte)in.read();
}
For some reason it returns me some weird data. I have tried libraries like RXTX and COMM, but other issues comes then. Is there some way to read everything in raw Java way?
Any code snippets would be helpful.
Thanks
The problem is that Byte is just too narrow for this data serial data(because it is signed), so I do loose most significant bit. Solution is to read everything to int.
I am developing a Java-based downloader for binary data. This data is transferred via a text-based protocol (UU-encoded). For the networking task the netty library is used. The binary data is split by the server into many thousands of small packets and sent to the client (i.e. the Java application).
From netty I receive a ChannelBuffer object every time a new message (data) is received. Now I need to process that data, beside other tasks I need to check the header of the package coming from the server (like the HTTP status line). To do so I call ChannelBuffer.array() to receive a byte[] array. This array I can then convert into a string via new String(byte[]) and easily check (e.g. compare) its content (again, like comparison to the "200" status message in HTTP).
The software I am writing is using multiple threads/connections, so that I receive multiple packets from netty in parallel.
This usually works fine, however, while profiling the application I noticed that when the connection to the server is good and data comes in very fast, then this conversion to the String object seems to be a bottleneck. The CPU usage is close to 100% in such cases, and according to the profiler very much time is spent in calling this String(byte[]) constructor.
I searched for a better way to get from the ChannelBuffer to a String, and noticed the former also has a toString() method. However, that method is even slower than the String(byte[]) constructor.
So my question is: Does anyone of you know a better alternative to achieve what I am doing?
Perhaps you could skip the String conversion entirely? You could have constants holding byte arrays for your comparison values and check array-to-array instead of String-to-String.
Here's some quick code to illustrate. Currently you're doing something like this:
String http200 = "200";
// byte[] -> String conversion happens every time
String input = new String(ChannelBuffer.array());
return input.equals(http200);
Maybe this is faster:
// Ideally only convert String->byte[] once. Store these
// arrays somewhere and look them up instead of recalculating.
final byte[] http200 = "200".getBytes("UTF-8"); // Select the correct charset!
// Input doesn't have to be converted!
byte[] input = ChannelBuffer.array();
return Arrays.equals(input, http200);
Some of the checking you are doing might just look at part of the buffer. If you could use the alternate form of the String constructor:
new String(byteArray, startCol, length)
That might mean a lot less bytes get converted to a string.
Your example of looking for "200" within the message would be an example.
2
You might find that you can use the length of the byte array as a clue. If some messages are long and you are looking for a short one, ignore the long ones and don't convert to characters. Or something like that.
3
Along with what #EricGrunzke said, partially looking in the byte buffer to filter out some messages and find that you don't need to convert them from bytes to characters.
4
If your bytes are ASCII characters, the conversion to characters might be quicker if you use charset "ASCII" instead of whatever the default is for your server:
new String(bytes, "ASCII")
might be faster in that case.
In fact, you might be able to pick and choose the charset for conversion byte-character in some organized fashion that speeds up things.
Depending on what you are trying to do there are a few options:
If you are just trying to get the response status to then can't you just call getStatus()? This would probably be faster than getting the string out.
If you are trying to convert the buffer, then, assuming you know it will be ASCII, which it sounds like you do, then just leave the data as byte[] and convert your UUDecode method to work on a byte[] instead of a String.
The biggest cost of the string conversion is most likely the copying of the data from the byte array to the internal char array of the String, this combined with the conversion is most likely just a bunch of work that you don't need to do.
I am currently trying to perform some regex on the result of a DatagramPacket.getData() call.
Implemented as String myString = new String(thepkt.getData()):
But weirdly, java is dropping the end quotation that it uses to encapsulate all data(see linked image below).
When I click the field in the variable inspector during a debug session and don't change anything, when I click off the variable field it corrects itself again without me changing anything. It even highlights the variable inspection field in yellow to signal change.
Its values are also displaying like it is still a byte array rather than a String object
http://i.imgur.com/8ZItsZI.png
It's throwing off my regex and I can't see anything that would cause it. It's a client server simulation and on the client side, the getData returns the data no problem.
I got it working by using the solution provided in:
https://stackoverflow.com/a/8557165/1700855
But I still don't understand how not specifying the length of the packet to the String constructor would cause it to drop the systematic end double quotes. Can anyone provide an explanation as I really like to understand solutions to my issues before moving on :)
The problem is that you didn't read the spec for DatagramPacket.getData:
Returns the data buffer. The data received or the data to be sent
starts from the offset in the buffer, and runs for length long.
So, to be correct, you should use
new String(thepkt.getData(), thepkt.getOffset(), thepht.getLength())
Or, to not use the default charset:
new String(thepkt.getData(), thepkt.getOffset(), thepht.getLength(), someCharset)
I have to read a binary file in a legacy format with Java.
In a nutshell the file has a header consisting of several integers, bytes and fixed-length char arrays, followed by a list of records which also consist of integers and chars.
In any other language I would create structs (C/C++) or records (Pascal/Delphi) which are byte-by-byte representations of the header and the record. Then I'd read sizeof(header) bytes into a header variable and do the same for the records.
Something like this: (Delphi)
type
THeader = record
Version: Integer;
Type: Byte;
BeginOfData: Integer;
ID: array[0..15] of Char;
end;
...
procedure ReadData(S: TStream);
var
Header: THeader;
begin
S.ReadBuffer(Header, SizeOf(THeader));
...
end;
What is the best way to do something similar with Java? Do I have to read every single value on its own or is there any other way to do this kind of "block-read"?
To my knowledge, Java forces you to read a file as bytes rather than being able to block read. If you were serializing Java objects, it'd be a different story.
The other examples shown use the DataInputStream class with a File, but you can also use a shortcut: The RandomAccessFile class:
RandomAccessFile in = new RandomAccessFile("filename", "r");
int version = in.readInt();
byte type = in.readByte();
int beginOfData = in.readInt();
byte[] tempId;
in.read(tempId, 0, 16);
String id = new String(tempId);
Note that you could turn the responce objects into a class, if that would make it easier.
If you would be using Preon, then all you would have to do is this:
public class Header {
#BoundNumber int version;
#BoundNumber byte type;
#BoundNumber int beginOfData;
#BoundString(size="15") String id;
}
Once you have this, you create Codec using a single line:
Codec<Header> codec = Codecs.create(Header.class);
And you use the Codec like this:
Header header = Codecs.decode(codec, file);
You could use the DataInputStream class as follows:
DataInputStream in = new DataInputStream(new BufferedInputStream(
new FileInputStream("filename")));
int x = in.readInt();
double y = in.readDouble();
etc.
Once you get these values you can do with them as you please. Look up the java.io.DataInputStream class in the API for more info.
I may have misunderstood you, but it seems to me you're creating in-memory structures you hope will be a byte-per-byte accurate representation of what you want to read from hard-disk, then copy the whole stuff onto memory and manipulate thence?
If that's indeed the case, you're playing a very dangerous game. At least in C, the standard doesn't enforce things like padding or aligning of members of a struct. Not to mention things like big/small endianness or parity bits... So even if your code happens to run it's very non-portable and risky - you depend on the compiler's creator not changing its mind on future versions.
Better to create an automaton to both validate the structure being read (byte per byte) from HD is valid, and filling an in-memory structure if it's indeed OK. You may loose some milliseconds (not so much as it may seem for modern OSes do a lot of disk read caching) though you gain platform and compiler independence. Plus, your code will be easily ported to another language.
Post Edit: In a way I sympathize with you. In the good-ol' days of DOS/Win3.11, I once created a C program to read BMP files. And used exactly the same technique. Everything was nice until I tried to compile it for Windows - oops!! Int was now 32 bits long, rather than 16! When I tried to compile on Linux, discovered gcc had very different rules for bit fields allocation than Microsoft C (6.0!). I had to resort to macro tricks to make it portable...
I used Javolution and javastruct, both handles the conversion between bytes and objects.
Javolution provides classes that represent C types. All you need to do is to write a class that describes the C structure. For example, from the C header file,
struct Date {
unsigned short year;
unsigned byte month;
unsigned byte day;
};
should be translated into:
public static class Date extends Struct {
public final Unsigned16 year = new Unsigned16();
public final Unsigned8 month = new Unsigned8();
public final Unsigned8 day = new Unsigned8();
}
Then call setByteBuffer to initialize the object:
Date date = new Date();
date.setByteBuffer(ByteBuffer.wrap(bytes), 0);
javastruct uses annotation to define fields in a C structure.
#StructClass
public class Foo{
#StructField(order = 0)
public byte b;
#StructField(order = 1)
public int i;
}
To initialize an object:
Foo f2 = new Foo();
JavaStruct.unpack(f2, b);
I guess FileInputStream lets you read in bytes. So, opening the file with FileInputStream and read in the sizeof(header). I am assuming that the header has a fixed format and size. I don't see that mentioned in the initial post, but assuming that is the case as it would get much more complex if the header has optional args and different sizes.
Once you have the info, there can be a header class in which you assign the contents of the buffer that you've already read. And then parse the records in a similar fashion.
Here is a link to read byte using a ByteBuffer (Java NIO)
http://exampledepot.com/egs/java.nio/ReadChannel.html
As other people mention DataInputStream and Buffers are probably the low-level API's you are after for dealing with binary data in java.
However you probably want something like Construct (wiki page has good examples too: http://en.wikipedia.org/wiki/Construct_(python_library), but for Java.
I don't know of any (Java versions) off hand, but taking that approach (declaratively specifying the struct in code) would probably be the right way to go. With a suitable fluent interface in Java it would probably be quite similar to a DSL.
EDIT: bit of googling reveals this:
http://javolution.org/api/javolution/io/Struct.html
Which might be the kind of thing you are looking for. I have no idea whether it works or is any good, but it looks like a sensible place to start.
I would create an object that wraps around a ByteBuffer representation of the data and provide getters to read directly from the buffer. In this way, you avoid copying data from the buffer to primitive types. Furthermore, you could use a MappedByteBuffer to get the byte buffer. If your binary data is complex, you can model it using classes and give each class a sliced version of your buffer.
class SomeHeader {
private final ByteBuffer buf;
SomeHeader( ByteBuffer fileBuffer){
// you may need to set limits accordingly before
// fileBuffer.limit(...)
this.buf = fileBuffer.slice();
// you may need to skip the sliced region
// fileBuffer.position(endPos)
}
public short getVersion(){
return buf.getShort(POSITION_OF_VERSION_IN_BUFFER);
}
}
Also useful are the methods for reading unsigned values from byte buffers.
HTH
I've written up a technique to do this sort of thing in java - similar to the old C-like idiom of reading bit-fields. Note it is just a start but could be expanded upon.
here
In the past I used DataInputStream to read data of arbitrary types in a specified order. This will not allow you to easily account for big-endian/little-endian issues.
As of 1.4 the java.nio.Buffer family might be the way to go, but it seems that the your code might actually be more complicated. These classes do have support for handling endian issues.
A while ago I found this article on using reflection and parsing to read binary data. In this case, the author is using reflection to read the java binary .class files. But if you are reading the data into a class file, it may be of some help.