Recently I created a wrapper to read and write data into a byte array. To do it, I've been using an ArrayList<Byte>, but I was wondering if this is the most efficent way to do it, because:
addAll() doesn't work with byte arrays (even using Arrays.asList(), which returns me List<Byte[]>). To fix it I'm just looping and adding a byte at each loop, but I suppose this supposes a lot of function calls and so it has a performance cost.
The same happens for getting a byte[] from the ArrayList. I can't cast from Byte[] to byte[], so I have to use a loop for it.
I suppose storing Byte instead of byte uses more memory.
I know ByteArrayInputStream and ByteArrayOutputStream could be used for this, but it has some inconvenients:
I wanted to implement methods for reading different data types in different byte order (for example, readInt, readLEInt, readUInt, etc), while those classes only can read / write a byte or a byte array. This isn't really a problem because I could fix that in the wrapper. But here comes the second problem.
I wanted to be able to write and read at the same time because I'm using this to decompress some files. And so to create a wrapper for it I would need to include both ByteArrayInputStream and ByteArrayOutputStream. I don't know if those could be syncronized in some way or I'd have to write the entire data of one to the other each time I wrote to the wrapper.
And so, here comes my question: would using a ByteBuffer be more efficient? I know you can take integers, floats, etc from it, even being able to change the byte order. What I was wondering is if there is a real performance change between using a ByteBuffer and a ArrayList<Byte>.
Definitely ByteBuffer or ByteArrayOutputStream. In your case ByteBuffer seems fine. Inspect the Javadoc, as it has nice methods, For putInt/getInt and such, you might want to set order (of those 4 bytes)
byteBuffer.order(ByteBuffer.LITTLE_ENDIAN);
With files you could use getChannel() or variants and then use a MappedByteBuffer.
A ByteBuffer may wrap a byte array, or allocate.
Keep in mind that every object has overhead associated with it including a bit of memory per object and garbage collection once it goes out of scope.
Using List<Byte> would mean creating / garbage collecting an object per byte which is very wasteful.
ByteBuffer is a wrapper class around a byte array, it doesn't have dynamical size like ArrayList, but it consumes less memory per byte and is faster.
If you know the size you need, then use ByteBuffer, if you don't, then you could use ByteArrayOutputStream (and maybe wrapped by ObjectOutputStream, it has some methods to write different kinds of data). To read the data you have written to ByteArrayOutputStream you can extend the ByteArrayOutputStream, and then you can access the fields buf[] and count, those fields are protected, so you can access them from extending class, it look like:
public class ByteArrayOutputStream extends OutputStream {
/**
* The buffer where data is stored.
*/
protected byte buf[];
/**
* The number of valid bytes in the buffer.
*/
protected int count;
...
}
public class ReadableBAOS extends ByteArrayOutputStream{
public byte readByte(int index) {
if (count<index) {
throw new IndexOutOfBoundsException();
}
return buf[index];
}
}
so you can make some methods in your extending class to read some bytes from the underlying buffer without the need to make an copy of its content each time like toByteArray() method do.
Related
It seems that String.getBytes() will create a new byte array, so there is an extra memory copy. Can I encode a String directly to a ByteBuffer without an intermediate byte array?
for example:
void putString(ByteBuffer bb, String s) {
byte[] arr = s.getBytes(StandardCharsets.UTF_8);
bb.put(arr);
}
This piece of code will create a byte array, encode the string to this byte array, then copy the content of byte array to ByteBuffer.
I think the byte array is not necessary, it will bring GC and extra memory copy.
You can use CharsetEncoder to write directly to a ByteBuffer:
static void putString(ByteBuffer buffer, String str, Charset charset) {
CharsetEncoder encoder = charset.newEncoder();
encoder.encode(CharBuffer.wrap(str), buffer, true);
encoder.flush(buffer);
}
It's your responsibility to make sure enough space has been allocated. You can also check the result of the encode() method to see if it was successful.
I can't think of a simple way to completely eliminate intermediate byte arrays.
However if you're worrying about this because the String is huge, you can break it into chunks:
for(offset=0; offset<str.length(); offset+=chunkSize) {
String chunk = str.substring(offset, offset+chunkSize);
byteBuffer.put(chunk.getBytes(StandardCharsets.UTF_8));
}
However, if your input strings are huge enough that this optimisation is necessary, the overall architecture of your program is probably ill-conceived.
You should not worry about GC performance unless you've seen something unusual while profiling. The JRE is brilliant at efficient GC.
No, it is not possible. String objects don't have encoding.
String objects are immutable by purpose. The whole idea of that class is to not allow to manipulate any underlying data structures (mainly for security and performance optimization reasons).
In that sense: there is no other better approach to acquire bytes making up a string object in Java.
I want to write ONLY the values of the data members of an object into a file, so here I can can't use serialization since it writes a whole lot other information which i don't need. Here's is what I have implemented in two ways. One using byte buffer and other without using it.
Without using ByteBuffer:
1st method
public class DemoSecond {
byte characterData;
byte shortData;
byte[] integerData;
byte[] stringData;
public DemoSecond(byte characterData, byte shortData, byte[] integerData,
byte[] stringData) {
super();
this.characterData = characterData;
this.shortData = shortData;
this.integerData = integerData;
this.stringData = stringData;
}
public static void main(String[] args) {
DemoSecond dClass= new DemoSecond((byte)'c', (byte)0x7, new byte[]{3,4},
new byte[]{(byte)'p',(byte)'e',(byte)'n'});
File checking= new File("c:/objectByteArray.dat");
try {
if (!checking.exists()) {
checking.createNewFile();
}
// POINT A
FileOutputStream bo = new FileOutputStream(checking);
bo.write(dClass.characterData);
bo.write(dClass.shortData);
bo.write(dClass.integerData);
bo.write(dClass.stringData);
// POINT B
bo.close();
} catch (FileNotFoundException e) {
System.out.println("FNF");
e.printStackTrace();
} catch (IOException e) {
System.out.println("IOE");
e.printStackTrace();
}
}
}
Using byte buffer: One more thing is that the size of the data members will always remain fixed i.e. characterData= 1byte, shortData= 1byte, integerData= 2byte and stringData= 3byte. So the total size of this class is 7byte ALWAYS
2nd method
// POINT A
FileOutputStream bo = new FileOutputStream(checking);
ByteBuffer buff= ByteBuffer.allocate(7);
buff.put(dClass.characterData);
buff.put(dClass.shortData);
buff.put(dClass.integerData);
buff.put(dClass.stringData);
bo.write(buff.array());
// POINT B
I want know which one of the two methods is more optimized? And kindly give the reason also.
The above class DemoSecond is just a sample class.
My original classes will be of size 5 to 50 bytes. I don't think here size might be the issue.
But each of my classes is of fixed size like the DemoSecond
Also there are so many files of this type which I am going to write in the binary file.
PS
if I use serialization it also writes the word "characterData", "shortData", "integerData","stringData" also and other information which I don't want to write in the file. What I am corcern here is about THEIR VALUES ONLY. In case of this example its:'c', 7, 3,4'p','e','n'. I want to write only this 7bytes into the file, NOT the other informations which is USELESS to me.
As you are doing file I/O, you should bear in mind that the I/O operations are likely to be very much slower than any work done by the CPU in your output code. To a first approximation, the cost of I/O is an amount proportional to the amount of data you are writing, plus a fixed cost for each operating system call made to do the I/O.
So in your case you want to minimise the number of operating system calls to do the writing. This is done by buffering data in the application, so the application performs few put larger operating system calls.
Using a byte buffer, as you have done, is one way of doing this, so your ByteBuffer code will be more efficient than your FileOutputStream code.
But there are other considerations. Your example is not performing many writes. So it is likely to be very fast anyway. Any optimisation is likely to be a premature optimisation. Optimisations tend to make code more complicated and harder to understand. To understand your ByteBuffer code a reader needs to understand how a ByteBuffer works in addition to everything they need to understand for the FileOutputStream code. And if you ever change the file format, you are more likely to introduce a bug with the ByteBuffer code (for example, by having a too small a buffer).
Buffering of output is commonly done. So it should not surprise you that Java already provides code to help you. That code will have been written by experts, tested and debugged. Unless you have special requirements you should always use such code rather than writing your own. The code I am referring to is the BufferedOutputStream class.
To use it simply adapt your code that does not use the ByteBuffer, by changing the line of your code that opens the file to
OutputStream bo = new BufferedOutputStream(new FileOutputStream(checking));
The two methods differ only in the byte buffer allocated.
If you are concerning about unnecessary write action to file, there is already a BufferedOutputStream you can use, for which buffer is allocated internally, and if you are writing to same outputstream multiple times, it is definitely more efficient than allocating buffer every time manually.
It would be simplest to use a DataOutputStream around a BufferedOutputStream around the FileOutputStream.
NB You can't squeeze 'shortData' into a byte. Use the various primitives of DataOutputStream, and use the corresponding ones of DataInputStream when reading them back.
Just wondering whether it is possible to update a ByteArray in C Code, which is created in Java, without returning it from C.
I have situation, where I need to update a single bytearray for multiple times through JNI and returning bytearray from C takes lot of JNI calls. Please let me know if anybody knows how to do this?
Code should be something like this
Java Code
byte[] storeData;
updateFromNative(storeData); //update the byteArray in native code;
//use the storeData in Java with updated value.
Updating data in the array is one thing, allocating is another. If you know the size, and it's not supposed to change, allocate the array beforehand, pass it into JNI, and use JNI calls SetByteArrayElement() and SetByteArrayRegion() to set elements. Like this:
byte[] storeData = new byte[Size];
updateFromNative(storeData);
However, if you want to (re)alocate the array within JNI, you're stuck with returning it. There's no out parameters in Java. One way around it is passing a class where the array is a member variable, and updating that member variable, but that complicates the JNI part somewhat.
For evaluating an algorithm I have to count how often the items of a byte-array are read/accessed. The byte-array is filled with the contents of a file and my algorithm can skip over many of the bytes in the array (like for example the Boyer–Moore string search algorithm). I have to find out how often an item is actually read. This byte-array is passed around to multiple methods and classes.
My ideas so far:
Increment a counter at each spot where the byte-array is read. This seems error-prone since there are many of these spots. Additionally I would have to remove this code afterwards such that it does not influence the runtime of my algorithm.
Use an ArrayList instead of a byte-array and overwrite its "get" method. Again, there are a lot of methods that would have to be modified and I suspect that there would be a performance loss.
Can I somehow use the Eclipse debug-mode? I see that I can specify a hit-count for watchpoints but it does not seem to be possible to output the hit count?!
Can maybe the Reflection API help me somehow?
Somewhat like 2), but in order to reduce the effort: Can I make a Java method accept an ArrayList where it wants an array such that it transparently calls the "get" method whenever an item is read?
There might be an out-of-the-box solution but I'd probably just wrap the byte array in a simple class.
public class ByteArrayWrapper {
private byte [] bytes;
private long readCount = 0;
public ByteArrayWrapper( byte [] bytes ) {
this.bytes = bytes;
}
public int getSize() { return bytes.length; }
public byte getByte( int index ) { readCount++; return bytes[ index ]; }
public long getReadCount() { return readCount; }
}
Something along these lines. Of course this does influence the running time but not very much. You could try it and time the difference, if you find it is significant, we'll have to find another way.
The most efficient way to do this is to add some code injection. However this is likely to be much more complicated to get right than writing a wrapper for your byte[] and passing this around. (tedious but at least the compiler will help you) If you use a wrapper which does basicly nothing (no counting) it will be almost as efficient as not using a wrapper and when you want counting you can use an implementation which does that.
You could use EHCache without too much overhead: implement an in-memory cache, keyed by array index. EHCache provides an API which will let you query hit rates "out of the box".
There's no way to do this automatically with a real byte[]. Using JVM TI might help here, but I suspect it's overkill.
Personally I'd write a simple wrapper around the byte[] with methods to read() and write() specific fields. Those methods can then track all accesses (either individually for each byte, or as a total or both).
Of course this would require the actual access to be modified, but if you're testing some algorithms that might not be such a big drawback. The same goes for performance: it will definitely suffer a bit, but the effect might be small enough not to worry about it.
Given a generic array T[], where T extends java.lang.Number, I would like to write the array to a byte[], using ByteArrayOutputStream. java.io.DataOutput (and an implementation such as java.io.DataOutputStream appears close to what I need, but there is no generic way to write the elements of the T[] array. I want to do something like
ByteArrayOutputStream out = new ByteArrayOutputStream();
DataOutputStream dataOut = new DataOutputStream(out);
for (T v : getData()) {
dataOut.write(v); // <== uh, oh
}
but there is no generic <T> void write(T v) method on DataOutput.
Is there any way to avoid having to write a whole bunch of isntanceof spaghetti?
Clarification
The byte[] is being sent to a non-Java client, so object serialization isn't an option. I need, for example, the byte[] generated from a Float[] to be a valid float[] in C.
No, there isn't. The instanceof "spaghetti" would have to exist somewhere anyway. Make a generic method that does that:
public <T> void write(DataOutputStream stream, T object) {
// instanceofs and writes here
}
You can just use an ObjectOutputStream instead of a DataOutputStream, since all Numbers are guaranteed to be serializable.
Regarding to the last edit, I would try this approach (if its ugly or not).
1) Check per instanceof which type you have
2) Store it into a primitive and extract the bytes you need (eg integer) like this (for the first two bytes)
byte[] bytes = new byte[2];
bytes[0]=(byte)(i>>8);
bytes[1]=(byte)i;
3) Send it via the byte[] array
4) Get stuck because different c implementations use different amout of bytes for integer, so nobody can guarantee that the results will equal your initial numbers. e.g. how do you want to handle the 4 byte integer of java with 2 byte integers of c? How do you handle Long?
So...i don't see a way to do, but, im not an expert in this area....
Please correct me if im wrong. ;-)