Hit-Count (reads) of an array in Java - java

For evaluating an algorithm I have to count how often the items of a byte-array are read/accessed. The byte-array is filled with the contents of a file and my algorithm can skip over many of the bytes in the array (like for example the Boyer–Moore string search algorithm). I have to find out how often an item is actually read. This byte-array is passed around to multiple methods and classes.
My ideas so far:
Increment a counter at each spot where the byte-array is read. This seems error-prone since there are many of these spots. Additionally I would have to remove this code afterwards such that it does not influence the runtime of my algorithm.
Use an ArrayList instead of a byte-array and overwrite its "get" method. Again, there are a lot of methods that would have to be modified and I suspect that there would be a performance loss.
Can I somehow use the Eclipse debug-mode? I see that I can specify a hit-count for watchpoints but it does not seem to be possible to output the hit count?!
Can maybe the Reflection API help me somehow?
Somewhat like 2), but in order to reduce the effort: Can I make a Java method accept an ArrayList where it wants an array such that it transparently calls the "get" method whenever an item is read?

There might be an out-of-the-box solution but I'd probably just wrap the byte array in a simple class.
public class ByteArrayWrapper {
private byte [] bytes;
private long readCount = 0;
public ByteArrayWrapper( byte [] bytes ) {
this.bytes = bytes;
}
public int getSize() { return bytes.length; }
public byte getByte( int index ) { readCount++; return bytes[ index ]; }
public long getReadCount() { return readCount; }
}
Something along these lines. Of course this does influence the running time but not very much. You could try it and time the difference, if you find it is significant, we'll have to find another way.

The most efficient way to do this is to add some code injection. However this is likely to be much more complicated to get right than writing a wrapper for your byte[] and passing this around. (tedious but at least the compiler will help you) If you use a wrapper which does basicly nothing (no counting) it will be almost as efficient as not using a wrapper and when you want counting you can use an implementation which does that.

You could use EHCache without too much overhead: implement an in-memory cache, keyed by array index. EHCache provides an API which will let you query hit rates "out of the box".

There's no way to do this automatically with a real byte[]. Using JVM TI might help here, but I suspect it's overkill.
Personally I'd write a simple wrapper around the byte[] with methods to read() and write() specific fields. Those methods can then track all accesses (either individually for each byte, or as a total or both).
Of course this would require the actual access to be modified, but if you're testing some algorithms that might not be such a big drawback. The same goes for performance: it will definitely suffer a bit, but the effect might be small enough not to worry about it.

Related

Android - deletion of resources in Java [duplicate]

String secret="foo";
WhatILookFor.securelyWipe(secret);
And I need to know that it will not be removed by java optimizer.
A String cannot be "wiped". It is immutable, and short of some really dirty and dangerous tricks you cannot alter that.
So the safest solution is to not put the data into a string in the first place. Use a StringBuilder or an array of characters instead, or some other representation that is not immutable. (And then clear it when you are done.)
For the record, there are a couple of ways that you can change the contents of a String's backing array. For example, you can use reflection to fish out a reference to the String's backing array, and overwrite its contents. However, this involves doing things that the JLS states have unspecified behaviour so you cannot guarantee that the optimizer won't do something unexpected.
My personal take on this is that you are better off locking down your application platform so that unauthorized people can't gain access to the memory / memory dump in the first place. After all, if the platform is not properly secured, the "bad guys" may be able to get hold of the string contents before you erase it. Steps like this might be warranted for small amounts of security critical state, but if you've got a lot of "confidential" information to process, it is going to be a major hassle to not be able to use normal strings and string handling.
You would need direct access to the memory.
You really wouldn't be able to do this with String, since you don't have reliable access to the string, and don't know if it's been interned somewhere, or if an object was created that you don't know about.
If you really needed to this, you'd have to do something like
public class SecureString implements CharSequence {
char[] data;
public void wipe() {
for(int i = 0; i < data.length; i++) data[i] = '.'; // random char
}
}
That being said, if you're worried about data still being in memory, you have to realize that if it was ever in memory at one point, than an attacker probably already got it. The only thing you realistically protect yourself from is if a core dump is flushed to a log file.
Regarding the optimizer, I incredibly doubt it will optimize away the operation. If you really needed it to, you could do something like this:
public int wipe() {
// wipe the array to a random value
java.util.Arrays.fill(data, (char)(rand.nextInt(60000));
// compute hash to force optimizer to do the wipe
int hash = 0;
for(int i = 0; i < data.length; i++) {
hash = hash * 31 + (int)data[i];
}
return hash;
}
This will force the compiler to do the wipe. It makes it roughly twice as long to run, but it's a pretty fast operation as it is, and doesn't increase the order of complexity.
Store the data off-heap using the "Unsafe" methods. You can then zero over it when done and be certain that it won't be pushed around the heap by the JVM.
Here is a good post on Unsafe:
http://highlyscalable.wordpress.com/2012/02/02/direct-memory-access-in-java/
If you're going to use a String then I think you are worried about it appearing in a memory dump. I suggest using String.replace() on key-characters so that when the String is used at run-time it will change and then go out of scope after it is used and won't appear correctly in a memory dump. However, I strongly recommend that you not use a String for sensitive data.

Java - ByteBuffer or ArrayList<Byte>?

Recently I created a wrapper to read and write data into a byte array. To do it, I've been using an ArrayList<Byte>, but I was wondering if this is the most efficent way to do it, because:
addAll() doesn't work with byte arrays (even using Arrays.asList(), which returns me List<Byte[]>). To fix it I'm just looping and adding a byte at each loop, but I suppose this supposes a lot of function calls and so it has a performance cost.
The same happens for getting a byte[] from the ArrayList. I can't cast from Byte[] to byte[], so I have to use a loop for it.
I suppose storing Byte instead of byte uses more memory.
I know ByteArrayInputStream and ByteArrayOutputStream could be used for this, but it has some inconvenients:
I wanted to implement methods for reading different data types in different byte order (for example, readInt, readLEInt, readUInt, etc), while those classes only can read / write a byte or a byte array. This isn't really a problem because I could fix that in the wrapper. But here comes the second problem.
I wanted to be able to write and read at the same time because I'm using this to decompress some files. And so to create a wrapper for it I would need to include both ByteArrayInputStream and ByteArrayOutputStream. I don't know if those could be syncronized in some way or I'd have to write the entire data of one to the other each time I wrote to the wrapper.
And so, here comes my question: would using a ByteBuffer be more efficient? I know you can take integers, floats, etc from it, even being able to change the byte order. What I was wondering is if there is a real performance change between using a ByteBuffer and a ArrayList<Byte>.
Definitely ByteBuffer or ByteArrayOutputStream. In your case ByteBuffer seems fine. Inspect the Javadoc, as it has nice methods, For putInt/getInt and such, you might want to set order (of those 4 bytes)
byteBuffer.order(ByteBuffer.LITTLE_ENDIAN);
With files you could use getChannel() or variants and then use a MappedByteBuffer.
A ByteBuffer may wrap a byte array, or allocate.
Keep in mind that every object has overhead associated with it including a bit of memory per object and garbage collection once it goes out of scope.
Using List<Byte> would mean creating / garbage collecting an object per byte which is very wasteful.
ByteBuffer is a wrapper class around a byte array, it doesn't have dynamical size like ArrayList, but it consumes less memory per byte and is faster.
If you know the size you need, then use ByteBuffer, if you don't, then you could use ByteArrayOutputStream (and maybe wrapped by ObjectOutputStream, it has some methods to write different kinds of data). To read the data you have written to ByteArrayOutputStream you can extend the ByteArrayOutputStream, and then you can access the fields buf[] and count, those fields are protected, so you can access them from extending class, it look like:
public class ByteArrayOutputStream extends OutputStream {
/**
* The buffer where data is stored.
*/
protected byte buf[];
/**
* The number of valid bytes in the buffer.
*/
protected int count;
...
}
public class ReadableBAOS extends ByteArrayOutputStream{
public byte readByte(int index) {
if (count<index) {
throw new IndexOutOfBoundsException();
}
return buf[index];
}
}
so you can make some methods in your extending class to read some bytes from the underlying buffer without the need to make an copy of its content each time like toByteArray() method do.

Optimizing Java Array Copy

So for my research group I am attempting to convert some old C++ code to Java and am running into an issue where in the C++ code it does the following:
method(array+i, other parameters)
Now I know that Java does not support pointer arithmetic, so I got around this by copying the subarray from array+i to the end of array into a new array, but this causes the code to run horribly slow (I.e. 100x slower than the C++ version). Is there a way to get around this? I saw someone mention a built-in method on here, but is that any faster?
Not only does your code become slower, it also changes the semantic of what is happening: when you make a call in C++, no array copying is done, so any change the method may apply to the array is happening in the original, not in the throw-away copy.
To achieve the same effect in Java change the signature of your function as follows:
void method(array, offset, other parameters)
Now the caller has to pass the position in the array that the method should consider the "virtual zero" of the array. In other words, instead of writing something like
for (int i = 0 ; i != N ; i++)
...
you would have to write
for (int i = offset ; i != offset+N ; i++)
...
This would preserve the C++ semantic of passing an array to a member function.
The C++ function probably relied on processing from the beginning of the array. In Java it should be configured to run from an offset into the array so the array doesn't need to be copied. Copying the array, even with System.arraycopy, would take a significant amount of time.
It could be defined as a Java method with something like this:
void method(<somearraytype> array, int offset, other parameters)
Then the method would start at the offset into the array, and it would be called something like this:
method(array, i, other parameters);
If you wish to pass a sub-array to a method, an alternative to copying the sub-array into a new array would be to pass the entire array with an additional offset parameter that indicates the first relevant index of the array. This would require changes in the implementation of method, but if performance is an issue, that's probably the most efficient way.
The right way to handle this is to refactor the method, to take signature
method(int[] array, int i, other parameters)
so that you pass the whole array (by reference), and then tell the method where to start its processing from. Then you don't need to do any copying.

How can I securely wipe a confidential data in memory in java with guarantee it will not be 'optimized'?

String secret="foo";
WhatILookFor.securelyWipe(secret);
And I need to know that it will not be removed by java optimizer.
A String cannot be "wiped". It is immutable, and short of some really dirty and dangerous tricks you cannot alter that.
So the safest solution is to not put the data into a string in the first place. Use a StringBuilder or an array of characters instead, or some other representation that is not immutable. (And then clear it when you are done.)
For the record, there are a couple of ways that you can change the contents of a String's backing array. For example, you can use reflection to fish out a reference to the String's backing array, and overwrite its contents. However, this involves doing things that the JLS states have unspecified behaviour so you cannot guarantee that the optimizer won't do something unexpected.
My personal take on this is that you are better off locking down your application platform so that unauthorized people can't gain access to the memory / memory dump in the first place. After all, if the platform is not properly secured, the "bad guys" may be able to get hold of the string contents before you erase it. Steps like this might be warranted for small amounts of security critical state, but if you've got a lot of "confidential" information to process, it is going to be a major hassle to not be able to use normal strings and string handling.
You would need direct access to the memory.
You really wouldn't be able to do this with String, since you don't have reliable access to the string, and don't know if it's been interned somewhere, or if an object was created that you don't know about.
If you really needed to this, you'd have to do something like
public class SecureString implements CharSequence {
char[] data;
public void wipe() {
for(int i = 0; i < data.length; i++) data[i] = '.'; // random char
}
}
That being said, if you're worried about data still being in memory, you have to realize that if it was ever in memory at one point, than an attacker probably already got it. The only thing you realistically protect yourself from is if a core dump is flushed to a log file.
Regarding the optimizer, I incredibly doubt it will optimize away the operation. If you really needed it to, you could do something like this:
public int wipe() {
// wipe the array to a random value
java.util.Arrays.fill(data, (char)(rand.nextInt(60000));
// compute hash to force optimizer to do the wipe
int hash = 0;
for(int i = 0; i < data.length; i++) {
hash = hash * 31 + (int)data[i];
}
return hash;
}
This will force the compiler to do the wipe. It makes it roughly twice as long to run, but it's a pretty fast operation as it is, and doesn't increase the order of complexity.
Store the data off-heap using the "Unsafe" methods. You can then zero over it when done and be certain that it won't be pushed around the heap by the JVM.
Here is a good post on Unsafe:
http://highlyscalable.wordpress.com/2012/02/02/direct-memory-access-in-java/
If you're going to use a String then I think you are worried about it appearing in a memory dump. I suggest using String.replace() on key-characters so that when the String is used at run-time it will change and then go out of scope after it is used and won't appear correctly in a memory dump. However, I strongly recommend that you not use a String for sensitive data.

Standard API way to check if one array is contained in another

I have two byte[] arrays in a method like this:
private static boolean containsBytes(byte[] body, byte[] checker){
//Code you do not want to ever see here.
}
I want to, using the standard API as much as possible, determine if the series contained in the checker array exists anywhere in the body array.
Right now I'm looking at some nasty code that did a hand-crafted algorithm. The performance of the algorithm is OK, which is about all you can say for it. I'm wondering if there is a more standard api way to accomplish it. Otherwise, I know how to write a readable hand-crafted one.
To get a sense of scale here, the checker array would not be larger than 48 (probably less) and the body might be a few kb large at most.
Not in the standard library (like Jon Skeet said, probably nothing there that does this) but Guava could help you here with its method Bytes.indexOf(byte[] array, byte[] target).
boolean contained = Bytes.indexOf(body, checker) != -1;
Plus, the same method exists in the classes for the other primitive types as well.
I don't know of anything in the standard API to help you here. There may be something in a third party library, although it would potentially need to be implemented repeatedly, once for each primitive type :(
EDIT: I was going to look for Boyer-Moore, but this answer was added on my phone, and I ran out of time :)
Depending on the data and your requirements, you may find that a brute force approach is absolutely fine - and a lot simpler to implement than any of the fancier algorithms available. The simple brute force approach is generally my first port of call - it often turns out to be perfectly adequate :)
You probably already know this, but what you're trying to (re-)implement is basically a string search:
http://en.wikipedia.org/wiki/String_searching_algorithm
The old code might in fact be an implementation of one of the string search algorithms; for better performance, it might be good to implement one of the other algorithms. You didn't mention how often this method is going to be called, which would help to decide whether it's worth doing that.
The collections framework can both cheaply wrap an array in the List interface and search for a sublist. I think this would work reasonably well:
import java.util.Arrays;
import java.util.Collections;
boolean found = Collections.indexOfSubList(Arrays.asList(body), Arrays.asList(checker) >= 0;

Categories

Resources