AccessViolationException on SByte[].Length - java

First of all, I took a look into other AccessViolationException problems here on SO, but mostly I didn't understand much because of terms like "marshalling" and unsafe code, etc.
Context: I try to port some of the Java Netty code to C#. Maybe I mixed up the two languages and don't see it anymore.
I just have two methods, one forwarding parameters to the other.
public override ByteBuf WriteBytes(sbyte[] src)
{
WriteBytes(src, 0, src.Length);
return this;
}
public override ByteBuf WriteBytes(sbyte[] src, int srcIndex, int length)
{
SetBytes(WriterIndex, src, srcIndex, length); // AccessViolationException throws here
return this;
}
Now, I'm testing the first method in my unit test, like this:
var byteArray = new sbyte[256];
for (int i = 0, b = sbyte.MinValue; b <= sbyte.MaxValue; i++, b++) // -128 ... 127
{
byteArray[i] = (sbyte) b;
}
buffer.WriteBytes(byteArray); // this is the invokation
What I found so far is, that the problem seems to arise from the "length" parameter. I don't know, but maybe I'm not allowed to use "src.Length" in the first method to pass it to the second.
Also, please note that I am using sbyte, not byte. (I hope this doesn't matter.)
Does this have to do with pass-by-reference or pass-by-value of arrays?
Edit:
I found that the exception must be thrown anywhere in the depths of SetBytes. However, I believed that SetBytes never was called because I set a breakpoint on the method's entry. But the debugger never stopped there. I believe that the debugger doesn't work properly as it sometimes doesn't stop at the breakpoints I set. After I have managed to debug the whole depths of SetBytes, the AccessViolationException was never thrown. I then RUN the test 10x and the exception didn't appear. Then I DEBUGged the whole thing again and the exception appeared again.
Why is that??

Related

Should a local variable be introduced for an array element at a specific index which is accessed repeatedly?

If an array element at a specific index is accessed repeatedly in a loop, should a local variable be introduced, in sense of performance ? Aka. does the array access by index bring overhead?
e.g
public void test(int[] arr) {
for (int i = 0; i < (1 << 20); i++) {
System.out.println(arr[0]);
}
}
public void test2(int[] arr) {
int first = arr[0];
for (int i = 0; i < (1 << 20); i++) {
System.out.println(first);
}
}
Is test2() better than test(), in the sense of performance ?
Update - languages of interests
Golang, C, Java
The question is entirely dependent of the language used and more specifically of the toolchain used (compilers, JIT, interpreters, etc.). Since the provided code is in Java, I will consider the case of Java using a mainstream JVM like HotSpot for example.
Mainstream JVM implementations can optimize this themselves easily as long as the loop is a hot loop. Indeed, JVMs can know that arr[0] is a constant here. This is the case here, especially if the function is executed multiple times. Thus, like most of the time, it is not a problem and you should not care about such micro-optimizations unless you get a benchmark that shows it is actually a problem. The proposed optimization does not matter here because the println call will be several orders of magnitude slower than anything else in the loop.
Note however, when you have a lot of small loops and the code is executed only few time or very rarely, then the second code can be slightly faster. This reason is that the second code results in a bit less efficient bytecode that may not be directly optimized by the JVM due to the cost of compiling the bytecode to a fast native code (it as to find a trade-off).
Jérôme Richard's answer has the most important part in it: don't worry about this kind of micro-optimization unless/until you have a benchmark showing that it's important.
I'll answer from the Go and C sides in a different way though. The two bits of code have different meanings here. (I'm not really a Java programmer so I'll just refer to Aliasing in Java with Arrays and Reference Types for the Java variant of this point.) Let's also change the code so that we have a mystery function, rather than some known-to-do-nothing-but-print function:
/* C */
extern void f(int);
void test(int *arr) {
int i;
for (i = 0; i < (1 << 20); i++) {
f(arr[0]);
}
}
// Go
func test(arr []int, f func(int)) {
for i := 0; i < (1 << 20); i++ {
f(arr[0])
}
}
Now let's consider a valid call to test. Here's part of the C-language implementation of f:
extern int A[];
void f(int arg) {
/* do something with arg */
A[0]++;
}
The call to test reads:
test(A);
That is, arr in test is A, and f() modifies A[0]. So each call to f() needs to pass a different integer value.
If you modify test to read:
/* C */
extern void f(int);
void test(int *arr) {
int i;
int arg = arr[0];
for (i = 0; i < (1 << 20); i++) {
f(arg);
}
}
then suddenly each call to f passes the original value only from A[0]. So these programs have different meanings.
Go and C have similar aliasing rules. However, Go compilers can often "see further" than C compilers (because the compiler usually gets a better chance to do function inlining, if nothing else) and hence detect whether or not some aliasing may be taking place. It's easier, in a sense, for a Go compiler to grab the arr[0] value once outside the loop, if that's possible, than it is for the C compiler. That's not a function of the language itself: it's a function of the traditional ways that C and Go compilers have been written.
Still, the upshot of all this is that if you intend to pass the same value to your function every trip through the loop, you can write that as code by copying arr[0] to a local variable before running the loop. If you intend to allow arr[0] to be modified each trip through the loop, you can write that by writing the variant without a local variable—but it might also be wise to put in a comment, noting that the called function is intended to be able to modify the array element.
Write the code so that the reader can understand the intent first. Then, if and when it proves to be a bottleneck, write the code in some more-obscure-but-faster form, if that's possible and appropriate.

Is ByteBuf.arrayOffset useless?

I'm learning Netty in Action.
At the chapter 5.2.2 ByteBuf usage patterns, there is a piece of code that confused me. It is shown below.
ByteBuf heapBuf = ...
if (heapBuf.hasArray()) {
byte[] array = heapBuf.array();
int offset = heapBuf.arrayOffset() + heapBuf.readerIndex();
int lenght = heapBuf.readableBytes();
handleArray(array, offset, length)
}
I wondered what is the use case of the ByteBuf.arrayOffset() method. The documentation for that method reads:
Returns the offset of the first byte within the backing byte array of this buffer.
Then, I looked up the arrayOffset() method in UnpooledHeapByteBuf.java which implements ByteBuf. The implementation for the method always just returns 0, as seen below.
#Override
public int arrayOffset() {
return 0;
}
So, is ByteBuf#arrayOffset useless?
There may be other implementations for ByteBuf and it could be possible that they have a more useful or even complex implementation.
So for the case of UnpooledHeapByteBuf returning 0 works but that does not mean that there aren't other implementations of ByteBuf that need a different implementation.
The method should do what the documentation states and you could imagine that other implementations indeed have an offset that is different to 0. For example if they use something like a circular-array as backing byte array.
In that case the method needs to return the index of where the current start pointer is located at and not 0.
Here's an example-image showing such a circular-array (the current pointer is at index 2 and not at 0, it moves around the array while using it):
And on the user-side, if you want to safely use your ByteBuf object you also should use the method. You can avoid using it if you operate on UnpooledHeapByteBuf but even then you should not because it could be possible that they change the internal behavior with future versions.
No its not useless at all as it allows us to have one huge byte array back multiple ByteBuf implementations. This in fact is done in PooledHeapByteBuf

Recommended way to handle problems/errors in algorithms

Keeping stacktrace out of it, lets say that the idea of 'error' is a problem that you didn't want to occur, but did.
If I were to use a boolean system to check if the action successfully completed, it would look something like this:
String[] array = new String[10];
int i = 0;
public boolean accessValue(int id) {
if(id < array.length) {
//do something
return true;
}
return false;
}
while(true) {
if(!accessValue(i++)) {
//tend to situation
}
}
If I were to use Exceptions, it would look like this:
class InvalidAccessException extends Throwable {
}
public boolean accessValue(int id) throws InvalidAccessException {
if(!id < array.length || !id >= 0)
throw new InvalidAccessException();
//do something
}
while(true) {
try {
accessValue(i++);
}catch(InvalidAccessException e) {
//tend to situation
}
}
The only thing that matters to me is that when a problem occurs, I'm notified in some way, and I will have an option to handle the situation. Which way is more practiced? Does it just depend on the situation, or are there reasons for picking one over the other?
The first approach you mention, is more C oriented, in which you have functions yielding various integers to denote how did the function fair during its execution.
Although this worked it (in my opinion) introduced extra problems where the developer would need to go through other documentation or other developer code to understand why was the particular value returned.
In Java, as far as I know the way to go is always to throw exceptions when something goes wrong (even when you expect it to go wrong). The obvious advantage of using exceptions is that the code is more readable, just by seeing your method signature I know what potential issues could your method cause. This would allow me to code quicker since I do not need to dig through your own documentation or code just to see what the behaviour of your method is (although I could potentially need to dig through documentation/code to see if I can find a solution to why is your code throwing exceptions).
Also, since Java does not have an implementation of a tuple to return error codes and values you would need to create your own which could affect code re usability and readability, which in my opinion is always something you should avoid.
EDIT:
What if my intention isn't to go back into my code and find where the
error was thrown to fix it. just wanted to be notified that an error
happened, in a way I can easily handle the situation in some way.
Rather than me going into the code and fixing it manually, I want to
be able to trigger another set of code (like a handleError() method),
which has an algorithm that will even things out. (whichever algorithm
I may choose). Will handling with Exceptions give me any advantage in
this case?
Yes it should since exception handling will allow you handle exceptional events, so in your code, you could have this:
while(true) {
try {
accessValue(i++);
}catch(InvalidAccessException e) {
//Execute algorithms here
}
}
Having a stack trace is helpful when, as you are saying, you are debugging a problem since it provides information of which methods where called when your program crashed. That being said, they are not the only benefit of using exceptions (as mentioned above).
Another potential problem I see with using return values is when different developers work on the same function. So you could have something like so designed by one developer:
function int doSomething()
{
//if it worked, yield 0
//else yield some number between 1 and 10
}
Then another developer comes along which believes that errors should have negative numbers and extends the above method,
function int doSomething()
{
//if it worked, yield 0
//else yield some number between 1 and 10
//something else went wrong, return -1
}
The above would mean that you would need to go through all other functions calling doSomething() and see that they now handle the case where the return value is negative. This is cumbersome and is also error prone.
EDIT 2:
I hope I am getting your point. I see this issue when you return true/false:
Assume this:
public boolean foo(arg1, arg2)
{
if(arg1 is invalid) return false;
if(arg2 is invalid) return false;
}
In the above example, what does false mean? Does it mean arg1 is invalid or arg2? What if you need to trigger different algorithms for different parameter validity?

Is compiler able to optimize reference creation?

Take for example a loop like this:
public boolean method(){
for (int i = 0; i < 5; i++) {
if (this.object.getSomething().getSomeArray().get(i).getArray().size() > 0)
return false;
}
return true;
}
Each get method simply retrieves a private attribute. A more readable version of the same code would be:
public boolean method(){
MySomeArray mySomeArray = this.object.getSomething().getSomeArray();
for (int i = 0; i < 5; i++) {
MyArray array = mySomeArray.get(i).getArray();
if (array.size() > 0)
return false;
}
return true;
}
Another version is:
public boolean method(){
MySomeArray mySomeArray = this.object.getSomething().getSomeArray();
MyArray array;
for (int i = 0; i < 5; i++) {
array = mySomeArray.get(i).getArray();
if (array.size() > 0)
return false;
}
return true;
}
I know that in theory compilers can optimize many things and in this case, (in my opinion) the three versions of the loop should be optimized in exactly the same machine code.
Am I correct or there would be difference in terms of number of instructions executed in the three versions?
If MySomeArray, as well as all other classes involved in your dereference chain, are at the bottom of their respective class hierarchies, then HotSpot will have an easy time turning all those virtual function calls into "plain" (non-virtual) calls by a technique known as monomorphic call site optimization.
This can also happen even if the classes involved are not leaf classes. The important thing is that at each call site, only one object type ever gets dispatched on.
With the uncertainty of virtual functions out of the way, the compiler can proceed to inline all the calls, and then to perform any further optimizations, like hoisting in your case. The ultimate values retrieved from the chain of dereferencing can be bound to registers, etc.
Note that much of the above is subject to the entire code path being free of any happens-before relations to the actions of other threads. In practice this mostly means no volatile variable access and no synchronized blocks (within your own code as well as within all the code called from your code).
Write a test case that uses this method and print the generated assembly code when you run it. You can then check yourself how many of the calls are inlined. I'm skeptical about the compiler being able to inline them all, but the JIT compiler can be surprising.
I would prefer the more readable version anyway, because it's more readable.
With enough inlining, the compiler can indeed hoist the method calls out of the loop, very much like you did by hand in your second and third examples. The details of whether it will actually do this depend entirely on the behavior and size of the methods in question, and the sophistication of the JIT involved.
I wrote up your example and tested it with Caliper, and all the methods have equivalent timings. I didn't inspect the assembly, since that's more involved - but I'll bet they are near equivalent.
The trouble is that you are making assumptions that the compiler cannot make.
You know that this.object.getSomething().getSomeArray() does not change each time around the loop but the compiler has no way to know that. Especially since other threads may potentially be modifying those variables at the same time...

Returning null from native methods using JNI

I have some native code which returns a jbyteArray (so byte[] on the Java side) and I want to return null. However, I run into problems if I simply return 0 in place of the jbyteArray.
Some more information:
The main logic is in Java, the native method is used to encode some data into a byte stream. don;t ask.. it has to be done like this. Recently, the native code had to be changed a bit and now it runs horribly horrible slow. After some experimentation, which included commenting out all code in the native method before the return, it turns out that returning 0 causes the slowdown. When returning an actual jbyteArray, everything is fine.
Method signatures for my code:
On the C++ side:
extern "C" JNIEXPORT jbyteArray JNICALL Java_com_xxx_recode (JNIEnv* env, jclass java_this, jbyteArray origBytes, jobject message)
On the Java side:
private static native byte[] recode(byte[] origBytes, Message message);
The native code looks something like this:
jbyteArray javaArray;
if (error != ERROR) {
// convert to jbyteArray
javaArray = env->NewByteArray((jsize) message.size);
env->SetByteArrayRegion(java_array, 0, message.size, reinterpret_cast<jbyte*>(message.buffer()));
if (env->ExceptionOccurred()) {
env->ExceptionDescribe();
error = ERROR;
}
}
if (error == ERROR) {
return 0; // Does NOT work - doesn't crash, just slows everything down horrible.
}
else {
return javaArray; // Works perfectly.
}
Does anyone know of any reasons that this could happen? Is it valid to return NULL from a native method in place of a jbyteArray, or is there another procedure to return null back to Java. Unfortunately, I had no luck on Google.
Thanks!
EDIT: Added additional information.
This is an old question but I had it too a minute ago...
You say in your question:
return 0; // Does NOT work - doesn't crash, just slows everything down horrible.
I just gave a try actually, with a jintArray as this is what my code has to allocate and return, unless an error happens (defined by some criteria not related to this topic) in which case it has to return a null result.
It happens that returning NULL (defined as ((void*)0)) works perfectly and is interpreted as null when back to the Java side. I didn't notice any degradation of the performances. And unless I missed anything returning 0 with no void * cast would not change anything to this.
So I don't think this was the cause of the slowdown you encountered. NULL looks just fine to return null.
EDIT:
I do confirm, the return value has nothing to do with performances. I just tested a same code returning a null value on a side, and its counterpart returning an object (a jintArray) on the other. Performances are similar for NULL, a jintArray of size 0, and a random jintArray of a few KBs allocated statically.
I also tried changing the value of a caller class's field, and returing void, with roughly the same performances. A very very little bit slower, probably due to the reflection code needed to catch that field and set it.
All these tests were made under Android, not under Java standalones - maybe this is why? (see comments):
An API 17 x86 emulator running under HAXM
An API 19 one, running under the same conditions
Two API 19 physical devices - an Asus tablet and a Galaxy 5 - running under Dalvik.
There's some asymmetry in your code that struck my eye: you never decide upon the type of object to return, except when returning 'nothing'. Apparently the env object decides how to allocate a javaSrray, so why not ask it to return some kind of empty array? It may be possible that the 0 that you return needs to be handled in a special way while marshaling between jni and java.
Have you tried returning a NULL reference?
This is untested (don't have a JNI development environment at hand at the moment) but you should be able to create a new global reference to NULL and return it like this:
return (*env)->NewGlobalRef(env, NULL);
EDIT That being said, you check if an exception occurs, but do not clear it. That, as far as I can understand, means that it is still "thrown" in the Java layer, so you should be able to use just that as an error indicator; then it does not matter what the function returns. In fact, calling a JNI function other than ExceptionClear()/ExceptionDescribe() when an exception is thrown is not "safe" according to the documentation. That the functions is "slow" might be caused by the ExceptionDescribe() function writing debugging information.
So, if I understand this correctly, this should be a well-behaved function throwing an exception the first time an error occurs, and returning NULL on each subsequent call (until 'error' is cleared):
if (error != ERROR) {
jbyteArray javaArray = env->NewByteArray((jsize) message.size);
env->SetByteArrayRegion(javaArray, 0, message.size, reinterpret_cast<jbyte*>(message.buffer()));
if (env->ExceptionOccurred()) {
error = ERROR;
return 0;
}
return javaArray;
} else {
return env->NewGlobalRef(NULL);
}
Again, this is untested since I dont have a JNI environment available right now.

Categories

Resources