java jna - get byte array by reference java.lang.IndexOutOfBoundsException

java jna - get byte array by reference java.lang.IndexOutOfBoundsException - java

I'm using JNA and I get a strange error getting a byte array.
I use this code:
PointerByReference mac=new PointerByReference();
NativeInterface.getMac(mac);
mac.getPointer().getByteArray(0,8)
And it throws a IndexOutOfBoundsException: Bounds exceeds available space : size=4, offset=8 also if I'm sure thate the byte returned is a 8byte length.
I tried to get that array as String:
mac.getPointer().getString(0)
And here I get successfully a String 8 chars lenght.
Can you understand why?
Thank you.

PointerByReference.getValue() returns the Pointer you're looking for. PointerByReference.getPointer() returns its address.
mac.getPointer().getByteArray(0, 8) is attempting to read 8 bytes from the PointerByReference allocated memory (which is a pointer), and put those bytes into a Java primitive array. You're asking for 8 bytes but there are only 4 allocated, thus the corresponding error.
mac.getPointer().getString(0) is attempting to read a C string from the memory allocated for a pointer value (as if it were const char *, and convert that C string into a Java String. It only bounds-checks the start of the string on the Java side, so it will keep reading memory (even if it is technically out of bounds) until it finds a zero value.
EDIT
mac.getValue().getByteArray(0, 8) will give you what you were originally trying to obtain (an array of 8 bytes).
EDIT
If your called function is supposed to be writing to a buffer (and not writing the address of a buffer), then you should change its signature to accept byte[] instead, e.g.
byte[] buffer = new byte[8];
getMac(buffer);

Related

How can an array of a data type with the size [1] suddenly be bigger than the datatype itself in Java

today I have been experimenting with memory in java. Specifically, I was deserializing objects into binary data and reserializing them. Something caught my eye and that is that for example an array of bytes with the size 1 takes up less binary data than defining a byte. Here's what I mean:
I defined a single byte in java and printed out the binary data of the byte:
byte size: 75bytes
101011001110110100000000000001010111001101110010000000000000111001101010011000010111011001100001001011100110110001100001011011100110011100101110010000100111100101110100011001011001110001001110011000001000010011101110010100001111010100011100000000100000000000000001010000100000000000000101011101100110000101101100011101010110010101111000011100100000000000010000011010100110000101110110011000010010111001101100011000010110111001100111001011100100111001110101011011010110001001100101011100101000011010101100100101010001110100001011100101001110000010001011000000100000000000000000011110000111000000000001
and here's a byte[1] array
byte[] size: 28bytes
10101100111011010000000000000101011101010111001000000000000000100101101101000010101011001111001100010111111110000000011000001000010101001110000000000010000000000000000001111000011100000000000000000000000000000000000100000001
But if I print out the size of byte[0] (byte at location 0 in the array) it suddenly grows back to 75bytes:
size of byte[0] in byte array:
101011001110110100000000000001010111001101110010000000000000111001101010011000010111011001100001001011100110110001100001011011100110011100101110010000100111100101110100011001011001110001001110011000001000010011101110010100001111010100011100000000100000000000000001010000100000000000000101011101100110000101101100011101010110010101111000011100100000000000010000011010100110000101110110011000010010111001101100011000010110111001100111001011100100111001110101011011010110001001100101011100101000011010101100100101010001110100001011100101001110000010001011000000100000000000000000011110000111000000000001
And yes, it's the full object and not metadata or something because using this binary i can reconstruct the object to it's original state so the values are stored inside the binary data. Here's the code I used to find out the size of the data:
public class MemoryFunctions {
static int sizeOf(Object input) {
int size = 0;
ByteArrayOutputStream checker = new ByteArrayOutputStream();
try {
ObjectOutputStream byteArray = new ObjectOutputStream(checker);
byteArray.writeObject(input);
byteArray.flush();
byte sizeDetector[] = checker.toByteArray();
size = sizeDetector.length;
int amountOfBytes = 0;
for (byte b:
sizeDetector) {
System.out.print(String.format("%8s", Integer.toBinaryString(b & 0xFF)).replace(' ', '0'));
amountOfBytes +=1;
}
System.out.println("real size in byte " + amountOfBytes);
System.out.println();
} catch (Exception e) {
System.err.println(e);
}
return size;
}
}
Is there any reason, that a byte array takes up less space than the byte itself? I need to heavily optimize a program. Using this information, would it be a better idea to have the values of a class that I want to deserialize into binary data in array form or are there any benefits of using "the full value"? Also, I am kind of confused with this information because as far as I know, byte and byte[] are primitive datatypes so they don't get called by reference but are stored as binary in memory "as is". What I also found out that getting the size of value 0 in the smaller array suddenly generates me a new int because it's size is again 75 bytes. Does this mean that values are generated another time when you call an index of an array?
It'd be nice if any of you had more information about this topic and could answer my questions.

In Java byte is a primitive type passed by-value, but all arrays are effectively objects and by-reference, including byte[].
If you pass a byte[1] to your sizeOf it passes the array, because all object types are subclasses of and compatible with the parameter Object input, and sizeOf serializes the array.
If you try to pass a primitive byte it doesn't work, because no primitive type is a reference type so it cannot be compatible with java.lang.Object or any other object type. Instead the byte is 'boxed' to an an object of the language-defined class java.lang.Byte (note different spelling) -- and sizeOf serializes that object. This is often called auto-boxing (and auto-unboxing for the reverse) because the compiler does these conversions without you writing them in the source code. The boxed object is actually about the same size in memory as the array (and both are significantly larger than the primitive), but as commented by https://stackoverflow.com/users/869736/louis-wasserman the serialization of the java.lang.Byte object is more complicated and longer than the serialization of the byte[1] array.

What happens when a Java String overflows?

As far as I understand, Java Strings are just an array of characters, with the maximum length being an integer value.
If I understand this answer correctly, it is possible to cause an overflow with a String - albeit in "unusual circumstances".
Since Java Strings are based on char arrays and Java automatically checks array bounds, buffer overflows are only possible in unusual scenarios:
If you call native code via JNI
In the JVM itself (usually written in C++)
The interpreter or JIT compiler does not work correctly (Java bytecode mandated bounds checks)
Correct me if I'm wrong, but I believe this means that you can write outside the bounds of the array, without triggering the ArrayIndexOutOfBounds (or similar) exception.
I've encountered issues in C++ with buffer overflows, and I can find plenty of advice about other languages, but none specifically answering what would happen if you caused a buffer overflow with a String (or any other array type) in Java.
I know that Java Strings are bounds-checked, and can't be overflowed by native Java code alone (unless issues are present in the compiler or JVM, as per points 2 and 3 above), but the first point implies that it is technically possible to get a char[] into an... undesirable position.
Given this, I have two specific questions about the behaviour of such issues in Java, assuming the above is correct:
If a String can overflow, what happens when it does?
What would the implications of this behaviour be?
Thanks in advance.

To answer you first question, I had the luck of actually causing a error of such, and the execution just stopped throwing one of these errors:
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
So that was my case, I don't know if that represents a security problem as buffer overflow in C and C++.

A String in Java is immutable, so once created there is no writing to the underlying array of char or array of byte (it depends on the Java version and contents of the String whether one or the other is used). Ok, using JNI could circumvent that, but with pure Java, it is impossible to leave the bounds of the array without causing an ArrayOutOfBoundsException or alike.
The only way to cause a kind of an overflow in the context of String handling would be to create a new String that is too long. Make sure that your JVM will have enough heap (around 36 GB), create a char array of Integer.MAX_VALUE - 1, populate that appropriately, call new String( byte [] ) with that array, and then execute
var string = string.concat( new String( array ) );
But the result is just an exception telling you that it was attempted to create a too large array.

Use Java unsafe to point char array to a memory location

Some analysis on a Java application showed that it's spending a lot of time decoding UTF-8 byte arrays into String objects. The stream of UTF-8 bytes are coming from a LMDB database, and the values in the database are Protobuf messages, which is why it's decoding UTF-8 so much. Another problem being caused by this is Strings are taking up a large chunk of memory because of the decoding from the memory-map into a String object in the JVM.
I want to refactor this application so it does not allocate a new String every time it reads a message from the database. I want the underlying char array in the String object to simply point to the memory location.
package testreflect;
import java.lang.reflect.Field;
import sun.misc.Unsafe;
public class App {
public static void main(String[] args) throws Exception {
Field field = Unsafe.class.getDeclaredField("theUnsafe");
field.setAccessible(true);
Unsafe UNSAFE = (Unsafe) field.get(null);
char[] sourceChars = new char[] { 'b', 'a', 'r', 0x2018 };
// Encoding to a byte array; asBytes would be an LMDB entry
byte[] asBytes = new byte[sourceChars.length * 2];
UNSAFE.copyMemory(sourceChars,
UNSAFE.arrayBaseOffset(sourceChars.getClass()),
asBytes,
UNSAFE.arrayBaseOffset(asBytes.getClass()),
sourceChars.length*(long)UNSAFE.arrayIndexScale(sourceChars.getClass()));
// Copying the byte array to the char array works, but is there a way to
// have the char array simply point to the byte array without copying?
char[] test = new char[sourceChars.length];
UNSAFE.copyMemory(asBytes,
UNSAFE.arrayBaseOffset(asBytes.getClass()),
test,
UNSAFE.arrayBaseOffset(test.getClass()),
asBytes.length*(long)UNSAFE.arrayIndexScale(asBytes.getClass()));
// Allocate a String object, but set its underlying
// byte array manually to avoid the extra memory copy
long stringOffset = UNSAFE.objectFieldOffset(String.class.getDeclaredField("value"));
String stringTest = (String) UNSAFE.allocateInstance(String.class);
UNSAFE.putObject(stringTest, stringOffset, test);
System.out.println(stringTest);
}
}
So far, I've figured out how to copy a byte array to a char array and set the underlying array in a String object using the Unsafe package. This should reduce the amount of CPU time the application is wasting decoding UTF-8 bytes.
However, this does not solve the memory problem. Is there a way to have a char array point to a memory location and avoid a memory allocation altogether? Avoiding the copy altogether will reduce the number of unnecessary allocations the JVM is making for these strings, leaving more room for the OS to cache entries from the LMDB database.

I think you are taking the wrong approach here.
So far, I've figured out how to copy a byte array to a char array and set the underlying array in a String object using the Unsafe package. This should reduce the amount of CPU time the application is wasting decoding UTF-8 bytes.
Erm ... no.
Using memory copy to copy from a byte[] to char[] is not going to work. Each char in the destination char[] will actually contain 2 bytes from the original. If you then try to wrap the char[] as a String, you will get a weird kind of mojibake.
What a real UTF-8 to String conversion does it to convert between 1 and 4 bytes (codeunits) representing a UTF-8 codepoint into 1 or 2 16-bit codeunits representing the same codepoint in UTF-16. That cannot be done using a plain memory copy.
If you aren't familiar with it, it would be worth reading the Wikipedia article on UTF-8 so that you understand how the text is encoded.
The solution depends on what you intend to do with the text data.
If the data must really be in the form of String (or StringBuilder or char[]) objects, then you really have no choice but to do the full conversion. Try anything else and you are liable to mess up; e.g. garbled text and potential JVM crashes.
If you want something that is "string like", you could conceivably implement a custom subclass of CharSequence, that wraps the bytes in the messages and decodes the UTF-8 on the fly. But doing that efficiently make be a problem, especially implementing the charAt method as an O(1) method.
If you are simply wanting to hold and/or compare the (entire) texts, this could possibly be done by representing them as or in a byte[] objects. These operations can be performed on the UTF-8 encoded data directly.
If the input text could actually be sent in character encoding with a fixed 8-bit character size (e.g. ASCII, Latin-1, etc) or as UTF-16, that simplifies things.

c++ (u256)(h256 const)(char*[] + int) cast rewriting to java

i need to to rewrite some code from c++ to java and i've got into trouble with such c++ code:
using u256 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<256, 256, boost::multiprecision::unsigned_magnitude, boost::multiprecision::unchecked, void>>;
using h256 = FixedHash<32>;
using bytes = std::vector<byte>;
uint32_t offset = ...;
bytes m_data = ...;
u256 result;
result = (u256)*(h256 const*)(m_data.data() + (size_t)offset);
I have no idea what's going on and how do i rewrite it in java code.
I've understood that firstly we made and offset and now pointing at some element of m_data array, then cast in to array of h256 type (i've watched debug and this cast made the following: we get data from 0 to offset from m_data and then cast to 32 size array with leading zero's)
And then we get a first value (im not sure about it) of this array and cast to u256? But the first value after (h256 const*) cast is zero but anyway the resulting value is not a zero.
Do u have any ideas?

I don't know what a u256 is, and the question miss the typedef, but this is the typical way in C to get a scalar type (int16_t, int32_t, int64_t, double....) from a buffer in memory.
Essentially the use of the syntax:
type t = (type)*(const type *)(buffer + offset)
... let you obtain an object of a specific type from a byte array starting from a particular index.
It's not very safe, but it blazing fast when converted to assembly!
NOTE: the pointer math depends from the declaration of "buffer", if it's int8_t * for instance buffer will be get from the "offset"-nth byte, if it's int32_t * it will be used from the "offset * 4"-nth byte.

conversion of byte array to string causing OOM

In my application i m storing strings using randomaccessfile and while reading back the string i need to convert byte array to string, which is causing OOM. Is there a better way to convert other than this
str = new String(b, "UTF-8");
where b is byte array

Is there a better way to convert other than new String(bytes, "UTF-8") ?
This is actually a rather complicated question.
This constructor cannot simply incorporate the byte[] into the string:
Prior to Java 9, it is always necessary to decode the byte array to a UTF-16 coded array of char. So the constructor is liable to allocate roughly double the memory used by the source byte[].
With Java 9 you have the option of using a new compact representation for String. If you do the AND if the UTF-8 encoded byte array only contains code-points in Unicode code-plane zero (\u0000 to \u00ff) then the String value is a byte[]. However, even in this case the constructor must copy the bytes to a new byte[].
In both cases, there is no more space-efficient way to create a String from a byte[]. Furthermore, I don't think there is a more space-efficient way do the conversion starting with a stream of bytes and a character count. (I am excluding things like modifying the lava.lang.* implementation, or breaking abstraction using reflection.)
Bottom line: when converting a byte[] to a String you should allow at least twice as much contiguous free memory as the original byte[] if you want your code to work on older JVMs.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.