How much overhead does a Java Foreign Linker API call have? - java

I was curious how much overhead there was in calling functions via JEP 389: Foreign Linker API. I have searched around but I can't seem to find that kind of info. I was taking a look to see if it is realistic to wrap a c++ 3d math library that uses SIMD stuff with it. For this case the overhead must be very low since the operations themselves take little time.
By my estimations on my machine it takes 10 nanoseconds to call. This is certainly very impressive but that does not work out when the target only takes nanoseconds to run!
I don't have a ton of Java experience so I was wondering if I am missing out on some configuration things or just something obvious. I'm just trying to wrap something up for Clojure.
This is (roughly) how I got my numbers:
rtm.cpp
extern "C" {
float inc_float(float f) {
return f + 1.0f;
}
}
Compiled with
clang -std=c++17 -O0 -shared -undefined dynamic_lookup -o librtm.so rtm.cpp
(also tested with -Ofast)
WrapperTest.java
import static jdk.incubator.foreign.CLinker.C_FLOAT;
import jdk.incubator.foreign.*;
import java.lang.invoke.*;
import java.nio.file.Path;
#Test public void testTonsOfSimpleCallsAlt() throws Throwable {
var path = Path.of("/path/to/librtm.so");
var libraryLookup = LibraryLookup.ofPath(path);
var incFloatHandle = CLinker.getInstance().downcallHandle(
libraryLookup.lookup("inc_float").get(),
MethodType.methodType(float.class, float.class),
FunctionDescriptor.of(C_FLOAT, C_FLOAT)
);
// warmup
for (var i = 0; i < 1024; ++i) {
float dummy = (float)incFloatHandle.invokeExact(1.0f);
}
long startTime = System.nanoTime();
float total = 1.0f;
int loops = 1024 * 1024 * 100;
int ops = 10;
for (var i = 0; i < loops; ++i) {
// The actual important call!
total = (float)incFloatHandle.invokeExact(total);
// nine more of the above line...
total = total - 10.0f;
}
long endTime = System.nanoTime();
float nanos = (float)(endTime - startTime);
System.out.println("Time taken is ms: " + (nanos / 1000000.0f));
System.out.println("Time taken per op in ns:" + (nanos / (float)loops / (float)ops));
}
JVM arguments
--add-modules jdk.incubator.foreign -Dforeign.restricted=permit
Java version
> java --version
openjdk 16 2021-03-16
OpenJDK Runtime Environment AdoptOpenJDK (build 16+36)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 16+36, mixed mode, sharing)
Looking in a profiler I am getting these sorts of call stacks around the call to the c++ function:
- WrapperTest.java:131 java.lang.invoke.LambdaForm$MH.0x0000000800227000.invokeExact_MT(Object, float, Object) 991 1
- LambdaForm$MH java.lang.invoke.LambdaForm$MH.0x0000000800229c00.invoke(Object, float) 991 1
- LambdaForm$MH java.lang.invoke.LambdaForm$MH.0x000000080021d800.invoke(Object, float, long) 991 1
I have just run this via the IntelliJ test runner.
So is the overhead 10ns per call (machine dependent blah blah) or is there a way to get this running faster?

I was really curious about this as well, so I wrote a quick benchmark to see how it compares to JNA as well as shared memory.
The results and code are here :
https://github.com/TwoClocks/benchmark_java_foreign_functions
I got around 25ns on this particular machine, which to me seems pretty good. Calling a C++ virtual function in C++ is 5-10ns-ish? Pretty good calling native code from the JVM (at least I think so). JNA comes in around 700ns. JNI might be a bit faster, but still far slower than the new foreign function stuff.

Related

Setting milliseconds of System time with java

Would it be possible to set system time with milliseconds component on Windows OS using Java?
I am trying to synchronize clocks between couple of computers but the OS API seems to offer only set time in format: HH:MM:SS.
This is what i tried:
public static void main(String[] args) throws InterruptedException, IOException {
String time="10:00:20"; // this works
String timeWithMiliseconds = "10:00:20.100"; // this doesn't change time at all
Runtime rt = Runtime.getRuntime();
rt.exec("cmd /C time " + time);
}
I am wondering how do NTP clients work if it's not possible to set milliseconds component ?
One way to deal with this issue could be to calculate time in milliseconds when server should reach next second, and sleep for that time. Is there a better, more direct way to achieve this ?
As mentioned by immibis you could use the Windows function SetSystemTime to set the time.
Find below a snippet which call the Windows function using JNA 4.2
Kernel32 kernel = Kernel32.INSTANCE;
WinBase.SYSTEMTIME newTime = new WinBase.SYSTEMTIME();
newTime.wYear = 2015;
newTime.wMonth = 11;
newTime.wDay = 10;
newTime.wHour = 12;
newTime.wMinute = 0;
newTime.wSecond = 0;
newTime.wMilliseconds = 0;
kernel.SetSystemTime(newTime);
For further information have a look into the sources of JNA and on those links
SetSystemTime Windows function and
SYSTEMTIME structure.
An introduction to JNA from 2009. Simplify Native Code Access with JNA

Java execution from commandline is slower than in IntelliJ

I have written a simple factorial program, with arbitrary precision:
public class Fac {
public static void main(String[] args) {
int stop = 100000;
long start = System.currentTimeMillis();
BigInteger integer = new BigInteger("1");
for(int i = 2; i <= stop; i++){
integer = integer.multiply(new BigInteger(i +""));
}
System.out.println("It took: " + (System.currentTimeMillis() - start) + "ms");
//System.out.println(integer);
}
}
When i run it in IntelliJ:
It took: 5392ms
When i run it in commandline:
It took: 17919ms
The commandline is run by:
javac Fac.java
java Fac
I know this is not the best way to measure time but the gap is soo huge that it does not matter.
Why is the performence that different?
Other people has noticed similar difference, however, as far as i can tell, their conclusions seem unrelated to my situation.
Why is my application running faster in IntelliJ compared to command-line?
http://grails.1312388.n4.nabble.com/Why-does-IntelliJ-IDEA-runs-faster-than-Windows-command-line-td3894823.html
It's because you are launching the jvm to run your program with different classpath, arguments, etc.
If you run the program in IntelliJ, you will see the first line of the Run window something like "C:\Program ..."
Click on it to expand it, and you will see all the arguments used when intellij runs your program (I am splitting an example over several lines here).
"C:\Program Files (x86)\Java\jdk1.8.0_40\bin\java"
-Didea.launcher.port=7532
"-Didea.launcher.bin.path=C:\Program Files (x86)\JetBrains\IntelliJ IDEA 14.0.3\bin"
-Dfile.encoding=UTF-8
-classpath "C:\Program Files (x86)\Java\jdk1.8.0_40\jre\lib\charsets.jar;...etc..."
Fac
If you duplicate the exact same arguments (using the exact same jvm) then you will likely see similar performance when you run your application manually.
Your system settings for PATH, JAVA_HOME and CLASSPATH are used by default for launching your program if you don't specify them fully.

libfaketime and java on RHEL 5 / RHEL 6

In order to test java-code with date / time set into the past or future I want to try libfaketime (currently we just adjust the system clock, but it causes much trouble like non working kerberos, etc).
I try with this small test program:
$ cat time.java
import java.util.*;
class TimeTest {
public static void main(String[] s) {
long timeInMillis = System.currentTimeMillis();
Calendar cal = Calendar.getInstance();
cal.setTimeInMillis(timeInMillis);
java.util.Date date = cal.getTime();
System.out.println("Date: " + date);
}
}
And executes this:
LD_ASSUME_KERNEL=2.6.18 LD_PRELOAD=/usr/lib64/libfaketime.so.1 FAKETIME="-15d" /opt/IBM/WebSphere/AppServer/java_1.7_64/bin/java TimeTest
Invalid clock_id for clock_gettime: -172402[root#myhost ~]#
But as you can see I just get an error message.
The test is performed on a RHEL 6.5 server, kernel 2.6.32-431 and
libfaketime 0.9.6
Do you have any suggestions how I can solve this? I'm also interested in hearing your experiences with libfaketime and java on RHEL.
I have also reported this issue at: https://github.com/wolfcw/libfaketime/issues
Best reagards,
Erling
I've observed this incorrect behaviour as well in IBM JVM 1.7.0 while in Oracle JVM 1.6.0 this works as expected.
The explanation is that IBM JVM apparatently has an internal bug which manifests by calling clock_gettime system call with incorrect clock_id parameter (random negative value).
The workaround (not a fix) is to modify libfaketime.c to reset the clock_id to valid value in fake_clock_gettime function.
case FT_START_AT: /* User-specified offset */
if (user_per_tick_inc_set)
{
/* increment time with every time() call*/
next_time(tp, &user_per_tick_inc);
}
else
{
if (clk_id < 0) { // jvm calls clock_gettime() with invalid random negative clock_id value
clk_id = CLOCK_REALTIME;
}
switch (clk_id)
// the rest is the same
This will prevent the libfaketime.so.1 library from existing on error you are observing
printf("\nInvalid clock_id for clock_gettime: %d", clk_id);
exit(EXIT_FAILURE);
Please note this workaround has a drawback that in case JVM is incorrectly asking system for invalid clockid, we will assume valid clockid which may be not what application expects.

Java 7 fails to collect permanent generation which is collected by java 5

Does anybody know why java 7 fails to collect permanent generation of app, resulting in java.lang.OutOfMemoryError: PermGen, while java 5 collects the permanent generation and app runs well?
App does evaluation of jython expressions in the loop, one iteration is approx. 5 sec.
Body of the loop looks like:
PythonInterpreter py = new PythonInterpreter();
py.set("AI", 1);
((PyInteger)py.eval(expr)).getValue()
Screenshots of jvisual vm taken for app running in java 7 and java 5.
In both cases the same parameters are used:
-Xmx700m
-XX:MaxPermSize=100m
-XX:+HeapDumpOnOutOfMemoryError
-Xloggc:"C:\Temp\gc.log" -XX:+PrintGCDetails -XX:-TraceClassUnloading -XX:+PrintClassHistogram
Having a small example to reproduce the problem, I've found that program running in java 7 outside Eclipse does not suffer from memory leak in permanent generation.
import org.python.core.PySystemState;
import org.python.util.PythonInterpreter;
public class Test01 {
public static void main(String[] args) throws Exception {
PySystemState.initialize();
long startNanos = System.nanoTime();
for(int i = 0; i < 450000; i++) {
PythonInterpreter pi = new PythonInterpreter();
long elapsedNanos = System.nanoTime() - startNanos;
int avgStepInMicros = (int)((elapsedNanos / 1000) / (i+1));
final String code = String.format(
"stepNo = %d + 1\n" +
"if stepNo %% 100 == 0:\n" +
" print 'stepNo: %%d, elapsedMillis: %%d, avgStepInMicros: %%d' %% (stepNo, %d, %d)", i, elapsedNanos/1000000, avgStepInMicros);
pi.exec(code);
}
}
}
MAT showed a debugger thread as garbage collector root.
Strange is that debugging app in java 5 does not have this problem.
One possibility for the permgen leak is the Serializable interface implemented by each PyInteger being stored in a static class_to_type map (PyType.java:101), this is a Jython bug. The only interesting changes to permgen allocation between 5 and 7 that I am aware of are the removal of intern'd strings in 7 and some changes to direct byte buffer memory allocation, so instead the temporal behaviour of your graph might be explained by the unloading of types on each iteration in Java 5.

How to call a java function from python/numpy?

it is clear to me how to extend Python with C++, but what if I want to write a function in Java to be used with numpy?
Here is a simple scenario: I want to compute the average of a numpy array using a Java class. How do I pass the numpy vector to the Java class and gather the result?
Thanks for any help!
I spent some time on my own question and would like to share my answer as I feel there is not much information on this topic on stackoverflow. I also think Java will become more relevant in scientific computing (e.g. see WEKA package for data mining) because of the improvement of performance and other good software development features of Java.
In general, it turns out that using the right tools it is much easier to extend Python with Java than with C/C++!
Overview and assessment of tools to call Java from Python
http://pypi.python.org/pypi/JCC: because of no proper
documentation this tool is useless.
Py4J: requires to start the Java process before using python. As
remarked by others this is a possible point of failure. Moreover, not many examples of use are documented.
JPype: although development seems to be death, it works well and there are
many examples on it on the web (e.g. see http://kogs-www.informatik.uni-hamburg.de/~meine/weka-python/ for using data mining libraries written in Java) . Therefore I decided to focus
on this tool.
Installing JPype on Fedora 16
I am using Fedora 16, since there are some issues when installing JPype on Linux, I describe my approach.
Download JPype, then modify setup.py script by providing the JDK path, in line 48:
self.javaHome = '/usr/java/default'
then run:
sudo python setup.py install
Afters successful installation, check this file:
/usr/lib64/python2.7/site-packages/jpype/_linux.py
and remove or rename the method getDefaultJVMPath() into getDefaultJVMPath_old(), then add the following method:
def getDefaultJVMPath():
return "/usr/java/default/jre/lib/amd64/server/libjvm.so"
Alternative approach: do not make any change in the above file _linux.py, but never use the method getDefaultJVMPath() (or methods which call this method). At the place of using getDefaultJVMPath() provide directly the path to the JVM. Note that there are several paths, for example in my system I also have the following paths, referring to different versions of the JVM (it is not clear to me whether the client or server JVM is better suited):
/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre/lib/x86_64/client/libjvm.so
/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre/lib/x86_64/server/libjvm.so
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64/server/libjvm.so
Finally, add the following line to ~/.bashrc (or run it each time before opening a python interpreter):
export JAVA_HOME='/usr/java/default'
(The above directory is in reality just a symbolic link to my last version of JDK, which is located at /usr/java/jdk1.7.0_04).
Note that all the tests in the directory where JPype has been downloaded, i.e. JPype-0.5.4.2/test/testsuite.py will fail (so do not care about them).
To see if it works, test this script in python:
import jpype
jvmPath = jpype.getDefaultJVMPath()
jpype.startJVM(jvmPath)
# print a random text using a Java class
jpype.java.lang.System.out.println ('Berlusconi likes women')
jpype.shutdownJVM()
Calling Java classes from Java also using Numpy
Let's start implementing a Java class containing some functions which I want to apply to numpy arrays. Since there is no concept of state, I use static functions so that I do not need to create any Java object (creating Java objects would not change anything).
/**
* Cookbook to pass numpy arrays to Java via Jpype
* #author Mannaggia
*/
package test.java;
public class Average2 {
public static double compute_average(double[] the_array){
// compute the average
double result=0;
int i;
for (i=0;i<the_array.length;i++){
result=result+the_array[i];
}
return result/the_array.length;
}
// multiplies array by a scalar
public static double[] multiply(double[] the_array, double factor) {
int i;
double[] the_result= new double[the_array.length];
for (i=0;i<the_array.length;i++) {
the_result[i]=the_array[i]*factor;
}
return the_result;
}
/**
* Matrix multiplication.
*/
public static double[][] mult_mat(double[][] mat1, double[][] mat2){
// find sizes
int n1=mat1.length;
int n2=mat2.length;
int m1=mat1[0].length;
int m2=mat2[0].length;
// check that we can multiply
if (n2 !=m1) {
//System.err.println("Error: The number of columns of the first argument must equal the number of rows of the second");
//return null;
throw new IllegalArgumentException("Error: The number of columns of the first argument must equal the number of rows of the second");
}
// if we can, then multiply
double[][] the_results=new double[n1][m2];
int i,j,k;
for (i=0;i<n1;i++){
for (j=0;j<m2;j++){
// initialize
the_results[i][j]=0;
for (k=0;k<m1;k++) {
the_results[i][j]=the_results[i][j]+mat1[i][k]*mat2[k][j];
}
}
}
return the_results;
}
/**
* #param args
*/
public static void main(String[] args) {
// test case
double an_array[]={1.0, 2.0,3.0,4.0};
double res=Average2.compute_average(an_array);
System.out.println("Average is =" + res);
}
}
The name of the class is a bit misleading, as we do not only aim at computing the average of a numpy vector (using the method compute_average), but also multiply a numpy vector by a scalar (method multiply), and finally, the matrix multiplication (method mult_mat).
After compiling the above Java class we can now run the following Python script:
import numpy as np
import jpype
jvmPath = jpype.getDefaultJVMPath()
# we to specify the classpath used by the JVM
classpath='/home/mannaggia/workspace/TestJava/bin'
jpype.startJVM(jvmPath,'-Djava.class.path=%s' % classpath)
# numpy array
the_array=np.array([1.1, 2.3, 4, 6,7])
# build a JArray, not that we need to specify the Java double type using the jpype.JDouble wrapper
the_jarray2=jpype.JArray(jpype.JDouble, the_array.ndim)(the_array.tolist())
Class_average2=testPkg.Average2
res2=Class_average2.compute_average(the_jarray2)
np.abs(np.average(the_array)-res2) # ok perfect match!
# now try to multiply an array
res3=Class_average2.multiply(the_jarray2,jpype.JDouble(3))
# convert to numpy array
res4=np.array(res3) #ok
# matrix multiplication
the_mat1=np.array([[1,2,3], [4,5,6], [7,8,9]],dtype=float)
#the_mat2=np.array([[1,0,0], [0,1,0], [0,0,1]],dtype=float)
the_mat2=np.array([[1], [1], [1]],dtype=float)
the_mat3=np.array([[1, 2, 3]],dtype=float)
the_jmat1=jpype.JArray(jpype.JDouble, the_mat1.ndim)(the_mat1.tolist())
the_jmat2=jpype.JArray(jpype.JDouble, the_mat2.ndim)(the_mat2.tolist())
res5=Class_average2.mult_mat(the_jmat1,the_jmat2)
res6=np.array(res5) #ok
# other test
the_jmat3=jpype.JArray(jpype.JDouble, the_mat3.ndim)(the_mat3.tolist())
res7=Class_average2.mult_mat(the_jmat3,the_jmat2)
res8=np.array(res7)
res9=Class_average2.mult_mat(the_jmat2,the_jmat3)
res10=np.array(res9)
# test error due to invalid matrix multiplication
the_mat4=np.array([[1], [2]],dtype=float)
the_jmat4=jpype.JArray(jpype.JDouble, the_mat4.ndim)(the_mat4.tolist())
res11=Class_average2.mult_mat(the_jmat1,the_jmat4)
jpype.java.lang.System.out.println ('Goodbye!')
jpype.shutdownJVM()
I consider Jython to be one of the best options - which makes it seamless to use java objects in python. I actually integrated weka with my python programs, and it was super easy. Just import the weka classes and call them as you would in java within the python code.
http://www.jython.org/
I'm not sure about numpy support, but the following might be helpful:
http://pypi.python.org/pypi/JCC/

Categories

Resources