How to improve GroovyClassLoader.loadClass performance?

How to improve GroovyClassLoader.loadClass performance? - java

I'm using Groovy 3.0.6 in my Java project as a scripting language, allowing the user to extend the application at runtime via groovy scripting. Some users have several hundred script snippets. I'm using GroovyClassLoader to load them (also CompileStatic, but that seems to have negligible impact).
My problem is that the parseClass method is extremely slow (more than 1 second compile time per script). Here's the profiler output of a JUnit test that compiles a bunch of scripts:
The call tree. Note that I let the test run for 724 seconds and groovy is consuming 87% of the time for compilation. It strikes me as odd that LexerATNSimulator.getReachableConfigSet consumes so much time...
The hot spots of the test run. Unsurprisingly, nothing but groovy (and java.util) here.
The source code is given to the GroovyClassLoader.parseClass(...) method as a regular String. I do have a custom code cache in place, but the compilation of a single script hits my application hard when there are hundreds of snippets to compile.
Is there anything I can do to reduce compilation times? They're crippling for the user experience at the moment.

Related

Sporadic java.lang.NoClassDefFoundError in Scala

We have a weird problem. We are using an automatic test tool. The DSL was implemented in Scala. The system which we test with this tool was written in Java, and the interface between the two components is RMI. Indeed, the interface part of the automatic test tool is also Java (the rest is Scala). We have the full control of the source code of these components.
We already have at the magnitude of thousand test cases. We execute these test cases automatically once every night, using Jenkins on a Linux server. The problem is that we sporadically receive a java.lang.NoClassDefFoundError exception. This typically happens when trying to access a Java artifacts from a Scala code.
If we execute the same test case manually, or check the result of the next nightly run, then typically the problem solves automatically, but sometimes it happens again in a completely different place. In case of some runs no such problem appears at all. The biggest problem is that the error is not reproducible; furthermore, as it happens in case of an automatic run, we have hardly any information about the exact circumstances, just the test case and the log.
Has somebody already encountered with such a problem? Do you have any idea how to proceed? Any hint or piece of information would be helpful, not only the solution of the problem. Thank you!

I found the reason of the error (99% sure). We had the following 2 Jenkins jobs:
Job1: Performs a full clean build of the tested system, written in Java, then performs a full clean build of the DSL, and finally executes the test cases. This is a long running job (~5 hours).
Job2: Performs a full clean build of the tested system, and then executes something else on it. The DSL is not involved. This is a shorter job (~1 hour).
We have one single Maven repository for all the jobs. Furthermore, some parts of the tested component is part of the interface between the two components.
Considering the time stamps the following happened:
Job1 performed the full build of both components, and started a test suite containing several test cases, which execution lasts about half an hour.
The garbage collector might swept out the components not used yet.
Job2 started the build, and it also rebuilt the interface parts, including the one swept out by garbage collector of Job1.
Job1 reached a test case which uses an interface component already swept out.
The solution was the following: we moved Job2 to an earlier time; now it finishes the job before Job1 starts the tests.

understanding Jvmstat graph

I am trying to use and understand the visualgc garbage collection monitoring tool for the first time. However, one of the multiple outputs on the graph, I couldn't make sense of. I tried this link which is an Oracle documentation on visualgc : - visualgc - Visual Garbage Collection Monitoring Tool
Could not understand :
Graph showing compile time. I am confused with what it means by compile time. As per the documentation from above link it means that total time spent on compilation process but then why does it shows : 59 complies. Does JVM complie the code multiple times? Or 59 is just the kind of number of tasks JVM did to compile only once? I thought that code is complied only once.

By "compilation" it is assumed that we're talking about Just-in-Time compilation of initially interpreted bytecode into native code. Yes, the same piece of Java bytecode may go through compilation several times. A familiar example is when a monomorphic call site's type assertion fails, forcing a recompile into a megamorphic call site. A method may have any number of call sites, so just this concern may cause any number of recompiles.
Also note that in modern architectures with dynamic bytecode generation, classes are often being generated, loaded, and unloaded as the program is running. This is another source of continuous JIT compilation.

Technial confusion between compilation and interpretation

I've read many definitions and statements about "Interpretation" and "compilation". But I am still very much confused.
Technically speaking, what is REALLY the difference between interpretation and compilation under the hood? Let me elaborate (please correct any wrong concept I might have) :
In java, the source code is "compiled" into ByteCode which is then "interpreted" and/or "just-in-time compiled" into machine code. But what is the difference between just in time compilation and interpretation? I mean, in the end, as far as my guess goes, the Host's CPU will run machine code only. Thus, in interpretation as well, instructions ARE being converted into machine code which can be understood by the CPU. So, where do we draw the line between just-in-time compilation and interpretation?
P.S. This is my conception. It might be totally wrong. In that case, Kindly excuse my stupidity and correct me.
Thanks.

1. Frankly speaking the idea that java has both Compiler and Interpreter is a myth, its the behavior of it that is marked as Compilation and Interpreter.
2. Java compiler compiles the human readable code to byte code. Which then is converted by the JIT (Just In Time Compiler) during runtime into machine level executable code.
3. During Runtime JIT identifies the runtime intensive part of the code and then converts it into machine level executable code, this part of the code is known as Hot-Spot, and thats why JIT is called as Hot-Spot compiler.
4. JIT uses the Virtual Memory Table ( V-table), which is a pointer to the method in the class. The Hot-Spot code is then converted to its machine level executable code, its address is stored here, and when this part is called again, then its directly fetched by this stored address. This behavior of JIT to keep compiling small amount of code during Run time is assumed to be Interpreted Behavior, And the JIT behaviour of storing this for later use is assumed as Compilation.
5. Virtual Memory Table also has a table which stores the address of the byte code, which can be used if needed.

When the code is compiled, the generated artifact is understandable directly by the hardware. Basically it's a machine code sent directly to the CPU. This also means that an artifact compiled against given CPU architecture won't run on another. The advantage is immediate startup and great performance.
In interpreted environments there is either no compilation at all or the result of such step is an intermediary code. This code is two abstract to be sent directly to processing unit. Instead a separate layer is needed (virtual machine, interpreter) that reads this artifact and executes it in some sandbox environment. The advantage of this approach is portability - intermediate code can run on any platform where native interpreter is available. Unfortunately the performance is almost always worse.
JIT in Java is a hybrid technology. First bytecode is interpreted, each bytecode instruction is executed by the interpreter. However at some point in time (and under some conditions) bytecode is translated into machine code and sent directly to CPU to improve performance. This approach brings best of both worlds - portability of intermediate code and speed of native code. Moreover, the JIT knows much more about runtime behaviour of your code (how many times given loop is called on average? Is this method really virtual?), so the machine code can be even faster then the one generated by an ordinary compiler (!)

You're right that eventually everything has to be converted to machine code. The basic difference is that in the case of an interpreter, this translation occurs every time the code runs, whereas a compiler does this translation ahead of time, after which the compiler is not required for running the program.
Just-in-time compilation is a combination of both, where the JIT compiler is still required for running the program, and the code is compiled at run-time.
Compilation takes time but it is advantageous when the same piece of code is run several times, for eg. in a loop. The Java HotSpot VM takes this approach further by initially interpreting bytecode directly and then JIT-compiling a piece of code once it has run a certain number of times.

Interpreters interpret code line by line, and decides the machine code at run time;
Compiler consume code by chunk, and decides the machine code at compile time;
JIT compiler is a hybrid approach, in which the code is generated at run time (but could be already cached to improve performance), but is consumed in chunk.

An interpreted environment involves instructions being executed immediately after parsing, where both the parsing and execution are done by the interpreter. This means that the machine you run the code on must have the interpreter in order to run the program.1
A compiler will parse the instructions into machine code and store them for later execution. Java however is bytecompiled2, which means that this process turns the instructions into ByteCode, which will then be used by the interpreter.

make java startup faster

Is there some way to get reasonable (not noticeable) starting times for Java, thus making it suitable for writing command line scripts (not long-lived apps)?
For a demonstation of the issue, take a simple Hello World program in Java and JavaScript (run w/ node.js) on my Macbook Pro:
$ time java T
Hello world!
real 0m0.352s
user 0m0.301s
sys 0m0.053s
$ time node T.js
Hello world!
real 0m0.098s
user 0m0.079s
sys 0m0.013s
There is a noticeable lag with the Java version, not so with Node. This makes command line tools seem unresponsive. (This is especially true if they rely on more than one class, unlike the simple T.java above.

Not likely, only thing you might be able to try is a different implementation of the JVM, but that probably won't change. Most Java apps are (relatively) long lived though and possibly interactive, which means the JVM startup time becomes lost in the noise of normal uses.

Have you actually tried timing a Java command-line app called repeatedly, though? I would expect after the first incarnation for the start-up time to be alleviated somewhat by the library classes being in the file system cache.
That said, yes, the Java platform is not one of the simplest and in any case you're not going to compete with a small native executable.
Edit: as you say that the timings above are for "warmed up" calls, then a possible workaround could be:
write the gubbins of your commands in Java
write a simple local continually running "server" that takes commands and passes them to the relevant Java routines
write a simple native command-line wrapper (or write it in something that's fast to start up) whose sole raison d'être is to pass commands on to the Java server and spit out the result.
This ain't nice, but it could allow you to write the gubbins of your routines in Java (which I assume is essentially what you want) while still keeping the command line model of invocation.

As others have said, the plain answer is just "not really". You can possibly make minor performance improvements, but you're never going to get away from the fact that the VM is going to take a while to start up and get going.
Make sure you haven't got the server VM selected for apps like this - that's one thing that really will increase the start up time.
The only real way round it is to compile Java to native code, which you can do with GCJ - so if you must write these apps in Java and you must have them faster, that might be a route to look down. Bear in mind though it's not that up-to-date and maintenance on it largely seems to be dying out too.

Haven't tried it yet but might be worth looking at nailgun. It will run your Java programs in the same JVM, so after "warming up" should be pretty fast. A "hello world" example goes from taking 0.132s to taking 0.004s
http://www.martiansoftware.com/nailgun/background.html

You can get a small speed-up with class data sharing https://rmannibucau.metawerx.net/post/java-class-data-sharing-docker-startup
A much bigger speedup should come from doing ahead-of-time compilation to a static binary using GraalVM native-image, although it's still tricky to use. A lot of libraries haven't been made compatible.

When built from eclipse, why is a java program slower than when built from command line?

I did some simple function call and string operation in a loop, the java program runs much faster under command line than launching ( Run as... ) from eclipse...
6 lines of output were printed, each line is around 120 characters.
each line is a perf result ranges from 50ms to 300ms.
The total time is a little more than 2 seconds.
"much slower" here means, for certain operations ( function call ), I see 20ms vs 300 ms.
After running on console once, the speed on eclipse catches up!
After I change and build the code in eclipse, the speed on CL will drop if I don't rebuild it with command line.
Looks like some hotspot information is only generated with CL...

Maybe it is just the eclipse console that is slower than your operating systems console?
Plus, at a total runtime of ~2 seconds, your benchmark probably is just super inaccurate.

Most likely the culprit is memory usage as a result of Eclipse loading, with the possibility that Eclipse is also doing something additional to the executable like swapping class loaders, or starting the java debugger.
I would say the most likely answer however is simply: Eclipse uses a lot of resources, especially memory, and is starving the system a bit, leading to swapping, and decreased performance. YMMV, and there's no guarantee I'm right without seeing your system, it's just my best guess.

I do agree with other comments that Eclipse is doing something when running the application and printing the console.
Eclipse has its own compiler (usually referred as Eclipse JDT) which supports incremental compilation. There is a possibility that the binary compiled by Eclipse is not optimized as it is compiled by javac.
These two compiler serves different purpose, JDT mainly enable Eclipse to provide state-of-art refactoring and auto-completion, and javac spends a lot of effort doing optimization.

I would say it's understandable that the application would run slower with all the Eclipse baggage underneath. Eclipse spawns the JVM process as a child and I am sure still does its own 'magic'.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to improve GroovyClassLoader.loadClass performance? - java

Related

Sporadic java.lang.NoClassDefFoundError in Scala

understanding Jvmstat graph

Technial confusion between compilation and interpretation

make java startup faster

When built from eclipse, why is a java program slower than when built from command line?

Categories

Resources