I want to compare the performance of two different implementations (Python and Java) of the same algorithm. I run scripts on the terminal (using Ubuntu 18) like this:
time script_name
I'm not sure how accurate this is. Is it possible to increase the accuracy of this benchmark? Perhaps there is a way to remove any restrictions or set-up in Python or Java?
As explained in this answer, the correct way to benchmark a program using time is the following command:
sudo chrt -f 99 /usr/bin/time --verbose <benchmark>. However, note that this will only be accurate if the algorithm takes at least a second to execute, as otherwise the exec call might take up a big part of the benchmark.
This answer suggests using perf stat instead, as such:
perf stat -r 10 -d <your app and arguments>.
The accuracy of the time command is probably fine for most testing.
But if you wanted, you could write a 2nd Python script for timing with an import subprocess that then uses the subprocess.call() function (or one of several related functions in the subprocess module) to run the two versions of the algorithm.
Your timing script could also import time and then do datetime.datetime.now().time() before and after the algorithm runs, to show how much time passed.
Related
I am running into an issue where java is slow when used over SSL. The solution is to add -Djava.security.egd=file:/dev/./urandom to java at the command line. Since I have multiple JVM's, I dont want to modify every single JVM to contain this string and hence would like to add it to the file $JAVA_HOME/jre/lib/security/java.security
Now, the java.security file already contains securerandom.source=file:/dev/urandom
Two questions on this :
Why and how is "/dev/urandom" different from "/dev/./urandom". Why doesnt java accept "/dev/urandom"
For the JVM's that I have running, how can I tell whether they are using the correct urandmon device (vs random)
This is actually a hack introduced into the JVM back in 1.3 or 1.4 days
http://bugs.sun.com/view_bug.do?bug_id=4705093
http://bugs.sun.com/view_bug.do?bug_id=6202721
The basic issue is that in the native JVM code they hardcoded /dev/urandom to actually use /dev/random to attempt to ensure sufficient entropy. Since /dev/urandom is supposed to be guaranteed not to block, this has the unintended consequence of blocking if not enough entropy is available.
The hardcoding looks specifically for the string /dev/urandom, so providing something that resolves to the same thing but doesn't match that causes the desired behavior. If you code /dev/./urandom you bypass the hardcoded aliasing and get to the intended urandom entropy source.
Using /dev/urandom (non blocking) is fine for most cases. Personally I would make sure to have some HW random number generator configured to be positive to have enough entropy (see below).
If you have to use /dev/random (blocking) and you cannot block then you should make sure to always have enough entropy. A solution to this is to configure a HW random number generator.
Assuming your on Linux you can check the available entropy with:
cat /proc/sys/kernel/random/entropy_avail
If you are on a machine that has a hw random number generator you most probably want to install rngd. You can check if your cpu has one by issuing the command:
cat /proc/cpuinfo
Look for flags called rand. You can also check if the file /dev/hwrng is present. You might have/want to load the corresponding module:
ls /lib/modules/*/kernel/drivers/char/hw_random
For me this is:
sudo modprobe tpm-rng
To make it permanent:
echo tpm-rng | sudo tee -a /etc/modules
If you happen to be on Ubuntu/Debian just install the package rng-tools.
sudo aptitude install rng-tools
If you check your entropy before and after installing rng-tools you should see a significant increase.
The following command should show you available entropy sources:
sudo rngd -f -r /dev/hwrng -v
Note that if you need better security you want to mix multiple entropy sources. Not sure rng-tools supports this.
I have a Java program which is launched through command-line by a Bash script, which is in turn called at various intervals by cron.
There are several operations performed by this program, the first being the copy of a possibly large number of more or less large files. (Anything from 10000 files of 30 KB to 1 big 1 GB file, but both of these are edge cases.)
I am curious about how this step should be accomplished to ensure performance (as in speed).
I can use either Bash's cp function, or Java 7's Files.copy(). I will run my own tests but I'm wondering if someone has any comparison data I could take into account before deciding on an implementation?
Is there some way to get reasonable (not noticeable) starting times for Java, thus making it suitable for writing command line scripts (not long-lived apps)?
For a demonstation of the issue, take a simple Hello World program in Java and JavaScript (run w/ node.js) on my Macbook Pro:
$ time java T
Hello world!
real 0m0.352s
user 0m0.301s
sys 0m0.053s
$ time node T.js
Hello world!
real 0m0.098s
user 0m0.079s
sys 0m0.013s
There is a noticeable lag with the Java version, not so with Node. This makes command line tools seem unresponsive. (This is especially true if they rely on more than one class, unlike the simple T.java above.
Not likely, only thing you might be able to try is a different implementation of the JVM, but that probably won't change. Most Java apps are (relatively) long lived though and possibly interactive, which means the JVM startup time becomes lost in the noise of normal uses.
Have you actually tried timing a Java command-line app called repeatedly, though? I would expect after the first incarnation for the start-up time to be alleviated somewhat by the library classes being in the file system cache.
That said, yes, the Java platform is not one of the simplest and in any case you're not going to compete with a small native executable.
Edit: as you say that the timings above are for "warmed up" calls, then a possible workaround could be:
write the gubbins of your commands in Java
write a simple local continually running "server" that takes commands and passes them to the relevant Java routines
write a simple native command-line wrapper (or write it in something that's fast to start up) whose sole raison d'ĂȘtre is to pass commands on to the Java server and spit out the result.
This ain't nice, but it could allow you to write the gubbins of your routines in Java (which I assume is essentially what you want) while still keeping the command line model of invocation.
As others have said, the plain answer is just "not really". You can possibly make minor performance improvements, but you're never going to get away from the fact that the VM is going to take a while to start up and get going.
Make sure you haven't got the server VM selected for apps like this - that's one thing that really will increase the start up time.
The only real way round it is to compile Java to native code, which you can do with GCJ - so if you must write these apps in Java and you must have them faster, that might be a route to look down. Bear in mind though it's not that up-to-date and maintenance on it largely seems to be dying out too.
Haven't tried it yet but might be worth looking at nailgun. It will run your Java programs in the same JVM, so after "warming up" should be pretty fast. A "hello world" example goes from taking 0.132s to taking 0.004s
http://www.martiansoftware.com/nailgun/background.html
You can get a small speed-up with class data sharing https://rmannibucau.metawerx.net/post/java-class-data-sharing-docker-startup
A much bigger speedup should come from doing ahead-of-time compilation to a static binary using GraalVM native-image, although it's still tricky to use. A lot of libraries haven't been made compatible.
I have a question relates to PHP and Java programming.
I am going to develop a web application to do a unit testing.
PHP is the language that I'll use for the web and will executes the exec() function.
as I read, the function returns the output of execution, ok I do need it.
but it's not enough, I also think if I can get how much memory are used during the execution.
the web will run in an Apache web server in a Linux native operating system (orefered in Ubuntu).
This is the second case:
if there is a Java source which contains a program in which requires the user input during the execution, how can I execute it via a web server with also pass all the lines which may act as the user input?
the next problem is, the exec() function only accepts parameters in line.
How if I want
so, if there is any idea how to do that things?
The /usr/bin/time program (documented in time(1)) can return the amount of memory used during execution:
$ /usr/bin/time echo hello
hello
0.00user 0.00system 0:00.00elapsed ?%CPU (0avgtext+0avgdata 2512maxresident)k
0inputs+0outputs (0major+205minor)pagefaults 0swaps
You can see that the echo(1) program required 2.5 megabytes of memory and ran very quickly. Larger programs will be more impressive:
$ /usr/bin/time jacksum --help
Unknown argument. Use -h for help. Exit.
Command exited with non-zero status 2
0.08user 0.03system 0:00.87elapsed 12%CPU (0avgtext+0avgdata 57456maxresident)k
25608inputs+64outputs (92major+4072minor)pagefaults 0swaps
jacksum is a Java-based program, so it took 57 megabytes to tell me I screwed up the command line arguments. That's more like it.
You might also find the BSD process account system worthwhile. See lastcomm(1), sa(8), and dump-acct(8) for more information.
I'm trying to figure out what is more efficient in terms of server load which is pretty big at the moment, additional overload wouldn't be a great idea. Here is what I need to do :
I have a log file which changes, sometimes every second other times every few minutes or so which is not really relevant to this question. I'm trying to find out whether it is more efficient to start up java file with a cron job or to write shell script which will be executed by cron also, this is all under linux. Which is the better idea?
Checking log files are mostly I/O anyway so the actual CPU time in both cases are negligeable here. So what matters is the startup time, and spawning a shell script in Linux is magnitudes faster than starting up the JVM.
As Peter said, it's dependent on what the program is supposed to do.
Generally, starting a Java program has quite a bit of overhead in comparison to a shell script. If some complex operations have to be done, however, java may well be your best bet.
I personally would choose Python for scripts for which shell is not really suited and for which Java may be overkill :)
Als be aware that system administrators can often read and understand shell scripts quite well, but Java is a different matter. That may be a problem, or maybe not.
In general Perl is considered to be best for text parsing, but since all you need to do is print the changes to console you should simply do a tail -f rather than cron jobs etc.
there are many tools in *nix that parses files efficiently. eg grep,tail. These tools are coded in C with very efficient algorithms for parsing files. Definitely go for these shell tools. No Java please. Firstly, starting it up is slow. You can't compare it with running a C program like grep. Secondly, you will find it troublesome (in terms of compilation) to troubleshoot your script if anything goes wrong.
If you have to spawn n^2 processes for n lines of log files, you have to optimize even beyond bash. If you want to spawn only a sinlge process, language will not make a difference, imho.
I would rather check out in which language I/O and text processing is better for your needs. If a simple text processing in Java is too much on a "server", doing the same in assembly will be too much as well.