Suppose I write a program using immutable data structures in Java. Even though it is not a functional language, it should be able to execute parallely. How do I ensure that my program is being executed using all the cores of my processer? How does the computer decide which code can be run parallely?
P.S. My intent in asking this question was not to find out how to parrallelize java programs. But to know - how does the computer parallelize code. Can it do it in a functional program written in a non functional language?
Java programs are parallelized through threads. The computer can't magically figure out how to distribute the pieces of your application across all the cores in an imperative language like Java. Only a functional language like Erlang or Haskell could do that. Read up on Java threads.
I am not aware of automatic parallelization JVMs. They do exist for other languages such as FORTRAN.
You might find the JSR166y fork-join framework scheduled for JDK7 interesting.
i dont think you can "force" the JVM to parallelize your program, but having a separate thread executing each "task", if you can break down your program that way, would probably do the trick in most cases? parallelism is still not guaranteed however.
You can write functions with automatically parallelise tasks, it is fairly easy to do for specific cases, however I am not aware of any built-in Java API which does this. (Except perhaps the Executor/ExecutorService)
Something that I used in school that did alot of the work for you.
http://www.cs.rit.edu/~ark/pj.shtml
It has the capability to do SMP or message based parallelism.
Whether or not it is useful to you is a different question :-)
Related
Does anybody know of a way to lock down individual threads within a Java process to specific CPU cores (on Linux)? I've done this in C, but can't find how to do this in Java. My instincts are that this will require a JNI call, but I was hoping someone here might have some insight or might have done it before.
Thanks!
You can't do this in pure java. But if you really need it -- you can use JNI to call native code which do the job. This is the place to start with:
http://ovatman.blogspot.com/2010/02/using-java-jni-to-set-thread-affinity.html
http://blog.toadhead.net/index.php/2011/01/22/cputhread-affinity-in-java/
UPD: After some thinking, I've decided to create my own class for this: ThreadAffinity.java It's JNA-based, and very simple -- so, if you want to use it in production, may be you should spent some time making it more stable, but for benchmarking and testing it works well as is.
UPD 2: There is another library for working with thread affinity in java. It uses same method as previously noted, but has another interface
I know it's been a while, but if anyone comes across this thread, here's how I solved this problem. I wrote a script that would do the following:
"jstack -l "
Take the results, find the "nid"'s of the threads I want to manually lock down to cores.
Taskset those threads.
You might want to take a look at https://github.com/peter-lawrey/Java-Thread-Affinity/blob/master/src/test/java/com/higherfrequencytrading/affinity/AffinityLockBindMain.java
IMO, this will not be possible unless you use native calls. JVM is supposed to be platform independent, any system calls done to achieve this will not result in a portable code.
It's not possible (at least with plain Java).
You can use thread pools to limit the amount of threads (and therefore cores) used for different types of work, but there is no way to specify a core to use.
There is even the (small) possibility that your Java runtime doesn't support native threading for your OS or hardware. In this case, green threads are used and only one core will be used for the whole JVM.
Since Python has some issues with GIL, Java is better for developing multiprocessing applications. Could you please justify the exact reasoning of java's effective processing than python in your way?
The biggest problem in multithreading in CPython is the Global Interpreter Lock (GIL) (note that other Python implementations don't necessarily share this problem!)
The GIL is an implementation detail that effectively prevents parallel (simultaneous) execution of separate threads in Python. The problem is that whenever Python byte code is to be executed, then the current thread must have acquired the GIL and only a single thread can have the GIL at any given moment.
So if 5 threads are trying to execute some Python byte code, then they will effectively run interleaved, because each one will have to wait for the GIL to become available. This is not usually a problem with single-core computers, as the physical constraints have the same effect: only a single thread can run at a time.
In multi-core/SMP computers, however this becomes a bottleneck. These days almost everthing is running on multiple cores, including effectively all smartphones and even many embedded systems.
Java has no such restrictions, so multiple threads can execute at the exact same time.
I would disagree that Python is not better than Java for Multi-Processing application.
First, I am assuming that the OP is using 'better' to mean 'faster code execution' as far as I can tell.
I suffer from 'speed-freak' syndrome, probably from having come from a C/ASM background, so I have spent considerable time getting to the bottom of the "is Python slow?" issue.
The simple answer to that? "It can be." Here's some important points:
1) With a multi-threadded application, Python is going to have a disadvantage to any language that doesn't have something similar to the GIL. The GIL is an artifact of the Python VM in CPython, not the Python language itself. Some Python VM's like Jython, IronPython, etc do not have a GIL.
2) In a Multi-Process application, the GIL doesn't really apply, and thus you can now start to harness faster execution of your Python code unmolested for the most part by the GIL. I strongly suggest if you want to write large Python code that needs both speed and concurrency, that you learn about Multi-Processing, and possibly ZMQ/0MQ for message passing.
3) Regardless of the GIL, Java displays faster code execution than Python in many areas. This is due to native differences in how Python handles objects in memory:
A number of Python functions create copies of objects in memory rather than modifying them ( see http://www.skymind.com/~ocrow/python_string/ for examples)
Python uses Dict to store attributes for objects, etc. I don't want to distract and delve into these areas, but I can generally say that some of the 'neat' things that Python can do come at a speed cost. It's also important to know that there are ways around the default behaviour if that is causing too high of a speed penalty for you.
4) Some of Java's speed advantage is due to more optimization in the Java VM over Python as far as I can tell. Once you eliminate the differences in how much behind-the-scenes memory/object work is done, Java can often still beat Python. Is it because Java has had more attention than Python? I'm not sure, with enough funding I feel that CPython could be faster.
Check http://c2.com/cgi/wiki?PythonProblems for more discussion on some of these issues.
I will say that I have decided to embrace Python nearly 100% going forward with new code.
Don't fall into the premature optimization trap, and remember you can always call C code in a pinch. Make your code work well, make it maintainable, then start to optimize once the speed of the application isn't fast enough for your needs.
Interesting Benchmarks:
http://benchmarksgame.alioth.debian.org/u64/python.php
Further information about Python speed issues can be found here:
http://www.infoworld.com/d/application-development/van-rossum-python-not-too-slow-188715
What is the easiest way in Java for achieving multi-core? And by this, I mean, to specifically point out on what core to execute some parts of the project, so good-old "normal" java threads are not an option.
So far, I was suggested JConqurr (which is an Eclipse toolkit for multi-core programming in java), JaMP (which extends Java for OpenMP), and MPJ express, of which I don't know much. Which of the above do you consider the best, or do you have other suggestions? It would be preferable to somehow mesure the performance boost/gain, but not exclusive.
Any help would be much appreciated.
Thanks,
twentynine.
Even though it is easy to write multi-threaded code in Java, there is nothing in the Java standard runtime which generically allow you to tell the JVM or the operating system how to schedule your program.
Hence you will need to have code specifically for your JVM and/or your operating system, and that code may not be doable in Java (unless you dive into JNI or JNA). External programs can pin processes to a CPU in many Unix versions (and probably Windows too), but I don't think you can do this for individual threads.
Scala is quite popular for this. It runs on the JVM and has bindings for Java to hook them together.
I have a choice to develop an app which will rely heavily on threading (up to to 200). I know I can use other Ruby interpreters for threading such as JRuby. But there are 2 things:
1) Jruby doesn't support 1.9 yet, so that is a no. Is there any other non-green thread interpreter that supports at least 1.9 as that is a prerequisite for me if I use Ruby.
2) Even using an interpreter such as Jruby, would I really get decent thread perfomance that I can get in Java? Perhaps I should just use Java for this application.
Note: this isn't an attempt at subjective discussion. It is for advice regarding thread performance only. Also, this isn't Java vs Ruby or anything of that nature. I am newer to Ruby and hoping to clear this up for my own benefit, thanks.
You should really benchmark it.
Are your threads going to be doing a lot of simultaneous computation? Then you'd probably need native threads. But if you are going to be waiting for IO all the time, then maybe Ruby's green threads are fine.
Even with this advice, you should cook up a small test program and see if the straightforward way (just using Ruby 1.9) will work.
Does anybody know of a way to lock down individual threads within a Java process to specific CPU cores (on Linux)? I've done this in C, but can't find how to do this in Java. My instincts are that this will require a JNI call, but I was hoping someone here might have some insight or might have done it before.
Thanks!
You can't do this in pure java. But if you really need it -- you can use JNI to call native code which do the job. This is the place to start with:
http://ovatman.blogspot.com/2010/02/using-java-jni-to-set-thread-affinity.html
http://blog.toadhead.net/index.php/2011/01/22/cputhread-affinity-in-java/
UPD: After some thinking, I've decided to create my own class for this: ThreadAffinity.java It's JNA-based, and very simple -- so, if you want to use it in production, may be you should spent some time making it more stable, but for benchmarking and testing it works well as is.
UPD 2: There is another library for working with thread affinity in java. It uses same method as previously noted, but has another interface
I know it's been a while, but if anyone comes across this thread, here's how I solved this problem. I wrote a script that would do the following:
"jstack -l "
Take the results, find the "nid"'s of the threads I want to manually lock down to cores.
Taskset those threads.
You might want to take a look at https://github.com/peter-lawrey/Java-Thread-Affinity/blob/master/src/test/java/com/higherfrequencytrading/affinity/AffinityLockBindMain.java
IMO, this will not be possible unless you use native calls. JVM is supposed to be platform independent, any system calls done to achieve this will not result in a portable code.
It's not possible (at least with plain Java).
You can use thread pools to limit the amount of threads (and therefore cores) used for different types of work, but there is no way to specify a core to use.
There is even the (small) possibility that your Java runtime doesn't support native threading for your OS or hardware. In this case, green threads are used and only one core will be used for the whole JVM.