I need to parallelize a CPU intensive Java application on my multicore desktop but I am not so comfortable with threads programming. I looked at Scala but this would imply learning a new language which is really time consuming. I also looked at Ateji PX Java parallel extensions which seem very easy to use but did not have a chance yet to evaluate it. Would anyone recommend it? Other suggestions welcome.
Thanks in advance for your help
Bill
I would suggest you try the built-in ExecutorService for distributing multiple tasks across multiple threads/cores. Do you have any requirements which this might not do for you?
The Java concurrency utilites:
http://download.oracle.com/javase/1.5.0/docs/guide/concurrency/overview.html
make parallel programming on Java even easier than it already was. I would suggest starting there - if you are uncomfortable with that level of working with threads, I would think twice about proceeding further. Parallelizing anything requires some level of technical comfort with how concurrent computation is done and coordinated. In my opinion, it can't get much easier than that framework - which is part of the reason why you see so few alternatives.
Second, the main thing you should think about is what the unit of work is for parallelization. If your unit of work is independent (i.e., each parallel task does not impact the others), this is generally far easier because you don't need to worry about much (or any) synchronization at all. Put effort into thinking how to model the problem so that computation is as independent as possible. If you model it well, you will almost certainly reduce the lines of code (which reduces the error, etc).
Admittedly, frameworks that automatically parallelize for you are less error prone, but can be suboptimal if your model unit of work doesn't play to their parallelization scheme.
I am the lead developer of Ateji PX. As you mention, guaranteeing thread safety is an important topic. It is also a very difficult one, and there's not much help out there beside hand-written and hand-checked #ThreadSafe annotations. See e.g. "The problem with threads".
We are currently working on a parallel verifier for Ateji PX. This has become possible because parallelism in Ateji PX is compositional (unlike threads) and based on a sound mathematical foundation, namely pi-calculus. Even without a tool, experience shows that expressing parallelism in an intuitive and compositional way makes it much easier to "think parallel" and catch errors earlier.
I browsed quickly through the Ateji PX web site. Seems to be a nice product but I'm afraid you will be disappointed at some point since Ateji PX only provides you an intuitive simple way of performing high level parallel operations such as distributing the work load on several workers, creating rendez-vous points between parallel tasks, etc. However as you can read in the FAQ in the section How do you detect and prevent data dependencies? Ateji PX does not ensure that the underlying code is thread safe. So at any rate you'll still be needing skills in Java thread programming.
Edit:
Also consider that when maintenance time will come and you won't be available to perform it, it'll be easier to find a contractor, employee or trainee with skills in standard Java multithread programming than in Ateji PX.
Last word, there's a free 30 days evaluation, try it.
Dont worry java 7 is coming up with Fork Join by Doug lea for Distributed Processing.
Related
I'm opening this questions since I can't find easy to understand summarized information about this topic. There isn't even a good youtube-video that explains this.
I'm currently studying realtime programming and statical- and dynamical scheduling is a part of it. I just can't seem to get my head around it.
If there is someone who can explain the advantages and disadvantages with statical- and dynamical scheduling in a educational way, that would really be helpful.
What I've got so far is the following:
Statical scheduling:
Is a off-line approach where a schedule is generated manually. It can be modified during run-time, but isn't suggested because it then can cause the threads to miss it's deadlines. It's easy to implement and to analyze. Because it's easy to analyze it's easy to see if the system is going to make all of its deadlines.
Dynamical scheduling:
Is a on-line approach where the schedule is generated automatically. It can be modified during run-time by the system and it should't cause (in most cases) the threads to miss its deadlines. If the system changes it's easy to generate a new schedule since it's automatically generated. There isn't a guarantee that the system meets all its deadlines.
Anyone that can explain these two a bit better than me? Or perhaps add more information about these two. Perhaps illustrate it with a image so it'll be easier to wrap my head around it.
In simple terms,
Static Scheduling is the mechanism, where we have already controlled the order/way that the threads/processes are executing in our code (Compile time). If you have used any control(locks, semaphores, joins, sleeps) over threads in your program (to achieve some goal), then you have intended to use static (compile time) scheduling.
Dynamic Scheduling is the mechanism where thread scheduling is done by the operating systems based on any scheduling algorithm implemented in OS level. So the execution order of threads will be completely dependent on that algorithm, unless we have put some control on it (with static scheduling).
I think the term 'advantages' would not be the best term here. Simply when you are implementing any control over threads with your code to achieve some task, you should make sure that you have used minimal controls and also in most optimized way. :))
Addition:
Comparison between Static & Dynamic Scheduling
Generally we would never have a computer program which would completely depend on only one of Static or Dynamic Scheduling.
Instead we would have some programs which are pretty much in controlled from the code itself (Strongly static). This would be a good example for that.
And some programs would be strongly dynamic (weakly static). This would be a good example for that. There you might see other than the start of 2 threads, rest of the program execution would be a free flyer.
Please don't try to find a disclaimer criteria which would seal a program either a strongly static or strongly dynamic one. :))
Positives & Negatives
Dynamic Scheduling scheduling is faster in execution than static scheduling, since it's basically a free flyer without any intentional waits, joins etc. (any kind of synchronization/protection between threads).
Dynamic Scheduling is not aware of any thread dependencies (safeness, synchronization etc.). If you followed above sources I mentioned, you would probably have the idea.
So generally, how good multi-threading programmer you are, would depend on how limited restrictions, dependencies, bottlenecks you have implemented on your threads yet to achieve your task successfully. :))
I think I have covered quite a things. Please raise me questions if any. :))
Dynamic Scheduling –
o Main advantages (PROs):
- Enables handling cases of dependence unknown at compile time
- Simplifies compiler
- Allows compiled code to run efficiently on a different pipeline
o Disadvantages (CONs):
- Significant increase in hardware complexity
- Increased power consumption
- Could generate imprecise exceptions
During Static scheduling the order of the thread or processes is already controlled by the compiler . So it occurs at the compile time.
Here if there is a data dependency involving memory then it wouldn't be solved or recognised at compile time therefore the concept of Dynamic scheduling was introduced .
Dynamic scheduling also determines the order of excecution but here the hardware does this rather than the compiler.
Hi.
I searched for a while and stumbled about the lack of specific instrument. There're many good, even free profilers for Java, which allow to see how much time the code parts took overall, but nothing for the next tasks.
Given.
A Java multithreading program, doing some calculations through the misc.unsafe, using a simple fork-join pool, ThreadPoolExecutor, running on one Oracle JVM and Intel Xeon based server-class node. Looks like nothing hard, hot-spots are obvious. I need to figure out is any optimizations possible, which are promising and their potential assessment.
Needed.
For this, among the other information, I need a decent information about what exactly happens inside the hardware. For the research start, I want at least the cache misses and memory boundary overhead, byte-code or hardware instructions' overall timings. This would be very useful for avoiding "box ticking" approach. For clarification what exactly I need, I put here a similar instrument's wiki link which I used for a C or Fortran legacy code https://en.wikipedia.org/wiki/VTune (especially look briefly at the "Hardware event sampling" paragraph) and it fulfilled all my purposes.
Thanks in advance.
Since Python has some issues with GIL, Java is better for developing multiprocessing applications. Could you please justify the exact reasoning of java's effective processing than python in your way?
The biggest problem in multithreading in CPython is the Global Interpreter Lock (GIL) (note that other Python implementations don't necessarily share this problem!)
The GIL is an implementation detail that effectively prevents parallel (simultaneous) execution of separate threads in Python. The problem is that whenever Python byte code is to be executed, then the current thread must have acquired the GIL and only a single thread can have the GIL at any given moment.
So if 5 threads are trying to execute some Python byte code, then they will effectively run interleaved, because each one will have to wait for the GIL to become available. This is not usually a problem with single-core computers, as the physical constraints have the same effect: only a single thread can run at a time.
In multi-core/SMP computers, however this becomes a bottleneck. These days almost everthing is running on multiple cores, including effectively all smartphones and even many embedded systems.
Java has no such restrictions, so multiple threads can execute at the exact same time.
I would disagree that Python is not better than Java for Multi-Processing application.
First, I am assuming that the OP is using 'better' to mean 'faster code execution' as far as I can tell.
I suffer from 'speed-freak' syndrome, probably from having come from a C/ASM background, so I have spent considerable time getting to the bottom of the "is Python slow?" issue.
The simple answer to that? "It can be." Here's some important points:
1) With a multi-threadded application, Python is going to have a disadvantage to any language that doesn't have something similar to the GIL. The GIL is an artifact of the Python VM in CPython, not the Python language itself. Some Python VM's like Jython, IronPython, etc do not have a GIL.
2) In a Multi-Process application, the GIL doesn't really apply, and thus you can now start to harness faster execution of your Python code unmolested for the most part by the GIL. I strongly suggest if you want to write large Python code that needs both speed and concurrency, that you learn about Multi-Processing, and possibly ZMQ/0MQ for message passing.
3) Regardless of the GIL, Java displays faster code execution than Python in many areas. This is due to native differences in how Python handles objects in memory:
A number of Python functions create copies of objects in memory rather than modifying them ( see http://www.skymind.com/~ocrow/python_string/ for examples)
Python uses Dict to store attributes for objects, etc. I don't want to distract and delve into these areas, but I can generally say that some of the 'neat' things that Python can do come at a speed cost. It's also important to know that there are ways around the default behaviour if that is causing too high of a speed penalty for you.
4) Some of Java's speed advantage is due to more optimization in the Java VM over Python as far as I can tell. Once you eliminate the differences in how much behind-the-scenes memory/object work is done, Java can often still beat Python. Is it because Java has had more attention than Python? I'm not sure, with enough funding I feel that CPython could be faster.
Check http://c2.com/cgi/wiki?PythonProblems for more discussion on some of these issues.
I will say that I have decided to embrace Python nearly 100% going forward with new code.
Don't fall into the premature optimization trap, and remember you can always call C code in a pinch. Make your code work well, make it maintainable, then start to optimize once the speed of the application isn't fast enough for your needs.
Interesting Benchmarks:
http://benchmarksgame.alioth.debian.org/u64/python.php
Further information about Python speed issues can be found here:
http://www.infoworld.com/d/application-development/van-rossum-python-not-too-slow-188715
On a multicore box, the java thread schedulers decisions are rather arbitrary, it assigns thread priorities based on when the thread was created, from which thread it was created etc.
The idea is to run a tuning process using pso that would randomly set thread priorities and then eventually reach optimal priorities where the fitness function is the total run time of the program?
Of course there would be more parameters, like the priorities would shift during the run to find an optimal priority function.
How practical, interesting does the idea sound? and any suggestions.
Just some background,
ive been programming in java/c/c++ for a few years now with various projects, another alternative would be making a thread scheduler based on this in c, where the default thread scheduler is the OS.
Your approach as described is a static approach, i.e. you need to run the program several times, then come up with a scheduling solution, then ship your scheduling information with the program.
The problem is that for most non-trivial programs, their performance will partly depend on the specific data they're working with. Even if you find an optimal way to schedule threads for one data set, there is absolutely no guarantee that it will improve speed on another one. In most cases, running what will be a long and arduous optimization every time they want to do a new release will not be worth it for devs, unless perhaps for large computation efforts (where the programs are likely to be manually tuned and not written in java anyway).
I'd say a self-learning thread scheduler is a nice idea, but you can't treat it as a classical optimization problem here. You either need to be sure that your scheduling order will remain optimal (unlikely) or find an optimization method that works at runtime. And the issue here might be that it wouldn't take much for the overhead of your scheduler to destroy any performance gain you might get.
I think this is a somewhat subjective question, but overall no, don't think it would work.
Best way to find out -- start an open source project and see people's usage/reaction.
It sounds very interesting to me -- but I personally don't find it very much useful. Perhaps we're just not at the point where concurrent programming is as prevalent and easy as it could be.
With the promotion of functional programming, I guess the world would move towards avoiding thread synchronization as much as possible (thus making thread scheduling less of an impact in overall performance)
From my personal subjective experience, most performance problems in software can be solved by improving one single bottleneck area that accounts for 90% of the slowdown. This optimizer may help find that out. I am not sure how much the scheduling strategy could improve overall performance, though.
Don't get discouraged, though! I'm just talking out of thin air. It sounds fun, so why not just play with it anyway :)
I have a semi big Java application. It's written quite poorly and I suspect there are quite a lot of simple things I can do that will clean things up a bit and improve performance.
For example I recently found the
String.matches(regex)
function used quite a lot in loops with the same regex. So I've replaced that with precompiled Patterns. I use FindBugs (which is great BTW) but that didn't spot this problem and I'm limited on tools that I can use here at work.
Is there anything else simple like this that I should look at?
Have a look at these differents tools:
FindBugs (you already did it).
Checkstyle
PMD
...
I suggest also to have a look at Sonar, which is a great wrapper for all these tools.
All these tools are free!
Globally, they will not help you to improve the performance, but the quality of the code.
First of all make it a well written application. In my experience most of the performance benefits will come not doing stupid things, rather than doing clever optimisations.
When you have a well written application, then is the time to run a profiler and optimise only that which matters.
Is performance an issue? If so, I wouldn't redo anything until I'd profiled the code under a wide variety of conditions and had some hard data in hand to tell me where time was being spent. You might be changing things that have no benefit at all.
I'd be concerned about thread safety. How much attention was paid to that?
If you're going to refactor, start by writing JUnit tests first. It'll help familiarize the code and provide you with a safety net. Tests must pass before and after your changes.
Most importantly, don't undertake a big refactoring just because you consider it a mess. Your team and customer should be on board with what you're doing before you start. Have you communicated your (admittedly good) ideas to others? Software is a team sport; communication is key.
One very important thing to do when refactoring to improve an application is to first refactor the code so that the source looks at least decent. After that's done, never guess where to optimize, always measure where the bottlenecks are and concentrate on solving those problems. Generally, programmers like to do things such as exchanging recursive methods for loops, choosing the exact correct sorting algorithm, etc., which very often makes very little difference at all. So make sure to focus on the correct area (using too much CPU? Memory? Too many threads? Long thread locks?)
EDIT: Like one of the other posters wrote, of course make sure that others on your team/your boss consider this work worth doing as well, if it's not an issue for them, they could probably care less.
Run the code through a profiler (check both speed and memory). Once you find where it is slow (usually not where you think it is) figure out what you can do to speed it up.
Another useful thing (if you are a little brave) is to use NetBeans 7.0 M2 (don't be too panicked, their non-release version are generlaly very stable) there is a plugin called "Jackpot" which searches your code for refactorings. Some of them have to do with performance... but I don't think any of them are going to make a radical change in speed.
Generally speaking, keep the code clean and easy to read and it'll be fast. When it isn't fast you will ahve an easier time speeding it up than if it is mess.
What I did one time when I was writing something that I knew had to be fast (it was code to parse classfiles) is to run the profiler each time I made a change. So for one step I thought I would reduce memroy by calling String.,intern to make sure that all of the Strings were pooled together. When I added the intern() call the memry did go down a bit but the time went up by some huge amount (String.intern is needlessly slow, or it was a few years ago). So at that point I knew that what I had just done was unnaceptably slow and I undid that change.
I don't reccomend doing that in general development, but running the code through a profiler once a day or once a week just to see how things are isn't a productivity killer.
If you are into books, I would highly recommend reading Effective Java by Josh Bloch.