My question is quite simple:
I am working on Ubuntu and I wrote a program in Java (with Eclipse IDE).
The program does not read or write anything anywhere, it just make a lot of calculation and create many instance of home made classes.
The output of the program is simple: it write A, B or C in the terminal.(consider it as a random process)
I must run the program repetitively until I get 1000000 times A and count the number of times I got B and C. I did it, it works but it is too slow.
For example:output is:
"A:1000000
B:1012458
C:1458"
This is where I need your help:
I want to parallelize the program. I tried with multi-Threading but it did not work faster! So, while each simulation is independent, I want to make multi Processing. I would like, for example, create 10 Proccess and ask them to run the program until A appears 100000 times. (so 10 * 100000 = 1000000 as I want)
The problem is that I need to know the total number of B and C and for now I got 10 value of each.
How can I do? I tried the ProcessBuilder (http://docs.oracle.com/javase/7/docs/api/java/lang/ProcessBuilder.html) but I do not understand how it works!
The only idea I have so far is to ask my program (with A till 100000) 10 times in the terminal with the command:
"java Main & java Main & java Main & java Main & java Main & java Main & java Main & java Main & java Main & java Main"
But then I must make the sum of the B and C occurrence MANUALLY. I am sure there is a better way to do this! I thought about creating 10 files with the value of (A), B and C and then read all of them and summarize them it is really a lot of work just to sum some integer isn't it?
Thank you forwards, I'm waiting for help :D
ps: To answer easily, let's consider I have a program named "prog" that take only int argument that represent the number of A I want to reach.
Parallelization makes sense only if you have multicore CPU. Run java.lang.Runtime.availableProcessors() to know how many threads you should run.
Then, running 10 batches of 100000 repetitions is not the same as running one batch with 1000000 repetitions, since the internal state of your application is changing, so think if parallelization is applicable at all in your case.
To know the total number of A,B, and C results just use AtomicInteger for all threads. Each time check if the count of A is less than 1000000.
On a single machine, parallel processing is more efficient when using multiple threads as compared to multiple processes.
When running on a single-core/single-CPU system, however, parallel processing will only bring a small performance penalty but no performance benefit for pure calculations. - Yet, when for example multiple slow IO is involved, multi-threading may speed up the process after all.
For short: Multi-processing will always be slower than multi-threading.
You could try to make a main class which launch your prog using one of the versions of Runtime.exec(...) method. By this you could use its process' outputStream to transmit to the main program the value each of your processes has computed.
Related
Is it possible to do Time Slicing using Java using In-bult java classes?
When searched on internet i am getting following kind of definitions only:
Time slicing is a term which is usually associated with the processor and operating system ...
what it means is ... When you have many processes running by an operating system then each process has to get
a chance to run ..that is each of the process should get the processor for particular time.....so now if you have
3 processes running eg- p0,p1,p2 then now p0 can run for a time of 5 ms then comes turn for p1 then for p2.....
thus you are giving some time called as timeslice to each of the process that has to run by the processor.....
Thus there are different algorithms which exists for these processes(depeneds on the operating System)
to run on a processor.....
I am getting this question in Java Interview questions. But not able to get any Java Code example related to same.
Is Time Slicing merely concept of Operating System or is there any practical usage to show using Java Programming? Can someone please share example for same?
int n = Runtime.getRuntime().availableProcessors(); then start n + 1 threads. By definition, if n + 1 threads are running on n cores then time slicing is occurring. Practical usage, you need to run n + 1 threads and only have n cpu cores.
I'm writing a simple program that calculates the number Pi according to this formula. Before I elaborate more on the problem, let me say that I'm testing my program (written in Java 8) on a 12-core CPU with 24 threads. According to htop, when running the tests, I have no load on the server, so that is out of the question.
However, I expect this to have near-linear speedup, when it starts to choke at high number of threads (let's say at >8, when it gets off the y=x line). At that point, the time in which the program executes for the same parameters with different numbers of threads is constant and speedup kind of tops at 10
Without too much concrete information, I would like to know how can I analyze where my program chokes. In other words, what are some must-do's when it comes to checking parallel programs' speedup.
I am making a program which tries to get all the possible outcomes. The threads in the program generate more threads (a little over a thousand). I am really bad at multithreading, I fear that the generation of threads won't stop. I am using the eclipse IDE which has a terminate button, will this stop all the running thread, if not is there any other way? Can the JVM handle this?
Yes. Clicking the terminate button in an eclipse launched JVM will halt that JVM, and that will stop all of the running threads (just like killing the JVM process).
As for running one thousand threads, I wouldn't recommend it... that sounds like a really slow approach (since each thread can only run a maximum of ~n/1000th of the time on n CPU cores).
Actually, there is no restrictions to open 1000 or more thread in Java programming language. However, the problem is it will slow down the works.
Do you know how the thread works? Thread just creates an environment with the help of the OS so that the application user can feel that the programs are running parallel. But, the actual scenario is different. The computer having a single core processor can handle one operation at an instant. Our OS just sends operations one after one, so that we the user can feel that they are running in parallel.
For example, let us consider a three threaded application. Each of its thread has a for loop. First thread adds numbers inside the loop and keep the result in a variable named result1, second thread multiplies numbers inside the loop and keep the result in a variable named result2, and third thread subtracts numbers inside the loop and keep the result in a variable named result3.
Now, if all these threads are started at the same instant and let all has same priority then the OS will send instructions one after one. If can send a number to add with result1. In the next instant, it can send a number to multiply with result2. In the next instant, it can send a number to subtract with result3. In the next instant, it can again add.
That means, actually a single core processor cannot compute three computation simultaneously. It compute one computation and make pause the remaining ones and go through in this way.
I think, now you understand why running 1000 thread will slow down the whole process. If the performance is not issue in the mentioned task and you just need the output you can run 1000+ thread.
But, if you need an improved performance you have to think something else. Have you hard about Map Reduce? Implementing map reduce by Hadoop framework you can get better performance in these type of issues. However, first you have to design your problem in map reduce frame. And this framework will compute you task in parallel with the help of more than one computers.
Another solution can be setting priority. In java you can set priorities in a thread.You can give critical tasks high priorities than simpler tasks. If your problem's tasks can be distinguished by high and low priorities tasks it will make the performance better definitely.
I was trying to get timing data for various Java programs. Then I had to perform some regression analysis based on this timing data. Here are the two methods I used to get the timing data:
System.currentTimeMillis(): I used this initially, but I wanted the timing data to be constant when the same program was run multiple
times. The variation was huge in this case. When two instances of the
same code were executed in parallel, the variation was even more. So
I dropped this and started looking for some profilers.
-XX countBytecodes Flag in Hotspot JVM: Since the variation in timing data was huge, I thought of measuring the number of byte codes executed, when this code was executed. This should have given a more static count, when the same program was executed multiple times. But This also had variations. When the programs were executed sequentially, the variations were small, but during parellel runs of the same code, the variations were huge. I also tried compiling using -Xint, but the results were similar.
So I am looking for some profiler that could give me the count of byte codes executed when a code is executed. The count should remain constant (or correlation close to 1) across runs of the same program. Or if there could be some other metric based on which I could get timing data, which should stay almost constant across multiple runs.
I wanted the timing data to be constant when the same program was run multiple times
That is not possible on a real machine unless it is designed for hard real time system which your machine will almost certainly be not.
I am looking for some profiler that could give me the count of byte codes executed when a code is executed.
Assuming you could do this, it wouldn't prove anything. You wouldn't be able to see for example that ++ is 90x cheaper than % depending on the hardware you run it on. You won't be able to see that a branch miss of an if is up to 100x more expensive than a speculative branch. You wouldn't be able to see that a memory access to an area of memory which triggers a TLB miss can be more expensive than copying 4 KB of data.
if there could be some other metric based on which I could get timing data, which should stay almost constant across multiple runs.
You can run it many times and take the average. This will hide any high results/outliers and give you a favourable idea of throughput. It can be a reproducible number for a given machine, if run long enough.
I'm working on a system at the moment. It's a complex system but it boils down to a Solver class with a method like this:
public int solve(int problem); // returns the solution, or 0 if no solution found
Now, when the system is up and running, a run time of about 5 seconds for this method is expected and is perfectly fast enough. However, I plan to run some tests that look a bit like this:
List<Integer> problems = getProblems();
List<Integer> solutions = new ArrayList<Integer>(problems.size);
Solver solver = getSolver();
for (int problem: problems) {
solutions.add(solver.solve(problem));
}
// see what percentage of solutions are zero
// get arithmetic mean of non-zero solutions
// etc etc
The problem is I want to run this on a large number of problems, and don't want to wait forever for the results. So say I have a million test problems and I want the tests to complete in the time it takes me to make a cup of tea, I have two questions:
Say I have a million core processor and that instances of Solver are threadsafe but with no locking (they're immutable or something), and that all the computation they do is in memory (i.e. there's no disk or network or other stuff going on). Can I just replace the solutions list with a threadsafe list and kick off threads to solve each problem and expect it to be faster? How much faster? Can it run in 5 seconds?
Is there a decent cloud computing service out there for Java where I can buy 5 million seconds of time and get this code to run in five seconds? What do I need to do to prepare my code for running on such a cloud? How much does 5 million seconds cost anyway?
Thanks.
You have expressed your problem with two major points of serialisation: Problem production and solution consumption (currently expressed as Lists of integers). You want to get the first problems as soon as you can (currently you won't get them until all problems are produced).
I am assuming as well that there is a correlation between the problem list order and the solution list order – that is solutions.get(3) is the solution for problems.get(3) – this would be a huge problem for parallelising it. You'd be better off having a Pair<P, S> of problem/solution so you don't need to maintain the correlation.
Parallelising the solver method will not be difficult, although exactly how you do it will depend a lot on the compute costs of each solve method (generally the more expensive the method the lower the overhead costs of parallelising, so if these are very cheap you need to batch them). If you end up with a distributed solution you'll have much higher costs of course. The Executor framework and the fork/join extensions would be a great starting point.
You're asking extremely big questions. There is overhead for threads, and a key thing to note is that they run in the parent process. If you wanted to run a million of these solvers at the same time, you'd have to fork them into their own processes.
You can use one program per input, and then use a simple batch scheduler like Condor (for Linux) or HPC (for Windows). You can run those on Amazon too, but there's a bit of a learning curve, it's not just "upload Java code & go".
Sure, you could use a standard worker-thread paradigm to run things in parallel. But there will be some synchronization overhead (e.g., updates to the solutions list will cause lock contention when everything tries to finish at the same time), so it won't run in exactly 5 seconds. But it would be faster than 5 million seconds :-)
Amazon EC2 runs between US$0.085 and US$0.68 per hour depending on how much CPU you need (see pricing). So, maybe about $120. Of course, you'll need to set up something separate to distribute your jobs across various CPUs. One option might be just to use Hadoop (see this question about whether Hadoop is right for running simulations.
You could read things like Guy Steele's talk on parallelism for more info on how to think parallel.
Use an appropriate Executor. Have a look at http://download.oracle.com/javase/6/docs/api/java/util/concurrent/Executors.html#newCachedThreadPool()
Check out these article on concurrency:
http://www.vogella.de/articles/JavaConcurrency/article.html
http://www.baptiste-wicht.com/2010/09/java-concurrency-part-7-executors-and-thread-pools/
Basically, Java 7's new Fork/Join model will work really well for this approach. Essentially you can set up your million+ tasks and it will spread them as best it can accross all available processors. You would have to provide your custom "Cloud" task executor, but it can be done.
This assumes, of course, that your "solving" algorithm is rediculously parallel. In short, as long as the Solver is fully self-contained, they should be able to be split among an arbitrary number of processors.