Is it possible to do Time Slicing using Java using In-bult java classes?
When searched on internet i am getting following kind of definitions only:
Time slicing is a term which is usually associated with the processor and operating system ...
what it means is ... When you have many processes running by an operating system then each process has to get
a chance to run ..that is each of the process should get the processor for particular time.....so now if you have
3 processes running eg- p0,p1,p2 then now p0 can run for a time of 5 ms then comes turn for p1 then for p2.....
thus you are giving some time called as timeslice to each of the process that has to run by the processor.....
Thus there are different algorithms which exists for these processes(depeneds on the operating System)
to run on a processor.....
I am getting this question in Java Interview questions. But not able to get any Java Code example related to same.
Is Time Slicing merely concept of Operating System or is there any practical usage to show using Java Programming? Can someone please share example for same?
int n = Runtime.getRuntime().availableProcessors(); then start n + 1 threads. By definition, if n + 1 threads are running on n cores then time slicing is occurring. Practical usage, you need to run n + 1 threads and only have n cpu cores.
Related
What is the explanation of this metric: system_load_average_1m
This is the help text, but I don't really understand it.
HELP system_load_average_1m The sum of the number of runnable entities
queued to available processors and the number of runnable entities
running on the available processors averaged over a period of time
In this case I have one CPU.
Does it mean that there are too many task to be solved at the same time, task are queued and waiting to be solved? So the system could stuck for this period of time?
The Micrometer metric is merely exposing the underlying OS load number. So you a can learn more about the load average elsewhere. For example: https://www.howtogeek.com/194642/understanding-the-load-average-on-linux-and-other-unix-like-systems/ states:
Unix systems traditionally just counted processes waiting for the CPU, but Linux also counts processes waiting for other resources — for example, processes waiting to read from or write to the disk.
On its own, the load number doesn’t mean too much. A computer might have a load of 0 one split-second, and a load of 5 the next split-second as several processes use the CPU. Even if you could see the load at any given time, that number would be basically meaningless.
That’s why Unix-like systems don’t display the current load. They display the load average — an average of the computer’s load over several periods of time. This allows you to see how much work your computer has been performing.
So it is averaging the load over the last minute and displaying that.
I need to get an ideal number of threads in a batch program, which runs in batch framework supporting parallel mode, like parallel step in Spring Batch.
As far as I know, it is not good that there are too many threads to execute steps of a program, it may has negative effect to the performance of the program. Some factors could arise performance degradation(context switching, race condition when using shared resources(locking, sync..) ... (are there any other factors?)).
Of course the best way of getting the ideal number of threads is for me to have actual program tests adjusting the number of threads of the program. But in my situation, it is not that easy to have the actual test because many things are needed for the tests(persons, test scheduling, test data, etc..), which are too difficult for me to prepare now. So, before getting the actual tests, I want to know the way of getting a guessable ideal number of threads of my program, as best as I can.
What should I consider to get the ideal number of threads(steps) of my program?? number of CPU cores?? number of processes on a machine on which my program would run?? number of database connection??
Is there a rational way such as a formula in a situation like this?
The most important consideration is whether your application/calculation is CPU-bound or IO-bound.
If it's IO-bound (a single thread is spending most of its time waiting for external esources such as database connections, file systems, or other external sources of data) then you can assign (many) more threads than the number of available processors - of course how many depends also on how well the external resource scales though - local file systems, not that much probably.
If it's (mostly) CPU bound, then slightly over the number of
available processors is probably best.
General Equation:
Number of Threads <= (Number of cores) / (1 - blocking factor)
Where 0 <= blocking factor < 1
Number of Core of a machine : Runtime.getRuntime().availableProcessors()
Number of Thread you can parallelism, you will get by printing out this code :
ForkJoinPool.commonPool()
And the number parallelism is Number of Core of your machine - 1. Because that one is for main thread.
Source link
Time : 1:09:00
What should I consider to get the ideal number of threads(steps) of my program?? number of CPU cores?? number of processes on a machine on which my program would run?? number of database connection?? Is there a rational way such as a formula in a situation like this?
This is tremendously difficult to do without a lot of knowledge over the actual code that you are threading. As #Erwin mentions, IO versus CPU-bound operations are the key bits of knowledge that are needed before you can determine even if threading an application will result is any improvements. Even if you did manage to find the sweet spot for your particular hardware, you might boot on another server (or a different instance of a virtual cloud node) and see radically different performance numbers.
One thing to consider is to change the number of threads at runtime. The ThreadPoolExecutor.setCorePoolSize(...) is designed to be called after the thread-pool is in operation. You could expose some JMX hooks to do this for you manually.
You could also allow your application to monitor the application or system CPU usage at runtime and tweak the values based on that feedback. You could also keep AtomicLong throughput counters and dial the threads up and down at runtime trying to maximize the throughput. Getting that right might be tricky however.
I typically try to:
make a best guess at a thread number
instrument your application so you can determine the effects of different numbers of threads
allow it to be tweaked at runtime via JMX so I can see the affects
make sure the number of threads is configurable (via system property maybe) so you don't have to rerelease to try different thread numbers
I'm writing a simple program that calculates the number Pi according to this formula. Before I elaborate more on the problem, let me say that I'm testing my program (written in Java 8) on a 12-core CPU with 24 threads. According to htop, when running the tests, I have no load on the server, so that is out of the question.
However, I expect this to have near-linear speedup, when it starts to choke at high number of threads (let's say at >8, when it gets off the y=x line). At that point, the time in which the program executes for the same parameters with different numbers of threads is constant and speedup kind of tops at 10
Without too much concrete information, I would like to know how can I analyze where my program chokes. In other words, what are some must-do's when it comes to checking parallel programs' speedup.
This question is about using the gm4java library to interact with Graphics Magick (in scala).
I've been testing the PooledGMService as it is demonstrated here with scala, and it's working well.
However, I noticed that it does not perform similarly to batch mode within the command line interface for gm (gm batch batchfile.gm). When I run a gm batch file from the command line with any number of images, it launches 1 gm process. However, if I:
val config = new GMConnectionPoolConfig()
val service = new PooledGMService(config)
and then share the instance of service across 4 threads, where I perform some operation on one image per thread like:
service.execute(
"convert",
srcPath.toString(),
"-resize", percent + "%",
outPath.toString()
)
I see that 4 separate gm processes are created.
I believe this has performance impacts (a test with 100 images, with the code mentioned above against the gm cli with a batch file, takes the same time, but my scala code uses 4x as much CPU).
My question is: how do I use gm4java so that a single gm process is working on several images (or at least several kinds of conversions for the same image), just like the cli batch mode? I've tried a few attempts (some desperately silly) with no luck here.
My exact scala code, can be found here if you are curious.
update 05/27/14
With the guidance of a comment by gm4java's author I realized that I was benchmarking two different gm commands. The updated benchmarking results are:
100 x 30MB images (3.09GB tot)
on i7 quadcore (8 logical cpu's w/ hyper-threading)
Criteria Time
gm cli batchfile 106s
my code 1 thread 112s
my code 4 threads 40s
my code 6 threads 31s
my code 7 threads 31s
my code 8 threads 28s
Upon closer inspection, I also saw that while my code ran, the same gm processes with the same process ids were kept up the whole time. This alleviated my worries that I was losing out on performance due to some overhead related to starting and terminating gm threads.
Rephrasing
I guess the heart of my question is what to do to make gm4java as fast as possible? The tip about matching gm the threadcount with the machine's execution engine count is useful. Is there anything else that comes to mind?
My particular use case is resizing input images (30MB is average, 50-60MB occasionally, and 100-500MB very rarely) to a few set sizes (with thumbnails being the most important and highest priority). Deployment will probably be on amazon ec2 with 7 or 14 "compute units"
The design of PooledGMService is to make max use of your computer power by starting multiple instances of GM processes to process your image manipulation request in a highly concurrent manner. 100 image is a too small sample size to test performance. If your goal is to make best use of your multi-CPU server to convert images, you need to test with large amount of samples (at least few thousands) and tweak the configuration to find the best number of concurrent GM processes to use. See documentation of GMConnectionPoolConfig for all the configuration options.
If you have only 8 CPUs, don't start more than 7 GM processes. If you are testing on a 2-CPU laptop, don't run more than 2 GM processes. In the example, you accepted all the default configuration setting, which will start maximal 8 GM processes upon demand. But that won't be the right configuration to just process 100 images on a merely 2 CPU laptop.
If all you want is to mimic the command line batch mode. Than the SimpleGMService is your best friend. Look at the usage pattern here.
The right solution is very much depends on your real use case. If you can tell us more about what exactly you are trying to achieve, your hardware environment and etc, we can be better equipped to help you.
My question is quite simple:
I am working on Ubuntu and I wrote a program in Java (with Eclipse IDE).
The program does not read or write anything anywhere, it just make a lot of calculation and create many instance of home made classes.
The output of the program is simple: it write A, B or C in the terminal.(consider it as a random process)
I must run the program repetitively until I get 1000000 times A and count the number of times I got B and C. I did it, it works but it is too slow.
For example:output is:
"A:1000000
B:1012458
C:1458"
This is where I need your help:
I want to parallelize the program. I tried with multi-Threading but it did not work faster! So, while each simulation is independent, I want to make multi Processing. I would like, for example, create 10 Proccess and ask them to run the program until A appears 100000 times. (so 10 * 100000 = 1000000 as I want)
The problem is that I need to know the total number of B and C and for now I got 10 value of each.
How can I do? I tried the ProcessBuilder (http://docs.oracle.com/javase/7/docs/api/java/lang/ProcessBuilder.html) but I do not understand how it works!
The only idea I have so far is to ask my program (with A till 100000) 10 times in the terminal with the command:
"java Main & java Main & java Main & java Main & java Main & java Main & java Main & java Main & java Main & java Main"
But then I must make the sum of the B and C occurrence MANUALLY. I am sure there is a better way to do this! I thought about creating 10 files with the value of (A), B and C and then read all of them and summarize them it is really a lot of work just to sum some integer isn't it?
Thank you forwards, I'm waiting for help :D
ps: To answer easily, let's consider I have a program named "prog" that take only int argument that represent the number of A I want to reach.
Parallelization makes sense only if you have multicore CPU. Run java.lang.Runtime.availableProcessors() to know how many threads you should run.
Then, running 10 batches of 100000 repetitions is not the same as running one batch with 1000000 repetitions, since the internal state of your application is changing, so think if parallelization is applicable at all in your case.
To know the total number of A,B, and C results just use AtomicInteger for all threads. Each time check if the count of A is less than 1000000.
On a single machine, parallel processing is more efficient when using multiple threads as compared to multiple processes.
When running on a single-core/single-CPU system, however, parallel processing will only bring a small performance penalty but no performance benefit for pure calculations. - Yet, when for example multiple slow IO is involved, multi-threading may speed up the process after all.
For short: Multi-processing will always be slower than multi-threading.
You could try to make a main class which launch your prog using one of the versions of Runtime.exec(...) method. By this you could use its process' outputStream to transmit to the main program the value each of your processes has computed.