i'm working or a very distinct solution on computational offloading, i can do that very well with a custom programming in c++/java but i'm in a search of same can be done in hadoop or any other framework ? i searched a lot but nothing worthy i found about that.
As we know a normal hadoop job made with Map and Reduce phase where both are running on machine which are having almost same power, for map phase we dont need the power and that can be offloaded to a cheap commodity hardware like RaspberryPI, while reduce should run on strong machine.
so is it possible to isolate these 2 phases and make them machine aware ?
On each node you can create a mapred-site.xml file to override any default settings. These settings will then only apply to this node (task tracker).
For each node can then specify values for
mapreduce.tasktracker.reduce.tasks.maximum
mapreduce.tasktracker.map.tasks.maximum
On nodes where you only want to run reduce tasks set the maximum map tasks to 0 and the other way around.
Here is the list of configuration options
Reducer jobs can run on different node but what is the advantage in running Reducer job on powerful machine?
You can use same commodity hardware configuration for both Map and Reduce nodes.
Fine tuning Map reduce job is trickier part depending on
1) Your input size
2) Time taken for Mapper to complete the Map job
3) Setting number of Map & Reducer jobs
etc.
Apart from config changes suggested by Gerhard, Have a look at some of the tips for fine tuning the performance Job
Tips to Tune the number of map and reduce tasks appropriately
Diagnostics/symptoms:
1) Each map or reduce task finishes in less than 30-40 seconds.
2) A large job does not utilize all available slots in the cluster.
3) After most mappers or reducers are scheduled, one or two remains pending and then runs all alone.
Tuning the number of map and reduce tasks for a job is important. Some tips.
1) If each task takes less than 30-40 seconds, reduce the number of tasks.
2) If a job has more than 1TB of input, consider increasing the block size of the input dataset to 256M or even 512M so that the number of tasks will be smaller.
3) So long as each task runs for at least 30-40 seconds, increase the number of mapper tasks to some multiple of the number of mapper slots in the cluster.
4) Don’t schedule too many reduce tasks – for most jobs. Number of reduce tasks should be equal to or a bit less than the number of reduce slots in the cluster.
If you still want to have different configuration, have a look at this question and Wiki link
EDIT:
Configure mapred.map.tasks in 1.x (or mapreduce.job.maps in 2.x version) & mapred.reduce.tasks in 1.x (or mapreduce.job.reduces in 2.x version) accordingly in your nodes depending on hardware configuration. Configure more reducers in better hardware nodes. But before configuring these parameters, make sure that you have taken care of INPUT size, Map processing time etc
Related
This is a compound question regarding how changing size of the thread pool sizes at run-time affects the spring batch run-time system.
To start I would like to do a verbiage clarification: concurrency = # of running steps and parallelism = # threads per step.
For a clear understanding of how I am using spring batch to do my processing. Currently I have a large number of files(200+) that are being generated and I am using Spring Batch to transfer the files where each step maps to 1 file.
Everything about the job is dynamic, as in the number of steps and each step's reader and writer is distinct to that step. So no step shares readers or writers. There is a thread pool dedicated to running the steps concurrently, and then each step has its own thread pool so we can do parallelism per step. When combined with commit interval this gives great throughput and control.
So my questions are:
How can I change the number of running steps after the Job has started?
How can I change the commit interval after a step has started processing?
So lets consider an example of why I would like to do this and what exactly I mean by change "running steps" and "commit interval".
Consider the case you have a total of 300 steps to process with a step thread pool size 5. I begin processing and realize that I have more resources to utilize, I would like to change the thread count to say 8.
When I actually do this at run-time what I experience is that the thread pool does increase but the number of running steps does not change. Why is that?
Following a similar logic say I have more memory to utilize, I would then like to increase my commit interval at run-time. I have not found anything in the StepExecution class that would let me change the commit interval surprisingly. Why not?
What is interesting is that for parallelism I am able to change the number of running threads by simply increasing that thread pool's size. From simply changing the number of parallel threads I noticed massive increase in throughput.
If you would like more information I can provide code, and link to the repository.
Thank you very much.
While it is possible to make the commit interval and thread pool size configurable and change them at startup time, it is not possible to change them at runtime (ie "in-flight") once the job execution has started.
Making the commit interval and thread pool size configurable (via application/system properties or passing them as job parameters) will allow you to empirically adapt the values to best utilize your resources without having to recompile/repackage your application.
The runtime dynamism you are looking for is not available by default, but you can always implement the Step interface and use it as part of a Spring Batch job next to other step types provided out-of-the-box by the framework.
I am using AWS EMR to run a map reduce job. My input set contains 1 million files each of around 15KB. Since input files are very small, so this will lead to a huge number of mappers. So, I changed s3 block size to 20KB and used 5 r3.2xlarge instances but number of concurrent tasks running is still just 30. Shouldn't the job run more number of concurrent mappers now after reducing the block size or even after reducing block size, memory taken by each mapper is still same?
How can I limit the memory usage of each mapper or increase the number of concurrent mapper tasks? The current expected completion time is 100hours, will combining these files to lesser number of bigger files, like 400MB files, increase the processing time?
Reducing Block size can increase the number of mappers required for a particular job , but will not increase the parallel number of mappers that your cluster can run at a given point nor the memory used for those mappers.
used 5 r3.2xlarge instances but number of concurrent tasks running is
still just 30
To find the parallel maps/Reducers that a Hadoop 2 EMR cluster can support , please see this article AWS EMR Parallel Mappers?
Ex: r3.2xlarge * 5 core's :
mapreduce.map.memory.mb 3392 3392
yarn.scheduler.maximum-allocation-mb 54272
yarn.nodemanager.resource.memory-mb 54272
Once core-node can have 54272/3392 = 16 mappers .
So, a cluster can have a total of 16*5 = 80 mappers in parallel.
So , if your job spins up like 1000 mappers , cluster can launch 80 mappers with that preconfigured memory and heap on your nodes and other mappers will be simply Queued up.
If you want more parallel mappers, you might want to configure less memory (based on that math) and less heap for mapper.
What you are looking for is CombineFileInputFormat .
Do remember map slit size by default = HDFS block size by default. Changing one will not affect the other.
Please follow the link : http://bytepadding.com/big-data/map-reduce/understanding-map-reduce-the-missing-guide/
I need to get an ideal number of threads in a batch program, which runs in batch framework supporting parallel mode, like parallel step in Spring Batch.
As far as I know, it is not good that there are too many threads to execute steps of a program, it may has negative effect to the performance of the program. Some factors could arise performance degradation(context switching, race condition when using shared resources(locking, sync..) ... (are there any other factors?)).
Of course the best way of getting the ideal number of threads is for me to have actual program tests adjusting the number of threads of the program. But in my situation, it is not that easy to have the actual test because many things are needed for the tests(persons, test scheduling, test data, etc..), which are too difficult for me to prepare now. So, before getting the actual tests, I want to know the way of getting a guessable ideal number of threads of my program, as best as I can.
What should I consider to get the ideal number of threads(steps) of my program?? number of CPU cores?? number of processes on a machine on which my program would run?? number of database connection??
Is there a rational way such as a formula in a situation like this?
The most important consideration is whether your application/calculation is CPU-bound or IO-bound.
If it's IO-bound (a single thread is spending most of its time waiting for external esources such as database connections, file systems, or other external sources of data) then you can assign (many) more threads than the number of available processors - of course how many depends also on how well the external resource scales though - local file systems, not that much probably.
If it's (mostly) CPU bound, then slightly over the number of
available processors is probably best.
General Equation:
Number of Threads <= (Number of cores) / (1 - blocking factor)
Where 0 <= blocking factor < 1
Number of Core of a machine : Runtime.getRuntime().availableProcessors()
Number of Thread you can parallelism, you will get by printing out this code :
ForkJoinPool.commonPool()
And the number parallelism is Number of Core of your machine - 1. Because that one is for main thread.
Source link
Time : 1:09:00
What should I consider to get the ideal number of threads(steps) of my program?? number of CPU cores?? number of processes on a machine on which my program would run?? number of database connection?? Is there a rational way such as a formula in a situation like this?
This is tremendously difficult to do without a lot of knowledge over the actual code that you are threading. As #Erwin mentions, IO versus CPU-bound operations are the key bits of knowledge that are needed before you can determine even if threading an application will result is any improvements. Even if you did manage to find the sweet spot for your particular hardware, you might boot on another server (or a different instance of a virtual cloud node) and see radically different performance numbers.
One thing to consider is to change the number of threads at runtime. The ThreadPoolExecutor.setCorePoolSize(...) is designed to be called after the thread-pool is in operation. You could expose some JMX hooks to do this for you manually.
You could also allow your application to monitor the application or system CPU usage at runtime and tweak the values based on that feedback. You could also keep AtomicLong throughput counters and dial the threads up and down at runtime trying to maximize the throughput. Getting that right might be tricky however.
I typically try to:
make a best guess at a thread number
instrument your application so you can determine the effects of different numbers of threads
allow it to be tweaked at runtime via JMX so I can see the affects
make sure the number of threads is configurable (via system property maybe) so you don't have to rerelease to try different thread numbers
Below is default configuration in hazelcast.xml,
<jobtracker name="default">
<max-thread-size>0</max-thread-size>
<!-- Queue size 0 means number of partitions * 2 -->
<queue-size>0</queue-size>
<retry-count>0</retry-count>
<chunk-size>1000</chunk-size>
<communicate-stats>true</communicate-stats>
<topology-changed-strategy>CANCEL_RUNNING_OPERATION</topology-changed-strategy>
</jobtracker>
How to update this configuration to get better performance for map reducing in java application???
The values you normally want to optimize are chunk-size and communicate-stats. First property is heavily depending on the way your mr job works and needs some trial and error, best is to keep reducers busy all the time (so depending on the reducing operation either bigger chunk size for heavy ops or smaller chunks for light operations). The communicate-stats deactivates transmission of statistical information which is normally not being used anyways.
It seems there is a limit on the number of jobs that Quartz scheduler can run per second. In our scenario we are having about 20 jobs per second firing up for 24x7 and quartz worked well upto 10 jobs per second (with 100 quartz threads and 100 database connection pool size for a JDBC backed JobStore), however, when we increased it to 20 jobs per second, quartz became very very slow and its triggered jobs are very late compared to their actual scheduled time causing many many Misfires and eventually slowing down the overall performance of the system significantly. One interesting fact is that JobExecutionContext.getScheduledFireTime().getTime() for such delayed triggers comes to be 10-20 and even more minutes after their schedule time.
How many jobs the quartz scheduler can run per second without affecting the scheduled time of the jobs and what should be the optimum number of quartz threads for such load?
Or am I missing something here?
Details about what we want to achieve:
We have almost 10k items (categorized among 2 or more categories, in current case we have 2 categories) on which we need to some processing at given frequency e.g. 15,30,60... minutes and these items should be processed within that frequency with a given throttle per minute. e.g. lets say for 60 minutes frequency 5k items for each category should be processed with a throttle of 500 items per minute. So, ideally these items should be processed within first 10 (5000/500) minutes of each hour of the day with each minute having 500 items to be processed which are distributed evenly across the each second of the minute so we would have around 8-9 items per second for one category.
Now for to achieve this we have used Quartz as scheduler which triggers jobs for processing these items. However, we don't process each item with in the Job.execute method because it would take 5-50 seconds (averaging to 30 seconds) per item processing which involves webservice call. We rather push a message for each item processing on JMS queue and separate server machines process those jobs. I have noticed the time being taken by the Job.execute method not to be more than 30 milliseconds.
Server Details:
Solaris Sparc 64 Bit server with 8/16 cores/threads cpu for scheduler with 16GB RAM and we have two such machines in the scheduler cluster.
In a previous project, I was confronted with the same problem. In our case, Quartz performed good up a granularity of a second. Sub-second scheduling was a stretch and as you are observing, misfires happened often and the system became unreliable.
Solved this issue by creating 2 levels of scheduling: Quartz would schedule a job 'set' of n consecutive jobs. With a clustered Quartz, this means that a given server in the system would get this job 'set' to execute. The n tasks in the set are then taken in by a "micro-scheduler": basically a timing facility that used the native JDK API to further time the jobs up to the 10ms granularity.
To handle the individual jobs, we used a master-worker design, where the master was taking care of the scheduled delivery (throttling) of the jobs to a multi-threaded pool of workers.
If I had to do this again today, I'd rely on a ScheduledThreadPoolExecutor to manage the 'micro-scheduling'. For your case, it would look something like this:
ScheduledThreadPoolExecutor scheduledExecutor;
...
scheduledExecutor = new ScheduledThreadPoolExecutor(THREAD_POOL_SIZE);
...
// Evenly spread the execution of a set of tasks over a period of time
public void schedule(Set<Task> taskSet, long timePeriod, TimeUnit timeUnit) {
if (taskSet.isEmpty()) return; // or indicate some failure ...
long period = TimeUnit.MILLISECOND.convert(timePeriod, timeUnit);
long delay = period/taskSet.size();
long accumulativeDelay = 0;
for (Task task:taskSet) {
scheduledExecutor.schedule(task, accumulativeDelay, TimeUnit.MILLISECOND);
accumulativeDelay += delay;
}
}
This gives you a general idea on how use the JDK facility to micro-schedule tasks. (Disclaimer: You need to make this robust for a prod environment, like check failing tasks, manage retries (if supported), etc...).
With some testing + tuning, we found an optimal balance between the Quartz jobs and the amount of jobs in one scheduled set.
We experienced a 100X throughput improvement in this way. Network bandwidth was our actual limit.
First of all check How do I improve the performance of JDBC-JobStore? in Quartz documentation.
As you can probably guess there is in absolute value and definite metric. It all depends on your setup. However here are few hints:
20 jobs per second means around 100 database queries per second, including updates and locking. That's quite a lot!
Consider distributing your Quartz setup to cluster. However if database is a bottleneck, it won't help you. Maybe TerracottaJobStore will come to the rescue?
Having K cores in the system everything less than K will underutilize your system. If your jobs are CPU intensive, K is fine. If they are calling external web services, blocking or sleeping, consider much bigger values. However more than 100-200 threads will significantly slow down your system due to context switching.
Have you tried profiling? What is your machine doing most of the time? Can you post thread dump? I suspect poor database performance rather than CPU, but it depends on your use case.
You should limit your number of threads to somewhere between n and n*3 where n is the number of processors available. Spinning up more threads is going to cause a lot of context switching, since most of them will be blocked most of the time.
As far as jobs per second, it really depends on how long the jobs run and how often they're blocked for operations like network and disk io.
Also, something to consider is that perhaps quartz isn't the tool you need. If you're sending off 1-2 million jobs a day, you might want to look into a custom solution. What are you even doing with 2 million jobs a day?!
Another option, which is a really bad way to approach the problem, but sometimes works... what is the server it's running on? Is it an older server? It might be bumping up the ram or other specs on it will give you some extra 'umph'. Not the best solution, for sure, because that delays the problem, not addresses, but if you're in a crunch it might help.
In situations with high amount of jobs per second make sure your sql server uses row lock and not table lock. In mysql this is done by using InnoDB storage engine, and not the default MyISAM storage engine which only supplies table lock.
Fundamentally the approach of doing 1 item at a time is doomed and inefficient when you're dealing with such a large number of things to do within such a short time. You need to group things - the suggested approach of using a job set that then micro-schedules each individual job is a first step, but that still means doing a whole lot of almost nothing per job. Better would be to improve your webservice so you can tell it to process N items at a time, and then invoke it with sets of items to process. And even better is to avoid doing this sort of thing via webservices and process them all inside a database, as sets, which is what databases are good for. Any sort of job that processes one item at a time is fundamentally an unscalable design.