Today | started looking at the different MapReduce Schedulers, because I would like to be able to start the processing on a new jobs when slots became available. So I started look at the other schedulers that come with Hadoop.
The Capacity Scheduler is designed to run Hadoop Map-Reduce as a shared, multi-tenant cluster in an operator-friendly manner while maximizing the throughput and the utilization of the cluster while running Map-Reduce applications.
Fair scheduling is a method of assigning resources to jobs such that all jobs get, on average, an equal share of resources over time. When there is a single job running, that job uses the entire cluster. When other jobs are submitted, tasks slots that free up are assigned to the new jobs, so that each job gets roughly the same amount of CPU time. Unlike the default Hadoop scheduler, which forms a queue of jobs, this lets short jobs finish in reasonable time while not starving long jobs. It is also an easy way to share a cluster between multiple of users. Fair sharing can also work with job priorities – the priorities are used as weights to determine the fraction of total compute time that each job gets.
Hadoop On Demand (HOD) is a system for provisioning and managing independent Hadoop MapReduce and Hadoop Distributed File System (HDFS) instances on a shared cluster of nodes see this page. HOD is a tool that makes it easy for administrators and users to quickly setup and use Hadoop. HOD is also a very useful tool for Hadoop developers and testers who need to share a physical cluster for testing their own Hadoop versions.
I decided to started with the Fair Scheduler, since it seem to fit my needs, but I will try to keep you informed of my progress.