Modifying the Number of Mappers or Reducers on a Running EMR Cluster

Amazon emr unfortunately doesn’t give you an easy way to change the number of mappers and reducers on a running cluster. To do so before booting the cluster, add

--bootstrap-action="s3://elasticmapreduce/bootstrap-actions/configure-hadoop"  \
   --args "-m,mapred.tasktracker.map.tasks.maximum=4,-m,mapred.tasktracker.reduce.tasks.maximum=2"

as appropriate to the elastic-mapreduce.rb command.

For a running emr cluster, you can use the following scripts. Navigate to the conf directory; it will be in a path similar to /home/hadoop/.versions/1.0.3/conf

Edit mapred-site.xml and replace either or both of

mapred.tasktracker.map.tasks.maximum

mapred.tasktracker.reduce.tasks.maximum

Then copy and paste these commands:

$ # distribute the file to all nodes
hadoop job -list-active-trackers | sed "s/^.*_//" | sed "s/:.*//" | xargs -t -I{} -P10 scp -o StrictHostKeyChecking=no  mapred-site.xml hadoop@{}:.versions/1.0.3/conf/
$
$ # bounce the tasktrackers on each node
hadoop job -list-active-trackers | sed "s/^.*_//" | sed "s/:.*//" | xargs -t -I{} -P10 ssh -o StrictHostKeyChecking=no hadoop@{}   sudo /etc/init.d/hadoop-tasktracker stop
$
$ # restart the jobtracker on the headnode
sudo /etc/init.d/hadoop-jobtracker stop

One way to verify this worked is on the jobtracker web page.

Stochastic Nonsense

Put something smart here.

Modifying the Number of Mappers or Reducers on a Running EMR Cluster