Install
To run on CPUs:
$ pip install horovod
To run on GPUs with NCCL:
$ HOROVOD_GPU_OPERATIONS=NCCL pip install horovod
See the Installation Guide for more details.
Modify
This example shows how to modify a TensorFlow v1 training script to use Horovod:
# 1: Initialize Horovod
import horovod.tensorflow as hvd
hvd.init()
# 2: Pin GPU to be used to process local rank (one GPU per process)
config = tf.ConfigProto()
config.gpu_options.visible_device_list = str(hvd.local_rank())
# 3: Add Horovod Distributed Optimizer and scale the learning rate
opt = tf.train.AdagradOptimizer(0.01 * hvd.size())
opt = hvd.DistributedOptimizer(opt)
# 4: Broadcast variables from rank 0 to all other processes during initialization.
hooks = [hvd.BroadcastGlobalVariablesHook(0)]
# 5: Save checkpoints only on worker 0 to prevent other workers from corrupting them.
checkpoint_dir = '/tmp/train_logs' if hvd.rank() == 0 else None
See the examples directory and API for more details.
Run
To run on a machine with 4 GPUs:
$ horovodrun -np 4 -H localhost:4 python train.py
To run on 4 machines with 4 GPUs each:
$ horovodrun -np 16 -H server1:4,server2:4,server3:4,server4:4 python train.py
See the Run documentation for more details.