This example shows how DALI's implementation of automatic augmentations - most notably AutoAugment and TrivialAugment - can be used in training. It shows the training of EfficientNet, an image classification model first described in EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.
The code is based on NVIDIA Deep Learning Examples - it has been extended with DALI pipeline supporting automatic augmentations, which can be found in :fileref:`here <docs/examples/use_cases/pytorch/efficientnet/image_classification/dali.py>`.
- The default values of the parameters were adjusted to values used in EfficientNet training.
--data-backendparameter was changed to acceptpytorch,pytorch_optimized,synthetic,daliordali_proxy. It is set todaliby default.--dali-devicewas added to control placement of some of DALI operators.--augmentationwas replaced with--automatic-augmentation, now supportingdisabled,autoaugment, andtrivialaugmentvalues.--workersdefaults were halved to accommodate DALI. The value is automatically doubled whenpytorchdata loader is used. Thanks to this the default value performs well with both loaders.- The model is restricted to EfficientNet-B0 architecture.
This model uses the following data augmentation:
- For training:
- Random resized crop to target images size (in this case 224)
- Scale from 8% to 100%
- Aspect ratio from 3/4 to 4/3
- Random horizontal flip
- [Optional: AutoAugment or TrivialAugment]
- Normalization
- Random resized crop to target images size (in this case 224)
- For inference:
- Scale to target image size + additional size margin (in this case it is 224 + 32 = 266)
- Center crop to target image size (in this case 224)
- Normalization
The EfficientNet script operates on ImageNet 1k, a widely popular image classification dataset from the ILSVRC challenge.
- Download the dataset from http://image-net.org/download-images
- Extract the training data:
mkdir train && mv ILSVRC2012_img_train.tar train/ && cd train
tar -xvf ILSVRC2012_img_train.tar && rm -f ILSVRC2012_img_train.tar
find . -name "*.tar" | while read NAME ; do mkdir -p "${NAME%.tar}"
tar -xvf "${NAME}" -C "${NAME%.tar}"; rm -f "${NAME}"; done
cd ..- Extract the validation data and move the images to subfolders:
mkdir val && mv ILSVRC2012_img_val.tar val/ && cd val && tar -xvf ILSVRC2012_img_val.tar
wget -qO- https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh | bashThe directory in which the train/ and val/ directories are placed, is referred to as $PATH_TO_IMAGENET in this document.
- Make sure you are either using the NVIDIA PyTorch NGC container or you have DALI and PyTorch installed.
- Install NVIDIA DLLogger and pynvml.
To run training on a single GPU, use the main.py entry point:
- For FP32:
python ./main.py --batch-size 64 $PATH_TO_IMAGENET - For AMP:
python ./main.py --batch-size 64 --amp --static-loss-scale 128 $PATH_TO_IMAGENET
You may need to adjust --batch-size parameter for your machine.
You can change the data loader and automatic augmentation scheme that are used by adding:
--data-backend:dali|dali_proxy|pytorch|synthetic,--automatic-augmentation:disabled|autoaugment|trivialaugment(the last one only for DALI),--dali-device:cpu|gpu(only for DALI).
By default DALI GPU-variant with AutoAugment is used (dali and dali_proxy backends).
- dali: Leverages a DALI pipeline along with DALI's PyTorch iterator for data loading, preprocessing, and augmentation.
- dali_proxy: Uses a DALI pipeline for preprocessing and augmentation while relying on PyTorch's data loader. DALI Proxy facilitates the transfer of data to DALI for processing. See :ref:`pytorch_dali_proxy`.
- pytorch: Employs the native PyTorch data loader for data preprocessing and augmentation.
- synthetic: Creates synthetic data on the fly, which is useful for testing and benchmarking purposes. This backend eliminates the need for actual datasets, providing a convenient way to simulate data loading.
For example to run the EfficientNet with AMP on a batch size of 128 with DALI using TrivialAugment you need to invoke:
python ./main.py --amp --static-loss-scale 128 --batch-size 128 --data-backend dali --automatic-augmentation trivialaugment $PATH_TO_IMAGENETTo run on multiple GPUs, use the multiproc.py to launch the main.py entry point script, passing the number of GPUs as --nproc_per_node argument. For example, to run the model on 8 GPUs using AMP and DALI with AutoAugment you need to invoke:
python ./multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 --batch-size 128 --data-backend dali --automatic-augmentation autoaugment $PATH_TO_IMAGENETTo see the full list of available options and their descriptions, use the -h or --help command-line option, for example:
python main.py -hTo run the training in a standard configuration (DGX A100/DGX-1V, AMP, 400 Epochs, DALI with AutoAugment) invoke the following command:
- for DGX1V-16G:
python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 --batch-size 128 $PATH_TO_IMAGENET - for DGX-A100:
python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 --batch-size 256 $PATH_TO_IMAGENET`
To run training benchmarks with different data loaders and automatic augmentations, you can use following commands, assuming that they are running on DGX1V-16G with 8 GPUs, 128 batch size and AMP:
# Adjust the following variable to control where to store the results of the benchmark runs
export RESULT_WORKSPACE=./
# synthetic benchmark
python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128
--batch-size 128 --epochs 1 --prof 1000 --no-checkpoints
--training-only --data-backend synthetic
--workspace $RESULT_WORKSPACE
--report-file bench_report_synthetic.json $PATH_TO_IMAGENET
# DALI without automatic augmentations
python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128
--batch-size 128 --epochs 4 --no-checkpoints --training-only
--data-backend dali --automatic-augmentation disabled
--workspace $RESULT_WORKSPACE
--report-file bench_report_dali.json $PATH_TO_IMAGENET
# DALI with AutoAugment
python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128
--batch-size 128 --epochs 4 --no-checkpoints --training-only
--data-backend dali --automatic-augmentation autoaugment
--workspace $RESULT_WORKSPACE
--report-file bench_report_dali_aa.json $PATH_TO_IMAGENET
# DALI with TrivialAugment
python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128
--batch-size 128 --epochs 4 --no-checkpoints --training-only
--data-backend dali --automatic-augmentation trivialaugment
--workspace $RESULT_WORKSPACE
--report-file bench_report_dali_ta.json $PATH_TO_IMAGENET
# DALI proxy with AutoAugment
python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128
--batch-size 128 --epochs 4 --no-checkpoints --training-only
--data-backend dali_proxy --automatic-augmentation autoaugment
--workspace $RESULT_WORKSPACE
--report-file bench_report_dali_proxy_aa.json $PATH_TO_IMAGENET
# DALI proxy with TrivialAugment
python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128
--batch-size 128 --epochs 4 --no-checkpoints --training-only
--data-backend dali_proxy --automatic-augmentation trivialaugment
--workspace $RESULT_WORKSPACE
--report-file bench_report_dali_proxy_ta.json $PATH_TO_IMAGENET
# PyTorch without automatic augmentations
python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128
--batch-size 128 --epochs 4 --no-checkpoints --training-only
--data-backend pytorch --automatic-augmentation disabled
--workspace $RESULT_WORKSPACE
--report-file bench_report_pytorch.json $PATH_TO_IMAGENET
# PyTorch with AutoAugment:
python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128
--batch-size 128 --epochs 4 --no-checkpoints --training-only
--data-backend pytorch --automatic-augmentation autoaugment
--workspace $RESULT_WORKSPACE
--report-file bench_report_pytorch_aa.json $PATH_TO_IMAGENETValidation is done every epoch, and can be also run separately on a checkpointed model.
python ./main.py --evaluate --epochs 1 --resume <path to checkpoint>
-b <batch size> $PATH_TO_IMAGENETTo run inference on JPEG image, you have to first extract the model weights from checkpoint:
python checkpoint2model.py --checkpoint-path <path to checkpoint>
--weight-path <path where weights will be stored>Then, run the classification script:
python classify.py --pretrained-from-file <path to weights from previous step>
--precision AMP|FP32 --image <path to JPEG image>