Bookmark and Share

Introducing the Model Zoo for Intel Architecture

This blog post was originally published at Intel's website. It is reprinted here with the permission of Intel.

Are you a data scientist who wants to optimize the performance of your machine learning (ML) inference workloads? Perhaps you’ve heard of the Intel® Optimization for TensorFlow* and the Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN), but have not yet seen a working application in your domain that takes full advantage of Intel’s optimizations. The Model Zoo for Intel Architecture is an open-sourced collection of optimized machine learning inference applications that demonstrates how to get the best performance on Intel platforms. The project contains more than 20 pre-trained models, benchmarking scripts, best practice documents, and step-by-step tutorials for running deep learning (DL) models optimized for Intel® Xeon® Scalable processors.

With the Model Zoo, you can easily:

  • Learn which AI topologies and applications Intel has optimized to run on its hardware
  • Benchmark the performance of optimized models on Intel hardware
  • Get started efficiently running optimized models in the cloud or on bare metal

What’s in Version 1.3

The latest release of the Model Zoo features optimized models for the TensorFlow* framework and benchmarking scripts for both 32-bit floating point (FP32) and 8-bit integer (Int8) precision. Most commercial DL applications today use FP32 precision for training and inference, though 8-bit multipliers can be used for inference with minimal to no loss in accuracy. The Int8 models were created using post-training quantization techniques for reduced model size and lower latency.

FP32 TensorFlow Models
Adversarial Networks DCGAN
Content Creation DRAW
Face Detection and Alignment FaceNet
MTCC
Image Recognition Inception ResNet V2
Inception V3
Inception V4
MobileNet V1
ResNet 101
ResNet 50
SqueezeNet
Image Segmentation Mask R-CNN
UNet
Language Translation GNMT
Transformer-LT
Object Detection Faster R-CNN
R-FCN
SSD-MobileNet
SSD-ResNet34
Recommendation Systems NCF
Wide & Deep
Text-to-Speech WaveNet
Int8 TensorFlow Models
Image Recognition Inception ResNet V2
Inception V3
Inception V4
ResNet 101
ResNet 50
Object Detection Faster R-CNN
R-FCN
SSD-MobileNet
Recommendation Systems Wide & Deep

You can run benchmarks by cloning the repository and following the step-by-step instructions for your topology of choice. Each model’s benchmark README contains detailed information for downloading a pre-trained model from a public cloud storage location, acquiring a test dataset, and launching the model’s benchmarking script. The benchmarking scripts are designed to run by default in a containerized environment using Intel-optimized TensorFlow Docker* images with all the necessary dependencies taken care of automatically. There is an alpha feature that allows you to run without Docker, but you must manually set up and install all the required dependencies to run benchmarks in this mode. Either way, the script automatically applies the optimal TensorFlow runtime settings for your Intel hardware and provides an output log describing the model performance metrics and settings used. There are options for testing real-time inference (latency with batch size 1), maximum throughput inference (large batch size), and some scripts also offer the option of measuring accuracy.

Example: FP32 Inception V3 Benchmarks

To show benchmarking in action, the following steps are reproduced from the FP32 Inception V3 benchmark README and adjusted for brevity:

  1. Clone the intelai/models repository.

    $ git clone https://github.com/IntelAI/models.git

  2. Download the pre-trained model.

    $ wget https://storage.googleapis.com/intel-optimized-tensorflow/models/inceptionv3_fp32_pretrained_model.pb

  3. If you would like to run Inception V3 FP32 inference and test for accuracy, you will need the ImageNet dataset. Benchmarking for latency and throughput does not require the ImageNet dataset and will use synthetic/dummy data if no dataset is provided. Instructions for downloading the dataset and converting it to the TF Records format can be found in the TensorFlow documentation here.

  4. Navigate to the benchmarks directory in your local clone of the intelai/models repo. The launch_benchmark.py script in the benchmarks directory is used for starting a benchmarking run in an optimized TensorFlow docker container. It has arguments to specify which model, framework, mode, precision, and docker image.

    For latency (using --batch-size 1):

    python launch_benchmark.py \
    --model-name inceptionv3 \
    --precision fp32 \
    --mode inference \
    --framework tensorflow \
    --batch-size 1 \
    --socket-id 0 \
    --docker-image intelaipg/intel-optimized-tensorflow:latest-devel-mkl \
    --in-graph /home/<user>/inceptionv3_fp32_pretrained_model.pb

    Example log tail:

    Inference with dummy data.
    Iteration 1: 1.075 sec
    Iteration 2: 0.023 sec
    Iteration 3: 0.016 sec
    ...
    Iteration 38: 0.014 sec
    Iteration 39: 0.014 sec
    Iteration 40: 0.014 sec
    Average time: 0.014 sec
    Batch size = 1
    Latency: 14.442 ms
    Throughput: 69.243 images/sec
    Ran inference with batch size 1
    Log location outside container: {--output-dir
    value}/benchmark_inceptionv3_inference_fp32_20190104_025220.log

    For throughput (using --batch-size 128):

    python launch_benchmark.py \
    --model-name inceptionv3 \
    --precision fp32 \
    --mode inference \
    --framework tensorflow \
    --batch-size 128 \
    --socket-id 0 \
    --docker-image intelaipg/intel-optimized-tensorflow:latest-devel-mkl \
    --in-graph /home/<user>/inceptionv3_fp32_pretrained_model.pb

    Example log tail:

    Inference with dummy data.
    Iteration 1: 2.024 sec
    Iteration 2: 0.765 sec
    Iteration 3: 0.781 sec
    ...
    Iteration 38: 0.756 sec
    Iteration 39: 0.760 sec
    Iteration 40: 0.757 sec
    Average time: 0.760 sec
    Batch size = 128
    Throughput: 168.431 images/sec
    Ran inference with batch size 128
    Log location outside container: {--output-dir
    value}/benchmark_inceptionv3_inference_fp32_20190104_024842.log

Documentation

In addition to benchmarking scripts and instructions for each model, the repository contains a documentation section with best practice guides for achieving maximum performance with Intel Optimization for TensorFlow and TensorFlow Serving. These are the best resources for in-depth knowledge about installing and tuning the frameworks for optimal performance on Intel hardware. Included in the documentation are hands-on tutorials for a selection of models in the Model Zoo and a tutorial on how to quantize the FP32 ResNet50 model to Int8 precision for improved performance while retaining high accuracy. Here is a sample of the documents found in v1.3 (see the documentation README for a full list):

Intel Optimization for TensorFlow

Intel Optimization for TensorFlow Serving

What’s Next?

Future releases of the Model Zoo will add more Int8 precision models and more hands-on tutorials covering additional models for TensorFlow, TensorFlow Serving, and the Int8 quantization process. We are also working on expanding the Model Zoo to include additional frameworks and benchmarking scripts that cover training in addition to inference and accuracy.

Visit the project on GitHub for more information and instructions on getting started.

Melanie Hart Buehler
Cloud Software Engineer, Artificial Intelligence Products Group, Intel