This is a list of samples to run on different hardware.
CPU requires FP32 or int8 models. All other hardware requires FP16 models, though GPU can run non-optimally with FP32 models in some cases.
To prepare for the exercise, we're going to convert an FP32 SqueezeNet 1.1 model (provided with OpenVINO) to an FP16 version.
- Create a directory for the FP16 squeezenet model.
mkdir ~/squeezenet1.1_FP16
- Move into the new directory.
cd ~/squeezenet1.1_FP16
- Use the Model Optimizer to convert a model to FP16
python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model ~/openvino_models/models/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.caffemodel --data_type FP16 --output_dir .
- Copy the labels file to the directory with the new FP16 SqueezeNet model.
cp ~/openvino_models/ir/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.labels .
- Copy the car image to the samples directory for ease of use.
sudo cp /opt/intel/openvino/deployment_tools/demo/car.png ~/inference_engine_samples/intel64/Release
- Go to the samples directory:
cd ~/inference_engine_samples/intel64/Release
- Use the Inference Engine to run a sample application on the CPU or iGPU:
- To run the sample application on the CPU:
./classification_sample -i car.png -m ~/openvino_models/ir/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.xml
- To run the sample application on the iGPU:
./classification_sample -i car.png -m ~/squeezenet1.1_FP16/squeezenet1.1.xml -d GPU
- To run the sample application on the CPU:
- To run the inference using both your FPGA and CPU, add the
-doption andHETERO:to your target:./classification_sample -i car.png -m ~/squeezenet1.1_FP16/squeezenet1.1.xml -d HETERO:FPGA,CPU
- To run the sample application on your target accelerator:
- To run the sample application on the Intel® Arria® 10 GX FPGA Development Kit:
aocl program acl0 /opt/intel/computer_vision_sdk_2018.5.445/bitstreams/a10_devkit_bitstreams/5-0_A10DK_FP11_SqueezeNet.aocx
- To run the sample application on the Intel® Programmable Acceleration Card (PAC) with Intel® Arria® 10 GX FPGA:
aocl program acl0 /opt/intel/computer_vision_sdk_2018.5.445/bitstreams/a10_dcp_bitstreams/5-0_RC_FP11_SqueezeNet.aocx
- To run the sample application on the Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA (Mustang-F100-A10):
aocl program acl0 /opt/intel/computer_vision_sdk_2018.5.445/bitstreams/a10_vision_design_bitstreams/5-0_PL1_FP11_SqueezeNet.aocx
- To run the sample application on the Intel® Vision Accelerator Design with Intel® Movidius™ VPUs:
./classification_sample -i car.png -m ~/squeezenet1.1_FP16/squeezenet1.1.xml -d HDDL
- To run the sample application on the Intel® Movidius™ Neural Compute Stick or Intel® Neural Compute Stick 2:
./classification_sample -i car.png -m ~/squeezenet1.1_FP16/squeezenet1.1.xml -d MYRIAD
- To run the sample application on the Intel® Arria® 10 GX FPGA Development Kit:
- Use
-nito increase the number of iterations. This option reduces the initialization impact:./classification_sample -i car.png -m ~/squeezenet1.1_FP16/squeezenet1.1.xml -d HETERO:FPGA,CPU -ni 100
NOTE: The CPU throughput is measured in Frames Per Second (FPS). This tells you how quickly the inference is done on the hardware.
The throughput on the accelerator may show a lower FPS due to the initialization time. To account for that, the next step increases the iterations to get a better sense of the speed the inference can run on the accelerator.