Skip to content

Increasing batch size does not improve efficiency #65

@mrpositron

Description

@mrpositron

When I increase batch size, the inference time on TensorRT does not change. Basically if inference time on the batch with size 8 took 20ms. Inference on batch size 16 just takes 40ms. I am not sure why this is happening ...

I have converted EfficientNet backbone from TF to ONNX, and then to TensorRT. In TF I specified the batch size as follows:

# save backbone model w/ full signature!

@tf.function()
def my_predict(my_prediction_inputs, **kwargs):
    prediction = mod(my_prediction_inputs, training=False)
    return {"prediction": prediction}

my_signatures = my_predict.get_concrete_function(
   my_prediction_inputs=tf.TensorSpec([batch_size, 256, 256, 3], dtype=tf.float32, name="image")
)

tf.saved_model.save(mod, bbone_name, signatures=my_signatures)

Converting TensorFlow model to ONNX

$ python -m tf2onnx.convert --saved-model mods/effnet-l/bbone --output mods/effnet-l/bbone.onnx

Converting ONNX model to TensorRT and saving it.

import engine as eng
import argparse
from onnx import ModelProto
import tensorrt as trt

base_dir = "mods/effnet-l"
# base_dir = "mods/resnet152/"
onnx_path = base_dir+"/bbone.onnx"
engine_name =  base_dir+"/bbone.plan"

batch_size = 8

model = ModelProto()
with open(onnx_path, "rb") as f:
    model.ParseFromString(f.read())

shape = [batch_size, 256, 256, 3]

engine = eng.build_engine(onnx_path, shape=shape)
eng.save_engine(engine, engine_name) 

Here is an inference code for TensorRT.

Everything works properly. The problem is the speed. Basically, if I increase batch size twice it will just increase inference time twice. Thus, it is not changing total inference time.

std::vector<float> EffnetBBone::convert_mat_to_fvec(cv::Mat mat)
{
    std::vector<float> array;
    if (mat.isContinuous())
    {
        array.assign((float *)mat.data, (float *)mat.data + mat.total() * mat.channels());
    }
    else
    {
        for (int i = 0; i < mat.rows; ++i)
        {
            array.insert(array.end(), mat.ptr<float>(i), mat.ptr<float>(i) + mat.cols * mat.channels());
        }
    }
    return array;
}



EffnetBBone::EffnetBBone(std::string base_dir, bool half_precision)
{
    onnx_net = new Trt();
    if (half_precision)
    {
        onnx_net->EnableFP16();
    }
    onnx_net->BuildEngine(base_dir + "/bbone.onnx", base_dir + "/bbone.plan")
    onnx_net->SetLogLevel((int)Severity::kINTERNAL_ERROR);
    
}

std::vector<float> EffnetBBone::run_batch(std::vector<cv::Mat> batch_img, bool normalized)
{
    cv::Mat crop;
    std::vector<float> batch_fvec;
    int size = batch_img.size() * (327680 / 4) ;
    std::vector<float> output(size);
    for (int i = 0; i < batch_img.size(); i++)
    {
        std::vector<float> fvec;
        crop = batch_img[i];
        cv::Mat img_f32;
        
        crop.convertTo(img_f32, CV_32F);
        if (normalized == false){
            img_f32 = img_f32 / 256.f;
        }   
        fvec = convert_mat_to_fvec(img_f32);
        batch_fvec.insert(batch_fvec.end(), fvec.begin(), fvec.end());
    }
    
    onnx_net->CopyFromHostToDevice(batch_fvec, inputBindIndex);
    bool state = onnx_net->Forward();
    assert(state == true);
    onnx_net->CopyFromDeviceToHost(output, outputBindIndex);
    return output;
}

Screenshots
If applicable, add screenshots to help explain your problem.

System environment (please complete the following information):

  • Device: GeForce RTX 3090
  • OS: Ubuntu 20.04
  • Driver version: 470.103.01
  • CUDA version: 11.2
  • TensorRT version: 8.4.0
  • Others:

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions