-
Notifications
You must be signed in to change notification settings - Fork 100
Open
Description
When I increase batch size, the inference time on TensorRT does not change. Basically if inference time on the batch with size 8 took 20ms. Inference on batch size 16 just takes 40ms. I am not sure why this is happening ...
I have converted EfficientNet backbone from TF to ONNX, and then to TensorRT. In TF I specified the batch size as follows:
# save backbone model w/ full signature!
@tf.function()
def my_predict(my_prediction_inputs, **kwargs):
prediction = mod(my_prediction_inputs, training=False)
return {"prediction": prediction}
my_signatures = my_predict.get_concrete_function(
my_prediction_inputs=tf.TensorSpec([batch_size, 256, 256, 3], dtype=tf.float32, name="image")
)
tf.saved_model.save(mod, bbone_name, signatures=my_signatures)
Converting TensorFlow model to ONNX
$ python -m tf2onnx.convert --saved-model mods/effnet-l/bbone --output mods/effnet-l/bbone.onnx
Converting ONNX model to TensorRT and saving it.
import engine as eng
import argparse
from onnx import ModelProto
import tensorrt as trt
base_dir = "mods/effnet-l"
# base_dir = "mods/resnet152/"
onnx_path = base_dir+"/bbone.onnx"
engine_name = base_dir+"/bbone.plan"
batch_size = 8
model = ModelProto()
with open(onnx_path, "rb") as f:
model.ParseFromString(f.read())
shape = [batch_size, 256, 256, 3]
engine = eng.build_engine(onnx_path, shape=shape)
eng.save_engine(engine, engine_name)
Here is an inference code for TensorRT.
Everything works properly. The problem is the speed. Basically, if I increase batch size twice it will just increase inference time twice. Thus, it is not changing total inference time.
std::vector<float> EffnetBBone::convert_mat_to_fvec(cv::Mat mat)
{
std::vector<float> array;
if (mat.isContinuous())
{
array.assign((float *)mat.data, (float *)mat.data + mat.total() * mat.channels());
}
else
{
for (int i = 0; i < mat.rows; ++i)
{
array.insert(array.end(), mat.ptr<float>(i), mat.ptr<float>(i) + mat.cols * mat.channels());
}
}
return array;
}
EffnetBBone::EffnetBBone(std::string base_dir, bool half_precision)
{
onnx_net = new Trt();
if (half_precision)
{
onnx_net->EnableFP16();
}
onnx_net->BuildEngine(base_dir + "/bbone.onnx", base_dir + "/bbone.plan")
onnx_net->SetLogLevel((int)Severity::kINTERNAL_ERROR);
}
std::vector<float> EffnetBBone::run_batch(std::vector<cv::Mat> batch_img, bool normalized)
{
cv::Mat crop;
std::vector<float> batch_fvec;
int size = batch_img.size() * (327680 / 4) ;
std::vector<float> output(size);
for (int i = 0; i < batch_img.size(); i++)
{
std::vector<float> fvec;
crop = batch_img[i];
cv::Mat img_f32;
crop.convertTo(img_f32, CV_32F);
if (normalized == false){
img_f32 = img_f32 / 256.f;
}
fvec = convert_mat_to_fvec(img_f32);
batch_fvec.insert(batch_fvec.end(), fvec.begin(), fvec.end());
}
onnx_net->CopyFromHostToDevice(batch_fvec, inputBindIndex);
bool state = onnx_net->Forward();
assert(state == true);
onnx_net->CopyFromDeviceToHost(output, outputBindIndex);
return output;
}
Screenshots
If applicable, add screenshots to help explain your problem.
System environment (please complete the following information):
- Device: GeForce RTX 3090
- OS: Ubuntu 20.04
- Driver version: 470.103.01
- CUDA version: 11.2
- TensorRT version: 8.4.0
- Others:
Metadata
Metadata
Assignees
Labels
No labels