Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization ?

please descript your problem in **English** if possible. it will to helpful to more people
**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior:
1. 
2. 

**Screenshots**
If applicable, add screenshots to help explain your problem.

**System environment (please complete the following information):**
 - Device:
 - OS:
 - Driver version:
 - CUDA version:
 - TensorRT version:
 - Others:

**Cmake output**

**Running output**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization ? #69

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization ? #69

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions