Skip to content

Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization ? #69

@nameli0722

Description

@nameli0722

please descript your problem in English if possible. it will to helpful to more people
Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:
1.
2.

Screenshots
If applicable, add screenshots to help explain your problem.

System environment (please complete the following information):

  • Device:
  • OS:
  • Driver version:
  • CUDA version:
  • TensorRT version:
  • Others:

Cmake output

Running output

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions