Skip to content

[BUG] I tried the image rocm/deepspeed:rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1_DeepSpeed_Inference with Llama-2 model ,i got errors #69

@sunpian1

Description

@sunpian1

Free memory : 19.685547 (GigaBytes)
Total memory: 23.984375 (GigaBytes)
Requested memory: 0.312500 (GigaBytes)
Setting maximum total tokens (input + output) to 1024
WorkSpace: 0x7f5cbbe00000
Memory access fault by GPU node-1 (Agent handle: 0x564cd7ba91d0) on address 0x7f5ccfe2c000. Reason: Page not present or supervisor privilege.
[2024-02-19 08:36:43,155] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 3349
[2024-02-19 08:36:43,156] [ERROR] [launch.py:322:sigkill_handler] ['/opt/conda/envs/py_3.9/bin/python', '-u', 'test.py', '--local_rank=0'] exits with return code = -6

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions