-
Notifications
You must be signed in to change notification settings - Fork 45
Description
I changed ARGS as
ARGS="--model-name /mnt/sdb/llm_models/opt-1.3b-pt
--model-type opt-save
--seed 42
--fp16
--num-layers 12
--max-layers 24
--budget 1024
--num-iters 10
--dist-url tcp://127.0.0.1:9032
--token-micro-batch-size 1
--world-size 2 --pipeline-group-size 2 --data-group-size 1
--pp-mode pipe_sync_sample_mask_token_pipe
--infer-data ${file}"
Also, I changed the memmap path in hf_opt_module_save.py:
module.fp_mlp_query = np.memmap(
f"/lustre/fsw/nvresearch/ldm/diffusion/data/175b_c4/mlp_sp_x_{module.layer_index}.mmap",
dtype="float16",
mode="w+",
shape=(
400000,
config.hidden_size,
),
)
But when I run the script, the bus error comes.
./run_infer_opt_1.3b_collect_sp_data.sh: line 30: 294291 Bus error (core dumped) python dist_inference_runner.py