Skip to content

[BUG] Silent bug in Eval Agent #408

@ParagEkbote

Description

@ParagEkbote

Describe the bug

There exists in a silent bug in eval agent when we try to do eval of models. It might not be triggered for every script, but it can be quite annoying to deal. WDYT?

What I did

Test script:

import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from pruna import smash, SmashConfig
from pruna.data.pruna_datamodule import PrunaDataModule
from pruna.evaluation.evaluation_agent import EvaluationAgent
from pruna.evaluation.task import Task
from pruna.evaluation.metrics import (
    TotalTimeMetric,
    LatencyMetric,
    ThroughputMetric,
    TotalParamsMetric,
    TotalMACsMetric,
)

os.environ["TOKENIZERS_PARALLELISM"] = "false"

device = "cuda" if torch.cuda.is_available() else "cpu"

# Load model
model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM3-3B").to(device)
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B")

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Configure quantization
smash_config = SmashConfig(device=device)
smash_config["quantizer"] = "hqq"
smash_config["hqq_weight_bits"] = 4
smash_config["hqq_compute_dtype"] = "torch.bfloat16"
smash_config["compiler"] = "torch_compile"
smash_config["torch_compile_fullgraph"] = True
smash_config["torch_compile_dynamic"] = True
smash_config["torch_compile_mode"] = "max-autotune"

# Smash model
smashed_model = smash(model, smash_config)

# Setup evaluation
datamodule = PrunaDataModule.from_string(
    dataset_name="WikiText",
    tokenizer=tokenizer,
    collate_fn_args={"max_seq_len": 512},
    dataloader_args={"batch_size": 8, "num_workers": 0},
)
datamodule.limit_datasets(5)

# Create metrics and evaluate
metrics = [
    TotalTimeMetric(),
    LatencyMetric(),
    ThroughputMetric(),
    TotalParamsMetric(),
    TotalMACsMetric(),
]

task = Task(metrics, datamodule=datamodule)
eval_agent = EvaluationAgent(task)

# Run evaluation - bug appears on script exit after this
results = eval_agent.evaluate(smashed_model)

print(f"Evaluation complete: {len(results)} metrics")
# Bug appears here when Python exits

Traceback:

WARNING - Argument cache_dir not found in config file. Skipping...
INFO - Could not load HQQ model using pipeline, trying generic HQQ pipeline...
INFO - Using best available device: 'cuda'
WARNING - Argument cache_dir not found in config file. Skipping...
100%|████████████████████████████████| 111/111 [00:00<00:00, 40618.37it/s]
  0%|                                        | 0/253 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/pruna/evaluation/evaluation_agent.py", line 109, in evaluate
    results.extend(self.compute_stateless_metrics(model, stateless_metrics))
  File "/pruna/evaluation/evaluation_agent.py", line 276, in compute_stateless_metrics
    results.append(metric.compute(model, self.task.dataloader))
  File "/pruna/evaluation/metrics/metric_memory.py", line 386, in compute
    return self.metric.compute(model, dataloader)
  File "/pruna/evaluation/metrics/metric_memory.py", line 154, in compute
    metric_model = self._load_and_prepare_model(str(save_path), model_cls)
  File "/pruna/evaluation/metrics/metric_memory.py", line 327, in _load_and_prepare_model
    model = model_cls.from_pretrained(model_path)
  File "/pruna/telemetry/metrics.py", line 218, in wrapper
    result = func(*args, **kwargs)
  File "/pruna/engine/pruna_model.py", line 367, in from_pretrained
    model, smash_config = load_pruna_model(model_source, **kwargs)
  File "/pruna/engine/load.py", line 75, in load_pruna_model
    model = LOAD_FUNCTIONS[smash_config.load_fns[0]](model_path, smash_config, **kwargs)
  File "/pruna/engine/load.py", line 568, in __call__
    return self.value(*args, **kwargs)
  File "/pruna/engine/load.py", line 398, in load_hqq
    quantized_model = algorithm_packages["AutoHQQHFModel"].from_quantized(...)
  [... HQQ loading details ...]
NotImplementedError: Cannot copy out of meta tensor; no data!

Exception ignored in: <function SmashConfig.__del__ at 0x715001cdff40>
Traceback (most recent call last):
  File "/pruna/config/smash_config.py", line 122, in __del__
  File "/pruna/config/smash_config.py", line 141, in cleanup_cache_dir
  File "/python3.10/pathlib.py", line 1290, in exists
TypeError: 'NoneType' object is not callable

Expected behavior

Model Evaluation should be completed without error.

Environment

  • pruna version: 0.2.10
  • python version: 3.11
  • Operating System: 5.15.0-1084-aws-x86_64-with-glibc2.31

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions