Skip to content

Conversation

@JewelRoam
Copy link
Collaborator

@JewelRoam JewelRoam commented Jan 15, 2026

PR Category

Feature

Description

Usage:

python3 -m graph_net_bench.torch.eval_backend_diff \
    --model-path str \
    --model-path-list str \
    --reference-config=$(base64 -w 0 <<EOF
{
    "seed": int,
    "compiler": str,
    "device": str,
    "op_lib": str,
    "warmup": int,
    "trials": int,
    "log_prompt": str,
    "model_path_prefix": str,
    "backend_config": dict
}
EOF
) \
    --target-config=$(base64 -w 0 <<EOF
{
    "seed": int,
    "compiler": str,
    "device": str,
    "op_lib": str,
    "warmup": int,
    "trials": int,
    "log_prompt": str,
    "model_path_prefix": str,
    "backend_config": dict
}
EOF
) \

TODO

目前为能跑通的初始版本,有很大优化空间:

  • (done)改进函数及config中参数的语义
  • (done)把编译器的注册改为根据config.compiler动态链接,算子库加载暂时不变
  • (done)将命令行args传递改为JSON config,eval_backend_perf所需的配置直接由调用层来制定,eval_backend_diff只负责传递数据
  • (done)把 args.config 拆分成 args.reference_config 和 args.target_config,分别不加修饰的传递给eval_backend_perf,使得eval_backend_diff感知的信息最少
  • (done)在test_device中替代test_compiler的引用,同时新增了test_device单测
  • 添加output文件传递和rpc支持,取代原有test_device
  • ……待补充

@paddle-bot
Copy link

paddle-bot bot commented Jan 15, 2026

Thanks for your contribution!

test_multi_models(args)


def complete_default_args(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数不要删掉。你改名成check_or_complete_args。
它对阅读代码很有用,告诉读者都有哪些参数。

Copy link
Collaborator Author

@JewelRoam JewelRoam Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前移动到了eval_backend_perf中,改为check_and_complete_args

@JewelRoam JewelRoam changed the title Add eval_backend_perf to further refactor previous test_compiler [Feature] Add eval_backend_perf to further refactor previous test_compiler Jan 16, 2026
@JewelRoam JewelRoam changed the title [Feature] Add eval_backend_perf to further refactor previous test_compiler [Feature] Add eval_backend_perf to replace test_compiler Jan 16, 2026
@JewelRoam JewelRoam changed the title [Feature] Add eval_backend_perf to replace test_compiler [Feature] Add eval_backend_diff & eval_backend_perf to replace test_compiler & test_device Jan 16, 2026
@JewelRoam JewelRoam changed the title [Feature] Add eval_backend_diff & eval_backend_perf to replace test_compiler & test_device [Feature] Add eval_backend_diff & eval_backend_perf to replace test_compiler / test_device Jan 16, 2026
@JewelRoam
Copy link
Collaborator Author

JewelRoam commented Jan 20, 2026

image

在抽象出local_runner,process_runner和remote_runner时,出于单一职责原则,给到eval_backend_perf的和给到runner的config需要分开。即,额外增加给runner的config(例如主机、端口号),其余配置条目runner不感知,直接传递给eval_backend_perf。
于是面临设计决策,需要从下面两种输入格式选择一个——

  1. 按照目前设计进一步拓展,参数如下(需要使用hasattr从其中分离,对于原来未指定runner_type的格式也可以很方便兼容):
python3 -m graph_net_bench.torch.eval_backend_diff \
    --model-path-list $model_list \
    --reference-config $(base64 -w 0 <<EOF
{
    "runner_type": "remote",
    "machine": "$REMOTE_MACHINE",
    "port": $REMOTE_PORT,
    "compiler": "nope",
    "device": "cuda",
    "warmup": 1,
    "trials": 1,
    "model_path_prefix": "$AI4C_ROOT"
}
EOF
) \
    --target-config ……
  1. 另外一种设计的选择如下(命名空间隔离,这样的好处是显式更清晰,不过会导致层级加深):
python3 -m graph_net_bench.torch.eval_backend_diff \
    --model-path-list $model_list \
    --reference-config $(base64 -w 0 <<EOF
{
    "execution": {"compiler": "nope", "device": "cuda", ……},
    "strategy": {"runner_type": "remote", "machine": "192.168.1.100", ……}
}
EOF
) \
    --target-config ……

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants