Skip to content

w2v_bert2.0 做encoder训练报错 #37

@liushenme

Description

@liushenme

w2v_bert2.0 做encoder训练300个step后会报数组越界的错,是因为训练音频长度问题吗?

【2025-07-31 17:12:53】[2025-07-31 17:12:53,090] [INFO] [logging.py:107:log_dist] [Rank 0] step=100, skipped=0, lr=[6.666666666666668e-05], mom=[(0.9, 0.999)]
【2025-07-31 17:12:53】[2025-07-31 17:12:53,092] [INFO] [timer.py:264:stop] epoch=0/micro_step=200/global_step=100, RunningAvgSamplesPerSec=59.76838046856318, CurrSamplesPerSec=58.883498797084, MemAllocated=16.48GB, MaxMemAllocated=21.82GB
【2025-07-31 17:12:53】INFO:root:Training Epoch: 1/2, step 200 lr 6.666666666666668e-05 completed (loss: 7.358315467834473, acc: 0.17391304671764374)
【2025-07-31 17:13:47】INFO:root:Training Epoch: 1/2, step 300 lr 7.253637530185604e-05 completed (loss: 9.75600528717041, acc: 0.06666667014360428)
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [9,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [10,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [11,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [12,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [13,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [14,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [15,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [16,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [17,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [18,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [19,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [20,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [21,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [22,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [23,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [24,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [26,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [27,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [28,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [29,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
......
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1553: indexSelectLargeIndex: block: [34,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
【2025-07-31 17:14:39】[rank2]:[E731 17:14:39.353937124 ProcessGroupNCCL.cpp:1899] [PG ID 1 PG GUID 1 Rank 2] Process group watchdog thread terminated with exception: CUDA error: device-side assert triggered
【2025-07-31 17:14:39】Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions