-
Notifications
You must be signed in to change notification settings - Fork 52
Description
Checkpoint provides 80 classes but module expects 2. Reinitializing denoising class embed.
Training for 3000 steps...
Logging every 100 steps.
Validating every 1000 steps.
Saving checkpoints every 1000 steps.
Train Step 1/3000 | Train Loss: 30.2364
Train Step 100/3000 | Train Loss: 30.6223
Train Step 200/3000 | Train Loss: 24.7357
Train Step 300/3000 | Train Loss: 26.9152
Train Step 400/3000 | Train Loss: 30.5777
Traceback (most recent call last):
File "/opt/conda/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1251, in _try_get_data
data = self._data_queue.get(timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/multiprocessing/queues.py", line 113, in get
if not self._poll(timeout):
^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/multiprocessing/connection.py", line 257, in poll
return self._poll(timeout)
^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/multiprocessing/connection.py", line 440, in _poll
r = wait([self], timeout)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/multiprocessing/connection.py", line 948, in wait
ready = selector.select(timeout)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/selectors.py", line 415, in select
fd_event_list = self._selector.poll(timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/utils/data/_utils/signal_handling.py", line 73, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 1055) is killed by signal: Killed.