-
-
Notifications
You must be signed in to change notification settings - Fork 149
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the issue:
I am using dask_jobqueue.SLURMCluster in my workflows. I am running ~15 workflows at a time. I am getting unlucky and it seems like two separate workflows are both trying to create the output log directory. One would fail:
File "/datasets/work/jones-storage/work/miniconda/miniforge3/envs/flint_main/lib/python3.12/site-packages/dask_jobqueue/core.py", line 661, in __init__
self._dummy_job # trigger property to ensure that the job is valid
^^^^^^^^^^^^^^^
File "/datasets/work/jones-storage/work/miniconda/miniforge3/envs/flint_main/lib/python3.12/site-packages/dask_jobqueue/core.py", line 690, in _dummy_job
return self.job_cls(
^^^^^^^^^^^^^
File "/datasets/work/jones-storage/work/miniconda/miniforge3/envs/flint_main/lib/python3.12/site-packages/dask_jobqueue/slurm.py", line 37, in __init__
super().__init__(
File "/datasets/work/jones-storage/work/miniconda/miniforge3/envs/flint_main/lib/python3.12/site-packages/dask_jobqueue/core.py", line 375, in __init__
os.makedirs(self.log_directory)
File "<frozen os>", line 225, in makedirs
FileExistsError: [Errno 17] File exists: 'flint_logs'
Minimal Complete Verifiable Example:
Given the 'race condition' involved I am not able to give a concise single example. I am guessing that it is a case that the two workflows are both getting past:
dask-jobqueue/dask_jobqueue/core.py
Line 379 in d562e6c
| if worker_extra_args is not None: |
and subsequently only one is successfully able to make the directory.
Would it make sense to use Pathlib.Path here? Something like
if self.log_directory:
log_path = Path(self.log_directory)
log_path.mkdir(exists_ok=True, parent=True)
Anything else we need to know?:
Environment:
- Dask version:
- Python version: Python 3.12.8
- Operating System: SLES 15.5
- Install method (conda, pip, source): pip
- dask_jobqueue: 0.9.0
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working