Skip to content

Potential for SLURMCluster log directory creation failing #692

@tjgalvin

Description

@tjgalvin

Describe the issue:

I am using dask_jobqueue.SLURMCluster in my workflows. I am running ~15 workflows at a time. I am getting unlucky and it seems like two separate workflows are both trying to create the output log directory. One would fail:

  File "/datasets/work/jones-storage/work/miniconda/miniforge3/envs/flint_main/lib/python3.12/site-packages/dask_jobqueue/core.py", line 661, in __init__
    self._dummy_job  # trigger property to ensure that the job is valid
    ^^^^^^^^^^^^^^^
  File "/datasets/work/jones-storage/work/miniconda/miniforge3/envs/flint_main/lib/python3.12/site-packages/dask_jobqueue/core.py", line 690, in _dummy_job
    return self.job_cls(
           ^^^^^^^^^^^^^
  File "/datasets/work/jones-storage/work/miniconda/miniforge3/envs/flint_main/lib/python3.12/site-packages/dask_jobqueue/slurm.py", line 37, in __init__
    super().__init__(
  File "/datasets/work/jones-storage/work/miniconda/miniforge3/envs/flint_main/lib/python3.12/site-packages/dask_jobqueue/core.py", line 375, in __init__
    os.makedirs(self.log_directory)
  File "<frozen os>", line 225, in makedirs
FileExistsError: [Errno 17] File exists: 'flint_logs'

Minimal Complete Verifiable Example:

Given the 'race condition' involved I am not able to give a concise single example. I am guessing that it is a case that the two workflows are both getting past:

if worker_extra_args is not None:

and subsequently only one is successfully able to make the directory.

Would it make sense to use Pathlib.Path here? Something like

if self.log_directory:
    log_path = Path(self.log_directory)
    log_path.mkdir(exists_ok=True, parent=True)

Anything else we need to know?:

Environment:

  • Dask version:
  • Python version: Python 3.12.8
  • Operating System: SLES 15.5
  • Install method (conda, pip, source): pip
  • dask_jobqueue: 0.9.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions