-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
What are you trying to do?
I am a new user to SubstraFL and am currently going through the example at https://docs.substra.org/en/stable/examples/substrafl/get_started/run_mnist_torch.html.
Issue Description (what is happening?)
The notebook failed at the following cell with an OSError.
from substrafl.experiment import execute_experiment
import logging
import substrafl
substrafl.set_logging_level(loglevel=logging.ERROR)
# A round is defined by a local training step followed by an aggregation operation
NUM_ROUNDS = 3
compute_plan = execute_experiment(
client=clients[ALGO_ORG_ID],
strategy=strategy,
train_data_nodes=train_data_nodes,
evaluation_strategy=my_eval_strategy,
aggregation_node=aggregation_node,
num_rounds=NUM_ROUNDS,
experiment_folder=str(pathlib.Path.cwd() / "tmp" / "experiment_summaries"),
dependencies=dependencies,
clean_models=False,
name="MNIST documentation example",
)
Expected Behavior (what should happen?)
Expected to not have the error when running the tutorial.
Reproducible Example
No response
Operating system
Ubuntu 20.04
Python version
3.11.9
Installed Substra versions
substra==0.53.0
substrafl==0.46.0
substratools==0.21.4Installed versions of dependencies
# packages in environment at /mnt/batch/tasks/shared/LS_root/mounts/clusters/hpang8/code/Users/hpang/conda_envs/substrafl_env:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
annotated-types 0.7.0 pypi_0 pypi
anyio 4.2.0 py311h06a4308_0
argon2-cffi 21.3.0 pyhd3eb1b0_0
argon2-cffi-bindings 21.2.0 py311h5eee18b_0
asttokens 2.0.5 pyhd3eb1b0_0
async-lru 2.0.4 py311h06a4308_0
attrs 23.1.0 py311h06a4308_0
babel 2.11.0 py311h06a4308_0
beautifulsoup4 4.12.3 py311h06a4308_0
bleach 4.1.0 pyhd3eb1b0_0
brotli-python 1.0.9 py311h6a678d5_8
build 1.2.1 pypi_0 pypi
bzip2 1.0.8 h5eee18b_6
ca-certificates 2024.7.2 h06a4308_0
certifi 2024.7.4 py311h06a4308_0
cffi 1.16.0 py311h5eee18b_1
charset-normalizer 3.3.2 pyhd3eb1b0_0
click 8.1.7 pypi_0 pypi
cloudpickle 3.0.0 pypi_0 pypi
cmake 3.30.1 pypi_0 pypi
comm 0.2.1 py311h06a4308_0
contourpy 1.2.1 pypi_0 pypi
cycler 0.12.1 pypi_0 pypi
debugpy 1.6.7 py311h6a678d5_0
decorator 5.1.1 pyhd3eb1b0_0
defusedxml 0.7.1 pyhd3eb1b0_0
docker 7.1.0 pypi_0 pypi
executing 0.8.3 pyhd3eb1b0_0
expat 2.6.2 h6a678d5_0
filelock 3.15.4 pypi_0 pypi
fonttools 4.53.1 pypi_0 pypi
idna 3.7 py311h06a4308_0
ipykernel 6.28.0 py311h06a4308_0
ipython 8.25.0 py311h06a4308_0
jedi 0.19.1 py311h06a4308_0
jinja2 3.1.4 py311h06a4308_0
joblib 1.4.2 pypi_0 pypi
json5 0.9.6 pyhd3eb1b0_0
jsonschema 4.19.2 py311h06a4308_0
jsonschema-specifications 2023.7.1 py311h06a4308_0
jupyter-lsp 2.2.0 py311h06a4308_0
jupyter_client 8.6.0 py311h06a4308_0
jupyter_core 5.7.2 py311h06a4308_0
jupyter_events 0.10.0 py311h06a4308_0
jupyter_server 2.14.1 py311h06a4308_0
jupyter_server_terminals 0.4.4 py311h06a4308_1
jupyterlab 4.0.11 py311h06a4308_0
jupyterlab_pygments 0.1.2 py_0
jupyterlab_server 2.25.1 py311h06a4308_0
kiwisolver 1.4.5 pypi_0 pypi
ld_impl_linux-64 2.38 h1181459_1
libffi 3.4.4 h6a678d5_1
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libsodium 1.0.18 h7b6447c_0
libstdcxx-ng 11.2.0 h1234567_1
libuuid 1.41.5 h5eee18b_0
lit 18.1.8 pypi_0 pypi
markupsafe 2.1.5 pypi_0 pypi
matplotlib 3.6.3 pypi_0 pypi
matplotlib-inline 0.1.6 py311h06a4308_0
mistune 2.0.4 py311h06a4308_0
mpmath 1.3.0 pypi_0 pypi
nbclient 0.8.0 py311h06a4308_0
nbconvert 7.10.0 py311h06a4308_0
nbformat 5.9.2 py311h06a4308_0
ncurses 6.4 h6a678d5_0
nest-asyncio 1.6.0 py311h06a4308_0
networkx 3.3 pypi_0 pypi
notebook 7.0.8 py311h06a4308_2
notebook-shim 0.2.3 py311h06a4308_0
numpy 1.24.3 pypi_0 pypi
nvidia-cublas-cu11 11.10.3.66 pypi_0 pypi
nvidia-cuda-cupti-cu11 11.7.101 pypi_0 pypi
nvidia-cuda-nvrtc-cu11 11.7.99 pypi_0 pypi
nvidia-cuda-runtime-cu11 11.7.99 pypi_0 pypi
nvidia-cudnn-cu11 8.5.0.96 pypi_0 pypi
nvidia-cufft-cu11 10.9.0.58 pypi_0 pypi
nvidia-curand-cu11 10.2.10.91 pypi_0 pypi
nvidia-cusolver-cu11 11.4.0.1 pypi_0 pypi
nvidia-cusparse-cu11 11.7.4.91 pypi_0 pypi
nvidia-nccl-cu11 2.14.3 pypi_0 pypi
nvidia-nvtx-cu11 11.7.91 pypi_0 pypi
openssl 3.0.14 h5eee18b_0
overrides 7.4.0 py311h06a4308_0
packaging 24.1 py311h06a4308_0
pandas 1.5.3 pypi_0 pypi
pandocfilters 1.5.0 pyhd3eb1b0_0
parso 0.8.3 pyhd3eb1b0_0
pexpect 4.8.0 pyhd3eb1b0_3
pillow 10.4.0 pypi_0 pypi
pip 24.0 py311h06a4308_0
pip-tools 7.4.1 pypi_0 pypi
platformdirs 3.10.0 py311h06a4308_0
prometheus_client 0.14.1 py311h06a4308_0
prompt-toolkit 3.0.43 py311h06a4308_0
prompt_toolkit 3.0.43 hd3eb1b0_0
psutil 5.9.0 py311h5eee18b_0
ptyprocess 0.7.0 pyhd3eb1b0_2
pure_eval 0.2.2 pyhd3eb1b0_0
pycparser 2.21 pyhd3eb1b0_0
pydantic 2.8.2 pypi_0 pypi
pydantic-core 2.20.1 pypi_0 pypi
pygments 2.15.1 py311h06a4308_1
pyparsing 3.1.2 pypi_0 pypi
pyproject-hooks 1.1.0 pypi_0 pypi
pysocks 1.7.1 py311h06a4308_0
python 3.11.9 h955ad1f_0
python-dateutil 2.9.0post0 py311h06a4308_2
python-fastjsonschema 2.16.2 py311h06a4308_0
python-json-logger 2.0.7 py311h06a4308_0
python-slugify 8.0.4 pypi_0 pypi
pytz 2024.1 py311h06a4308_0
pyyaml 6.0.1 py311h5eee18b_0
pyzmq 25.1.2 py311h6a678d5_0
readline 8.2 h5eee18b_0
referencing 0.30.2 py311h06a4308_0
requests 2.31.0 pypi_0 pypi
rfc3339-validator 0.1.4 py311h06a4308_0
rfc3986-validator 0.1.1 py311h06a4308_0
rpds-py 0.10.6 py311hb02cf49_0
scikit-learn 1.3.1 pypi_0 pypi
scipy 1.14.0 pypi_0 pypi
send2trash 1.8.2 py311h06a4308_0
setuptools 69.5.1 py311h06a4308_0
six 1.16.0 pyhd3eb1b0_1
sniffio 1.3.0 py311h06a4308_0
soupsieve 2.5 py311h06a4308_0
sqlite 3.45.3 h5eee18b_0
stack_data 0.2.0 pyhd3eb1b0_0
substra 0.53.0 pypi_0 pypi
substrafl 0.46.0 pypi_0 pypi
substratools 0.21.4 pypi_0 pypi
sympy 1.13.1 pypi_0 pypi
terminado 0.17.1 py311h06a4308_0
text-unidecode 1.3 pypi_0 pypi
threadpoolctl 3.5.0 pypi_0 pypi
tinycss2 1.2.1 py311h06a4308_0
tk 8.6.14 h39e8969_0
torch 2.0.1 pypi_0 pypi
torchvision 0.15.2 pypi_0 pypi
tornado 6.4.1 py311h5eee18b_0
tqdm 4.66.4 pypi_0 pypi
traitlets 5.14.3 py311h06a4308_0
triton 2.0.0 pypi_0 pypi
typing-extensions 4.12.2 pypi_0 pypi
typing_extensions 4.11.0 py311h06a4308_0
tzdata 2024a h04d1e81_0
urllib3 2.2.2 py311h06a4308_0
wcwidth 0.2.5 pyhd3eb1b0_0
webencodings 0.5.1 py311h06a4308_1
websocket-client 1.8.0 py311h06a4308_0
wheel 0.43.0 py311h06a4308_0
xz 5.4.6 h5eee18b_1
yaml 0.2.5 h7b6447c_0
zeromq 4.3.5 h6a678d5_0
zlib 1.2.13 h5eee18b_1
Logs / Stacktrace
Rounds progress: 100%|██████████| 3/3 [00:00<00:00, 1050.24it/s]
Compute plan progress: 10%|▉ | 2/21 [02:35<24:34, 77.61s/it]
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
Cell In[14], line 9
6 # A round is defined by a local training step followed by an aggregation operation
7 NUM_ROUNDS = 3
----> 9 compute_plan = execute_experiment(
10 client=clients[ALGO_ORG_ID],
11 strategy=strategy,
12 train_data_nodes=train_data_nodes,
13 evaluation_strategy=my_eval_strategy,
14 aggregation_node=aggregation_node,
15 num_rounds=NUM_ROUNDS,
16 experiment_folder=str(pathlib.Path.cwd() / "tmp" / "experiment_summaries"),
17 dependencies=dependencies,
18 clean_models=False,
19 name="MNIST documentation example",
20 )
File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substrafl/experiment.py:498, in execute_experiment(client, strategy, train_data_nodes, experiment_folder, num_rounds, aggregation_node, evaluation_strategy, dependencies, clean_models, name, additional_metadata, task_submission_batch_size)
485 # save the experiment summary in experiment_folder
486 _save_experiment_summary(
487 experiment_folder=experiment_folder,
488 compute_plan_key=compute_plan_key,
(...)
496 additional_metadata=additional_metadata,
497 )
--> 498 compute_plan = client.add_compute_plan(
499 substra.sdk.schemas.ComputePlanSpec(
500 key=compute_plan_key,
501 tasks=tasks,
502 name=name or timestamp,
503 metadata=cp_metadata,
504 ),
505 auto_batching=True,
506 batch_size=task_submission_batch_size,
507 )
508 logger.info(("The compute plan has been registered to Substra, its key is {0}.").format(compute_plan.key))
509 return compute_plan
File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/client.py:48, in logit.<locals>.wrapper(*args, **kwargs)
46 error = None
47 try:
---> 48 return f(*args, **kwargs)
49 except Exception as e:
50 error = e.__class__.__name__
File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/client.py:548, in Client.add_compute_plan(self, data, auto_batching, batch_size)
542 if not is_valid_uuid(spec.key):
543 raise exceptions.ComputePlanKeyFormatError(
544 "The compute plan key has to respect the UUID format. You can use the uuid library to generate it. \
545 Example: compute_plan_key=str(uuid.uuid4())"
546 )
--> 548 return self._backend.add(spec, spec_options=spec_options)
File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/backends/local/backend.py:487, in Local.add(self, spec, spec_options, key)
485 else:
486 if spec.__class__.type_ == schemas.Type.ComputePlan:
--> 487 compute_plan = add_asset(spec, spec_options)
488 return compute_plan
489 else:
File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/backends/local/backend.py:406, in Local._add_compute_plan(self, spec, spec_options)
403 compute_plan = self._db.add(compute_plan)
405 # go through the tasks sorted by rank
--> 406 compute_plan = self.__execute_compute_plan(spec, compute_plan, visited, tasks, spec_options)
407 return compute_plan
File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/backends/local/backend.py:269, in Local.__execute_compute_plan(self, spec, compute_plan, visited, tasks, spec_options)
266 if not task_spec:
267 continue
--> 269 self.add(
270 key=task_spec.key,
271 spec=task_spec,
272 spec_options=spec_options,
273 )
275 progress_bar.update()
277 return compute_plan
File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/backends/local/backend.py:491, in Local.add(self, spec, spec_options, key)
489 else:
490 key = key or spec.compute_key()
--> 491 add_asset(key, spec, spec_options)
492 return key
File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/backends/local/backend.py:437, in Local._add_task(self, key, spec, spec_options)
420 task = models.Task(
421 key=key,
422 creation_date=self.__now(),
(...)
433 metadata=spec.metadata if spec.metadata else dict(),
434 )
436 task = self._db.add(task)
--> 437 self._worker.schedule_task(task)
438 return task
File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/backends/local/compute/worker.py:313, in Worker.schedule_task(self, task)
310 elif asset_type == schemas.Type.Dataset:
311 dataset = self._db.get_with_files(schemas.Type.Dataset, task_input.asset_key)
312 cmd_line_inputs.append(
--> 313 self._prepare_dataset_input(
314 dataset=dataset,
315 task_input=task_input,
316 input_volume=volumes[VOLUME_INPUTS],
317 multiple=multiple,
318 )
319 )
320 addable_asset = dataset
322 if addable_asset:
File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/backends/local/compute/worker.py:161, in Worker._prepare_dataset_input(self, dataset, task_input, input_volume, multiple)
157 def _prepare_dataset_input(
158 self, dataset: models.Dataset, task_input: models.InputRef, input_volume: str, multiple: bool
159 ):
160 path_to_opener = input_volume / Filenames.OPENER.value
--> 161 Path(dataset.opener.storage_address).link_to(path_to_opener)
162 return TaskResource(
163 id=task_input.identifier,
164 value=f"{TPL_VOLUME_INPUTS}/{Filenames.OPENER.value}",
165 multiple=multiple,
166 )
File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/pathlib.py:1226, in Path.link_to(self, target)
1211 """
1212 Make the target path a hard link pointing to this path.
1213
(...)
1220 Use `hardlink_to()` instead.
1221 """
1222 warnings.warn("pathlib.Path.link_to() is deprecated and is scheduled "
1223 "for removal in Python 3.12. "
1224 "Use pathlib.Path.hardlink_to() instead.",
1225 DeprecationWarning, stacklevel=2)
-> 1226 self.__class__(target).hardlink_to(self)
File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/pathlib.py:1208, in Path.hardlink_to(self, target)
1206 if not hasattr(os, "link"):
1207 raise NotImplementedError("os.link() not available on this system")
-> 1208 os.link(target, self)
OSError: [Errno 95] Operation not supported: '/mnt/batch/tasks/shared/LS_root/mounts/clusters/hpang8/code/Users/hpang/Projects/Federated_learning/substrafl/local-worker/yumnknd_/61c0f7fa-5228-4804-9d24-8beac24bfbc2/mnist_opener.py' -> '/mnt/batch/tasks/shared/LS_root/mounts/clusters/hpang8/code/Users/hpang/Projects/Federated_learning/substrafl/local-worker/d18aa0b7-4aaf-4a4d-9e87-ebead4d168f9/inputs/opener.py'
Metadata
Metadata
Assignees
Labels
No labels