-
Notifications
You must be signed in to change notification settings - Fork 6
Closed
Labels
CITo do with continuous integration.To do with continuous integration.bugSomething isn't workingSomething isn't workinghpcflowIssue with the upstream hpcflow packageIssue with the upstream hpcflow package
Description
Github applies rate limiting to its API to stop egregious scraping. Fine, except that it's causing spontaneous failures. For example (and pretty typical), from the log of #502 (copied here so it persists):
================================== FAILURES ===================================
________________________ test_load_case_from_npz_file _________________________
@pytest.mark.xfail(
condition=sys.platform == "darwin",
raises=requests.exceptions.HTTPError,
reason=(
"GHA MacOS runners use the same IP address, so we get rate limited when "
"retrieving demo data from GitHub."
),
)
def test_load_case_from_npz_file():
> npz_file_path = mf.get_demo_data_file_path("load_cases.npz")
matflow\tests\param_classes\test_load.py:40:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.venv\lib\site-packages\hpcflow\sdk\app.py:4530: in get_demo_data_file_path
return self.get_data_file_path("data", file_key=file_name)
.venv\lib\site-packages\hpcflow\sdk\app.py:4505: in get_data_file_path
cached_path = self.cache_file(data_type, file_key, manifest)
.venv\lib\site-packages\hpcflow\sdk\app.py:4374: in cache_file
fs, url_path = rate_limit_safe_url_to_fs(self, url, logger=self.logger)
.venv\lib\site-packages\hpcflow\sdk\app.py:364: in rate_limit_safe_url_to_fs
return _inner(*args, **kwargs)
.venv\lib\site-packages\decorator.py:235: in fun
return caller(func, *(extras + args), **kw)
.venv\lib\site-packages\reretry\api.py:161: in retry_decorator
return retry_call(
.venv\lib\site-packages\reretry\api.py:218: in retry_call
return func(
.venv\lib\site-packages\reretry\api.py:31: in __retry_internal
return f()
.venv\lib\site-packages\hpcflow\sdk\app.py:362: in _inner
return url_to_fs(*args, **kwargs)
.venv\lib\site-packages\fsspec\core.py:415: in url_to_fs
fs = filesystem(protocol, **inkwargs)
.venv\lib\site-packages\fsspec\registry.py:322: in filesystem
return cls(**storage_options)
.venv\lib\site-packages\fsspec\spec.py:84: in __call__
obj = super().__call__(*args, **kwargs, **strip_tokenize_options)
.venv\lib\site-packages\fsspec\implementations\github.py:66: in __init__
self.ls("")
.venv\lib\site-packages\fsspec\implementations\github.py:175: in ls
r.raise_for_status()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <Response [403]>
def raise_for_status(self):
"""Raises :class:`HTTPError`, if one occurred."""
http_error_msg = ""
if isinstance(self.reason, bytes):
# We attempt to decode utf-8 first because some servers
# choose to localize their reason strings. If the string
# isn't utf-8, we fall back to iso-8859-1 for all other
# encodings. (See PR #3538)
try:
reason = self.reason.decode("utf-8")
except UnicodeDecodeError:
reason = self.reason.decode("iso-8859-1")
else:
reason = self.reason
if 400 <= self.status_code < 500:
http_error_msg = (
f"{self.status_code} Client Error: {reason} for url: {self.url}"
)
elif 500 <= self.status_code < 600:
http_error_msg = (
f"{self.status_code} Server Error: {reason} for url: {self.url}"
)
if http_error_msg:
> raise HTTPError(http_error_msg, response=self)
E requests.exceptions.HTTPError: 403 Client Error: rate limit exceeded for url: https://api.github.com/repos/hpcflow/matflow-data/git/trees/main
.venv\lib\site-packages\requests\models.py:1026: HTTPError
---------------------------- Captured stderr call -----------------------------
WARNING matflow: 403 Client Error: rate limit exceeded for url: https://api.github.com/repos/hpcflow/matflow-data/git/trees/main, attempt 1/3 failed - retrying in 5 seconds...
WARNING matflow: 403 Client Error: rate limit exceeded for url: https://api.github.com/repos/hpcflow/matflow-data/git/trees/main, attempt 2/3 failed - retrying in 26.5256876096892 seconds...
WARNING matflow: 403 Client Error: rate limit exceeded for url: https://api.github.com/repos/hpcflow/matflow-data/git/trees/main, attempt 3/3 failed - giving up!
------------------------------ Captured log call ------------------------------
WARNING matflow:api.py:91 403 Client Error: rate limit exceeded for url: https://api.github.com/repos/hpcflow/matflow-data/git/trees/main, attempt 1/3 failed - retrying in 5 seconds...
WARNING matflow:api.py:91 403 Client Error: rate limit exceeded for url: https://api.github.com/repos/hpcflow/matflow-data/git/trees/main, attempt 2/3 failed - retrying in 26.5256876096892 seconds...
WARNING matflow:api.py:100 403 Client Error: rate limit exceeded for url: https://api.github.com/repos/hpcflow/matflow-data/git/trees/main, attempt 3/3 failed - giving up!
=========================== short test summary info ===========================
FAILED matflow/tests/param_classes/test_load.py::test_load_case_from_npz_file - requests.exceptions.HTTPError: 403 Client Error: rate limit exceeded for url: https://api.github.com/repos/hpcflow/matflow-data/git/trees/main
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!
============= 1 failed, 8 passed, 2 skipped, 1 xfailed in 34.07s ==============
This is not a failure of the code under test, but rather of an auxiliary service. A retry will usually work, but it's very annoying to have to do it, especially when it's a transient auxiliary problem.
Can we do something about caching the data? All matflow needs to do is to have a consistent local caching mechanism (possibly optionally enabled only) and we can use actions/cache in the workflow to persist the cache.
Metadata
Metadata
Assignees
Labels
CITo do with continuous integration.To do with continuous integration.bugSomething isn't workingSomething isn't workinghpcflowIssue with the upstream hpcflow packageIssue with the upstream hpcflow package