-
-
Notifications
You must be signed in to change notification settings - Fork 13
Move extra dev commits into main #218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,5 @@ | ||
| docker/mysql/data | ||
| .DS_Store | ||
|
|
||
| # Byte-compiled / optimized / DLL files | ||
| __pycache__/ | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,13 +1,33 @@ | ||||||||||||||||||||||||||
| services: | ||||||||||||||||||||||||||
| database: | ||||||||||||||||||||||||||
| image: "openml/test-database" | ||||||||||||||||||||||||||
| profiles: ["python", "php", "all"] | ||||||||||||||||||||||||||
| image: "openml/test-database:20240105" | ||||||||||||||||||||||||||
| container_name: "openml-test-database" | ||||||||||||||||||||||||||
| environment: | ||||||||||||||||||||||||||
| MYSQL_ROOT_PASSWORD: ok | ||||||||||||||||||||||||||
| ports: | ||||||||||||||||||||||||||
| - "3306:3306" | ||||||||||||||||||||||||||
| healthcheck: | ||||||||||||||||||||||||||
| test: ["CMD", "mysqladmin" ,"ping", "-h", "localhost"] | ||||||||||||||||||||||||||
| start_period: 30s | ||||||||||||||||||||||||||
| start_interval: 1s | ||||||||||||||||||||||||||
| timeout: 3s | ||||||||||||||||||||||||||
| interval: 5s | ||||||||||||||||||||||||||
| retries: 10 | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| database-setup: | ||||||||||||||||||||||||||
| profiles: ["python", "php", "all"] | ||||||||||||||||||||||||||
| image: mysql | ||||||||||||||||||||||||||
| container_name: "openml-test-database-setup" | ||||||||||||||||||||||||||
| volumes: | ||||||||||||||||||||||||||
| - ./docker/database/update.sh:/database-update.sh | ||||||||||||||||||||||||||
| command: /bin/sh -c "/database-update.sh" | ||||||||||||||||||||||||||
| depends_on: | ||||||||||||||||||||||||||
| database: | ||||||||||||||||||||||||||
| condition: service_healthy | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| docs: | ||||||||||||||||||||||||||
| profiles: ["all"] | ||||||||||||||||||||||||||
| build: | ||||||||||||||||||||||||||
| context: . | ||||||||||||||||||||||||||
| dockerfile: docker/docs/Dockerfile | ||||||||||||||||||||||||||
|
|
@@ -16,8 +36,35 @@ services: | |||||||||||||||||||||||||
| volumes: | ||||||||||||||||||||||||||
| - .:/docs | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| elasticsearch: | ||||||||||||||||||||||||||
| profiles: ["php", "all"] | ||||||||||||||||||||||||||
| image: docker.elastic.co/elasticsearch/elasticsearch:6.8.23 | ||||||||||||||||||||||||||
| container_name: "openml-elasticsearch" | ||||||||||||||||||||||||||
| platform: "linux/amd64" | ||||||||||||||||||||||||||
| ports: | ||||||||||||||||||||||||||
| - "9200:9200" # also known as /es (nginx) | ||||||||||||||||||||||||||
| - "9300:9300" | ||||||||||||||||||||||||||
| env_file: docker/elasticsearch/.env | ||||||||||||||||||||||||||
| healthcheck: | ||||||||||||||||||||||||||
| test: curl 127.0.0.1:9200/_cluster/health | grep -e "green" | ||||||||||||||||||||||||||
| start_period: 30s | ||||||||||||||||||||||||||
| start_interval: 5s | ||||||||||||||||||||||||||
| timeout: 3s | ||||||||||||||||||||||||||
| interval: 10s | ||||||||||||||||||||||||||
|
Comment on lines
+51
to
+53
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. issue (bug_risk): Same As with the database service,
Comment on lines
+48
to
+53
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Elasticsearch healthcheck may be too strict for single-node clusters. The healthcheck only accepts Proposed fix to accept both green and yellow healthcheck:
- test: curl 127.0.0.1:9200/_cluster/health | grep -e "green"
+ test: curl -s 127.0.0.1:9200/_cluster/health | grep -E '"status":"(green|yellow)"'
start_period: 30s📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||||
| deploy: | ||||||||||||||||||||||||||
| resources: | ||||||||||||||||||||||||||
| limits: | ||||||||||||||||||||||||||
| cpus: '1' | ||||||||||||||||||||||||||
| memory: 1G | ||||||||||||||||||||||||||
| reservations: | ||||||||||||||||||||||||||
| cpus: '0.2' | ||||||||||||||||||||||||||
| memory: 250M | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| php-api: | ||||||||||||||||||||||||||
| image: "openml/php-rest-api" | ||||||||||||||||||||||||||
| profiles: ["php", "all"] | ||||||||||||||||||||||||||
| image: "openml/php-rest-api:v1.2.2" | ||||||||||||||||||||||||||
| container_name: "openml-php-rest-api" | ||||||||||||||||||||||||||
| env_file: docker/php/.env | ||||||||||||||||||||||||||
| ports: | ||||||||||||||||||||||||||
| - "8002:80" | ||||||||||||||||||||||||||
| depends_on: | ||||||||||||||||||||||||||
|
|
@@ -33,7 +80,8 @@ services: | |||||||||||||||||||||||||
| interval: 1m | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| python-api: | ||||||||||||||||||||||||||
| container_name: "python-api" | ||||||||||||||||||||||||||
| profiles: ["python", "all"] | ||||||||||||||||||||||||||
| container_name: "openml-python-rest-api" | ||||||||||||||||||||||||||
| build: | ||||||||||||||||||||||||||
| context: . | ||||||||||||||||||||||||||
| dockerfile: docker/python/Dockerfile | ||||||||||||||||||||||||||
|
|
@@ -43,20 +91,3 @@ services: | |||||||||||||||||||||||||
| - .:/python-api | ||||||||||||||||||||||||||
| depends_on: | ||||||||||||||||||||||||||
| - database | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| elasticsearch: | ||||||||||||||||||||||||||
| image: docker.elastic.co/elasticsearch/elasticsearch:6.8.23 | ||||||||||||||||||||||||||
| container_name: "elasticsearch" | ||||||||||||||||||||||||||
| ports: | ||||||||||||||||||||||||||
| - "9200:9200" | ||||||||||||||||||||||||||
| - "9300:9300" | ||||||||||||||||||||||||||
| environment: | ||||||||||||||||||||||||||
| - ELASTIC_PASSWORD=default | ||||||||||||||||||||||||||
| - discovery.type=single-node | ||||||||||||||||||||||||||
| - xpack.security.enabled=false | ||||||||||||||||||||||||||
| healthcheck: | ||||||||||||||||||||||||||
| test: curl 127.0.0.1:9200/_cluster/health | grep -e "green" | ||||||||||||||||||||||||||
| start_period: 30s | ||||||||||||||||||||||||||
| start_interval: 5s | ||||||||||||||||||||||||||
| timeout: 3s | ||||||||||||||||||||||||||
| interval: 1m | ||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,31 @@ | ||||||
| #/bin/bash | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix the shebang syntax. The shebang is missing the Proposed fix-#/bin/bash
+#!/bin/bash📝 Committable suggestion
Suggested change
🧰 Tools🪛 Shellcheck (0.11.0)[error] 1-1: Use #!, not just #, for the shebang. (SC1113) 🤖 Prompt for AI Agents |
||||||
| # Change the filepath of openml.file | ||||||
| # from "https://www.openml.org/data/download/1666876/phpFsFYVN" | ||||||
| # to "http://minio:9000/datasets/0000/0001/phpFsFYVN" | ||||||
| mysql -hdatabase -uroot -pok -e 'UPDATE openml.file SET filepath = CONCAT("http://minio:9000/datasets/0000/", LPAD(id, 4, "0"), "/", SUBSTRING_INDEX(filepath, "/", -1)) WHERE extension="arff";' | ||||||
|
|
||||||
| # Update openml.expdb.dataset with the same url | ||||||
| mysql -hdatabase -uroot -pok -e 'UPDATE openml_expdb.dataset DS, openml.file FL SET DS.url = FL.filepath WHERE DS.did = FL.id;' | ||||||
|
Comment on lines
+5
to
+8
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. issue (bug_risk): The MinIO dataset URL pattern here is inconsistent with the application’s routing logic and will fail for larger dataset IDs. In |
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
| # Create the data_feature_description TABLE. TODO: can we make sure this table exists already? | ||||||
| mysql -hdatabase -uroot -pok -Dopenml_expdb -e 'CREATE TABLE IF NOT EXISTS `data_feature_description` ( | ||||||
| `did` int unsigned NOT NULL, | ||||||
| `index` int unsigned NOT NULL, | ||||||
| `uploader` mediumint unsigned NOT NULL, | ||||||
| `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, | ||||||
| `description_type` enum("plain", "ontology") NOT NULL, | ||||||
| `value` varchar(256) NOT NULL, | ||||||
| KEY `did` (`did`,`index`), | ||||||
| CONSTRAINT `data_feature_description_ibfk_1` FOREIGN KEY (`did`, `index`) REFERENCES `data_feature` (`did`, `index`) ON DELETE CASCADE ON UPDATE CASCADE | ||||||
| )' | ||||||
|
|
||||||
| # SET dataset 1 to active (used in unittests java) | ||||||
| mysql -hdatabase -uroot -pok -Dopenml_expdb -e 'INSERT IGNORE INTO dataset_status VALUES (1, "active", "2024-01-01 00:00:00", 1)' | ||||||
| mysql -hdatabase -uroot -pok -Dopenml_expdb -e 'DELETE FROM dataset_status WHERE did = 2 AND status = "deactivated";' | ||||||
|
|
||||||
| # Temporary fix in case the database missed the kaggle table. The PHP Rest API expects the table to be there, while indexing. | ||||||
| mysql -hdatabase -uroot -pok -Dopenml_expdb -e 'CREATE TABLE IF NOT EXISTS `kaggle` (`dataset_id` int(11) DEFAULT NULL, `kaggle_link` varchar(500) DEFAULT NULL)' | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| ELASTIC_PASSWORD=default | ||
| discovery.type=single-node | ||
| xpack.security.enabled=false |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| API_KEY=AD000000000000000000000000000000 | ||
| BASE_URL=http://php-api:80/ | ||
| MINIO_URL=http://minio:9000/ | ||
| DB_HOST_OPENML=database:3306 | ||
| DB_HOST_EXPDB=database:3306 | ||
| DB_USER_OPENML=root | ||
| DB_PASS_OPENML=ok | ||
| DB_USER_EXPDB_WRITE=root | ||
| DB_PASS_EXPDB_WRITE=ok | ||
| DB_USER_EXPDB_READ=root | ||
| DB_PASS_EXPDB_READ=ok | ||
| ES_URL=elasticsearch:9200 | ||
| ES_PASSWORD=default | ||
| INDEX_ES_DURING_STARTUP=false | ||
|
Comment on lines
+1
to
+14
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Check if .env files are properly ignored
echo "=== Checking .gitignore for .env patterns ==="
if [ -f .gitignore ]; then
grep -n "\.env" .gitignore || echo "No .env patterns found in .gitignore"
else
echo ".gitignore not found"
fi
echo -e "\n=== Checking if this .env file is tracked by git ==="
git ls-files docker/php/.env && echo "WARNING: .env file is tracked!" || echo "OK: .env file is not tracked"
echo -e "\n=== Searching for .env.example or similar template files ==="
find . -name ".env*" -type f | grep -v ".git" | head -20
echo -e "\n=== Checking git status of docker/php/.env ==="
git status docker/php/.env 2>/dev/null || echo "File not found or not in git"Repository: openml/server-api Length of output: 438 🏁 Script executed: # Check the .gitignore content around line 126 to see the pattern
echo "=== .gitignore content around line 126 ==="
sed -n '120,130p' .gitignore
echo -e "\n=== Full .env patterns in .gitignore ==="
grep -n "env" .gitignore
echo -e "\n=== Check if docker directory has any .gitignore ==="
cat docker/.gitignore 2>/dev/null || echo "No docker/.gitignore"
echo -e "\n=== Check if docker/php has any .gitignore ==="
cat docker/php/.gitignore 2>/dev/null || echo "No docker/php/.gitignore"Repository: openml/server-api Length of output: 745 Remove docker/php/.env from version control and create a template file. This file contains sensitive credentials (API keys, database passwords) and is currently tracked by git despite having Actions needed:
🧰 Tools🪛 dotenv-linter (4.0.0)[warning] 4-4: [UnorderedKey] The DB_HOST_OPENML key should go before the MINIO_URL key (UnorderedKey) [warning] 5-5: [UnorderedKey] The DB_HOST_EXPDB key should go before the DB_HOST_OPENML key (UnorderedKey) [warning] 6-6: [UnorderedKey] The DB_USER_OPENML key should go before the MINIO_URL key (UnorderedKey) [warning] 7-7: [UnorderedKey] The DB_PASS_OPENML key should go before the DB_USER_OPENML key (UnorderedKey) [warning] 8-8: [UnorderedKey] The DB_USER_EXPDB_WRITE key should go before the DB_USER_OPENML key (UnorderedKey) [warning] 9-9: [UnorderedKey] The DB_PASS_EXPDB_WRITE key should go before the DB_PASS_OPENML key (UnorderedKey) [warning] 10-10: [UnorderedKey] The DB_USER_EXPDB_READ key should go before the DB_USER_EXPDB_WRITE key (UnorderedKey) [warning] 11-11: [UnorderedKey] The DB_PASS_EXPDB_READ key should go before the DB_PASS_EXPDB_WRITE key (UnorderedKey) [warning] 12-12: [UnorderedKey] The ES_URL key should go before the MINIO_URL key (UnorderedKey) [warning] 13-13: [UnorderedKey] The ES_PASSWORD key should go before the ES_URL key (UnorderedKey) [warning] 14-14: [UnorderedKey] The INDEX_ES_DURING_STARTUP key should go before the MINIO_URL key (UnorderedKey) 🤖 Prompt for AI Agents |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,3 +15,7 @@ database="openml_expdb" | |
|
|
||
| [databases.openml] | ||
| database="openml" | ||
|
|
||
| [routing] | ||
| minio_url="http://minio:9000/" | ||
| server_url="http://php-api:80/" | ||
|
Comment on lines
+19
to
+21
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Find Python files that might load configuration
find . -name "*.py" -type f | head -20
# Search for toml loading patterns
rg -n "toml|config" --type=py -i | head -30
# Check the src/config.toml file
if [ -f "src/config.toml" ]; then
cat -n src/config.toml
fi
# Search for environment variable patterns in Python files
rg -n "environ|getenv|os.env" --type=py | head -20Repository: openml/server-api Length of output: 4101 Routing configuration lacks environment variable override support. The routing URLs use Docker service names ( 🤖 Prompt for AI Agents |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -49,14 +49,8 @@ def get_flow(flow_id: int, expdb: Annotated[Connection, Depends(expdb_connection | |
| ] | ||
|
|
||
| tags = database.flows.get_tags(flow_id, expdb) | ||
| flow_rows = database.flows.get_subflows(for_flow=flow_id, expdb=expdb) | ||
| subflows = [ | ||
| { | ||
| "identifier": flow.identifier, | ||
| "flow": get_flow(flow_id=flow.child_id, expdb=expdb), | ||
| } | ||
| for flow in flow_rows | ||
| ] | ||
| flow_rows = database.flows.get_subflows(flow_id, expdb) | ||
| subflows = [get_flow(flow_id=flow.child_id, expdb=expdb) for flow in flow_rows] | ||
|
Comment on lines
+52
to
+53
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Potential infinite recursion with circular subflow references. The recursive call to Proposed fix with cycle detection @router.get("/{flow_id}")
-def get_flow(flow_id: int, expdb: Annotated[Connection, Depends(expdb_connection)] = None) -> Flow:
+def get_flow(
+ flow_id: int,
+ expdb: Annotated[Connection, Depends(expdb_connection)] = None,
+ _visited: set[int] | None = None,
+) -> Flow:
+ if _visited is None:
+ _visited = set()
+ if flow_id in _visited:
+ raise HTTPException(status_code=HTTPStatus.INTERNAL_SERVER_ERROR, detail="Circular subflow reference detected")
+ _visited.add(flow_id)
+
flow = database.flows.get(flow_id, expdb)
...
flow_rows = database.flows.get_subflows(flow_id, expdb)
- subflows = [get_flow(flow_id=flow.child_id, expdb=expdb) for flow in flow_rows]
+ subflows = [get_flow(flow_id=flow.child_id, expdb=expdb, _visited=_visited.copy()) for flow in flow_rows]
🤖 Prompt for AI Agents |
||
|
|
||
| return Flow( | ||
| id_=flow.id, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): The
start_intervalkey is not a valid Docker healthcheck option and will be ignored.Docker healthchecks only support
test,interval,timeout,retries, andstart_period;start_intervalis ignored. To control check frequency, useinterval(and optionallystart_period) instead, and removestart_intervalhere and in the Elasticsearch service or replace it with a supported option.