Skip to content

Optimize MLDCAT-AP endpoint performance #213

@lucifer4330k

Description

@lucifer4330k

Description

The MLDCAT-AP router is explicitly marked as "incredibly inefficient" and meant only as a proof of concept.

Location

src/routers/mldcat_ap/dataset.py:1-4

"""Router for MLDCAT-AP endpoints: https://semiceu.github.io/MLDCAT-AP/releases/1.0.0/#examples

Incredibly inefficient, but it's just a proof of concept.
Specific queries could be written to fetch e.g., a single feature or quality.
"""

Impact

  • Poor performance when fetching MLDCAT-AP formatted datasets
  • Multiple unnecessary database queries
  • Not suitable for production use

Current Issues

  • Fetches entire datasets, features, and qualities even when only partial data is needed
  • No query optimization for specific feature or quality lookups
  • Repeated database calls for related entities

Suggested Implementation

  1. Write specific SQL queries for targeted data retrieval
  2. Use JOINs to reduce number of database round-trips
  3. Implement pagination and filtering at the database level
  4. Add caching for frequently accessed metadata
  5. Profile and benchmark performance improvements

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    wontfixThis will not be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions