Skip to content

Conversation

@kuzaxak
Copy link

@kuzaxak kuzaxak commented Dec 2, 2025

Mirror repositories store metadata and zipballs on the local filesystem, which prevents running Packeton in stateless container deployments. When using S3 for regular package storage (STORAGE_SOURCE=s3), the /data directory still accumulates mirror data that cannot be shared across replicas.

This change extends the existing Flysystem-based S3 storage abstraction to mirror repositories. Both metadata (packages.json, provider includes, package JSON files) and distribution archives are now stored in S3 when configured.

For zipball storage, hash-based path sharding is implemented to optimize S3 performance. S3 partitions data by key prefix, so distributing files across prefixes like ab/cd/... prevents hot spots when mirroring large repositories like packagist.org.

Local caching is optional and configurable via MIRROR_METADATA_CACHE_DIR and MIRROR_DIST_CACHE_DIR environment variables. When empty, all operations go directly to S3 for fully stateless operation. When set to /tmp paths, ephemeral caching improves read performance while maintaining statelessness across container restarts.

The streaming fallback in MirrorController handles cases where local cache writes fail, ensuring zipball downloads still work by streaming directly from S3.

Closes issue #304

Mirror repositories store metadata and zipballs on the local filesystem,
which prevents running Packeton in stateless container deployments. When
using S3 for regular package storage (`STORAGE_SOURCE=s3`), the `/data`
directory still accumulates mirror data that cannot be shared across
replicas.

This change extends the existing Flysystem-based S3 storage abstraction
to mirror repositories. Both metadata (packages.json, provider includes,
package JSON files) and distribution archives are now stored in S3 when
configured.

For zipball storage, hash-based path sharding is implemented to optimize
S3 performance. S3 partitions data by key prefix, so distributing files
across prefixes like `ab/cd/...` prevents hot spots when mirroring large
repositories like packagist.org.

Local caching is optional and configurable via `MIRROR_METADATA_CACHE_DIR`
and `MIRROR_DIST_CACHE_DIR` environment variables. When empty, all
operations go directly to S3 for fully stateless operation. When set to
`/tmp` paths, ephemeral caching improves read performance while
maintaining statelessness across container restarts.

The streaming fallback in `MirrorController` handles cases where local
cache writes fail, ensuring zipball downloads still work by streaming
directly from S3.
@kuzaxak kuzaxak requested a review from vtsykun as a code owner December 2, 2025 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant