Skip to content

background: CEPH QuotaExceeded #10

@bwalsh

Description

@bwalsh

Ceph RGW QuotaExceeded Analysis for Bucket EllrottLab

Context

The application is receiving an S3-style error:

<Error>
  <Code>QuotaExceeded</Code>
  <BucketName>EllrottLab</BucketName>
  ...
</Error>

The object store backend is Ceph RGW (RADOS Gateway), not AWS S3. In Ceph RGW, QuotaExceeded indicates that a configured quota limit has been reached, not that the physical cluster is necessarily out of raw capacity and not that the system is performing request-rate throttling.

How Ceph RGW Quotas Work

Ceph RGW supports quotas at two primary scopes:

  1. User quota

    • Limits total usage for a given RGW user (across all buckets).
    • Controls:
      • max_size: maximum total bytes used.
      • max_objects: maximum number of objects.
  2. Bucket quota

    • Limits usage for a specific bucket.
    • Controls:
      • max_size: maximum bytes in that bucket.
      • max_objects: maximum number of objects in that bucket.

When either a user quota or a bucket quota is enabled and exceeded, RGW returns:

<Code>QuotaExceeded</Code>

for subsequent write operations (PUT Object, multipart upload, CopyObject, etc.) that would increase usage.

What QuotaExceeded Is and Is Not

It IS:

  • A signal that a configured quota has been reached:
    • Bucket size (max_size for the bucket), or
    • Bucket object count (max_objects for the bucket), or
    • User total size (max_size across all buckets), or
    • User total object count (max_objects across all buckets).

It is NOT:

  • AWS S3 throttling (SlowDown, ServiceUnavailable, etc.).
  • Direct evidence of cluster-wide space exhaustion (though that can co-exist).
  • A transient, retryable rate-limit error. It will persist until the quota is adjusted or usage is reduced.

Diagnostic Procedure

The goal is to determine:

  1. Whether the quota violation is at the bucket level or the user level.
  2. Which parameter (size vs object count) is responsible.

Step 1: Inspect bucket stats and bucket quota

  • Use radosgw-admin bucket stats to view current bucket usage and owner:

    radosgw-admin bucket stats --bucket=EllrottLab | jq .

    This shows:

    • Current size and object count.
    • The owner field, which is the RGW user that owns the bucket.
  • Check bucket-level quota configuration:

    radosgw-admin quota get --quota-scope=bucket --bucket=EllrottLab

    Here you will see:

    • Whether the bucket quota is enabled.
    • Any values set for max_size and max_objects.

Step 2: Determine bucket owner and inspect user quota

  • Extract the bucket owner:

    OWNER=$(radosgw-admin bucket stats --bucket=EllrottLab | jq -r '.owner')
    echo "Bucket owner (RGW user): $OWNER"
  • Inspect the user’s quota and usage:

    radosgw-admin user info --uid="$OWNER" | jq '.user_quota, .bucket_quota'
    radosgw-admin quota get --quota-scope=user --uid="$OWNER"

    This shows:

    • Whether user quotas are enabled.
    • max_size / max_objects for the user.
    • Aggregate usage across all buckets for this user.

Step 3: Decide which quota is limiting

Compare:

  • Bucket usage vs bucket quota (max_size, max_objects).
  • User usage vs user quota.

The first scope where usage meets or exceeds the configured limit is what is generating QuotaExceeded.

Examples:

  • If EllrottLab bucket is ~1 TB and bucket max_size is 1T, and bucket quota is enabled → bucket quota is limiting.
  • If the user’s aggregate usage across multiple buckets hits max_size while the bucket quota is generous or disabled → user quota is limiting.

Remediation Options

Depending on policy, you can:

  1. Increase the bucket quota (if bucket quota is the limiting factor):

    radosgw-admin quota set --quota-scope=bucket --bucket=EllrottLab --max-size=10T --max-objects=-1
    radosgw-admin quota enable --quota-scope=bucket --bucket=EllrottLab
    • --max-size=10T: set maximum bucket size to 10 TB (adjust to your environment).
    • --max-objects=-1: disable object-count limit.
  2. Increase the user quota (if user quota is the limiting factor):

    radosgw-admin quota set --quota-scope=user --uid="$OWNER" --max-size=50T --max-objects=-1
    radosgw-admin quota enable --quota-scope=user --uid="$OWNER"
    • --max-size=50T: increase total allowed size across all buckets for this user.
  3. Reduce usage (if policy restricts quotas):

    • Delete unneeded objects from the bucket.
    • Apply lifecycle policies (e.g., expiration on old data) and wait for them to take effect.

Post-Change Verification

After adjusting quotas, verify that:

  • The bucket and user quotas are now high enough compared to current usage:

    radosgw-admin bucket stats --bucket=EllrottLab | jq .
    radosgw-admin quota get --quota-scope=bucket --bucket=EllrottLab
    radosgw-admin quota get --quota-scope=user --uid="$OWNER"
  • New uploads no longer return QuotaExceeded.

If QuotaExceeded persists despite apparently generous quotas, investigate:

  • Whether quotas are being applied at the correct scope (e.g., updating the wrong user).
  • Any higher-level pool or namespace constraints (e.g., underlying Ceph pool near full).

Summary

For Ceph RGW:

  • QuotaExceeded is a logical limit error, driven by the RGW quota subsystem, not a generic storage or transaction-rate failure.
  • The typical resolution path is:
    1. Identify whether bucket or user quotas are limiting.
    2. Adjust max_size and/or max_objects in line with storage policy.
    3. Re-verify usage and resume uploads.

This should be treated as an intentional guardrail, and quota changes should follow your organization’s storage governance and approval processes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions