Skip to content

Conversation

@sergenyalcin
Copy link
Member

@sergenyalcin sergenyalcin commented Jan 14, 2026

Description of your changes

This PR adds storage version migration support to upjet, enabling automatic migration of CRD resources from old storage versions to new storage versions when CRD schemas are updated.

When a CRD's storage version changes (e.g., from v1beta1 to v1beta2), existing resources stored in etcd remain in the old version until they are explicitly migrated. This can cause issues during upgrades, especially when introducing breaking changes.

The storage version migrator automates this migration process by:

  1. Listing all resources of a CRD
  2. Applying an empty patch to each resource (triggering the API server to convert and rewrite them in the new storage version)
  3. Updating the CRD's status.storedVersions to reflect only the new storage version

Core Implementation (pkg/config/crds_migrator.go)

  • CRDsMigrator: Handles storage version migration for a list of GroupVersionKinds
  • Run(): Executes the migration process with pagination support (500 resources per batch)
  • GetCRDNameFromGVK(): Resolves CRD names from GroupVersionKind using discovery client
  • PrepareCRDsMigrator(): Scans provider resources and creates a migrator for those with previous versions

Provider Integration (pkg/config/provider.go)

  • Added StorageVersionMigrator field to the Provider struct
  • Added WithStorageVersionMigrator() provider option

Example Usage

In provider configuration (config/provider.go):

func GetProvider(ctx context.Context, sdkProvider *schema.Provider) (*ujconfig.Provider, error) {
    pc := ujconfig.NewProvider(...)

    // Configure resources
    pc.ConfigureResources()

    // Prepare storage version migrator for resources with previous versions
    ujconfig.PrepareCRDsMigrator(pc)

    return pc, nil
}

In provider main (cmd/provider/main.go):

// After setting up controllers, run storage version migration
discoveryClient, err := discovery.NewDiscoveryClientForConfig(cfg)
if err != nil {
	logr.Info("Failed to create discovery client, skipping the storage version migration", "err", err)
} else {
	// Create a non-cached client for the migration since the manager cache hasn't started yet
	directClient, err := client.New(cfg, client.Options{Scheme: mgr.GetScheme()})
	if err != nil {
		logr.Info("Failed to create direct client for storage version migration", "err", err)
	} else {
		if err = clusterProvider.StorageVersionMigrator.Run(ctx, logr, discoveryClient, directClient); err != nil {
			logr.Info("Failed to run storage version migrator", "err", err)
		}
	}
}
kingpin.FatalIfError(mgr.Start(ctrl.SetupSignalHandler()), "Cannot start controller manager")

I have:

  • Read and followed Upjet's contribution process.
  • Run make reviewable to ensure this PR is ready for review.
  • Added backport release-x.y labels to auto-backport this PR if necessary.

How has this code been tested

Tested in this branch: https://github.com/sergenyalcin/provider-upjet-azuread/tree/sv-migrator. Validated the migration.

Summary by CodeRabbit

  • New Features

    • Automatic CRD storage-version migration with batched conversions, configurable retry/backoff, permission checks, and verification.
    • New CLI to run CRD storage-version updates from mappings (flag or file) with per-CRD reporting and aggregated results.
    • Provider integration to plug in the storage-version migrator.
  • Chores

    • Updated Kubernetes dependency declarations.
  • Documentation

    • Added an example CRD mapping file for the CLI.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Sergen Yalçın <yalcinsergen97@gmail.com>
@coderabbitai
Copy link

coderabbitai bot commented Jan 14, 2026

📝 Walkthrough

Walkthrough

Adds a CRD storage-version migration facility (CRDsMigrator) with discovery-driven migration logic, integrates it into Provider, adds a standalone CLI (crds-migrator) to run updates from mappings or files, and promotes k8s.io/apiextensions-apiserver to a direct go.mod requirement.

Changes

Cohort / File(s) Summary
Dependency Management
go.mod
Promoted k8s.io/apiextensions-apiserver v0.33.0 from indirect to a direct require in the primary require block (+1/-1).
CRD Storage-Version Migration
pkg/config/crds_migrator.go
Added CRDsMigrator, options API (WithRetryBackoff), constructor, Run(ctx, logr, discoveryClient, kube) orchestration, and helpers: GetCRDNameFromGVK, UpdateCRDStorageVersion, CheckCRDStatusUpdatePermission, PrepareCRDsMigrator. Implements discovery, stored-version comparison, batched listing, patch-triggered conversion, retries/backoff, and status updates (+275/-0). Review attention: retry/backoff logic, RESTMapper resolution, permission checks, and status patch semantics.
Provider Integration
pkg/config/provider.go
Added StorageVersionMigrator *CRDsMigrator to Provider and WithStorageVersionMigrator(migrator *CRDsMigrator) option (+12/-0).
CLI: crds-migrator
cmd/crdsmigrator/main.go, cmd/crdsmigrator/example.yaml
New CLI to apply CRD storage-version updates via --crd-names or --crd-file, builds kube config, registers schemes, creates client, configures retry backoff, optionally checks permissions, calls UpdateCRDStorageVersion, and reports per-CRD results (+229/+9). Review attention: input parsing/validation, kubeconfig resolution, permission-skip behavior, and aggregated error reporting.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI
    participant Migrator as CRDsMigrator
    participant Discovery as RESTMapper/Discovery
    participant API as Kubernetes API

    CLI->>Migrator: Start with CRD mapping & backoff config
    Migrator->>Discovery: Discover API resources / build RESTMapper
    loop per GVK
        Migrator->>Discovery: Resolve CRD name from GVK
        Discovery-->>Migrator: CRD name
        Migrator->>API: Get CRD object
        API-->>Migrator: CRD
        Migrator->>Migrator: Determine storageVersion & needMigration
        alt storedVersions differ
            Migrator->>API: List resources of storage version (batched)
            API-->>Migrator: Resource batch
            loop each resource
                Migrator->>API: Patch resource (noop) to trigger conversion (with retry/backoff)
                API-->>Migrator: Resource patched
            end
            Migrator->>API: Patch CRD.Status.StoredVersions (with retry/backoff)
            API-->>Migrator: Status patched
        end
    end
    Migrator-->>CLI: Report successes/failures
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hop through CRD fields and names,

Patch by patch I nudge their frames,
Discovery lights the path to roam,
Retries hum soft to guide them home,
🥕 New versions wake and take their place.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.54% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add storage version migration support' directly and clearly summarizes the main change—introducing a new CRD storage version migration facility across multiple files (crds_migrator.go, provider.go, and a CLI tool).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
pkg/config/crds_migrator.go (3)

11-11: Inconsistent error package usage.

This file uses github.com/crossplane/crossplane-runtime/v2/pkg/errors while provider.go in the same package uses github.com/pkg/errors. Per coding guidelines, prefer github.com/pkg/errors for consistency.

♻️ Suggested change
-	"github.com/crossplane/crossplane-runtime/v2/pkg/errors"
+	"github.com/pkg/errors"

96-116: Consider adding progress logging for large migrations.

For CRDs with thousands of resources, the batch patching loop could run for a long time without feedback. Consider adding periodic progress logging (e.g., after each batch).

♻️ Optional enhancement
+		batchCount := 0
 		for {
 			if err := kube.List(ctx, &resources,
 				client.Limit(500),
 				client.Continue(continueToken),
 			); err != nil {
 				return errors.Wrapf(err, "cannot list %s", resources.GroupVersionKind().String())
 			}

 			for i := range resources.Items {
 				// apply empty patch for storage version upgrade
 				res := resources.Items[i]
 				if err := kube.Patch(ctx, &res, client.RawPatch(types.MergePatchType, []byte(`{}`))); err != nil {
 					return errors.Wrapf(err, "cannot patch %s %q", crd.Spec.Names.Kind, res.GetName())
 				}
 			}
+			batchCount++
+			logr.Debug("Processed batch", "crd", crdName, "batch", batchCount, "resourcesInBatch", len(resources.Items))

 			continueToken = resources.GetContinue()
 			if continueToken == "" {
 				break
 			}
 		}

141-154: Consider wrapping errors with context.

Per coding guidelines, wrap errors with context using patterns like errors.Wrap(err, "context"). The bare error returns on lines 144 and 150 lose context about what operation failed.

♻️ Suggested fix
 func GetCRDNameFromGVK(discoveryClient discovery.DiscoveryInterface, gvk schema.GroupVersionKind) (string, error) {
 	groupResources, err := restmapper.GetAPIGroupResources(discoveryClient)
 	if err != nil {
-		return "", err
+		return "", errors.Wrap(err, "cannot get API group resources")
 	}

 	mapper := restmapper.NewDiscoveryRESTMapper(groupResources)
 	mapping, err := mapper.RESTMapping(gvk.GroupKind(), gvk.Version)
 	if err != nil {
-		return "", err
+		return "", errors.Wrapf(err, "cannot get REST mapping for %s", gvk.String())
 	}

 	return mapping.Resource.Resource + "." + mapping.Resource.Group, nil
 }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 914f0cb and 3258b35.

📒 Files selected for processing (3)
  • go.mod
  • pkg/config/crds_migrator.go
  • pkg/config/provider.go
🧰 Additional context used
📓 Path-based instructions (3)
**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.go: Do not use any type throughout codebase - use concrete types or type parameters instead
Use pointer types for optional fields in generated structs
Avoid type aliases in favor of explicit types
Use github.com/pkg/errors for error wrapping with context
Return errors from functions instead of panicking, except for impossible states
Wrap errors with context using patterns like: errors.Wrap(err, "cannot configure resource")
Avoid circular dependencies between packages

Files:

  • pkg/config/crds_migrator.go
  • pkg/config/provider.go
pkg/**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

Public API packages should be organized under pkg/ directory

Files:

  • pkg/config/crds_migrator.go
  • pkg/config/provider.go
go.mod

📄 CodeRabbit inference engine (CLAUDE.md)

go.mod: Module path must be github.com/crossplane/upjet/v2
When testing in providers, add replace github.com/crossplane/upjet/v2 => ../upjet to provider's go.mod
Run make modules.check to verify go.mod/go.sum are tidy before committing

Files:

  • go.mod
🧬 Code graph analysis (2)
pkg/config/crds_migrator.go (1)
pkg/config/provider.go (1)
  • Provider (73-186)
pkg/config/provider.go (1)
pkg/config/crds_migrator.go (1)
  • CRDsMigrator (28-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: lint
  • GitHub Check: unit-tests
  • GitHub Check: check-diff
🔇 Additional comments (7)
go.mod (1)

39-39: LGTM!

The promotion of k8s.io/apiextensions-apiserver to a direct dependency is appropriate since the new CRDsMigrator directly uses extv1.CustomResourceDefinition. The version aligns correctly with other k8s.io dependencies at v0.33.0.

pkg/config/provider.go (2)

182-186: LGTM!

The StorageVersionMigrator field follows the established pattern for optional Provider fields with pointer type and clear documentation.


315-321: LGTM!

The WithStorageVersionMigrator option follows the consistent pattern of other ProviderOption functions in this file.

pkg/config/crds_migrator.go (4)

27-39: LGTM!

The struct and constructor are clean and follow Go conventions with an unexported field and exported constructor.


118-134: LGTM!

The status update with verification is good defensive programming. Using MergeFrom for the status patch is the correct approach.


156-171: LGTM!

The function correctly identifies resources requiring migration based on PreviousVersions and constructs GVKs following upjet conventions.


1-1: License header year appears incorrect.

The copyright year is 2026, but current date is January 2026 and this is new code. Typically, copyright years reflect when the code was written. If this was written in 2025 or earlier during development, consider using the appropriate year.

⛔ Skipped due to learnings
Learnt from: CR
Repo: crossplane/upjet PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-13T21:29:32.267Z
Learning: Use Kubernetes-native patterns via crossplane-runtime
Learnt from: CR
Repo: crossplane/upjet PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-13T21:29:32.266Z
Learning: Applies to go.mod : Module path must be `github.com/crossplane/upjet/v2`

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

func PrepareCRDsMigrator(pc *Provider) {
var gvkList []schema.GroupVersionKind
for _, r := range pc.Resources {
if len(r.PreviousVersions) != 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: just for heads-up, not because there is an immediate issue. This is also not actually specific to the changes in this PR.

I checked the usage of r.PreviousVersions in general. We solely rely on the provider (therefore the developer) to correctly configure it. There are no validations/cross-check around this configuration (especially with r.Version). e.g:

  • it is possible to forget configuring this: r.Version = "v1beta2" and r.PreviousVersions = [] (this might be still a valid scenario though after v1beta1 is actually removed)
  • no check for Version and PreviousVersions are disjoint sets. r.Version = "v1beta2" and r.PreviousVersions = ["v1beta1", "v1beta2"] possible
  • no check that PreviousVersions are behind the current Version r.Version = "v1beta2" and r.PreviousVersions = ["v1beta3", "v1beta4"]

Just noting these since it is getting a new feature around it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also considered this, and I believe that having a complete version list, etc., as a new feature, may make sense. And this must be stored automatically without manual configuration.

origCrd := crd.DeepCopy()

crd.Status.StoredVersions = []string{storageVersion}
if err := kube.Status().Patch(ctx, &crd, client.MergeFrom(origCrd)); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately, provider might need an extra RBAC here for patching CRDs, which is not ideal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. I manually provided these roles to the provider ClusterRole and tested. Therefore, if we don't take this step, I believe it will be updated at some point. However, we can't validate the migration.

From an RBAC manager's perspective, granting this permission to all providers may not be a sensible approach. This is because many providers are unwilling to perform this operation. An option is to proceed without validating this part, i.e., patching the CRDs. This sounds better than granting this right to all providers.

Signed-off-by: Sergen Yalçın <yalcinsergen97@gmail.com>
Signed-off-by: Sergen Yalçın <yalcinsergen97@gmail.com>
@sergenyalcin
Copy link
Member Author

After the latest changes, the implementation consists of two main components:

1. Library: CRDsMigrator (pkg/config/crds_migrator.go)

Automatic Migration During Provider Startup:

  • The provider automatically migrates custom resources (CRs) to the new storage version during startup
  • Resources are patched with an empty patch to trigger conversion by the API server
  • Includes configurable retry logic with exponential backoff for resilience

Default Behavior & RBAC Limitation:

  • CR Migration: Successfully patches all CRs to trigger storage version conversion
  • CRD Status Update: By default, providers lack permission to update CRD status in the Crossplane ecosystem

The CRD status update is the final step that marks the old storage version as removed. Without this, while all resources are successfully migrated to the new storage version, the CRD status will still list old versions in status.storedVersions.

2. CLI Tool: crdsmigrator (cmd/crdsmigrator/)

A standalone tool for cluster administrators to complete the CRD status update when providers don't have sufficient permissions.

# Using comma-separated list
crds-migrator update \
  --crd-names "buckets.s3.aws.upbound.io:v1beta2,users.iam.aws.upbound.io:v1beta1"

# Using YAML file
crds-migrator update --crd-file crds-mapping.yaml

YAML format:

buckets.s3.aws.upbound.io: v1beta2
users.iam.aws.upbound.io: v1beta1

Provider Release Integration

For Provider Maintainers:

When releasing a provider version that includes storage version changes, include this information in release notes:

  1. List of CRDs requiring migration with their target storage versions
  2. Two migration options for users:

Option 1: Grant CRD Status Permissions to Provider (Recommended for automated environments)

  • Add this ClusterRole to your provider deployment
  • Then bind it to your provider's ServiceAccount. The provider will automatically complete the full migration on startup.

Option 2: Manual Migration Using CLI Tool (Recommended for controlled environments)

Download the crdsmigrator tool and run it with cluster admin credentials:

# Example for provider-aws v2.5.0
crds-migrator update --crd-file migration-v2.5.0.yaml

Provide the migration-v2.5.0.yaml file in the release assets with the CRD mappings.

Signed-off-by: Sergen Yalçın <yalcinsergen97@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@cmd/crdsmigrator/main.go`:
- Around line 89-95: Validate CLI flag values before constructing retryBackoff:
ensure updateRetries (used for wait.Backoff.Steps) is >0, updateDuration is >0,
updateFactor is >0, and updateJitter is within [0,1]; if any check fails return
a descriptive error (or exit) instead of building wait.Backoff. Perform these
checks right before creating retryBackoff and reference the symbols
retryBackoff, wait.Backoff, updateRetries, updateDuration, updateFactor, and
updateJitter so the code path short-circuits on invalid values and avoids
passing a bad Backoff to wait.ExponentialBackoff.
♻️ Duplicate comments (1)
pkg/config/crds_migrator.go (1)

87-101: Consider skipping when the REST mapper can’t resolve a GVK.

If RESTMapping returns a no‑match error (e.g., CRD not installed), the current behavior aborts the entire migration run. Skipping that GVK is usually safer.

k8s.io/apimachinery/pkg/api/meta NoMatchError handling with RESTMapper.RESTMapping
🧹 Nitpick comments (1)
cmd/crdsmigrator/example.yaml (1)

1-5: Prefer YAML mapping for the example file.

JSON is valid YAML, but the .yaml extension suggests a YAML mapping; switching improves readability for users copying this example.

♻️ Proposed update
-{
-  "locations.conditionalaccess.azuread.upbound.io":"v1beta2",
-  "invitations.invitations.azuread.upbound.io":"v1beta2",
-  "principals.serviceprincipals.azuread.upbound.io":"v1beta2"
-}
+locations.conditionalaccess.azuread.upbound.io: v1beta2
+invitations.invitations.azuread.upbound.io: v1beta2
+principals.serviceprincipals.azuread.upbound.io: v1beta2

Signed-off-by: Sergen Yalçın <yalcinsergen97@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@pkg/config/crds_migrator.go`:
- Around line 147-155: The patch loop should treat NotFound as a benign case so
deleted resources don't fail migration: inside the retry.OnError callback where
kube.Patch(...) is called (affecting res, types.MergePatchType), capture the
patch error and if apierrors.IsNotFound(err) return nil to stop retrying and
continue; otherwise return the error to preserve retry behavior. Also add the
k8s.io/apimachinery/pkg/api/errors import (commonly aliased apierrors). Keep the
existing wrapping logic (errors.Wrapf) for non-NotFound failures.
- Around line 228-243: The SSAR creation currently returns an error on any
failure which aborts the run; change the error handling around kube.Create(ctx,
ssar) in crds_migrator.go so that when the creation fails with
permission-related errors (use apierrors.IsForbidden, apierrors.IsUnauthorized,
or apierrors.IsNotFound) you treat it as "permission denied" and return (false,
nil) instead of an error; for all other errors continue to wrap and return the
error as before. Ensure you reference the existing ssar variable and the
kube.Create(ctx, ssar) call when making the change.

Signed-off-by: Sergen Yalçın <yalcinsergen97@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@pkg/config/crds_migrator.go`:
- Around line 117-119: The Debug log call in the CRD migration loop uses
logr.Debug with a single value (crdName) instead of key/value pairs; update the
call (the one referencing needMigration and crdName) to pass a descriptive key
and the crdName as its value (e.g., "crd" or "crdName") so it conforms to
crossplane-runtime logging.Logger expectations and preserves the existing
control flow (keep the continue).

Signed-off-by: Sergen Yalçın <yalcinsergen97@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants