Skip to content

ontap-nas-economy: cannot discover existing flexvols after MetroCluster failover #1082

@clementnuss

Description

@clementnuss

Describe the bug
After MetroCluster SVM failover, Trident ontap-nas-economy driver cannot discover existing flexvols. after some investigation with debuggTraceFlags.api=true, we noticed it is searching for flexVols with a snapshot policy of none, while the existing flexVols (on the main site) have none-DR snapshot policy.
The driver creates new flexvols instead of reusing existing ones with available capacity, causing SVM volume quota exhaustion.

Environment
Provide accurate information about the environment to help us reproduce the issue.

  • Trident version: 25.06.2
  • Kubernetes version: 1.33
  • Kubernetes orchestrator: vanilla
  • NetApp backend types: MetroCluster ONTAP
  • Other: SVM volume quota limit 100, qtreesPerFlexvol: 200, limitVolumeSize: 5000Gi

To Reproduce
Steps to reproduce the behavior:

  1. Configure ontap-nas-economy backend with MetroCluster SVM
  2. Create flexvols and provision qtrees
  3. Trigger MetroCluster SVM failover (SVM name changes from name to name-mc, flexvols' snapshot-policy changes from none to none-DR)
  4. Provision new PVCs after failover
  5. Observe: Trident creates new flexvols instead of reusing existing flexvols with none-DR policy
  6. Verify in debug logs: volume-get-iter ZAPI query includes <snapshot-policy>none</snapshot-policy>, filtering out none-DR volumes

Expected behavior
Trident should discover and reuse existing flexvols regardless of snapshot policy differences (e.g., none vs none-DR).

Additional context
I've implemented a hack/workaround which removes the snapshot policy from the query (as we are not interested in that value in our setup), so that the query returns both none-DR and none flexVols:

patch
diff --git a/storage_drivers/ontap/api/ontap_zapi.go b/storage_drivers/ontap/api/ontap_zapi.go
index ba1021f0..77780b7a 100644
--- a/storage_drivers/ontap/api/ontap_zapi.go
+++ b/storage_drivers/ontap/api/ontap_zapi.go
@@ -1637,7 +1637,12 @@ func (c Client) VolumeListByAttrs(
 	if snapReserve >= 0 {
 		queryVolSpaceAttrs.SetPercentageSnapshotReserve(snapReserve)
 	}
-	queryVolSnapshotAttrs := azgo.NewVolumeSnapshotAttributesType().SetSnapshotPolicy(snapshotPolicy)
+	queryVolSnapshotAttrs := azgo.NewVolumeSnapshotAttributesType()
+	// Only filter by snapshot policy if specified (non-empty)
+	// This allows finding flexvols with different snapshot policies (e.g., "none-DR" vs "none")
+	if snapshotPolicy != "" {
+		queryVolSnapshotAttrs.SetSnapshotPolicy(snapshotPolicy)
+	}
 	if snapshotDir != nil {
 		queryVolSnapshotAttrs.SetSnapdirAccessEnabled(*snapshotDir)
 	}
diff --git a/storage_drivers/ontap/ontap_nas_qtree.go b/storage_drivers/ontap/ontap_nas_qtree.go
index a04852d5..f8314a63 100644
--- a/storage_drivers/ontap/ontap_nas_qtree.go
+++ b/storage_drivers/ontap/ontap_nas_qtree.go
@@ -1443,12 +1443,14 @@ func (d *NASQtreeStorageDriver) findFlexvolForQtree(
 	}
 
 	// Get all volumes matching the specified attributes
+	// Note: Do not filter by SnapshotPolicy to allow discovery of flexvols with different
+	// snapshot policies (e.g., "none-DR" vs "none" in MetroCluster environments)
 	volAttrs := &api.Volume{
 		Aggregates:      []string{aggregate},
 		Encrypt:         enableEncryption,
 		Name:            d.FlexvolNamePrefix() + "*",
 		SnapshotDir:     convert.ToPtr(enableSnapshotDir),
-		SnapshotPolicy:  snapshotPolicy,
+		SnapshotPolicy:  "", // Empty = don't filter by snapshot policy
 		SpaceReserve:    spaceReserve,
 		SnapshotReserve: snapshotReserveInt,
 		TieringPolicy:   tieringPolicy,

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions