[SPEC | CORE] : Allow table level override for scan planning #14867

singhpk234 · 2025-12-17T08:38:02Z

About the change

Scan Planning Modes

Single enum ScanPlanningMode with 4 values:

client-only - MUST use client-side planning
client-preferred (default) - Prefer client-side, but flexible
catalog-preferred - Prefer server-side, but flexible (fallback to client if unavailable)
catalog-only - MUST use server-side planning

Negotiation Logic

When both client and server configure scan-planning-mode:

Both same → Use that mode
Incompatible → client-only + catalog-only = FAIL
ONLY beats PREFERRED → Hard requirement wins over flexible preference
Both PREFERRED → Client wins
Only one set → Use it
Neither set → Default client-preferred

ML : https://lists.apache.org/thread/z1g4y8b4ogdrn0jjtjlgg7yjgxdbzpvg

nastra · 2025-12-17T10:16:19Z

while the changes make sense to me, we may actually want to discuss this in the broader community to decide whether we want to override server-side scan planning at the table level

singhpk234 · 2025-12-20T00:01:20Z

ML thread : https://lists.apache.org/thread/z1g4y8b4ogdrn0jjtjlgg7yjgxdbzpvg

geruh · 2025-12-20T02:44:05Z

Thanks for raising this @singhpk234! I like the direction here. As a user today, there are two modes either use scan planning or not. Which begs the question, when should I use one versus the other? And right now, there is no clear insight or story from the user's perspective.

Now from a catalog's perspective, the modes make sense. For instance, If the catalog is using planning to enforce governance, the required mode would signify intent. But I do have a question on the semantics. In the current implementation IIUC, optional seems to be similar to required. Rendering the combination of mode=None and Catalog support of planning, as optional. So I'd say if the intent is truly to express optionality to the client, should there be a client side override that wins. Or is the expectation that the catalog avoids sending the mode.

singhpk234 · 2025-12-22T17:16:00Z

Thanks for the feedback @geruh !

optional seems to be similar to required.

Optional in this context is that the catalog really doesn't have an opinion on what the client decides, it can choose local and remote, the way i was thinking is lets say you are running a lot of concurrent queries in your spark cluster and your driver is slim, even though, we are using spark, we may prefer spark. That being said yes in this impl what i did if the catalog supports plan endpoint and the catalog doesn't have any opinion on this, in java impl we always do scan planning, yes being able to toggle this based on client side config would be ideal may be when the server sends optional from the server side, and from the client side we have configured required we should not allow overwritting the key to optionals and reuse the optional ? WDYT

I see @RussellSpitzer has similar feedback in ML thread too, let me take a deeper look on their feedback and respond there as well

singhpk234 · 2025-12-22T20:08:53Z

Decision matrix : scan planning mode (required | optional | none) :

Client Config	Server Config	Supports REST?	Planning Outcome
Required	Required	Yes	Server Planning (Server overrides and fulfills requirement)
Required	Optional	Yes	Server Planning (Client mandates, Server is flexible)
Required	None	Yes	Server Planning (Client mandate takes priority for Server)
Optional	Required	Yes	Server Planning (Server Required overrides Client Optional)
Optional	Optional	Yes	Server Planning (Defaulted for REST efficiency)
Optional	None	Yes	Local Planning (Server cannot, Client is flexible)
None	Required	Yes	Server Planning (Server Required overrides Client None)
None	Optional	Yes	Local Planning (Client mandate takes priority)
None	None	Yes	Client Planning (Both agree on Client side)
Not configured	Required	Yes	Server Planning (Server Required overrides Client None)
Not configured	Optional	Yes	Local Planning
Not configured	None	Yes	Local Planning

Decision matrix : scan planning mode(client only | client preferred | catalog preferred | catalog only)

Client Config	Server Config	Supports REST?	Planning Outcome
Client Only	Client Only	Yes	Local Planning: Both agree on client-side implementation.
Client Only	Client Preferred	Yes	Local Planning: Both Server and Client agree.
Client Only	Catalog Preferred	Yes	Local Planning: Client ignores server optimization capabilities and chooses client only impl.
Client Only	Catalog Only	Yes	FAIL
Client Preferred	Client Only	Yes	Local Planning: Client preference aligns with server's strict client-only requirement.
Client Preferred	Client Preferred	Yes	Local Planning: Both parties prefer local execution; negotiation defaults to Client.
Client Preferred	Catalog Preferred	Yes	Local Planning: Both are flexible; override client side.
Client Preferred	Catalog Only	Yes	Server Planning: Server MUST plan; client yields its preference.
Catalog Preferred	Client Only	Yes	Local Planning: Server MUST plan locally; client yields its preference.
Catalog Preferred	Client Preferred	Yes	Server Planning: Negotiation favors client-side to minimize server-side overhead.
Catalog Preferred	Catalog Preferred	Yes	Server Planning: Both parties prefer server-side; Catalog is chosen.
Catalog Preferred	Catalog Only	Yes	Server Planning: Server MUST plan; client preference aligns.
Catalog Only	Client Only	Yes	FAIL:
Catalog Only	Client Preferred	Yes	Server Planning: Server is flexible; Catalog MUST plan.
Catalog Only	Catalog Preferred	Yes	Server Planning: Client MUST use Catalog; server agrees with preference.
Catalog Only	Catalog Only	Yes	Server Planning: Strict agreement on server-side implementation.
Not Configured	Client Only	Yes	Local Planning: Client MUST adhere to Catalog;
Not Configured	Client Preferred	Yes	Local Planning:
Not Configured	Catalog Preferred	Yes	Server Planning:
Not Configured	Catalog Only	Yes	Server Planning: Strict agreement on server-side implementation.

RussellSpitzer · 2025-12-22T21:43:26Z

I wasn't thinking about it quite that way. I was assuming the client is configured independently of the catalog. The client can either have a user preference or none. If none, it does whatever the catalog feeds back to it if the client supports that mode. Otherwise it does what is manually specified.

So

Client (None) -> Follow Catalog Config (Client Only, Client Preferred, CatalogPreferred or Catalog Only), Fail if the client doesn't support config
Client (ClientPlanning) -> Fail if Catalog says Catalog Only, Client planning for all other cases
Client (CatalogPlanning) -> Fail if Catalog says Client Only, Catalog planning for all other cases

A user without a preference or who wants the Catalog to make the determination just leaves this unset on the client. Or a Client which wants to override the catalog can set a specific mode and fail fast if the Catalog doesn't support it.

core/src/main/java/org/apache/iceberg/rest/RESTCatalogProperties.java

nastra · 2026-01-07T10:43:55Z

core/src/main/java/org/apache/iceberg/rest/RESTCatalogProperties.java

+   * <p>Values:
+   *
+   * <ul>
+   *   <li>CLIENT_ONLY - MUST use client-side planning. Fails if paired with CATALOG_ONLY from other


I think using ENFORCED might be a better fit instead of ONLY, wdyt? That explains the intent more naturally

you mean CLIENT_ENFORCED | CATALOG_ENFORCED, it believe it does more authoritative, since we are including in spec this might be language we prefer, let me think a bit more on this.

I would actually prefer that we just remove it and have client, client-preferred, catalog-preferred, and catalog.

Using words like ENFORCED or REQUIRED don't quite feel right and ultimately, if we're going with this enumeration, it is explicit.

I'm still a bit skeptical that we even need the notion of preferences, e.g. client-preferred, catalog-preferred. it''s plausible that servers could have more insights to give a more intelligent preference but it feels over complicated compared to just having a "clients-choice" (not a real mode, just something that's inferred when the endpoint is supported but not required) instead of 2 preferences. It simplifies the decision matrix logic below, and clients can then use their own heuristics.

I think that's what the decision as to if preferences or not makes sense, comes down to:
is it better to have clients just make intelligent choices when server side planning is available but not required, or is it better for servers to indicate preferences. My thought process is if a server really feels like it's advantageous to do remote planning, may as well just send it back as required.

I should note: I'm not super opinionated on this, but I do think it'd be great if we could outline some concrete cases where we think a preference is advantageous (in both directions) just to make it clear if the complexity is worth it.

is it better to have clients just make intelligent choices when server side planning is available but not required, or is it better for servers to indicate preferences. My thought process is if a server really feels like it's advantageous to do remote planning, may as well just send it back as required

This is mostly from the POV that its dependent on the load they are having at the moment when the call is made, for example lets take the following cases:

I am using py-iceberg, i know i am low on resources its better i just do remote planning if possible and the table is big and catalog can py-iceberg can say i prefer catalog to be planned and server based on catalog_only / catalog_preferred can have that negotiation.

Let say i am spark and i have big compute infra, but i based on the current workload,

lets say a lot of concurrent queries env, I will not have a lot of memory available to plan this, i would start with saying i prefer catalog

let say i have dedicated cluster rather than doing remote plan i would do it in my JVM, i would say client_only from the client side

Server Side

If the server is load and the client is open to plan it in client end then its better just server say hey i am burdened / low on resource are you open to planning in client end and hence as soft signal client_preferred, server has no clue on what the client is its purely sending this decision based on what its their state, sending client_only would have caused trouble for stuff like py-iceberg incase its configured to catalog_only

please let me know what do you think of these cases ?

I am using py-iceberg, i know i am low on resources its better i just do remote planning if possible and the table is big and catalog can py-iceberg can say i prefer catalog to be planned and server based on catalog_only / catalog_preferred can have that negotiation.

Yeah I guess I'm mainly coming from the perspective that if a user is running PyIceberg in a low resource environment, then a user would either knowingly explicitly configure the client property to use remote planning, or PyIceberg would internally choose what planning it wants when it's optional (could be something simple like just do client planning, could be heuristics based, it's all up to client implementations).

It's nice that the server could use this as a dynamic mechanism to control planning based on the load but I think there are already mechanisms for that. A server could just throttle a client initiated planning, and then a client could fall back to using client side planning for instance. This doesn't require additional protocol complexity to support today (I believe).

Let say i am spark and i have big compute infra, but i based on the current workload,
lets say a lot of concurrent queries env, I will not have a lot of memory available to plan this, i would start with saying i prefer catalog
let say i have dedicated cluster rather than doing remote plan i would do it in my JVM, i would say client_only from the client side

Yeah, same principle as the PyIceberg case imo, I feel like in these circumstances a user would either explicitly configure stuff, and if we need a little bit more dynamism based on server/client load, we'd build that logic directly in the client without specing out preferences.

As far as I can tell, the main benefit of codifying preferences in the spec is that it standardizes client behavior when the endpoint is optional but not required (i.e. we know exactly what PyIceberg, Java, Rust etc would do in this situation given some combination of options in that matrix). With my approach, there'd be deviation in client behavior across different implementations, but I personally think that's kind of an advantage in this case.

I personally don't feel like that's super useful but as I said, I'm willing to move forward here since I guess these additional options aren't that complicated for clients to implement and there's some level of benefit I can see to standardizing behavior across clients.

nastra · 2026-01-07T17:31:29Z

core/src/main/java/org/apache/iceberg/rest/RESTCatalogProperties.java

+  // Negotiation rules: ONLY beats PREFERRED, both PREFERRED = client wins
+  // Default when neither client nor server provides: client-preferred
+  public static final String SCAN_PLANNING_MODE = "scan-planning-mode";
+  public static final String SCAN_PLANNING_MODE_DEFAULT =


I think it would make sense to split out introducing the different planning modes from the option of overriding this at the table level

is this more from backward compatibility pov ? asking because we haven't shipped any iceberg java version yet with this config

core/src/main/java/org/apache/iceberg/rest/RESTCatalogProperties.java

amogh-jahagirdar · 2026-01-08T16:50:42Z

core/src/main/java/org/apache/iceberg/rest/ScanPlanningNegotiator.java

+ *   <li><b>Neither configured</b>: Use default (CLIENT_PREFERRED)
+ * </ul>
+ */
+public class ScanPlanningNegotiator {


I'm not really sure we need a whole separate "negotiator" class, it feels a bit over the top, and it's not really a negotiation imo. We're using a defined set of rules to determine the planning mode. Have we considered just having a static ScanPlanningMode#determinePlanningDecision

amogh-jahagirdar · 2026-01-08T16:55:00Z

core/src/main/java/org/apache/iceberg/rest/RESTCatalogProperties.java

+   * <p>Values:
+   *
+   * <ul>
+   *   <li>CLIENT_ONLY - MUST use client-side planning. Fails if paired with CATALOG_ONLY from other


I'm still a bit skeptical that we even need the notion of preferences, e.g. client-preferred, catalog-preferred. it''s plausible that servers could have more insights to give a more intelligent preference but it feels over complicated compared to just having a "clients-choice" (not a real mode, just something that's inferred when the endpoint is supported but not required) instead of 2 preferences. It simplifies the decision matrix logic below, and clients can then use their own heuristics.

I think that's what the decision as to if preferences or not makes sense, comes down to:
is it better to have clients just make intelligent choices when server side planning is available but not required, or is it better for servers to indicate preferences. My thought process is if a server really feels like it's advantageous to do remote planning, may as well just send it back as required.

I should note: I'm not super opinionated on this, but I do think it'd be great if we could outline some concrete cases where we think a preference is advantageous (in both directions) just to make it clear if the complexity is worth it.

pvary · 2026-01-10T06:32:32Z

open-api/rest-catalog-open-api.py

+    - **Both PREFERRED**: When both are PREFERRED (different types), client config wins
+    - **Both same**: When both have the same value, use that planning type
+    - **Only one configured**: Use the configured side (client or server)
+    - **Neither configured**: Use default (`client-preferred`)


I have a concern about some catalogs starting to make every table CATALOG_ONLY, which would essentially lock users to the catalog without providing a way to migrate the data to another catalog.
Maybe we add a sentence in the spec to enforce, that there should be some users where the catalog MUST provide access to the metadata files.

WDYT?

the catalog without providing a way to migrate the data to another catalog

we would still have a way to migrate, mostly because in the loadTable we give back the metadata.json pointer (which is self describing the table state), and its the catalog ADMIN would be able to use that pointer and register table to another REST or Metastore backed catalog. In the model where storage is decoupled from compute its the administrator of the catalog who has given access the catalog to vend storage creds and it can very well take it back.

This feature is mostly like i want to read the table, can you help me with the data | delete files that corresponds to the table. Nevertheless i believe CATALOG_ONLY we think to be used primarily for gov cases also for things like scanning huge tables where planning can cause a lot of pressure on JVM (trino coordinar unstablity | spark requiring distributed planning) where catalog can do some efficient indexing (stuff like Redis) etc to help these engine.

All in all IMHO i believe vendor lock in and not being able to migrate would not be possible by exposing this option, please let me know if i am missing something.

The specification states:

The corresponding file location of table metadata should be returned in the `metadata-location` field

However, it does not specify that this location must be readable by any user. (Perhaps this is something we should clarify going forward.)

Before the introduction of CATALOG_ONLY tables, users were required to have direct read access to the metadata files in order to plan queries on the table. That implied an access requirement, even though it was never explicitly documented. With the introduction of CATALOG_ONLY, this implicit requirement no longer applies, and we currently do not have an explicit requirement defined in the specification either.

danielcweeks · 2026-01-13T21:52:14Z

open-api/rest-catalog-open-api.yaml

        - `token`: Authorization bearer token to use for table requests if OAuth2 security is enabled
+        - `scan-planning-mode`: Controls scan planning behavior for table operations. This property can be configured by:
+          - **Server**: Returned in `LoadTableResponse.config()` to advertise server preference/requirement
+          - **Client**: Set in catalog properties to override server configuration


I think we need to remove anything in the REST spec referencing the client configuration. This spec should be explicitly reserved for what the server needs to implement.

How a client configures/defines it's behavior should not be covered in the spec (it's an implementation detail).

singhpk234 requested review from amogh-jahagirdar and nastra December 17, 2025 08:38

github-actions bot added the core label Dec 17, 2025

sfc-gh-prsingh force-pushed the feature/selectively-override-failure branch from a04195b to 474e982 Compare December 17, 2025 09:15

github-actions bot added the OPENAPI label Dec 17, 2025

sfc-gh-prsingh force-pushed the feature/selectively-override-failure branch 2 times, most recently from 3144cd9 to f68fbe2 Compare December 17, 2025 17:35

singhpk234 changed the title ~~CORE: Allow table level override for scan planning~~ [SPEC | CORE] : Allow table level override for scan planning Dec 20, 2025

sfc-gh-prsingh force-pushed the feature/selectively-override-failure branch 2 times, most recently from 10123db to 7f2a05b Compare December 20, 2025 02:22

github-actions bot added the spark label Dec 22, 2025

sfc-gh-prsingh force-pushed the feature/selectively-override-failure branch 3 times, most recently from 9b1adae to 66eb57a Compare December 24, 2025 01:21

nastra reviewed Jan 7, 2026

View reviewed changes

core/src/main/java/org/apache/iceberg/rest/RESTCatalogProperties.java Outdated Show resolved Hide resolved

nastra reviewed Jan 7, 2026

View reviewed changes

anoopj reviewed Jan 7, 2026

View reviewed changes

core/src/main/java/org/apache/iceberg/rest/RESTCatalogProperties.java Outdated Show resolved Hide resolved

sfc-gh-prsingh force-pushed the feature/selectively-override-failure branch from 66eb57a to 860ba21 Compare January 7, 2026 22:38

sfc-gh-prsingh added 5 commits January 7, 2026 22:53

CORE: Allow table level override for scan planning

f8e9b6f

Add test and spec changes

f65a613

Add rest planning mode

9f2923e

adjust build

e5c378e

Address review feedback

ef9e24f

Adjust preference mode

1f9bdcb

sfc-gh-prsingh force-pushed the feature/selectively-override-failure branch from 860ba21 to 1f9bdcb Compare January 7, 2026 23:04

amogh-jahagirdar reviewed Jan 8, 2026

View reviewed changes

pvary reviewed Jan 10, 2026

View reviewed changes

singhpk234 added this to the Iceberg 1.11.0 milestone Jan 11, 2026

danielcweeks reviewed Jan 13, 2026

View reviewed changes

[SPEC | CORE] : Allow table level override for scan planning #14867

Are you sure you want to change the base?

[SPEC | CORE] : Allow table level override for scan planning #14867

Uh oh!

Conversation

singhpk234 commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

About the change

Uh oh!

nastra commented Dec 17, 2025

Uh oh!

singhpk234 commented Dec 20, 2025

Uh oh!

geruh commented Dec 20, 2025

Uh oh!

singhpk234 commented Dec 22, 2025

Uh oh!

singhpk234 commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Decision matrix : scan planning mode (required | optional | none) :

Decision matrix : scan planning mode(client only | client preferred | catalog preferred | catalog only)

Uh oh!

RussellSpitzer commented Dec 22, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielcweeks Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amogh-jahagirdar Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amogh-jahagirdar Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pvary Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

singhpk234 commented Dec 17, 2025 •

edited

Loading

singhpk234 commented Dec 22, 2025 •

edited

Loading

danielcweeks Jan 8, 2026 •

edited

Loading

amogh-jahagirdar Jan 8, 2026 •

edited

Loading

amogh-jahagirdar Jan 8, 2026 •

edited

Loading

pvary Jan 13, 2026 •

edited

Loading