ref(snuba-admin): add copy tables page #7623

MeredithAnya · 2026-01-08T02:09:10Z

Simple overview of what I was trying to achieve here:

pick the host that has the tables you want to copy, that host should be discoverable via the clusters which is why i reused the drop down
the show tables button under Source Host shows you a list of all the tables so you can verify those are the ones you want to copy over
You manually must know the IP to enter for your new replica, its confusing in the screenshots cause in my local setup the target and source are the same
the show tables button under Target Host should be blank with a new replica and was meant as a way to see what tables have already been created thus far as you execute COPY TABLE on the tables you want
the preview statement was basically a sanity check like a dry run command to make sure the cluster name was right for on cluster
when adding a new replica there is really no reason to run on cluster since its just the one node so thats why i added the toggle as an option

Previewing what the CREATE statement will be

What an error looks like when running COPY TABLE

After running COPY TABLE successfully

linear · 2026-01-08T02:09:14Z

EAP-344 Add a way to run the copy tables script from snuba admin

sentry · 2026-01-08T02:13:34Z

snuba/admin/clickhouse/copy_tables.py

+        source_host, 9000, storage_name, client_settings=settings
+    )
+    target_connection = get_clusterless_node_connection(
+        target_host, 9000, storage_name, client_settings=settings


Bug: The copy_table function ignores the port sent from the frontend and always connects to a hardcoded port 9000, which will cause connection failures for nodes on non-standard ports.
_{Severity: CRITICAL | Confidence: High}

🔍 Detailed Analysis

The backend API for copying tables ignores the source_port and target_port parameters sent from the frontend. The copy_table function in snuba/admin/clickhouse/copy_tables.py is implemented to always use a hardcoded port 9000 for both the source and target ClickHouse connections. Since the UI allows selecting nodes that may run on different ports, this discrepancy will cause connection failures or, more critically, connections to the wrong ClickHouse instance, potentially leading to data being copied from or to an incorrect database.

💡 Suggested Fix

Modify the view handler in snuba/admin/views.py to extract source_port and target_port from the request JSON. Pass these ports to the copy_table function. Update the copy_table function signature to accept the port parameters and use them when calling get_clusterless_node_connection instead of the hardcoded 9000.

🤖 Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: snuba/admin/clickhouse/copy_tables.py#L51-L54 Potential issue: The backend API for copying tables ignores the `source_port` and `target_port` parameters sent from the frontend. The `copy_table` function in `snuba/admin/clickhouse/copy_tables.py` is implemented to always use a hardcoded port `9000` for both the source and target ClickHouse connections. Since the UI allows selecting nodes that may run on different ports, this discrepancy will cause connection failures or, more critically, connections to the wrong ClickHouse instance, potentially leading to data being copied from or to an incorrect database.

_{Did we get this right? 👍 / 👎 to inform future reviews.}
_{Reference ID: 8315514}

sentry · 2026-01-08T02:13:34Z

snuba/admin/views.py

+        source_host = req["source_host"]
+        target_host = req["target_host"]
+
+        ipaddress.ip_address(target_host)


Bug: The API validates the target_host using ipaddress.ip_address(), which will raise an unhandled ValueError if a user provides a valid hostname instead of an IP address.
_{Severity: HIGH | Confidence: High}

🔍 Detailed Analysis

The view handler for the table copy operation in snuba/admin/views.py validates the target_host using ipaddress.ip_address(). This function raises a ValueError if the input is a hostname instead of a valid IP address. However, the system's configuration and other parts of the codebase permit the use of hostnames for ClickHouse nodes. Since the UI provides a free-form text input for the target host, a user entering a valid hostname (e.g., 'clickhouse-node-1') will trigger an unhandled ValueError, causing the API request to fail.

💡 Suggested Fix

Remove the ipaddress.ip_address(target_host) validation. If validation is required, use a method that correctly resolves and validates both IP addresses and hostnames, or apply the same validation logic consistently to both source_host and target_host.

🤖 Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: snuba/admin/views.py#L469 Potential issue: The view handler for the table copy operation in `snuba/admin/views.py` validates the `target_host` using `ipaddress.ip_address()`. This function raises a `ValueError` if the input is a hostname instead of a valid IP address. However, the system's configuration and other parts of the codebase permit the use of hostnames for ClickHouse nodes. Since the UI provides a free-form text input for the target host, a user entering a valid hostname (e.g., 'clickhouse-node-1') will trigger an unhandled `ValueError`, causing the API request to fail.

_{Did we get this right? 👍 / 👎 to inform future reviews.}
_{Reference ID: 8315514}

phacops · 2026-01-08T20:45:39Z

I don't really want to have a tool allowing me to selectively create tables on another replica so I think we should just show what tables would be created but not let us select them. It creates weird imbalances if we omit certain tables, even if not in use, and it's cleaner to properly remove all at once from a cluster.

Also, I'm not sure we actually can use this in production if the list of hosts is populated via what's on the query nodes as usual. When we create a new replica, before we can add it to the list of replicas on the query nodes, we need to create the tables first and have them start replicating.

Can we get a list of replicas attached to the clusters from storage nodes directly? Or maybe, can we make it a free-form field and just know to paste the right value for the Target host field?

I think we could get away with not opting out of ON CLUSTER and just letting us execute that on an existing storage node, which would know about the new replica even if the query node doesn't. So, we'd take 1-1 as our reference node (it's always the first node created so always have the tables we want on other replicas) and just have a button "Copy tables to all replicas" and let ON CLUSTER does the work.

And that way, it'll simplify the UI too:

no list of tables to select for creation
no ON CLUSTER button
no create table statement preview
no local/target host selection

MeredithAnya · 2026-01-08T21:30:05Z

I don't really want to have a tool allowing me to selectively create tables on another replica so I think we should just show what tables would be created but not let us select them. It creates weird imbalances if we omit certain tables, even if not in use, and it's cleaner to properly remove all at once from a cluster.

Also, I'm not sure we actually can use this in production if the list of hosts is populated via what's on the query nodes as usual. When we create a new replica, before we can add it to the list of replicas on the query nodes, we need to create the tables first and have them start replicating.

Can we get a list of replicas attached to the clusters from storage nodes directly? Or maybe, can we make it a free-form field and just know to paste the right value for the Target host field?

I think we could get away with not opting out of ON CLUSTER and just letting us execute that on an existing storage node, which would know about the new replica even if the query node doesn't. So, we'd take 1-1 as our reference node (it's always the first node created so always have the tables we want on other replicas) and just have a button "Copy tables to all replicas" and let ON CLUSTER does the work.

And that way, it'll simplify the UI too:

no list of tables to select for creation

no ON CLUSTER button

no create table statement preview

no local/target host selection

Okay so I was about to write up a more detailed description of what the idea was behind what I was doing but I'll just quickly do that here, the idea was to

pick the host that has the tables you want to copy, that host should be discoverable via the clusters which is why i reused the drop down
the show tables button under Source Host shows you a list of all the tables so you can verify those are the ones you want to copy over
You manually must know the IP to enter for your new replica, its confusing in the screenshots cause in my local setup the target and source are the same
the show tables button under Target Host should be blank with a new replica and was meant as a way to see what tables have already been created thus far as you execute COPY TABLE on the tables you want
the preview statement was basically a sanity check like a dry run command to make sure the cluster name was right for on cluster
when adding a new replica there is really no reason to run on cluster since its just the one node so thats why i added the toggle as an option

So that was the idea behind the UI, as for some of your questions:

ref(admin): allow system queries on nodes sans clusters #7605 should have added the logic to hand inputing IP hosts directly without the need for the query nodes to know about them yet
I don't really want to have a tool allowing me to selectively create tables on another replica so I think we should just show what tables would be created but not let us select them.
- im pretty sure that is how copy tables works and while its not perfect this was meant to be a slight improvement before we basically revamp this. my understanding is that we wanted to do it table by table to wait for replication to complete for each table before we moved on. which we could automate later but that was not in my scope for this

That all being said - if the UI is confusing and not helpful (and not better than our current solution of copy tables) then I'm happy to scrap this. I was trying to reuse a lot of what we had in admin already and kind of assumed this would be dead soon once we overhauled our process of bring up a new replica/shard anyway

phacops · 2026-01-12T21:30:21Z

That all being said - if the UI is confusing and not helpful (and not better than our current solution of copy tables) then I'm happy to scrap this. I was trying to reuse a lot of what we had in admin already and kind of assumed this would be dead soon once we overhauled our process of bring up a new replica/shard anyway

I think it's useful to have this in snuba-admin but need to be simplified. You outline 5 or 6 steps to do in your description to create tables, I'm advocating for only 1. The copy tables script works indeed by having you write down all the tables you want to copy and to me, it's a problem, not a feature.

I think a tool like this in snuba-admin would help us and we have the chance to simplify it as well. It could give us the status of tables on all replicas in a cluster (which we can already fetch with the InactiveReplicas query in snuba-admin) and a way to create tables to all replicas, and that's about it. We let ClickHouse manage the rest.

ref(snuba-admin): add copy tables page

2f84fe3

MeredithAnya requested a review from a team as a code owner January 8, 2026 02:09

sentry bot reviewed Jan 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ref(snuba-admin): add copy tables page #7623

ref(snuba-admin): add copy tables page #7623

Uh oh!

MeredithAnya commented Jan 8, 2026 •

edited

Loading

Uh oh!

linear bot commented Jan 8, 2026

Uh oh!

sentry bot Jan 8, 2026

Uh oh!

sentry bot Jan 8, 2026

Uh oh!

phacops commented Jan 8, 2026 •

edited

Loading

Uh oh!

MeredithAnya commented Jan 8, 2026 •

edited

Loading

Uh oh!

phacops commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

ref(snuba-admin): add copy tables page #7623

Are you sure you want to change the base?

ref(snuba-admin): add copy tables page #7623

Uh oh!

Conversation

MeredithAnya commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linear bot commented Jan 8, 2026

Uh oh!

sentry bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

sentry bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

phacops commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MeredithAnya commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phacops commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MeredithAnya commented Jan 8, 2026 •

edited

Loading

phacops commented Jan 8, 2026 •

edited

Loading

MeredithAnya commented Jan 8, 2026 •

edited

Loading