Skip to content

Conversation

@MeredithAnya
Copy link
Member

@MeredithAnya MeredithAnya commented Jan 8, 2026

Simple overview of what I was trying to achieve here:

  1. pick the host that has the tables you want to copy, that host should be discoverable via the clusters which is why i reused the drop down
  2. the show tables button under Source Host shows you a list of all the tables so you can verify those are the ones you want to copy over
  3. You manually must know the IP to enter for your new replica, its confusing in the screenshots cause in my local setup the target and source are the same
  4. the show tables button under Target Host should be blank with a new replica and was meant as a way to see what tables have already been created thus far as you execute COPY TABLE on the tables you want
  5. the preview statement was basically a sanity check like a dry run command to make sure the cluster name was right for on cluster
  6. when adding a new replica there is really no reason to run on cluster since its just the one node so thats why i added the toggle as an option

Previewing what the CREATE statement will be

Screenshot 2026-01-07 at 5 51 15 PM

What an error looks like when running COPY TABLE

Screenshot 2026-01-07 at 5 59 33 PM

After running COPY TABLE successfully

Screenshot 2026-01-07 at 5 54 06 PM

@MeredithAnya MeredithAnya requested a review from a team as a code owner January 8, 2026 02:09
@linear
Copy link

linear bot commented Jan 8, 2026

Comment on lines +51 to +54
source_host, 9000, storage_name, client_settings=settings
)
target_connection = get_clusterless_node_connection(
target_host, 9000, storage_name, client_settings=settings
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The copy_table function ignores the port sent from the frontend and always connects to a hardcoded port 9000, which will cause connection failures for nodes on non-standard ports.
Severity: CRITICAL | Confidence: High

🔍 Detailed Analysis

The backend API for copying tables ignores the source_port and target_port parameters sent from the frontend. The copy_table function in snuba/admin/clickhouse/copy_tables.py is implemented to always use a hardcoded port 9000 for both the source and target ClickHouse connections. Since the UI allows selecting nodes that may run on different ports, this discrepancy will cause connection failures or, more critically, connections to the wrong ClickHouse instance, potentially leading to data being copied from or to an incorrect database.

💡 Suggested Fix

Modify the view handler in snuba/admin/views.py to extract source_port and target_port from the request JSON. Pass these ports to the copy_table function. Update the copy_table function signature to accept the port parameters and use them when calling get_clusterless_node_connection instead of the hardcoded 9000.

🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: snuba/admin/clickhouse/copy_tables.py#L51-L54

Potential issue: The backend API for copying tables ignores the `source_port` and
`target_port` parameters sent from the frontend. The `copy_table` function in
`snuba/admin/clickhouse/copy_tables.py` is implemented to always use a hardcoded port
`9000` for both the source and target ClickHouse connections. Since the UI allows
selecting nodes that may run on different ports, this discrepancy will cause connection
failures or, more critically, connections to the wrong ClickHouse instance, potentially
leading to data being copied from or to an incorrect database.

Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 8315514

source_host = req["source_host"]
target_host = req["target_host"]

ipaddress.ip_address(target_host)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The API validates the target_host using ipaddress.ip_address(), which will raise an unhandled ValueError if a user provides a valid hostname instead of an IP address.
Severity: HIGH | Confidence: High

🔍 Detailed Analysis

The view handler for the table copy operation in snuba/admin/views.py validates the target_host using ipaddress.ip_address(). This function raises a ValueError if the input is a hostname instead of a valid IP address. However, the system's configuration and other parts of the codebase permit the use of hostnames for ClickHouse nodes. Since the UI provides a free-form text input for the target host, a user entering a valid hostname (e.g., 'clickhouse-node-1') will trigger an unhandled ValueError, causing the API request to fail.

💡 Suggested Fix

Remove the ipaddress.ip_address(target_host) validation. If validation is required, use a method that correctly resolves and validates both IP addresses and hostnames, or apply the same validation logic consistently to both source_host and target_host.

🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: snuba/admin/views.py#L469

Potential issue: The view handler for the table copy operation in `snuba/admin/views.py`
validates the `target_host` using `ipaddress.ip_address()`. This function raises a
`ValueError` if the input is a hostname instead of a valid IP address. However, the
system's configuration and other parts of the codebase permit the use of hostnames for
ClickHouse nodes. Since the UI provides a free-form text input for the target host, a
user entering a valid hostname (e.g., 'clickhouse-node-1') will trigger an unhandled
`ValueError`, causing the API request to fail.

Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 8315514

@phacops
Copy link
Contributor

phacops commented Jan 8, 2026

I don't really want to have a tool allowing me to selectively create tables on another replica so I think we should just show what tables would be created but not let us select them. It creates weird imbalances if we omit certain tables, even if not in use, and it's cleaner to properly remove all at once from a cluster.

Also, I'm not sure we actually can use this in production if the list of hosts is populated via what's on the query nodes as usual. When we create a new replica, before we can add it to the list of replicas on the query nodes, we need to create the tables first and have them start replicating.

Can we get a list of replicas attached to the clusters from storage nodes directly? Or maybe, can we make it a free-form field and just know to paste the right value for the Target host field?

I think we could get away with not opting out of ON CLUSTER and just letting us execute that on an existing storage node, which would know about the new replica even if the query node doesn't. So, we'd take 1-1 as our reference node (it's always the first node created so always have the tables we want on other replicas) and just have a button "Copy tables to all replicas" and let ON CLUSTER does the work.

And that way, it'll simplify the UI too:

  • no list of tables to select for creation
  • no ON CLUSTER button
  • no create table statement preview
  • no local/target host selection

@MeredithAnya
Copy link
Member Author

MeredithAnya commented Jan 8, 2026

I don't really want to have a tool allowing me to selectively create tables on another replica so I think we should just show what tables would be created but not let us select them. It creates weird imbalances if we omit certain tables, even if not in use, and it's cleaner to properly remove all at once from a cluster.

Also, I'm not sure we actually can use this in production if the list of hosts is populated via what's on the query nodes as usual. When we create a new replica, before we can add it to the list of replicas on the query nodes, we need to create the tables first and have them start replicating.

Can we get a list of replicas attached to the clusters from storage nodes directly? Or maybe, can we make it a free-form field and just know to paste the right value for the Target host field?

I think we could get away with not opting out of ON CLUSTER and just letting us execute that on an existing storage node, which would know about the new replica even if the query node doesn't. So, we'd take 1-1 as our reference node (it's always the first node created so always have the tables we want on other replicas) and just have a button "Copy tables to all replicas" and let ON CLUSTER does the work.

And that way, it'll simplify the UI too:

  • no list of tables to select for creation
  • no ON CLUSTER button
  • no create table statement preview
  • no local/target host selection

Okay so I was about to write up a more detailed description of what the idea was behind what I was doing but I'll just quickly do that here, the idea was to

  1. pick the host that has the tables you want to copy, that host should be discoverable via the clusters which is why i reused the drop down
  2. the show tables button under Source Host shows you a list of all the tables so you can verify those are the ones you want to copy over
  3. You manually must know the IP to enter for your new replica, its confusing in the screenshots cause in my local setup the target and source are the same
  4. the show tables button under Target Host should be blank with a new replica and was meant as a way to see what tables have already been created thus far as you execute COPY TABLE on the tables you want
  5. the preview statement was basically a sanity check like a dry run command to make sure the cluster name was right for on cluster
  6. when adding a new replica there is really no reason to run on cluster since its just the one node so thats why i added the toggle as an option

So that was the idea behind the UI, as for some of your questions:

  • ref(admin): allow system queries on nodes sans clusters #7605 should have added the logic to hand inputing IP hosts directly without the need for the query nodes to know about them yet
  • I don't really want to have a tool allowing me to selectively create tables on another replica so I think we should just show what tables would be created but not let us select them.

    • im pretty sure that is how copy tables works and while its not perfect this was meant to be a slight improvement before we basically revamp this. my understanding is that we wanted to do it table by table to wait for replication to complete for each table before we moved on. which we could automate later but that was not in my scope for this

That all being said - if the UI is confusing and not helpful (and not better than our current solution of copy tables) then I'm happy to scrap this. I was trying to reuse a lot of what we had in admin already and kind of assumed this would be dead soon once we overhauled our process of bring up a new replica/shard anyway

@phacops
Copy link
Contributor

phacops commented Jan 12, 2026

That all being said - if the UI is confusing and not helpful (and not better than our current solution of copy tables) then I'm happy to scrap this. I was trying to reuse a lot of what we had in admin already and kind of assumed this would be dead soon once we overhauled our process of bring up a new replica/shard anyway

I think it's useful to have this in snuba-admin but need to be simplified. You outline 5 or 6 steps to do in your description to create tables, I'm advocating for only 1. The copy tables script works indeed by having you write down all the tables you want to copy and to me, it's a problem, not a feature.

I think a tool like this in snuba-admin would help us and we have the chance to simplify it as well. It could give us the status of tables on all replicas in a cluster (which we can already fetch with the InactiveReplicas query in snuba-admin) and a way to create tables to all replicas, and that's about it. We let ClickHouse manage the rest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants