-
-
Notifications
You must be signed in to change notification settings - Fork 60
ref(snuba-admin): add copy tables page #7623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| source_host, 9000, storage_name, client_settings=settings | ||
| ) | ||
| target_connection = get_clusterless_node_connection( | ||
| target_host, 9000, storage_name, client_settings=settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: The copy_table function ignores the port sent from the frontend and always connects to a hardcoded port 9000, which will cause connection failures for nodes on non-standard ports.
Severity: CRITICAL | Confidence: High
🔍 Detailed Analysis
The backend API for copying tables ignores the source_port and target_port parameters sent from the frontend. The copy_table function in snuba/admin/clickhouse/copy_tables.py is implemented to always use a hardcoded port 9000 for both the source and target ClickHouse connections. Since the UI allows selecting nodes that may run on different ports, this discrepancy will cause connection failures or, more critically, connections to the wrong ClickHouse instance, potentially leading to data being copied from or to an incorrect database.
💡 Suggested Fix
Modify the view handler in snuba/admin/views.py to extract source_port and target_port from the request JSON. Pass these ports to the copy_table function. Update the copy_table function signature to accept the port parameters and use them when calling get_clusterless_node_connection instead of the hardcoded 9000.
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: snuba/admin/clickhouse/copy_tables.py#L51-L54
Potential issue: The backend API for copying tables ignores the `source_port` and
`target_port` parameters sent from the frontend. The `copy_table` function in
`snuba/admin/clickhouse/copy_tables.py` is implemented to always use a hardcoded port
`9000` for both the source and target ClickHouse connections. Since the UI allows
selecting nodes that may run on different ports, this discrepancy will cause connection
failures or, more critically, connections to the wrong ClickHouse instance, potentially
leading to data being copied from or to an incorrect database.
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 8315514
| source_host = req["source_host"] | ||
| target_host = req["target_host"] | ||
|
|
||
| ipaddress.ip_address(target_host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: The API validates the target_host using ipaddress.ip_address(), which will raise an unhandled ValueError if a user provides a valid hostname instead of an IP address.
Severity: HIGH | Confidence: High
🔍 Detailed Analysis
The view handler for the table copy operation in snuba/admin/views.py validates the target_host using ipaddress.ip_address(). This function raises a ValueError if the input is a hostname instead of a valid IP address. However, the system's configuration and other parts of the codebase permit the use of hostnames for ClickHouse nodes. Since the UI provides a free-form text input for the target host, a user entering a valid hostname (e.g., 'clickhouse-node-1') will trigger an unhandled ValueError, causing the API request to fail.
💡 Suggested Fix
Remove the ipaddress.ip_address(target_host) validation. If validation is required, use a method that correctly resolves and validates both IP addresses and hostnames, or apply the same validation logic consistently to both source_host and target_host.
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: snuba/admin/views.py#L469
Potential issue: The view handler for the table copy operation in `snuba/admin/views.py`
validates the `target_host` using `ipaddress.ip_address()`. This function raises a
`ValueError` if the input is a hostname instead of a valid IP address. However, the
system's configuration and other parts of the codebase permit the use of hostnames for
ClickHouse nodes. Since the UI provides a free-form text input for the target host, a
user entering a valid hostname (e.g., 'clickhouse-node-1') will trigger an unhandled
`ValueError`, causing the API request to fail.
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 8315514
|
I don't really want to have a tool allowing me to selectively create tables on another replica so I think we should just show what tables would be created but not let us select them. It creates weird imbalances if we omit certain tables, even if not in use, and it's cleaner to properly remove all at once from a cluster. Also, I'm not sure we actually can use this in production if the list of hosts is populated via what's on the query nodes as usual. When we create a new replica, before we can add it to the list of replicas on the query nodes, we need to create the tables first and have them start replicating. Can we get a list of replicas attached to the clusters from storage nodes directly? Or maybe, can we make it a free-form field and just know to paste the right value for the I think we could get away with not opting out of And that way, it'll simplify the UI too:
|
Okay so I was about to write up a more detailed description of what the idea was behind what I was doing but I'll just quickly do that here, the idea was to
So that was the idea behind the UI, as for some of your questions:
That all being said - if the UI is confusing and not helpful (and not better than our current solution of copy tables) then I'm happy to scrap this. I was trying to reuse a lot of what we had in admin already and kind of assumed this would be dead soon once we overhauled our process of bring up a new replica/shard anyway |
I think it's useful to have this in I think a tool like this in snuba-admin would help us and we have the chance to simplify it as well. It could give us the status of tables on all replicas in a cluster (which we can already fetch with the |
Simple overview of what I was trying to achieve here:
Previewing what the
CREATEstatement will beWhat an error looks like when running COPY TABLE
After running COPY TABLE successfully