-
Notifications
You must be signed in to change notification settings - Fork 18
fix: Validate collection names #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds collection name validation to the client (type, non-empty, allowed chars [A-Za-z0-9_], max length), invokes validation in collection creation/get-or-create flows, adds unit tests for validation, and updates README documenting name constraints and Configuration wrapper notes. Changes
Sequence Diagram(s)(omitted — changes do not introduce multi-component sequential interactions requiring a diagram) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/pyseekdb/client/client_base.py (1)
89-127: Solid validation logic with comprehensive error handling.The validation function correctly enforces all requirements: type checking, non-empty constraint, length bounds, and character restrictions. The guard against prefix misconfiguration (lines 112-116) is a thoughtful addition.
Optional: Remove redundant variable assignment
Line 111 assigns
effective_max = available_length, butavailable_lengthis already the value you need. Consider simplifying:# Calculate effective maximum based on table prefix and database limit table_prefix = CollectionNames.table_name("") # Guard against misconfiguration where prefix itself is too long available_length = max(0, _MAX_TABLE_NAME_LENGTH - len(table_prefix)) - effective_max = available_length - if effective_max <= 0: + if available_length <= 0: raise ValueError( "Invalid collection table prefix configuration: no space left for collection name. " f"Prefix={table_prefix!r}, table name limit={_MAX_TABLE_NAME_LENGTH}." ) - if len(name) > effective_max: + if len(name) > available_length: raise ValueError( f"Collection name too long: {len(name)} characters; " - f"maximum allowed is {effective_max} for the current prefix." + f"maximum allowed is {available_length} for the current prefix." )
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
README.mdsrc/pyseekdb/client/client_base.pytests/unit_tests/test_collection_name_validation.py
🧰 Additional context used
🧬 Code graph analysis (2)
src/pyseekdb/client/client_base.py (1)
src/pyseekdb/client/meta_info.py (2)
CollectionNames(12-15)table_name(14-15)
tests/unit_tests/test_collection_name_validation.py (2)
src/pyseekdb/client/client_base.py (1)
_validate_collection_name(89-126)src/pyseekdb/client/meta_info.py (2)
CollectionNames(12-15)table_name(14-15)
🪛 Ruff (0.14.10)
src/pyseekdb/client/client_base.py
104-104: Avoid specifying long messages outside the exception class
(TRY003)
106-106: Avoid specifying long messages outside the exception class
(TRY003)
113-116: Avoid specifying long messages outside the exception class
(TRY003)
118-121: Avoid specifying long messages outside the exception class
(TRY003)
123-126: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (5)
src/pyseekdb/client/client_base.py (3)
43-49: LGTM! Well-documented validation constants.The regex pattern and table name length constant are correctly defined. The comments clearly explain the validation strategy of computing an effective maximum by subtracting the collection prefix length from the database limit.
435-435: Excellent placement of validation.Calling
_validate_collection_name(name)at the beginning ofcreate_collectionensures early failure before any table name construction or SQL execution, providing clear error messages to users.
794-795: Good practice: validate early before any database interaction.Validating the collection name before checking existence ensures immediate feedback for invalid names. While this validation also occurs in
create_collection(line 435) if a new collection is created, the early check here is appropriate for fast failure.tests/unit_tests/test_collection_name_validation.py (1)
1-81: Excellent test coverage for collection name validation.The test suite comprehensively covers all validation rules:
- Valid names including the edge case exactly at maximum length
- Type checking for non-string inputs
- Empty name rejection
- Length limit enforcement
- Invalid character detection with multiple examples
The
_effective_max_lengthproperty correctly mirrors the production calculation, ensuring tests remain synchronized with the implementation.README.md (1)
261-261: Documentation accurately reflects validation constraints.The parameter documentation correctly describes all collection name requirements enforced by
_validate_collection_name: non-empty, alphanumeric plus underscore characters, and the computed maximum length (59 characters with the defaultc$v1$prefix).
|
@hnwyllmm PTAL |
|
Thanks very much for your contribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/pyseekdb/client/client_base.py (2)
82-109: Validation logic is comprehensive and correct.The function properly enforces all required constraints (type, non-empty, length, character set) with clear error messages.
Optional: Address Ruff TRY003 warnings
Static analysis suggests extracting long error messages into custom exception classes for better code organization. Since the error messages are clear and actionable as-is, this is purely a style improvement:
class CollectionNameError(ValueError): """Base exception for collection name validation errors.""" pass class EmptyCollectionNameError(CollectionNameError): """Collection name cannot be empty.""" def __init__(self): super().__init__("Collection name must not be empty") class CollectionNameTooLongError(CollectionNameError): """Collection name exceeds maximum length.""" def __init__(self, length: int, max_length: int): super().__init__( f"Collection name too long: {length} characters; " f"maximum allowed is {max_length}." ) class InvalidCollectionNameCharsError(CollectionNameError): """Collection name contains invalid characters.""" def __init__(self): super().__init__( "Collection name contains invalid characters. " "Only letters, digits, and underscore are allowed: [a-zA-Z0-9_]" )Then use:
raise EmptyCollectionNameError()instead of constructing messages inline.
536-632: Consider adding validation to other collection methods for consistency.While
create_collectionandget_or_create_collectioncorrectly validate names,get_collection,delete_collection, andhas_collectiondo not. Adding validation would provide clearer error messages when users attempt operations with invalid names (e.g., "Invalid collection name" instead of "Collection not found").Proposed additions
def get_collection( self, name: str, embedding_function: EmbeddingFunctionParam = _NOT_PROVIDED ) -> "Collection": + _validate_collection_name(name) # Construct table name: c$v1${name} table_name = CollectionNames.table_name(name)def delete_collection(self, name: str) -> None: + _validate_collection_name(name) # Construct table name: c$v1${name} table_name = CollectionNames.table_name(name)def has_collection(self, name: str) -> bool: + _validate_collection_name(name) # Construct table name: c$v1${name} table_name = CollectionNames.table_name(name)Also applies to: 634-652, 721-741
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
README.mdsrc/pyseekdb/client/client_base.pytests/unit_tests/test_collection_name_validation.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit_tests/test_collection_name_validation.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/pyseekdb/client/client_base.py (3)
src/pyseekdb/client/admin_client.py (1)
AdminAPI(34-94)src/pyseekdb/client/collection.py (3)
Collection(19-570)name(63-65)embedding_function(88-90)src/pyseekdb/client/embedding_function.py (1)
EmbeddingFunction(38-65)
🪛 Ruff (0.14.10)
src/pyseekdb/client/client_base.py
96-96: Avoid specifying long messages outside the exception class
(TRY003)
98-98: Avoid specifying long messages outside the exception class
(TRY003)
100-103: Avoid specifying long messages outside the exception class
(TRY003)
105-108: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (5)
src/pyseekdb/client/client_base.py (3)
39-42: LGTM! Constants correctly define validation constraints.The pattern restricts collection names to
[A-Za-z0-9_]and the 512-character limit aligns with the reviewer's feedback addressing issue #74.
417-417: LGTM! Validation correctly placed at method entry.The validation occurs before any database interaction, ensuring invalid names are rejected at the client as intended.
776-777: LGTM! Validation with clear intent comment.The validation is correctly placed before the
has_collectioncheck, ensuring invalid names are rejected early.README.md (2)
261-261: LGTM! Documentation accurately describes validation rules.The collection name constraints are clearly documented and match the implementation in
_validate_collection_name.
262-272: LGTM! Configuration documentation is clear and comprehensive.The updated documentation effectively explains the Configuration wrapper pattern and provides helpful guidance on when to use it versus HNSWConfiguration directly.
already fixed |
Summary
Enforce stricter, database-aware validation for collection names in pyseekdb, ensuring that invalid names are rejected at the client layer before any SQL is executed #74
Solution Description
Summary by CodeRabbit
Documentation
New Features
Tests
Other
✏️ Tip: You can customize this high-level summary in your review settings.