Skip to content

Conversation

@apstndb
Copy link
Owner

@apstndb apstndb commented Sep 6, 2025

Summary

This PR implements the --sample-database flag for initializing the embedded emulator with Google's official Spanner sample databases, as specified in #470.

Changes

New Features

  • --sample-database=<name> flag to initialize emulator with official samples
  • --list-samples flag to list available sample databases
  • Support for 5 official samples: banking, finance, finance-graph, finance-pg, gaming

Implementation Details

  • URI-based resource loading supporting multiple schemes:
    • gs:// - Google Cloud Storage (for official samples)
    • file:// - Local file system
    • http:// / https:// - HTTP(S) downloads
  • Integration with spanemuboost for DDL/DML initialization
  • memefish for proper SQL statement splitting
  • Automatic dialect detection (PostgreSQL for finance-pg)

Code Organization

  • sample_databases.go - Core implementation with registry and URI loading
  • sample_databases_test.go - Comprehensive test coverage
  • Refactored to minimize code duplication and improve maintainability

Test Plan

  • Unit tests for all URI loading functions
  • Tests for statement parsing
  • Sample registry validation tests
  • Manual testing with --list-samples
  • Error handling for invalid sample names
  • make check passes (all tests and lint)

Usage Examples

# List available samples
spanner-mycli --list-samples

# Start emulator with banking sample
spanner-mycli --embedded-emulator --sample-database=banking

# Start with PostgreSQL sample
spanner-mycli --embedded-emulator --sample-database=finance-pg

# Use with specific database ID
spanner-mycli --embedded-emulator --sample-database=gaming --database=test-gaming

Future Work

Fixes #470

🤖 Generated with Claude Code

…ith official samples

- Add --sample-database flag to load Google's official sample databases
- Add --list-samples flag to show available sample databases
- Support 5 official samples: banking, finance, finance-graph, finance-pg, gaming
- Implement URI-based resource loading (gs://, file://, http://, https://)
- Integrate with spanemuboost for DDL/DML initialization
- Use memefish for proper SQL statement splitting
- Add cloud.google.com/go/storage dependency for GCS support

Fixes #470

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @apstndb, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the spanner-mycli tool by enabling the embedded Cloud Spanner emulator to be initialized with pre-populated official sample databases. This feature streamlines the setup process for development and testing environments, allowing users to quickly get a functional database instance with relevant data without manual configuration. The implementation supports loading data from diverse sources and automatically handles dialect differences.

Highlights

  • New Feature: Sample Database Initialization: Introduces a new --sample-database flag to initialize the embedded Cloud Spanner emulator with official sample databases (banking, finance, finance-graph, finance-pg, gaming).
  • New Feature: List Samples: Adds a --list-samples flag to display all available sample databases and their descriptions.
  • Flexible URI-based Resource Loading: Enables loading schema and data from various sources including Google Cloud Storage (gs://), local file system (file://), and HTTP/HTTPS URLs (http://, https://).
  • Enhanced Emulator Integration: Leverages spanemuboost for DDL/DML initialization and memefish for robust SQL statement splitting, including automatic dialect detection for PostgreSQL samples.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link

github-actions bot commented Sep 6, 2025

Code Metrics Report

📊 View detailed coverage report (available for 7 days)

main (889ab33) #472 (6209b57) +/-
Coverage 70.2% 69.9% -0.3%
Code to Test Ratio 1:1.2 1:1.2 -0.1
Test Execution Time 33s 35s +2s
Details
  |                     | main (889ab33) | #472 (6209b57) |  +/-  |
  |---------------------|----------------|----------------|-------|
- | Coverage            |          70.2% |          69.9% | -0.3% |
  |   Files             |             51 |             53 |    +2 |
  |   Lines             |           5534 |           5688 |  +154 |
+ |   Covered           |           3885 |           3977 |   +92 |
- | Code to Test Ratio  |          1:1.2 |          1:1.2 |  -0.1 |
  |   Code              |          12765 |          13067 |  +302 |
+ |   Test              |          15975 |          16312 |  +337 |
- | Test Execution Time |            33s |            35s |   +2s |

Code coverage of files in pull request scope (63.2% → 62.6%)

Files Coverage +/- Status
cli.go 51.8% -1.3% modified
file_safety.go 91.3% +91.3% added
main.go 66.6% -4.1% modified
sample_databases.go 68.7% +68.7% added

Reported by octocov

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature for initializing the embedded emulator with official sample databases. The implementation is well-structured, with the core logic for handling sample databases and URI-based resource loading cleanly separated into a new sample_databases.go file. The test coverage for the new functionality is also comprehensive.

My review includes a few high-severity suggestions focused on improving performance, robustness, and resource safety, particularly around network and file operations, in alignment with the project's style guide.

apstndb and others added 6 commits September 6, 2025 17:59
- Add --sample-database and --list-samples flags to usage section
- Add "Sample Databases" subsection under Embedded Cloud Spanner Emulator
- Include examples showing how to list and use sample databases
- Fix column name in COUNT example (functions need AS alias for column name)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…ations

- Implement loadMultipleFromURIs() for efficient batch loading of mixed sources
- Add parallel processing using goroutines with sync.WaitGroup.Go() (Go 1.25 feature)
- Batch all GCS operations to use a single client for efficiency
- Process file:// and HTTP(S) URIs in parallel for better performance
- Add HTTP timeout using context.WithTimeout (30 seconds)
- Simplify main.go to use the unified batch loading approach

This optimization significantly improves loading performance when initializing
sample databases with multiple resources from different sources.
- Extract common goroutine pattern into processURIs helper function
- Reduce code duplication between file:// and HTTP(S) processing
- Maintain same functionality with cleaner, more maintainable code
- Use WaitGroup.Go() consistently through helper function
- Remove unnecessary pre-grouping into fileJobs, httpJobs, etc.
- Process non-GCS URIs directly in the main loop using WaitGroup.Go()
- Only collect GCS URIs separately for batch processing efficiency
- Remove unnecessary mutex (each goroutine writes to unique index)
- Reduce code by ~20 lines while maintaining same functionality

This makes the code simpler and more direct - we process URIs as we
encounter them instead of categorizing first then processing.
- Replace loadBatchFromGCS with loadFromGCSWithClient that uses a passed client
- Create GCS client once and share it across all GCS operations
- Treat all URI types uniformly in the same processing loop
- Further reduce code by ~30 lines while maintaining functionality

This completes the simplification - all URI types are now processed
identically with just the loader function being different.
…DML processing

- Change loadMultipleFromURIs to return map[string][]byte for cleaner access
- Eliminate error-prone manual index tracking with resultIdx
- Extract common DDL/DML processing into addStatements helper function
- Add mutex protection for concurrent map writes (required for maps)
- Simplify error messages to be more consistent

This refactoring reduces code duplication and makes the implementation
more robust by eliminating manual index management.
@apstndb
Copy link
Owner Author

apstndb commented Sep 6, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature for initializing the embedded emulator with sample databases. The implementation is generally well-structured, utilizing URI-based loading for flexibility. However, I've identified a critical compilation error in the concurrency logic, a missing security check that violates the project's style guide, and an incorrect test case that will fail. Addressing these points will significantly improve the robustness and correctness of the new functionality.

- Add maxFileSize constant set to 10MB for sample databases
- Check file size with os.Stat before reading file:// URIs
- Check object size with Attrs() before reading GCS objects
- Check Content-Length header and use io.LimitReader for HTTP responses
- Prevents accidental OOM from loading unexpectedly large files

Addresses review feedback about file size safety limits.
- Remove unnecessary loop variable copy (Go 1.25 has fixed semantics)
- Check reader.Close() error with defer func pattern
- Fix formatting with gofumpt
- Use separate variable name for os.Stat error to avoid shadowing
@apstndb
Copy link
Owner Author

apstndb commented Sep 6, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature for initializing the embedded emulator with sample databases. The implementation is well-structured, with logic for handling various URI schemes encapsulated in a new sample_databases.go file. I've found one critical issue that will prevent the code from compiling, which I've detailed in a specific comment.

- Add validation to prevent reading from special files (devices, pipes, sockets)
- Document WaitGroup.Go() as a Go 1.25 feature in styleguide
- Improves safety when loading file:// URIs

Addresses review feedback about file safety validation.
Document the observed behavior that memefish.SplitRawStatements:
- Includes leading line comments with their statements
- Does not include block comments between statements
Reference the upstream PR that will fix block comment handling
- Created file_safety.go with ValidateFileSafety and SafeReadFile functions
- Unified file safety validation logic between cli.go and sample_databases.go
- Added comprehensive tests for file safety functions
- Added platform-specific FIFO tests (Unix-only with runtime.GOOS check)
- Updated CLAUDE.md to specify copyright header convention for new files
- Fixed cli_test.go to match new error messages from SafeReadFile

This improves security by centralizing file validation logic and makes it
easier to maintain consistent safety checks across the codebase.
@apstndb
Copy link
Owner Author

apstndb commented Sep 6, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature for initializing the embedded emulator with sample databases. The implementation is well-structured, with good separation of concerns into new files for sample database logic and file safety checks. The URI-based loading is a flexible approach. My review focuses on a critical compilation error in the concurrent download logic, improving error reporting for multiple failures, and ensuring consistency with the project's newly defined copyright guidelines for new files.

- Fixed copyright headers for new files (sample_databases.go, sample_databases_test.go) to use 'Copyright 2025 apstndb'
- Improved error handling in loadMultipleFromURIs to report all errors when multiple failures occur
- Updated CLAUDE.md to clarify copyright policy for files created after fork vs existing files
- Errors are now sorted for deterministic output
@apstndb
Copy link
Owner Author

apstndb commented Sep 6, 2025

/gemini review

Fixed trailing whitespace on line 166 of sample_databases.go
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature for initializing the embedded emulator with sample databases, along with a convenient --list-samples flag. The implementation is well-structured, with URI-based resource loading and a new centralized file safety module. The test coverage for the new functionality is also comprehensive. I've found one critical issue related to concurrency that will prevent the code from compiling, which I've detailed in a specific comment.

@apstndb
Copy link
Owner Author

apstndb commented Sep 6, 2025

Development Insights from Implementation

This PR implements the --sample-database feature for initializing the embedded Cloud Spanner emulator with official sample databases. Here are the key insights from the implementation:

Architecture Decisions

  1. Unified URI Loading: Created loadMultipleFromURIs() to handle mixed URI schemes (gs://, file://, http://) in parallel. This approach:

    • Reuses a single GCS client across multiple operations for efficiency
    • Uses sync.WaitGroup.Go() (Go 1.25 feature) for clean concurrent processing
    • Returns results as a map for deterministic access without index tracking
  2. File Safety Module: Extracted common file safety checks into file_safety.go:

    • Centralized validation logic for file size limits and special file types
    • Different size limits for different contexts (100MB for source files, 10MB for sample databases)
    • Platform-specific handling for FIFO tests using runtime.GOOS
  3. Error Handling: Implemented comprehensive error reporting that collects all failures and reports them with deterministic ordering, improving debugging experience when multiple URIs fail.

Key Learnings

  1. Go 1.25 Features: The new sync.WaitGroup.Go() method significantly simplifies concurrent code. We documented this in .gemini/styleguide.md to prevent false positive reviews.

  2. Copyright Policy: Established clear policy - new files created after the fork use "Copyright [year] apstndb", while existing files keep "Copyright [year] Google LLC".

  3. memefish Behavior: Discovered that memefish.SplitRawStatements() includes leading line comments with the statement they precede, but not block comments. This behavior is documented with a TODO noting potential future changes.

  4. Security Considerations: For a personal CLI tool, the threat model differs from enterprise software. File size limits and special file validation are sufficient safeguards without over-engineering security measures.

Testing Coverage

  • Added comprehensive tests for file safety validation including platform-specific FIFO tests
  • Test coverage for new modules: file_safety.go (91.3%), sample_databases.go (68.7%)
  • All integration tests pass with the sample database feature

Future Considerations

The URI loading infrastructure created here could be extended to other features like the \. meta command for sourcing SQL files from various locations (tracked in Issue #471).

@apstndb
Copy link
Owner Author

apstndb commented Sep 6, 2025

/gemini summary

@gemini-code-assist
Copy link
Contributor

Summary of Changes

This pull request significantly enhances the spanner-mycli tool by enabling users to quickly set up and experiment with pre-populated Spanner emulator instances using official sample databases. This streamlines development, testing, and demonstration workflows by providing realistic data environments with minimal setup, making it easier to get started and explore Spanner's capabilities.

Highlights

  • New Features: Introduced --sample-database=<name> flag to initialize the embedded emulator with Google's official Spanner sample databases (banking, finance, finance-graph, finance-pg, gaming).
  • New Features: Added --list-samples flag to display available sample databases and their descriptions.
  • URI-based Resource Loading: Implemented a robust URI-based resource loading mechanism supporting gs:// (Google Cloud Storage), file:// (local file system), and http(s):// schemes, with parallel fetching for multiple URIs.
  • File Safety Enhancements: Centralized file safety checks into a new file_safety.go module, including validation for file size limits (10MB for samples, 100MB for general source files) and prevention of reading from special files (e.g., devices, named pipes).
  • Integration with Emulator Tools: Integrated sample database loading with spanemuboost for DDL/DML initialization and memefish for proper SQL statement splitting and automatic dialect detection (including PostgreSQL).
  • Codebase Policy Update: Updated CLAUDE.md to specify new license header guidelines for files added after the project fork.
Changelog
  • CLAUDE.md
    • Added new guidelines for license headers, differentiating between new files and those originating from the spanner-cli fork.
  • README.md
    • Updated the help output to include the new --sample-database and --list-samples flags.
    • Added a new section detailing the usage of sample databases with examples for listing and initializing different samples.
  • cli.go
    • Refactored the executeSourceFile function to utilize the new SafeReadFile utility for improved file safety and error handling.
  • cli_test.go
    • Updated existing tests for executeSourceFile to reflect the new error messages returned by SafeReadFile for non-existent, non-regular, and excessively large files.
  • file_safety.go
    • New file: Introduced FileSafetyOptions struct and ValidateFileSafety function to enforce file size limits and prevent reading from non-regular files.
    • New file: Added SafeReadFile function as a wrapper for os.ReadFile that incorporates the new safety checks.
  • file_safety_test.go
    • New file: Added comprehensive unit tests for ValidateFileSafety and SafeReadFile, covering various scenarios including file size limits, non-regular files (like FIFOs), and non-existent files.
  • go.mod
    • Added new Go module dependencies, including cloud.google.com/go/storage for Google Cloud Storage access and updated OpenTelemetry related modules.
  • go.sum
    • Updated checksums for new and modified Go module dependencies.
  • main.go
    • Added --sample-database and --list-samples flags to the command-line options.
    • Implemented logic to handle the --list-samples flag, printing a formatted list of available sample databases.
    • Integrated sample database loading into the emulator startup process, including dialect detection and parsing DDL/DML statements from specified URIs.
    • Added validation checks to ensure --sample-database is used with --embedded-emulator and that --list-samples is mutually exclusive with --sample-database.
  • sample_databases.go
    • New file: Defined SampleDatabase struct to hold configuration for each sample.
    • New file: Created sampleDatabases map as a registry of official Spanner sample databases with their schema/data URIs and descriptions.
    • New file: Implemented loadMultipleFromURIs to concurrently fetch content from gs://, file://, and http(s):// URIs with safety checks.
    • New file: Provided helper functions loadFromGCSWithClient and loadFromHTTP for specific URI schemes, including timeout and size limit checks.
    • New file: Added ListAvailableSamples to generate a formatted string of available samples.
    • New file: Implemented ParseStatements using memefish to correctly split SQL/SDL content into individual statements.
  • sample_databases_test.go
    • New file: Added unit tests for ListAvailableSamples, loadFromHTTP, loadFromURI (for file scheme), and ParseStatements.
    • New file: Included tests to validate the sampleDatabases registry, ensuring correct dialect and URI formatting.
Activity
  • The github-actions[bot] posted a Code Metrics Report, indicating a slight decrease in overall coverage (-0.3%) but an increase in covered lines (+92) and files (+2).
  • apstndb initiated multiple /gemini review commands to trigger automated reviews.
  • apstndb provided detailed "Development Insights from Implementation," outlining architectural decisions (unified URI loading, file safety module), key learnings (Go 1.25 features, copyright policy, memefish behavior), security considerations, testing coverage, and future work.
  • The gemini-code-assist[bot] raised several high and critical priority review comments regarding file safety (size limits, special files), GCS client efficiency, HTTP client timeouts, and the use of sync.WaitGroup.Go().
  • apstndb addressed and fixed most of the issues raised by the bot, confirming fixes in subsequent commits.
  • A recurring discussion point was the sync.WaitGroup.Go() method, with apstndb asserting its validity in Go 1.25 and the bot repeatedly flagging it as a compilation error, indicating a potential discrepancy in the bot's Go version knowledge or its understanding of the project's specific Go version target.

@apstndb apstndb merged commit adf0cdc into main Sep 6, 2025
2 checks passed
@apstndb apstndb deleted the issue-470-sample-database branch September 6, 2025 14:59
@apstndb apstndb added the enhancement New feature or request label Sep 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add --sample-database flag for initializing embedded emulator with official samples

2 participants