Skip to content

Conversation

@enitrat
Copy link
Collaborator

@enitrat enitrat commented Jan 4, 2026

Summary

  • Replace pre-summarized file approach with full mdbook build workflow
  • Download and build complete Cairo Book from GitHub releases
  • Remove quiz-cairo, cairo, and gettext preprocessors during build
  • Delete legacy 314KB cairo_book_summary.md file

Context

Previously, CairoBookIngester used a pre-generated summary file (cairo_book_summary.md) that required manual regeneration. This was inconsistent with other mdbook ingesters like StarknetFoundryIngester, which build the full documentation.

This migration:

  1. Downloads the latest Cairo Book release from GitHub
  2. Modifies book.toml to remove problematic preprocessors (quiz-cairo, cairo, gettext)
  3. Builds the complete mdbook using mdbook build
  4. Processes all generated markdown files for full content coverage

Technical Changes

CairoBookIngester.ts:

  • Added downloadAndExtractRepo() - downloads from GitHub releases
  • Added updateBookConfig() - removes preprocessors from book.toml
  • Added buildMdBook() - builds mdbook with CLI
  • Removed readSummaryFile() and chunkSummaryFile() - no longer needed
  • Removed custom process() override - uses parent class workflow
  • Updated to match StarknetFoundryIngester pattern

Cleanup:

  • Deleted python/src/cairo_coder_tools/ingestion/generated/cairo_book_summary.md
  • Verified no orphaned references to removed methods

Benefits

  • ✅ Complete Cairo Book content instead of condensed summaries
  • ✅ Consistent ingestion pattern across all mdbook sources
  • ✅ Automatic updates from latest Cairo Book releases
  • ✅ No manual summary regeneration required
  • ✅ Cleaner codebase without legacy files

Test Plan

  • Run ingestion: cd ingesters && bun run generate-embeddings
  • Verify Cairo Book ingester completes successfully
  • Check vector database for Cairo Book chunks
  • Verify mdbook preprocessors are properly removed during build
  • Confirm no errors related to quiz-cairo/cairo/gettext preprocessors

Replace pre-summarized file approach with full mdbook build workflow,
matching StarknetFoundryIngester pattern. This provides complete Cairo
Book content instead of a condensed summary.

Changes:
- Download Cairo Book from GitHub releases
- Remove quiz-cairo, cairo, and gettext preprocessors from book.toml
- Build full mdbook and process all generated markdown files
- Remove legacy cairo_book_summary.md file (314KB)
- Remove readSummaryFile() and chunkSummaryFile() methods
- Remove custom process() override in favor of parent class workflow

Benefits:
- Access to complete Cairo Book content, not just summaries
- Consistent ingestion pattern across all mdbook sources
- Automatic updates from latest Cairo Book releases
@enitrat
Copy link
Collaborator Author

enitrat commented Jan 4, 2026

[AUTOMATED] Updated the implementation to use git clone from the main branch instead of downloading from GitHub releases. This provides more up-to-date content and simplifies the implementation significantly by removing the need for zip extraction logic.

Replace GitHub release download approach with direct git clone from the
main branch for more up-to-date content and simpler implementation.

Changes:
- Replace axios/AdmZip download logic with git clone --depth 1
- Clone from main branch instead of latest release
- Remove axios and AdmZip dependencies from imports
- Update docstrings to reflect cloning approach
- Simplify downloadAndExtractRepo method significantly

Benefits:
- Always get the latest Cairo Book content from main
- No need to wait for releases
- Simpler implementation without zip extraction logic
- Faster with shallow clone (--depth 1)
@enitrat enitrat force-pushed the feat/cairo-book-mdbook-migration branch from 48115db to 180dc64 Compare January 4, 2026 14:01
@enitrat enitrat merged commit 8c3ded9 into main Jan 4, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants