-
Notifications
You must be signed in to change notification settings - Fork 0
Docs: Add comprehensive LDP tutorial and directory READMEs #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Added complete getting started tutorial using tested example code from the examples/ directory, including: - MinIO operations - Spark data processing - Iceberg table management - Airflow workflow orchestration - Production-ready DAG examples Added README files to all key directories explaining: - airflow/ - DAG development and deployment - config/ - Service configurations (Iceberg, etc.) - docker/ - Custom Docker images and dependencies - examples/ - Tested code examples and patterns - scripts/ - Utility scripts for platform management - spark/ - Spark job development and submission - terraform/ - Infrastructure as Code deployment - tests/ - Testing strategies and frameworks All documentation references tested, working code to ensure users have a solid foundation for building their own pipelines. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Updated tutorial to follow best practice of copying example code to appropriate locations before running: - Spark jobs: Copy to spark/jobs/ before submitting - Airflow DAGs: Copy to airflow/dags/ before deploying - Scripts: Copy to scripts/ before running This approach: - Keeps examples/ directory clean as reference material - Prevents accidental modifications to tested code - Teaches users the proper workflow - Makes examples/ a template library Added clear warnings and notes throughout the tutorial. Made this rule #1 in Best Practices section. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Clarified that the examples/ directory is: - Optional and can be deleted if users want to start fresh - A reference library and template collection - Not required for LDP platform to function Updated all instructions to emphasize: - Always copy examples to appropriate folders before running - Never run code directly from examples/ - Three approaches: copy & modify, reference only, or delete entirely Added clear table showing where to copy each example type: - Spark jobs → spark/jobs/ - Airflow DAGs → airflow/dags/ - Scripts → scripts/ Made best practice #1: Never run from examples/ directly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added comprehensive Windows support to the tutorial: Platform Support: - ✅ Windows 10/11 with Docker Desktop - ✅ Linux (Ubuntu, Fedora, etc.) - ✅ macOS (Intel and Apple Silicon) Changes: - All commands now shown for both platforms with 💻 and 🐧 icons - Windows PowerShell, Command Prompt, and WSL guidance - File path differences (\ vs /) clearly explained - Platform-specific command table for reference - Windows alternatives for make commands using docker-compose Key additions: - Platform prerequisites section - Windows vs Linux command comparison table - File path handling notes for Windows users - Recommended Windows setup (PowerShell, WSL, Docker Desktop) - Cross-platform troubleshooting guidance Examples now show: - cp vs Copy-Item/copy commands - make commands vs direct docker-compose commands - Platform-specific log viewing This makes LDP accessible to the majority Windows user base while maintaining full support for Linux/macOS developers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Major corrections to reflect actual LDP architecture: Architecture: - Minikube (local Kubernetes) + Terraform deployment - NOT docker-compose Scripts: - Windows: PowerShell scripts in scripts/windows/ - setup.ps1, start.ps1, stop.ps1, check-health.ps1 - Linux/macOS: Bash scripts in scripts/ - setup.sh, start.sh, stop.sh, check-health.sh Commands: - Removed all docker-compose references - Removed make commands (use actual scripts) - Use kubectl for interacting with services - Show actual script paths throughout Prerequisites: - Minikube, kubectl, Terraform, Helm - Installation instructions for Windows (choco/winget), Linux, macOS Service Access: - Services accessed via Minikube IP + NodePorts - Ports: Airflow=30080, MinIO=30901, Spark=30707, Jupyter=30888 Unified commands where possible: - kubectl commands work same on all platforms - Only platform-specific: copy files and run setup/start scripts This now accurately reflects the actual deployment model and gives Windows users (majority) proper PowerShell guidance. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added prominent Getting Started Tutorial section: - Link to comprehensive tutorial as primary learning resource - Lists all topics covered (setup, MinIO, Spark, Iceberg, Airflow) - Emphasizes tested, ready-to-use code - Positioned before Quick Start for visibility Reorganized Documentation section: - Added "Getting Started" category with tutorial first - Added "Understanding LDP" category - Added "Operations & Deployment" category - Added "Directory READMEs" section linking to all folder docs - Made tutorial the "START HERE" resource Added "Recent Updates" section: - December 2024 major documentation update - Dependency updates (s3fs/fsspec fix, Python 3.13, Airflow 3.1.5, etc.) - Documentation improvements (cross-platform, Windows PowerShell, etc.) - Cleanup notes (removed Hive, clarified examples/ as optional) Platform ordering: - Windows listed first (majority of users) - Then macOS and Linux This makes the tutorial the main entry point for new users and provides visibility into recent platform improvements. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Added complete getting started tutorial using tested example code from the examples/ directory, including:
Added README files to all key directories explaining:
All documentation references tested, working code to ensure users have a solid foundation for building their own pipelines.
🤖 Generated with Claude Code