Skip to content

Conversation

@klagrida
Copy link
Contributor

Added complete getting started tutorial using tested example code from the examples/ directory, including:

  • MinIO operations
  • Spark data processing
  • Iceberg table management
  • Airflow workflow orchestration
  • Production-ready DAG examples

Added README files to all key directories explaining:

  • airflow/ - DAG development and deployment
  • config/ - Service configurations (Iceberg, etc.)
  • docker/ - Custom Docker images and dependencies
  • examples/ - Tested code examples and patterns
  • scripts/ - Utility scripts for platform management
  • spark/ - Spark job development and submission
  • terraform/ - Infrastructure as Code deployment
  • tests/ - Testing strategies and frameworks

All documentation references tested, working code to ensure users have a solid foundation for building their own pipelines.

🤖 Generated with Claude Code

klagrida and others added 6 commits December 20, 2025 13:20
Added complete getting started tutorial using tested example code
from the examples/ directory, including:
- MinIO operations
- Spark data processing
- Iceberg table management
- Airflow workflow orchestration
- Production-ready DAG examples

Added README files to all key directories explaining:
- airflow/ - DAG development and deployment
- config/ - Service configurations (Iceberg, etc.)
- docker/ - Custom Docker images and dependencies
- examples/ - Tested code examples and patterns
- scripts/ - Utility scripts for platform management
- spark/ - Spark job development and submission
- terraform/ - Infrastructure as Code deployment
- tests/ - Testing strategies and frameworks

All documentation references tested, working code to ensure
users have a solid foundation for building their own pipelines.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Updated tutorial to follow best practice of copying example code
to appropriate locations before running:
- Spark jobs: Copy to spark/jobs/ before submitting
- Airflow DAGs: Copy to airflow/dags/ before deploying
- Scripts: Copy to scripts/ before running

This approach:
- Keeps examples/ directory clean as reference material
- Prevents accidental modifications to tested code
- Teaches users the proper workflow
- Makes examples/ a template library

Added clear warnings and notes throughout the tutorial.
Made this rule #1 in Best Practices section.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Clarified that the examples/ directory is:
- Optional and can be deleted if users want to start fresh
- A reference library and template collection
- Not required for LDP platform to function

Updated all instructions to emphasize:
- Always copy examples to appropriate folders before running
- Never run code directly from examples/
- Three approaches: copy & modify, reference only, or delete entirely

Added clear table showing where to copy each example type:
- Spark jobs → spark/jobs/
- Airflow DAGs → airflow/dags/
- Scripts → scripts/

Made best practice #1: Never run from examples/ directly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added comprehensive Windows support to the tutorial:

Platform Support:
- ✅ Windows 10/11 with Docker Desktop
- ✅ Linux (Ubuntu, Fedora, etc.)
- ✅ macOS (Intel and Apple Silicon)

Changes:
- All commands now shown for both platforms with 💻 and 🐧 icons
- Windows PowerShell, Command Prompt, and WSL guidance
- File path differences (\ vs /) clearly explained
- Platform-specific command table for reference
- Windows alternatives for make commands using docker-compose

Key additions:
- Platform prerequisites section
- Windows vs Linux command comparison table
- File path handling notes for Windows users
- Recommended Windows setup (PowerShell, WSL, Docker Desktop)
- Cross-platform troubleshooting guidance

Examples now show:
- cp vs Copy-Item/copy commands
- make commands vs direct docker-compose commands
- Platform-specific log viewing

This makes LDP accessible to the majority Windows user base
while maintaining full support for Linux/macOS developers.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Major corrections to reflect actual LDP architecture:

Architecture:
- Minikube (local Kubernetes) + Terraform deployment
- NOT docker-compose

Scripts:
- Windows: PowerShell scripts in scripts/windows/
  - setup.ps1, start.ps1, stop.ps1, check-health.ps1
- Linux/macOS: Bash scripts in scripts/
  - setup.sh, start.sh, stop.sh, check-health.sh

Commands:
- Removed all docker-compose references
- Removed make commands (use actual scripts)
- Use kubectl for interacting with services
- Show actual script paths throughout

Prerequisites:
- Minikube, kubectl, Terraform, Helm
- Installation instructions for Windows (choco/winget), Linux, macOS

Service Access:
- Services accessed via Minikube IP + NodePorts
- Ports: Airflow=30080, MinIO=30901, Spark=30707, Jupyter=30888

Unified commands where possible:
- kubectl commands work same on all platforms
- Only platform-specific: copy files and run setup/start scripts

This now accurately reflects the actual deployment model
and gives Windows users (majority) proper PowerShell guidance.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added prominent Getting Started Tutorial section:
- Link to comprehensive tutorial as primary learning resource
- Lists all topics covered (setup, MinIO, Spark, Iceberg, Airflow)
- Emphasizes tested, ready-to-use code
- Positioned before Quick Start for visibility

Reorganized Documentation section:
- Added "Getting Started" category with tutorial first
- Added "Understanding LDP" category
- Added "Operations & Deployment" category
- Added "Directory READMEs" section linking to all folder docs
- Made tutorial the "START HERE" resource

Added "Recent Updates" section:
- December 2024 major documentation update
- Dependency updates (s3fs/fsspec fix, Python 3.13, Airflow 3.1.5, etc.)
- Documentation improvements (cross-platform, Windows PowerShell, etc.)
- Cleanup notes (removed Hive, clarified examples/ as optional)

Platform ordering:
- Windows listed first (majority of users)
- Then macOS and Linux

This makes the tutorial the main entry point for new users
and provides visibility into recent platform improvements.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@klagrida klagrida merged commit f311bf2 into main Dec 20, 2025
15 checks passed
@klagrida klagrida deleted the docs/ldp-tutorial-and-readmes branch December 20, 2025 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants