Skip to content

Lightweight, edge-first pipeline for collecting Enviro+ sensor data on Raspberry Pi and syncing it directly to cloud object storage. Data is stored as Parquet with Hive partitioning for fast analytics using DuckDB-WASM.

Notifications You must be signed in to change notification settings

walkthru-earth/opensensor-enviroplus

Repository files navigation

OpenSensor Enviroplus

PyPI version Python Versions Ask DeepWiki

Modern, CLI-based environmental sensor data collector using Polars, Apache Arrow, and Hive-partitioned Parquet for Raspberry Pi Enviro+.

Part of the OpenSensor.Space network for open environmental data.

Features

  • UUID v7 Station IDs: Time-ordered UUIDs for better database performance
  • Modern Stack: Polars streaming, Apache Arrow, Hive-partitioned Parquet
  • Memory Efficient: Optimized for Raspberry Pi with limited RAM
  • CLI-First: Simple Python commands replace bash scripts
  • Smart Logging: Rich console output for easy debugging
  • Cloud Sync: Multi-provider sync using obstore (S3, R2, GCS, Azure, MinIO, Wasabi, Backblaze, Hetzner)
  • Prefix-based IAM: S3 bucket access control per station
  • Type Safe: Pydantic settings with validation
  • Production Ready: Graceful error handling, automatic retries
  • Browser-queryable: DuckDB-wasm compatible Parquet output
  • Temperature & Humidity Compensation: CPU heat correction using Pimoroni's dewpoint formula (applied in collector and CLI test)
  • System Health Monitoring: Optional CPU, memory, disk, WiFi signal, NTP sync tracking

Quick Start

One-Line Install (Recommended)

curl -LsSf https://raw.githubusercontent.com/walkthru-earth/opensensor-enviroplus/main/install.sh | sudo bash

This installs everything: system dependencies, UV, opensensor-enviroplus, and configures permissions. After reboot:

cd ~/opensensor
opensensor setup                  # Configure station
opensensor test                   # Verify sensors
sudo opensensor service setup     # Install as service

Manual Installation

Click to expand manual installation steps

Prerequisites

# Update system packages
sudo apt-get update

# Install git and required system libraries
sudo apt-get install -y git python3-dev python3-cffi libportaudio2

# Enable I2C and SPI interfaces (required for sensors)
sudo raspi-config nonint do_i2c 0
sudo raspi-config nonint do_spi 0

# Install UV package manager (fast Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

Note: The I2C interface is required for BME280, LTR559, and gas sensors. SPI is needed for the LCD display. A reboot may be required after enabling these interfaces.

Install from PyPI

uv tool install opensensor-enviroplus

Or install from source

git clone https://github.com/walkthru-earth/opensensor-enviroplus.git
cd opensensor-enviroplus
uv sync

Fix Permissions (reboot required)

sudo $(which opensensor) fix-permissions
sudo reboot

Configure and Start

opensensor setup                  # Configure station ID
opensensor test                   # Verify sensors work
sudo opensensor service setup     # Install & start as service

CLI Commands

Command Description
opensensor setup Interactive configuration wizard
opensensor test Test sensors with live readings table
opensensor info Show config, sensors, data stats, service status
opensensor start Run collector in foreground (for debugging)
opensensor sync Manual sync to cloud storage
opensensor --version Show package version
opensensor fix-permissions Fix serial port permissions (sudo required)

Service Commands

Command Description
opensensor service setup Install, enable, and start service
opensensor service status Show service status
opensensor service logs -f Follow live logs
opensensor service stop Stop the service
opensensor service restart Restart the service
opensensor service remove Completely remove service

Configuration

Configuration via .env file (auto-generated by opensensor setup):

# Station identification (UUID v7 - auto-generated)
OPENSENSOR_STATION_ID=019ab383-d789-74e2-a460-bb92b1c13681

# Data collection
OPENSENSOR_READ_INTERVAL=5              # Seconds between sensor reads
OPENSENSOR_BATCH_DURATION=900           # 15-minute batches

# Temperature/humidity compensation (for Raspberry Pi CPU heat)
OPENSENSOR_TEMP_COMPENSATION_ENABLED=true
OPENSENSOR_TEMP_COMPENSATION_FACTOR=2.25  # Pimoroni's official factor
OPENSENSOR_PMS5003_DEVICE=/dev/serial0  # Serial port for PMS5003

# Health monitoring (CPU, memory, disk, WiFi, NTP sync)
OPENSENSOR_HEALTH_ENABLED=true

# Health Data Storage (Optional - for separate storage)
# OPENSENSOR_HEALTH_STORAGE_PROVIDER=gcs
# OPENSENSOR_HEALTH_STORAGE_BUCKET=my-health-bucket

# Output settings
OPENSENSOR_OUTPUT_DIR=output
OPENSENSOR_COMPRESSION=zstd             # Efficient compression (snappy, zstd, gzip)

# Cloud sync (optional)
OPENSENSOR_SYNC_ENABLED=true
OPENSENSOR_SYNC_INTERVAL_MINUTES=15

# Storage provider (s3, r2, gcs, azure, minio, wasabi, backblaze, hetzner)
OPENSENSOR_STORAGE_PROVIDER=s3
OPENSENSOR_STORAGE_BUCKET=my-sensor-bucket
OPENSENSOR_STORAGE_PREFIX=sensors/station-019ab383  # For IAM scoping
OPENSENSOR_STORAGE_REGION=us-west-2

# Provider credentials (see .env.example for all providers)
OPENSENSOR_AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
OPENSENSOR_AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCY

# Logging
OPENSENSOR_LOG_LEVEL=INFO
OPENSENSOR_LOG_DIR=logs

See .env.example for a complete template with all provider configurations and IAM policy examples.

Supported Cloud Storage Providers

Provider Type Notes
s3 AWS S3 Native support
r2 Cloudflare R2 S3-compatible, no egress fees
gcs Google Cloud Storage Native support
azure Azure Blob Storage Native support
minio MinIO S3-compatible, self-hosted
wasabi Wasabi S3-compatible, affordable
backblaze Backblaze B2 S3-compatible
hetzner Hetzner Object Storage S3-compatible

Architecture

Data Flow

Sensors (5s) -> Polars Collector -> Hive-Partitioned Parquet (15min) -> S3/MinIO (obstore)
                    ↓
              Health Metrics (~1min) -> Separate Parquet (output-health/) -> S3/GCS (configurable)

Output Format (Hive-Partitioned Parquet)

output/                                           # Sensor data
  station=019ab383-d789-74e2-a460-bb92b1c13681/
    year=2025/
      month=11/
        day=24/
          data_1430.parquet  # Batch written at 14:30
          data_1445.parquet  # Batch written at 14:45

output-health/                                    # System health (optional)
  station=019ab383-d789-74e2-a460-bb92b1c13681/
    year=2025/
      month=11/
        day=24/
          health_1430.parquet  # ~15 health records per batch

Benefits:

  • Browser-queryable with DuckDB-wasm
  • Partition pruning for fast time-range queries
  • Simple, universal format (no proprietary transaction logs)
  • Perfect for append-only time-series data

See ARCHITECTURE.md for detailed diagrams and scalability analysis.

Development

# Install with dev dependencies
uv sync --group dev

# Format code
uv run ruff format .

# Lint code
uv run ruff check .

# Run with UV (no venv activation needed)
uv run opensensor --help

Tech Stack

  • Python 3.10+ - Modern Python with type hints
  • UV - Fast Rust-based package manager (10-100x faster than pip)
  • Polars 1.35+ - High-performance DataFrames with streaming
  • PyArrow 22+ - Columnar memory format (zero-copy operations)
  • uuid6 - RFC 9562 UUID v7 implementation
  • obstore - Rust-powered object storage (S3/GCS/Azure)
  • Pydantic Settings - Type-safe configuration
  • Typer + Rich - Beautiful CLI with auto-completion
  • Ruff - Extremely fast Python linter and formatter

License

MIT License - see LICENSE file for details

Credits

Built by the walkthru.earth team for the OpenSensor.Space network.

Dependencies:

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Run tests and linting (uv run ruff check .)
  4. Commit your changes (git commit -m 'feat: add amazing feature')
  5. Push to the branch (git push origin feature/amazing-feature)
  6. Open a Pull Request

Support

About

Lightweight, edge-first pipeline for collecting Enviro+ sensor data on Raspberry Pi and syncing it directly to cloud object storage. Data is stored as Parquet with Hive partitioning for fast analytics using DuckDB-WASM.

Topics

Resources

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •