Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
name: CI

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]

jobs:
test:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: ["3.10", "3.11", "3.12"]

steps:
- uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e ".[dev]"

- name: Lint with ruff
run: |
ruff check src tests

- name: Check formatting with black
run: |
black --check src tests

- name: Type check with mypy
run: |
mypy src --ignore-missing-imports

- name: Run tests with pytest
run: |
pytest tests/ -v --cov=tablediff_arrow --cov-report=xml

- name: Upload coverage to Codecov
if: matrix.os == 'ubuntu-latest' && matrix.python-version == '3.11'
uses: codecov/codecov-action@v4
with:
file: ./coverage.xml
fail_ci_if_error: false

build:
runs-on: ubuntu-latest
needs: test

steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"

- name: Install build dependencies
run: |
python -m pip install --upgrade pip
pip install build

- name: Build package
run: python -m build

- name: Check package
run: |
pip install twine
twine check dist/*
31 changes: 31 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- id: check-json
- id: check-toml
- id: check-merge-conflict
- id: debug-statements

- repo: https://github.com/psf/black
rev: 23.12.1
hooks:
- id: black
language_version: python3.10

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.1.11
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]

- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.8.0
hooks:
- id: mypy
additional_dependencies: [types-all]
args: [--ignore-missing-imports]
36 changes: 36 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.1.0] - 2025-10-13

### Added
- Initial release of tablediff-arrow
- Core table comparison functionality with keyed comparisons
- Support for Parquet, CSV, and Arrow IPC file formats
- Support for local and S3 file paths
- Numeric tolerances (absolute and relative) for comparisons
- HTML report generation with styled output
- CSV report generation with separate files for changes, left-only, and right-only rows
- Command-line interface (CLI) with comprehensive options
- Python library API for programmatic use
- Comprehensive test suite with 86% coverage
- Pre-commit hooks for code quality
- GitHub Actions CI workflow for automated testing
- Support for Python 3.10+
- MIT License
- Documentation and examples

### Features
- **Fast Performance**: Built on Apache Arrow for efficient data processing
- **Multiple Formats**: Parquet, CSV, and Arrow IPC support
- **S3 Support**: Optional S3 filesystem integration
- **Flexible Comparisons**: Single or multiple key columns
- **Numeric Tolerances**: Configure absolute and relative tolerances per column
- **Rich Reports**: Generate HTML and CSV reports with detailed differences
- **CLI and Library**: Use as a command-line tool or Python library

[0.1.0]: https://github.com/psmman/tablediff-arrow/releases/tag/v0.1.0
203 changes: 203 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
# Contributing to tablediff-arrow

Thank you for your interest in contributing to tablediff-arrow! This document provides guidelines and instructions for contributing to the project.

## Getting Started

### Prerequisites

- Python 3.10 or higher
- Git
- pip

### Setting Up Development Environment

1. Fork and clone the repository:
```bash
git clone https://github.com/YOUR_USERNAME/tablediff-arrow.git
cd tablediff-arrow
```

2. Install the package in development mode with all dependencies:
```bash
pip install -e ".[dev]"
```

3. Install pre-commit hooks:
```bash
pre-commit install
```

## Development Workflow

### Running Tests

Run all tests:
```bash
pytest
```

Run tests with coverage:
```bash
pytest --cov=tablediff_arrow --cov-report=html
```

Run specific test file:
```bash
pytest tests/test_compare.py
```

Run specific test:
```bash
pytest tests/test_compare.py::test_identical_tables
```

### Code Quality

#### Formatting

Format code with Black:
```bash
black src tests
```

#### Linting

Lint code with Ruff:
```bash
ruff check src tests
```

Fix auto-fixable issues:
```bash
ruff check --fix src tests
```

#### Type Checking

Run type checking with mypy:
```bash
mypy src --ignore-missing-imports
```

### Pre-commit Hooks

Pre-commit hooks run automatically when you commit. To run manually:
```bash
pre-commit run --all-files
```

## Making Changes

### Branch Naming

Use descriptive branch names:
- `feature/add-new-format-support`
- `fix/handle-nan-values`
- `docs/update-examples`

### Commit Messages

Follow conventional commit format:
- `feat: add support for JSON format`
- `fix: handle NaN values in comparisons`
- `docs: update README with new examples`
- `test: add tests for S3 functionality`
- `refactor: simplify comparison logic`

### Pull Request Process

1. Create a new branch for your changes
2. Make your changes and add tests
3. Ensure all tests pass and code is properly formatted
4. Update documentation if needed
5. Push your branch and create a pull request
6. Wait for review and address any feedback

## Code Style Guidelines

### Python Style

- Follow PEP 8
- Use type hints where appropriate
- Maximum line length: 100 characters
- Use meaningful variable and function names

### Documentation

- Add docstrings to all public functions and classes
- Use Google-style docstrings
- Update README for user-facing changes
- Add examples for new features

### Testing

- Write tests for all new features
- Maintain or improve test coverage
- Use pytest fixtures for test data
- Test edge cases and error conditions

## Project Structure

```
tablediff-arrow/
├── src/
│ └── tablediff_arrow/
│ ├── __init__.py # Package initialization
│ ├── cli.py # Command-line interface
│ ├── compare.py # Core comparison logic
│ ├── loader.py # Data loading utilities
│ └── reports.py # Report generation
├── tests/
│ ├── test_cli.py # CLI tests
│ ├── test_compare.py # Comparison tests
│ ├── test_loader.py # Loader tests
│ └── test_reports.py # Report tests
├── .github/
│ └── workflows/
│ └── ci.yml # CI/CD configuration
├── pyproject.toml # Project configuration
└── README.md # Project documentation
```

## Adding New Features

### Adding a New File Format

1. Update `loader.py` to handle the new format
2. Add tests in `tests/test_loader.py`
3. Update CLI to support format selection
4. Update documentation

### Adding a New Report Format

1. Create a new function in `reports.py`
2. Add tests in `tests/test_reports.py`
3. Update CLI to support new format
4. Update documentation

## Debugging

### Running with Debug Output

```python
import logging
logging.basicConfig(level=logging.DEBUG)
```

### Interactive Testing

```bash
python -i examples.py
```

## Getting Help

- Create an issue for bugs or feature requests
- Check existing issues before creating new ones
- Provide detailed information in issues
- Be respectful and constructive

## License

By contributing to tablediff-arrow, you agree that your contributions will be licensed under the MIT License.
Loading
Loading