Skip to content

IrishRugbyman/gedcom-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧬 GEDCOM Genealogy Parser

Python 3.8+ License: MIT

Convert GEDCOM files to structured JSON for LLM querying and genealogy analysis

Transform your family tree data into a format that's perfect for Large Language Models, enabling natural language queries about your ancestry.

✨ Features

  • 🚀 Ultra-fast parsing of large GEDCOM files (161K+ lines in seconds)
  • 🤖 LLM-ready JSON output with clean, structured data
  • 🔍 Advanced search capabilities
  • 📊 Comprehensive statistics and analytics
  • 🧹 Clean name formatting (removes GEDCOM slashes)
  • 🧪 Full test suite with 95%+ coverage

📦 Installation

Requirements

  • Python 3.8 or higher
  • No external dependencies required!

Quick Start

# Clone the repository
git clone https://github.com/yourusername/gedcom-parser.git
cd gedcom-parser

# Run the parser
python main.py

🎯 Usage

Basic Usage

# Parse default GEDCOM file
python main.py

# Parse specific GEDCOM file
python main.py data/your_family_tree.ged

# Show statistics only
python main.py --stats-only

# Search for a person
python main.py --search "John Doe"

Advanced Usage

# Parse and save to custom location
python main.py data/family.ged --output results/my_tree.json

# Get help
python main.py --help

📁 How to Add Your GEDCOM File

Step 1: Prepare Your GEDCOM File

  1. Export from genealogy software (Family Tree Maker, Ancestry.com, etc.)
  2. Save as .ged file in UTF-8 encoding
  3. Place in data/ directory:
    data/
    └── your_family_tree.ged
    

Step 2: Parse Your File

# Parse your file
python main.py data/your_family_tree.ged

# The parser will create:
# data/your_family_tree_parsed.json

Step 3: Query Your Data

# Search for ancestors
python main.py --search "Smith"

# Get statistics
python main.py --stats-only

# Use with LLMs
# The JSON file is now ready for AI analysis!

📊 Output Format

The parser generates clean JSON with this structure:

{
  "individuals": {
    "1": {
      "id": "1",
      "name": "John Smith",
      "gender": "M",
      "birth": {
        "date": "15 JAN 1900",
        "place": "New York, USA"
      },
      "death": {
        "date": "12 MAR 1985",
        "place": "California, USA"
      },
      "occupation": "Engineer",
      "families_as_child": ["F1"],
      "families_as_spouse": ["F2"]
    }
  },
  "families": {
    "F1": {
      "id": "F1",
      "husband": "1",
      "wife": "2",
      "children": ["3", "4"],
      "marriage": {
        "date": "1925",
        "place": "Chicago, USA"
      }
    }
  }
}

🔧 Development

Running Tests

# Run all tests
python -m unittest tests.test_parser

# Run with verbose output
python -m unittest tests.test_parser -v

Project Structure

├── main.py                 # Main entry point
├── src/
│   └── gedcom_parser.py    # Core parser logic
├── tests/
│   └── test_parser.py      # Unit tests
├── data/                   # GEDCOM files (add yours here)
│   └── sample.ged
├── docs/                   # Documentation
├── requirements.txt        # Dependencies
└── README.md              # This file

🤖 LLM Integration

The generated JSON is perfect for LLM queries:

import json

# Load parsed data
with open('data/your_family_parsed.json', 'r', encoding='utf-8') as f:
    family_data = json.load(f)

# Now you can query with natural language:
# "Who are the descendants of John Smith?"
# "What was the occupation of Mary Johnson?"
# "Show me the family tree of the Smith family"

📈 Performance

  • 161,606 lines parsed in ~2 seconds
  • 13,109 individuals processed
  • 6,356 families resolved
  • Memory efficient - processes large files without issues

🧪 Testing

The parser has been tested with:

  • ✅ Large GEDCOM files (100K+ lines)
  • ✅ Various genealogy software exports
  • ✅ UTF-8 encoded files
  • ✅ Missing data handling
  • ✅ Name formatting edge cases

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

📄 License

MIT License - see LICENSE file for details

🙏 Acknowledgments

  • Built for genealogy enthusiasts who want to leverage AI for family history research
  • Inspired by the GEDCOM 5.5.1 specification
  • Designed for both technical and non-technical users

Made with ❤️ for genealogy and AI enthusiasts

About

gedcom to json (WIP)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages