🧬 GEDCOM Genealogy Parser

Convert GEDCOM files to structured JSON for LLM querying and genealogy analysis

Transform your family tree data into a format that's perfect for Large Language Models, enabling natural language queries about your ancestry.

✨ Features

🚀 Ultra-fast parsing of large GEDCOM files (161K+ lines in seconds)
🤖 LLM-ready JSON output with clean, structured data
🔍 Advanced search capabilities
📊 Comprehensive statistics and analytics
🧹 Clean name formatting (removes GEDCOM slashes)
🧪 Full test suite with 95%+ coverage

📦 Installation

Requirements

Python 3.8 or higher
No external dependencies required!

Quick Start

# Clone the repository
git clone https://github.com/yourusername/gedcom-parser.git
cd gedcom-parser

# Run the parser
python main.py

🎯 Usage

Basic Usage

# Parse default GEDCOM file
python main.py

# Parse specific GEDCOM file
python main.py data/your_family_tree.ged

# Show statistics only
python main.py --stats-only

# Search for a person
python main.py --search "John Doe"

Advanced Usage

# Parse and save to custom location
python main.py data/family.ged --output results/my_tree.json

# Get help
python main.py --help

📁 How to Add Your GEDCOM File

Step 1: Prepare Your GEDCOM File

Export from genealogy software (Family Tree Maker, Ancestry.com, etc.)
Save as .ged file in UTF-8 encoding
Place in data/ directory:
```
data/
└── your_family_tree.ged
```

Step 2: Parse Your File

# Parse your file
python main.py data/your_family_tree.ged

# The parser will create:
# data/your_family_tree_parsed.json

Step 3: Query Your Data

# Search for ancestors
python main.py --search "Smith"

# Get statistics
python main.py --stats-only

# Use with LLMs
# The JSON file is now ready for AI analysis!

📊 Output Format

The parser generates clean JSON with this structure:

{
  "individuals": {
    "1": {
      "id": "1",
      "name": "John Smith",
      "gender": "M",
      "birth": {
        "date": "15 JAN 1900",
        "place": "New York, USA"
      },
      "death": {
        "date": "12 MAR 1985",
        "place": "California, USA"
      },
      "occupation": "Engineer",
      "families_as_child": ["F1"],
      "families_as_spouse": ["F2"]
    }
  },
  "families": {
    "F1": {
      "id": "F1",
      "husband": "1",
      "wife": "2",
      "children": ["3", "4"],
      "marriage": {
        "date": "1925",
        "place": "Chicago, USA"
      }
    }
  }
}

🔧 Development

Running Tests

# Run all tests
python -m unittest tests.test_parser

# Run with verbose output
python -m unittest tests.test_parser -v

Project Structure

├── main.py                 # Main entry point
├── src/
│   └── gedcom_parser.py    # Core parser logic
├── tests/
│   └── test_parser.py      # Unit tests
├── data/                   # GEDCOM files (add yours here)
│   └── sample.ged
├── docs/                   # Documentation
├── requirements.txt        # Dependencies
└── README.md              # This file

🤖 LLM Integration

The generated JSON is perfect for LLM queries:

import json

# Load parsed data
with open('data/your_family_parsed.json', 'r', encoding='utf-8') as f:
    family_data = json.load(f)

# Now you can query with natural language:
# "Who are the descendants of John Smith?"
# "What was the occupation of Mary Johnson?"
# "Show me the family tree of the Smith family"

📈 Performance

161,606 lines parsed in ~2 seconds
13,109 individuals processed
6,356 families resolved
Memory efficient - processes large files without issues

🧪 Testing

The parser has been tested with:

✅ Large GEDCOM files (100K+ lines)
✅ Various genealogy software exports
✅ UTF-8 encoded files
✅ Missing data handling
✅ Name formatting edge cases

🤝 Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

📄 License

MIT License - see LICENSE file for details

🙏 Acknowledgments

Built for genealogy enthusiasts who want to leverage AI for family history research
Inspired by the GEDCOM 5.5.1 specification
Designed for both technical and non-technical users

Made with ❤️ for genealogy and AI enthusiasts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧬 GEDCOM Genealogy Parser

✨ Features

📦 Installation

Requirements

Quick Start

🎯 Usage

Basic Usage

Advanced Usage

📁 How to Add Your GEDCOM File

Step 1: Prepare Your GEDCOM File

Step 2: Parse Your File

Step 3: Query Your Data

📊 Output Format

🔧 Development

Running Tests

Project Structure

🤖 LLM Integration

📈 Performance

🧪 Testing

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
docs		docs
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
PUBLISH_GUIDE.md		PUBLISH_GUIDE.md
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

License

IrishRugbyman/gedcom-parser

Folders and files

Latest commit

History

Repository files navigation

🧬 GEDCOM Genealogy Parser

✨ Features

📦 Installation

Requirements

Quick Start

🎯 Usage

Basic Usage

Advanced Usage

📁 How to Add Your GEDCOM File

Step 1: Prepare Your GEDCOM File

Step 2: Parse Your File

Step 3: Query Your Data

📊 Output Format

🔧 Development

Running Tests

Project Structure

🤖 LLM Integration

📈 Performance

🧪 Testing

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages