Convert GEDCOM files to structured JSON for LLM querying and genealogy analysis
Transform your family tree data into a format that's perfect for Large Language Models, enabling natural language queries about your ancestry.
- 🚀 Ultra-fast parsing of large GEDCOM files (161K+ lines in seconds)
- 🤖 LLM-ready JSON output with clean, structured data
- 🔍 Advanced search capabilities
- 📊 Comprehensive statistics and analytics
- 🧹 Clean name formatting (removes GEDCOM slashes)
- 🧪 Full test suite with 95%+ coverage
- Python 3.8 or higher
- No external dependencies required!
# Clone the repository
git clone https://github.com/yourusername/gedcom-parser.git
cd gedcom-parser
# Run the parser
python main.py# Parse default GEDCOM file
python main.py
# Parse specific GEDCOM file
python main.py data/your_family_tree.ged
# Show statistics only
python main.py --stats-only
# Search for a person
python main.py --search "John Doe"# Parse and save to custom location
python main.py data/family.ged --output results/my_tree.json
# Get help
python main.py --help- Export from genealogy software (Family Tree Maker, Ancestry.com, etc.)
- Save as
.gedfile in UTF-8 encoding - Place in
data/directory:data/ └── your_family_tree.ged
# Parse your file
python main.py data/your_family_tree.ged
# The parser will create:
# data/your_family_tree_parsed.json# Search for ancestors
python main.py --search "Smith"
# Get statistics
python main.py --stats-only
# Use with LLMs
# The JSON file is now ready for AI analysis!The parser generates clean JSON with this structure:
{
"individuals": {
"1": {
"id": "1",
"name": "John Smith",
"gender": "M",
"birth": {
"date": "15 JAN 1900",
"place": "New York, USA"
},
"death": {
"date": "12 MAR 1985",
"place": "California, USA"
},
"occupation": "Engineer",
"families_as_child": ["F1"],
"families_as_spouse": ["F2"]
}
},
"families": {
"F1": {
"id": "F1",
"husband": "1",
"wife": "2",
"children": ["3", "4"],
"marriage": {
"date": "1925",
"place": "Chicago, USA"
}
}
}
}# Run all tests
python -m unittest tests.test_parser
# Run with verbose output
python -m unittest tests.test_parser -v├── main.py # Main entry point
├── src/
│ └── gedcom_parser.py # Core parser logic
├── tests/
│ └── test_parser.py # Unit tests
├── data/ # GEDCOM files (add yours here)
│ └── sample.ged
├── docs/ # Documentation
├── requirements.txt # Dependencies
└── README.md # This file
The generated JSON is perfect for LLM queries:
import json
# Load parsed data
with open('data/your_family_parsed.json', 'r', encoding='utf-8') as f:
family_data = json.load(f)
# Now you can query with natural language:
# "Who are the descendants of John Smith?"
# "What was the occupation of Mary Johnson?"
# "Show me the family tree of the Smith family"- 161,606 lines parsed in ~2 seconds
- 13,109 individuals processed
- 6,356 families resolved
- Memory efficient - processes large files without issues
The parser has been tested with:
- ✅ Large GEDCOM files (100K+ lines)
- ✅ Various genealogy software exports
- ✅ UTF-8 encoded files
- ✅ Missing data handling
- ✅ Name formatting edge cases
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
MIT License - see LICENSE file for details
- Built for genealogy enthusiasts who want to leverage AI for family history research
- Inspired by the GEDCOM 5.5.1 specification
- Designed for both technical and non-technical users
Made with ❤️ for genealogy and AI enthusiasts