Convert GEDCOM files to lossless, AI-ready JSON. Preserve family genealogy data with perfect fidelity for current and future integrations.
Lossless preservation: Every piece of data from your GEDCOM files is retained, including unfamiliar tags stored in rawTags fields.
AI-ready JSON: Clean, machine-friendly structure designed for embeddings, RAG systems, and future analysis pipelines.
Future-proof: Scalable architecture ready for integrations with webtrees, FamilySearch API, and AI pipelines.
Non-technical friendly: Simple scripts and clear documentation so anyone can convert and analyze family data.
- Node.js 20.0.0+ (download)
- Your GEDCOM (.ged) files from Ancestry.com or other genealogy software
cd family-treenpm installPlace your .ged files in data/raw/. These files are ignored by default and should not be committed.
npm run convert -- data/raw/yourfile.gedOutput: A timestamped JSON file in data/json/yourfile-YYYY-MM-DD.json
npm run stats -- data/json/yourfile-YYYY-MM-DD.jsonPrints family counts, date ranges, top surnames, and birthplaces.
npm run validate -- data/json/yourfile-YYYY-MM-DD.jsonChecks each person record against the JSON schema.
npm run verifyWrites a dated report to data/verification-report-YYYY-MM-DD.json.
npm run fetch-sources
npm run build-contextThis pipeline first fetches period+place facts from external sources (DPLA, Europeana, Chronicling America, LOC, OWID, and optional World History Encyclopedia), then uses Ollama only to synthesize those facts into narrative-ready context under data/historical-context/.
Typical runtime is 20-60 minutes for a full tree range, which is expected.
See docs/setup-context-sources.md for API key setup and run order.
Useful scoped builds:
npm run context:status
npm run build-context:world
npm run build-context:usa
npm run build-context:ohio
npm run build-context:indiana
npm run enrich-context
npm run build-context:resetfamily-tree/
βββ data/
β βββ raw/ # Original .ged files (never modified)
β βββ json/ # Converted JSON output (timestamped)
β βββ archive/ # Older versions for reference
βββ scripts/
β βββ ged-to-json.js # Main conversion engine
β βββ validate.js # Schema validation
β βββ stats.js # Statistical analysis
βββ schema/
β βββ person.schema.json # JSON schema for validation
βββ docs/
β βββ data-dictionary.md # Explains every JSON field
β βββ integrations.md # Future API integrations
βββ README.md # This file
Converted JSON follows a standardized structure:
{
"meta": {
"source": "myfamily.ged",
"convertedAt": "2024-01-15T14:30:00Z",
"gedcomVersion": "5.5.1",
"totalIndividuals": 250,
"totalFamilies": 80
},
"individuals": [
{
"id": "@I1@",
"name": {
"full": "John Henry Smith",
"given": "John Henry",
"surname": "Smith"
},
"sex": "M",
"birth": {
"date": "12 MAR 1845",
"dateISO": "1845-03-12",
"place": "Boston, Massachusetts, USA"
},
"death": { ... },
"familiesAsSpouse": ["@F1@"],
"familiesAsChild": ["@F2@"]
}
],
"families": [ ... ],
"sources": [ ... ],
"notes": [ ... ],
"repositories": [ ... ]
}Key features:
- β Both raw GEDCOM dates AND ISO 8601 conversions
- β Date qualifiers (ABT, BEF, AFT, BET) preserved
- β
All unrecognized tags stored in
rawTagsβ nothing is lost - β Warnings logged for unparseable or unfamiliar GEDCOM elements
npm run convert -- data/raw/myfile.gedOutputs to data/json/myfile-YYYY-MM-DD.json. Re-running produces the same output (idempotent).
npm run stats -- data/json/myfile-YYYY-MM-DD.jsonShows:
- Person and family counts
- Male/female/unknown breakdown
- Earliest and latest birth/death years
- Top 10 surnames
- Top 10 birthplaces
npm run validate -- data/json/myfile-YYYY-MM-DD.jsonChecks each person record against the schema. Reports any validation errors.
npm run scaffoldCreates documents/by-person/... folders and per-person metadata.json (local-only).
npm run media-inventoryWrites data/media-inventory.json showing which referenced media files are present locally vs missing.
- data-dictionary.md β Complete reference for every JSON field
- integrations.md β Roadmap for webtrees, FamilySearch API, and AI pipelines
- Use a private repository: Store sensitive
.gedfiles in a private Git repository or locally only - Anonymize before sharing: Remove dates and details for living people before publishing
- Separate public/private data: Keep sensitive files in
data/archive/private/
Your .git/config can use core.sparseCheckout to selectively version files.
This repo is configured for public publishing: .gitignore excludes data/raw/, data/json/*.json, and per-person documents/by-person/ content by default. See data/README.md and documents/README.md.
- GEDCOM β JSON conversion with lossless preservation
- JSON schema validation
- Statistical analysis
- webtrees integration β Import JSON directly into webtrees
- FamilySearch API integration (read-only) β Matching + download tooling
- AI/RAG pipeline β Generate embeddings and query with LLMs
- GEDCOM-X support β Upgrade to modern GEDCOM-X format
- Web UI β Visual tree explorer and editor
- Mobile app β React Native companion for mobile browsing
- π± Backup & preservation: Store family data in durable JSON format
- π€ AI analysis: Generate embeddings for RAG systems
- π³ Web publishing: Easily import into webtrees for family website
- π Research: Analyze genealogy data programmatically
- π Integration: Connect with FamilySearch, Ancestry, other platforms
- π Version control: Track genealogy changes over time with Git
- Parser: parse-gedcom β Fast, standard GEDCOM parsing
- Validation: ajv β JSON Schema validation
- Date parsing: date-fns β Robust date handling
- Format: GEDCOM 5.5.1 (standard format from Ancestry.com)
Q: "File not found" error
A: Make sure your .ged file path is relative to the project root, e.g., data/raw/myfile.ged
Q: Warnings logged for many tags
A: This is normal! The converter preserves all GEDCOM tags, even non-standard ones. Check data/json/warnings-YYYY-MM-DD.log for details. They're safe in rawTags.
Q: Dates not parsing correctly
A: GEDCOM supports many date formats. If a date doesn't convert to ISO 8601, it's preserved as-is in the date field for manual inspection.
Q: How do I merge multiple GEDCOM files?
A: Not yet automated, but you can merge the individuals, families, and other arrays in the JSON files manually. Check integrations.md for upcoming merge tools.
Q: Why does context generation take so long?
A: fetch-sources + build-context process many 5-year windows across multiple scopes and enforce API/model rate limits. A full run commonly takes 20-60 minutes. Run npm run update-context quarterly for incremental refresh.
MIT β Feel free to use, modify, and share.
Ideas? Found a bug? Please open an issue or submit a pull request!
Happy genealogy hunting! π³
- Delete any personal data files from data/ and documents/ before committing (they are gitignored but exist in your working tree)
- Confirm no private files were accidentally committed:
git ls-files data/json/ data/raw/ documents/by-person/ - Copy .env.example to .env and fill in your credentials (never commit .env)
- Run
npm run scaffoldafter converting your GEDCOM to create person folders - Run
npm run verifyto check your data quality before running any API scripts