Skip to content

PrimitiveContext/voynich

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Voynich Manuscript — Deciphered

Computational structural analysis of the Voynich manuscript (Beinecke MS 408) revealing systematic grammar, case morphology, and consistent clause structure across all five manuscript sections.

Doctor's Personal Alphabet

The profile points to a Dravidian-speaking Siddha medical practitioner who designed a personal script to encode their pharmacopeia. The positional rules are too systematic for a naturally evolved writing system — this script was engineered, not inherited. Early folios show rougher, less consistent glyph forms; later sections are fluid and assured. The scribe was learning their own writing system as they wrote.

No second copy exists. No parallel text. No Rosetta Stone. This was a private notebook — never meant for anyone else to read.

Reasonable Confusion

The Voynich manuscript's statistics simultaneously mimic several different systems, and each metric points in a different direction:

  • High Index of Coincidence (0.075) made it look like a simple substitution cipher of Latin or Italian — but the bigram coverage was far too low (7%) for any European alphabet.
  • Low bigram coverage suggested a large syllabary (~140 symbols) — but the IC was 5x too high for a syllabary, which would spread frequency across all glyphs.
  • Narrow word-length distribution (CV = 0.386) matched Latin syllable statistics perfectly — but the positional dominance (0.839) was higher than any natural language alphabet.
  • Extreme positional constraints (some characters 99%+ word-initial or word-final) looked like a constructed system — but the paradigm fill rate (15%) and suffix Zipf exponent (0.893) were exactly in natural language range.

The resolution: it's an abugida — a small set of consonant bases (~14) combine with vowel modifiers to produce ~50 surface glyphs. This gives you a small effective alphabet (high IC) with many surface forms (low bigram fill) and strong positional rules (onset/nucleus/coda occupy distinct character slots).

The language itself is agglutinative with SOV word order, which produces the regular word lengths and clause-final verb patterns. The two noun classifiers (h- organic, k- material) act as scribal semantic markers, not grammatical gender — 64% of nouns carry neither.

Composite Distance Telugu positional abugida encoding is the closest statistical match across all metrics (composite distance 0.066).

Radar Comparison Telugu nearly matches the Voynich on all four key metrics. English and Latin syllabary encodings fail on IC.

Typological Matrix Dravidian languages match 94% of the Voynich's typological features — more than any other language family tested.

Syllabary Sweep No single language + simple syllabary reproduces all four metrics. The IC gap (bottom-right) is what rules out a pure syllabary and points to an abugida with a small base alphabet.


What We Found (Verifiable)

The EVA transcription encodes a consistent agglutinative grammar:

  • 6 case suffixes with distinct verb selectional preferences (-an: 38% before verb 1H; -am: 33% before verb 1cH; -ae: 34% before verb 1K)
  • SOV word order — 76.5% of clause-final words end in suffix -9
  • Two noun classifiers (h- and k- prefixes) — NOT grammatical gender: 64% of nouns are unmarked, 54 roots appear in both classes, verbs don't agree
  • Participle chaining for sequential procedures (-c89 = "having done")
  • Definite article 4o- (proclitic, 97% character binding)
  • Clause-final demonstratives (sam, san, sae) with case agreement

This grammar is internally consistent at 80% parseability across all 5 sections (biological, herbal A, herbal B, astronomical, recipe/stars) and 29,000 words.

Case Distribution Case suffix distribution across manuscript sections — each case has distinct frequency profiles matching its grammatical function.

Clause-Final Verbs 76.5% of clause-final words carry the finite verb suffix -9, confirming SOV order.

Section Profiles Noun root frequency varies systematically by section, consistent with a medical handbook covering different domains.

Classifier Distribution h/k noun classifiers are scribal semantic markers, not grammatical gender — 64% of nouns are unmarked.

To verify: run translate_voynich.py on the standard EVA transcription. The grammar rules are encoded in the script. The parseability percentage is reproducible.

What We Propose (Preliminary)

Building on the grammar, further analysis suggests:

  • Script type: positional syllabary/abugida with ~50 distinct glyphs (EVA collapses to ~30, destroying phonetic information)
  • Language family: closest statistical match is Dravidian (Telugu positional abugida encoding, composite distance 0.066 across 6 metrics)
  • Content: Siddha medical handbook — pharmacopeia, anatomy, medical astrology, and compounding procedures
  • 23 preliminary glyph-to-syllable mappings from 9 plant name readings
  • Most common content word may read as "amma" (body/being) — a proto-Dravidian root

Syllabary Table Plant name cross-references yield consistent syllable mappings across 9 plants.

Feature Slots Five mutual-exclusion character groups — characters within a group never appear adjacent, competing for the same structural slot. The signature of a featural script.

These proposals need independent verification, particularly by a Dravidian linguist working from original manuscript glyphs rather than EVA transcription.

Translation Coverage

The structural translation resolves 81% of words to English glosses. The remaining 19% are left as bracketed placeholders [...] and fall into three categories:

  1. Uppercase EVA variants — visually distinct glyphs for which we lack phonetic values
  2. Special characters — unusual glyphs outside the standard EVA alphabet
  3. Rare roots — insufficient distributional data to constrain meaning

The 81% that is translated is backed by distributional evidence. The 19% that remains bracketed is honestly unknown.

Files

Core:

  • VOYNICH_TRANSLATION.txt — complete structural translation (4634 lines)
  • voynich_lexicon.txt — lexicon with syntactic frames (9217 entries)
  • voynich_syllabary.txt — 23 glyph-to-syllable mappings with evidence
  • voynich_semantic_map.txt — root meanings with confidence levels
  • voynich_clause_structure.txt — case system, verb forms, clause templates
  • translate_voynich.py — translation engine (reproducible)

Evidence:

  • METHODS.md — complete methodology, confidence levels, known limitations
  • voynich_glyph_inventory.txt — visual variant catalog from hi-res images
  • voynich_plant_identifications.txt — 20 plants with Dravidian names
  • voynich_unified_findings.txt — 80 consolidated findings
  • voynich_*_report.txt — individual analysis reports

Related Work

Zero-knowledge semantic topology extraction via dual grammar induction. Runs two complementary compression algorithms (Sequitur for structure, MR-RePair for frequency) on raw bytes, overlays the rulesets into a 2x2 matrix, and discovers relational structure from the residuals. The core principle — let structure emerge from the data without assumptions, then classify what the algorithms agree on vs. disagree on — is the same approach applied here to the Voynich manuscript's morphological patterns.

Biomechanical analysis of writing systems — modeling glyphs as physical stroke paths with two-axis motor cost, curvature, pen lifts, and transition angles. One useful idea from that work: characters in a writing system exist in a constrained energy landscape where positional patterns and transition costs encode structural information about the script. That perspective informed parts of the Voynich script analysis, particularly the mutual-exclusion character groups and positional dominance patterns.

Status

Exploratory. Not peer-reviewed. The grammar is internally consistent and reproducible. The phonetic decryption and language identification are preliminary and need independent verification.

About

Structural decryption of the Voynich manuscript — Dravidian syllabary, Siddha medical handbook

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages