ARPAbet to Ingglish/IPA Mapping

Overview

How Ingglish maps ARPAbet notation from the CMU Pronouncing Dictionary to Ingglish spellings and IPA.

ARPAbet is a phonemic notation system that uses ASCII characters to represent English phonemes (contrastive speech sounds). Each English word in the CMU dictionary has an ARPAbet transcription.

For why we chose these spellings, see Design Decisions.

Pronunciation Dictionary

We use the CMU Pronouncing Dictionary (cmudict):

  • Contains ~126,000 entries (variant pronunciations are pre-resolved at build time)
  • Uses ARPAbet phoneme notation
  • Includes stress markers (0=none, 1=primary, 2=secondary)
  • Maintained by Carnegie Mellon University
  • Available as npm package: cmu-pronouncing-dictionary

Vowel Mappings

ARPAbet Ingglish IPA Example Words Notes
AA o ɑ father, hot, rock Open back vowel
AE a æ cat, bat Near-open front vowel
AH uh/a ʌ/ə but, cup / about, sofa Stressed /ʌ/ → 'uh'; unstressed /ə/ (AH0) → 'a'; see note below
AO aw ɔ thought, law Open-mid back rounded
AW ou cow, how Diphthong
AY ai my, time Diphthong
EH e ɛ bed, red Open-mid front vowel
EY ay say, day Diphthong
IH i ɪ bit, sit Near-close front vowel
IY ee i bee, see Close front vowel (also written /iː/)
OW oh go, show Diphthong
OY oi ɔɪ boy, toy Diphthong
UH u ʊ book, put Near-close back vowel
UW oo u too, blue Close back vowel (also written /uː/)

Consonant Mappings

Stops (Plosives)

ARPAbet Ingglish IPA Example Words
B b b bat, cab
D d d dog, bed
G g ɡ go, big
K k k cat, back
P p p pat, cup
T t t top, cat

Fricatives

ARPAbet Ingglish IPA Example Words Notes
DH dh ð the, father Voiced dental fricative
F f f fat, laugh
S s s sat, miss
SH sh ʃ she, push
TH th θ think, bath Voiceless dental fricative
V v v van, love
Z z z zoo, is
ZH zh ʒ measure, beige

Affricates

ARPAbet Ingglish IPA Example Words
CH ch chat, batch
JH j just, edge

Nasals

ARPAbet Ingglish IPA Example Words
M m m man, come
N n n no, pen
NG ng ŋ sing, thing

Glottal

ARPAbet Ingglish IPA Example Words
HH h h hat, ahead

Liquids & Glides

ARPAbet Ingglish IPA Example Words
L l l let, well
R r ɹ run, car
W w w wet, away
Y y j yes, you

R-Colored Vowels

When certain vowels are followed by R, they combine into special r-colored sounds. Ingglish uses dedicated spellings for these combinations:

Phoneme Sequence Ingglish IPA Example Words Notes
AE + R arr æɹ arrow, barrow, carrot Cat vowel + R
EH + R air ɛɹ air, care, there Bed vowel + R
IH + R eer ɪɹ beer, beard, fear NEAR vowel (bit vowel + R)
AA + R ar ɑɹ star, car, far Father vowel + R
AO + R or ɔɹ store, more, for Thought vowel + R
ER er ɝ bird, her, turn Standalone r-colored vowel

Why This Matters

Without special handling, the vowel mappings would produce confusing results:

  • "star" (AA + R) would become "stor" (o + r), which looks like "store"
  • "store" (AO + R) would become "stawr" (aw + r)
  • "fair" (EH + R) would become "fer" (collides with "fur" → "fer")
  • "carry" (AE + R) would become "karee" (indistinct from 'ar' words once AA+R → ar is added)
  • "beard" (IH + R) would become "bird" (looks like the animal)

The R-rule fixes this:

  • "star" → star (intuitive)
  • "store" → stor (clearly different from "star")
  • "fair" → fair (distinct from "fur" → "fer")
  • "carry" → karree (distinct from "car" → "kar")
  • "beer" → beer (identical! without the rule it would be "bir")

The rule applies only when the vowel is immediately followed by R in the phoneme sequence. Standalone AA, AO, EH, AE, and IH vowels use their regular spellings (o, aw, e, a, i).

Why Not Use R-Colored Spellings for All Vowels?

Why not use the R-colored vowel bases everywhere? If AA was always 'a', AO always 'o', EH always 'ai', and AE always 'ar', then R-coloring would happen automatically, no special rules needed.

The problem is readability. These spellings would make words look like different English words:

  • "hot" → "hat" (looks like the head covering)
  • "law" → "lo" (looks incomplete)
  • "bed" → "baid" (looks like "bade" or "bayed")

The R-colored spellings (ar, or, air, arr) were chosen because they match English conventions in the R context - "star", "store", "air", "arrow" all look natural. But using their base vowels everywhere would create confusing false cognates.

With these R-colored vowel rules in place, there are zero collisions between any vowel+R combinations in the dictionary.

If Ingglish ever gets popular enough that this exception is the biggest complaint, we'd happily revisit it. The rule helps English readers today, but a future version could drop it for full consistency.

Example Translations

English Phonemes Ingglish
hello HH AH0 L OW1 haloh
world W ER1 L D werld
beautiful B Y UW1 T AH0 F AH0 L byootafal
think TH IH1 NG K thingk
the DH AH0 dha
English IH1 NG G L IH0 SH Ingglish

Schwa and STRUT

The CMU dictionary uses a single phoneme AH for both stressed /ʌ/ (the STRUT vowel, as in "but" and "cup") and unstressed /ə/ (schwa, as in "about" and "sofa"). Ingglish splits these by stress level:

  • AH1/AH2 (stressed /ʌ/) → 'uh': but, cup, run, son
  • AH0 (unstressed /ə/) → 'a': about, sofa, the, and

This split exploits their complementary distribution (/ʌ/ appears only in stressed syllables, /ə/ only in unstressed syllables). Many phonological analyses treat them as allophones of a single phoneme (e.g., Giegerich 1992, English Phonology: An Introduction).

Using 'a' for schwa preserves the English spelling of extremely common words: "a", "and", "about", "away", "important", "hospital", "normal", "signal". See Spelling Iteration Log for the full rationale.

Stress Handling

ARPAbet includes stress markers on vowels:

  • 0 = no stress (unstressed)
  • 1 = primary stress
  • 2 = secondary stress

Ingglish Output

We strip stress markers before mapping to Ingglish spellings. The output is simpler and still phonemically accurate.

IPA Output

IPA output preserves stress information using standard IPA stress markers:

  • ˈ (U+02C8) = primary stress
  • ˌ (U+02CC) = secondary stress

Stress markers are placed at syllable boundaries following the Maximal Onset Principle. This means the marker appears before the onset consonants of the stressed syllable, not directly before the vowel.

Example: "hello" /həˈloʊ/

  • The stress marker goes before "l" (the syllable onset), not before "oʊ"

Example: "examination" /ɪɡˌzæməˈneɪʃən/

  • Secondary stress before "z" (onset of second syllable)
  • Primary stress before "n" (onset of fourth syllable)

The system uses English phonotactics (valid onset clusters like /bl/, /str/, /skw/) to place stress markers at the right syllable boundaries.

Limitations

  1. Homophones: Words that sound the same will have the same Ingglish spelling
    • "their", "there", "they're" → all become the same
    • See False Friends Analysis for a full breakdown of how this affects real words
  2. Digraph boundary ambiguity: When two letters that form a digraph appear adjacent across a morpheme boundary, the spelling can be misread. For example, "hothouse" → "hothous" where 'th' represents /t/+/h/ (not /θ/), "mishap" → "mishap" where 'sh' is /s/+/h/ (not /ʃ/), and "engage" → "engayj" where 'ng' is /n/+/g/ (not /ŋ/). This is an inherent limitation of digraph-based orthographies; the same ambiguity exists in standard English (compare "hothouse" vs "nothing"). Cases where this matters are rare.
  3. Accent neutrality: CMU dictionary represents General American English. This includes maintaining the cot-caught distinction (/ɑ/ vs /ɔ/) even though many American speakers merge these vowels. We preserve the distinction because the CMU dictionary does and because it serves speakers who maintain it.
  4. Allophonic detail not captured: As a phonemic (not phonetic) system, Ingglish does not represent allophonic variation such as aspiration of stops (/pʰ/ in "pin" vs /p/ in "spin"), flapping of /t/ ([ɾ] in "butter"), or vowel nasalization. These are predictable from context and don't change word meanings.
  5. Missing words: Proper nouns, neologisms, and slang may not be in the dictionary