Architecture Overview
High-level architecture of the Ingglish project.
Project Structure
ingglish/
├── packages/
│ ├── normalize/ # Text cleanup, case handling, tokenization, word patterns
│ ├── phonemes/ # Phoneme data + conversion
│ ├── dictionary/ # CMU dict, lookup, frequency
│ ├── g2p/ # Rule-based grapheme-to-phoneme
│ ├── ipa/ # IPA ↔ ARPAbet conversion
│ ├── shavian/ # Shavian alphabet conversion
│ ├── deseret/ # Deseret alphabet conversion
│ ├── fallback/ # Unknown word strategies
│ ├── core/ # Translation API (translate + detect)
│ ├── dom/ # DOM translation utilities
│ ├── website/ # React web application
│ ├── extension/ # Chrome extension
│ └── cors-proxy/ # Cloudflare Worker proxy
├── docs/ # Documentation
└── .github/ # CI/CD workflows
Package Dependencies
@ingglish/normalize (0 deps)
@ingglish/phonemes (0 deps) ◄┐
@ingglish/dictionary (0 deps) ◄┤
@ingglish/g2p ──► phonemes ◄┼── @ingglish/fallback
│
@ingglish/ipa ──► phonemes │
@ingglish/shavian ──► phonemes + dictionary
@ingglish/deseret ──► phonemes + dictionary
│
ingglish ◄── all above packages
▲
@ingglish/dom ──► @ingglish/normalize (peer: core)
▲
@ingglish/website ──► @ingglish/dom + ingglish
@ingglish/extension ──► ingglish
Library Packages
@ingglish/normalize - Text cleanup, case handling, tokenization
src/
├── index.ts # Barrel exports
├── case.ts # Case pattern detection/application, splitCamelCase
├── text.ts # normalizeApostrophes, stripDiacritics, URL/email preservation
└── tokenize.ts # WORD_SPLIT_REGEX, WORD_TEST_REGEX, tokenizeText, tokenizeIPA, etc.
@ingglish/phonemes - Phoneme data + conversion
src/
├── index.ts # Barrel exports
├── arpabet.ts # ARPAbet phoneme definitions
├── phonotactics.ts # English sound rules for stress
├── types.ts # OutputFormat type
├── to-ingglish.ts # ARPAbet → Ingglish
├── from-ingglish.ts # Ingglish → ARPAbet
├── ingglish-maps.ts # Phoneme mapping tables
├── custom-format.ts # Custom format registration
└── format-registry.ts # Format registry for extensible output
@ingglish/dictionary - CMU dict, lookup, frequency
src/
├── index.ts # Barrel exports
├── loader.ts # Load and cache CMU dictionary
├── lookup.ts # Word pronunciation lookup
├── reverse.ts # Build reverse index (phoneme → words)
├── frequency.ts # Word frequency ranking
├── custom-words.ts # Custom pronunciations (tech terms)
└── data/ # Generated dictionary and frequency data
@ingglish/g2p - Rule-based grapheme-to-phoneme
src/
├── index.ts # Public API
├── g2p-rules.ts # Core G2P conversion rules
├── stress.ts # Stress assignment
└── stress.test.ts # Stress prediction tests
@ingglish/ipa - IPA ↔ ARPAbet conversion
src/
├── index.ts # Barrel exports
├── to-ipa.ts # ARPAbet → IPA with stress
└── from-ipa.ts # IPA → ARPAbet
@ingglish/shavian - Shavian alphabet conversion
src/
├── index.ts # Barrel exports
├── to-shavian.ts # ARPAbet → Shavian
├── from-shavian.ts # Shavian → ARPAbet
├── shavian-maps.ts # Shavian mapping tables
└── tokenize.ts # Shavian tokenization
@ingglish/deseret - Deseret alphabet conversion
src/
├── index.ts # Barrel exports
├── to-deseret.ts # ARPAbet → Deseret
├── from-deseret.ts # Deseret → ARPAbet
├── deseret-maps.ts # Deseret mapping tables
└── tokenize.ts # Deseret tokenization
@ingglish/fallback - Unknown word strategies
src/
├── index.ts # Fallback orchestration
├── acronyms.ts # Acronym/initialism handling
├── compounds.ts # Compound word splitting
├── stemming.ts # Base word + suffix matching
└── british.ts # British spelling variants
ingglish - Translation API
The core package is a thin orchestration layer. It imports from the packages above and exports the public translation API.
src/
├── index.ts # Public API: translate, reverseTranslate, Sync variants
├── translate/ # Translation logic
│ ├── forward.ts # English → Ingglish/IPA
│ ├── reverse.ts # Ingglish/IPA → English
│ ├── contractions.ts # Handle "don't", "I'm", etc.
│ └── preserved.ts # URL/email preservation during translation
└── detect/ # Language detection
└── language.ts # Detect Ingglish vs English text
Translation Flow
English Text
│
▼
┌─────────────────┐
│ translateText │ (format: 'ingglish' | 'ipa')
└────────┬────────┘
│ tokenize
▼
┌─────────────────┐ ┌──────────────────────┐
│ translateWord │────>│ lookupPronunciation │
└────────┬────────┘ └────────┬─────────────┘
│ │
│ found? │ CMU Dictionary
│ │
┌────┴────┐ │
│ │ ▼
▼ ▼ ┌──────────────┐
phonemes unknown │ phonemes │
│ │ └──────┬───────┘
│ │ │
│ ┌────┴────┐ │
│ │ stemming│ │
│ │ rules │ │
│ └────┬────┘ │
│ │ │
└────┬────┘ │
│ │
▼ │
┌────────────────────┐ │
│ Output Format? │<──────────┘
└────────┬───────────┘
│
┌────┴────┐
│ │
▼ ▼
Ingglish IPA
│ │
▼ ▼
┌───────────┐ ┌───────────┐
│ phonemes │ │phonemesTo │
│ToIngglish │ │ IPA │
└─────┬─────┘ └─────┬─────┘
│ │
▼ ▼
"haloh" "/həˈloʊ/"
Reverse Translation Flow
Supports both Ingglish and IPA input:
Ingglish Text IPA Text
│ │
▼ ▼
┌─────────────────┐ ┌────────────────────┐
│reverseTranslate │ │ reverseTranslate │
│ Text │ │ IPAText │
└────────┬────────┘ └─────────┬──────────┘
│ │
▼ ▼
┌────────────────────┐ ┌────────────────────┐
│ingglishToPhonemes │ │ ipaToArpabet │
└─────────┬──────────┘ └─────────┬──────────┘
│ │
└───────────┬───────────┘
│
▼
┌─────────────────────┐
│ lookupByPhonemes │ Reverse dictionary lookup
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ sortByFrequency │ Rank homophones
└────────┬────────────┘
│
▼
English Text (most common match)
Key Data Structures
CMU Dictionary
// Word → Phoneme array (pre-split at build time)
{
"hello": ["HH", "AH0", "L", "OW1"],
"world": ["W", "ER1", "L", "D"],
...
}
Reverse Dictionary (built at runtime)
// Phoneme key → English words (sorted by frequency)
Map<string, string[]>
{
"T UW": ["to", "too", "two"],
"DH EH R": ["there", "their", "they're"],
...
}
Phoneme Map
// ARPAbet → Ingglish spelling
{
"HH": "h",
"AH": "uh",
"L": "l",
"OW": "oh",
...
}
DOM Library (@ingglish/dom)
Browser-only utilities for translating DOM content.
Module Structure
src/
├── index.ts # Public API exports
├── types.ts # DOMTranslatorOptions interface
├── translate/ # DOM translation logic
│ ├── index.ts # translateDOM orchestration
│ ├── translator.ts # Core DOM translation algorithm
│ ├── apply-map.ts # Apply pre-computed translations
│ ├── restore.ts # Restore original text
│ └── tooltip-fragment.ts # Hover tooltip HTML generation
├── observe/ # Dynamic content handling
│ ├── index.ts # observeAndTranslate entry point
│ └── observer.ts # MutationObserver implementation
└── traversal/ # DOM traversal
├── index.ts # Traversal exports
├── browser.ts # Browser detection
├── extract.ts # Word extraction from text nodes
├── skip-rules.ts # Skip logic for tags/classes
├── text-nodes.ts # TreeWalker and text node utilities
└── tooltip.ts # Tooltip styling utilities
Key Features
- Chunked translation: Uses
requestAnimationFramefor smooth rendering on large pages - Tooltip support: Wraps translated words in spans with original text on hover
- MutationObserver: Auto-translates dynamically added content (SPAs)
- Attribute translation: Handles
title,alt,placeholder,aria-label - Skip logic: Respects
<code>,<pre>,.no-translate,contenteditable - Pre-computed translations:
applyTranslationsMap()for external translation sources
Website (@ingglish/website)
React single-page application with three main features:
Components
src/
├── components/
│ ├── TextTranslator.tsx # Bidirectional text translation
│ ├── UrlTranslator.tsx # Web page translation
│ ├── SpellingGuide.tsx # Phoneme mapping reference
│ ├── Extension.tsx # Chrome extension info page
│ └── Docs.tsx # Documentation viewer
├── contexts/
│ └── FormatContext.tsx # Output format state (Ingglish/IPA)
├── hooks/
│ └── useUrlTranslator.ts # URL fetching & translation logic
└── App.tsx # Tab navigation & routing
URL Translation Architecture
┌─────────────┐ ┌─────────────┐ ┌──────────────┐
│ Browser │────>│ CORS Proxy │────>│ Target Site │
│ (iframe) │ │ (Worker) │ │ │
└─────────────┘ └─────────────┘ └──────────────┘
│
▼
┌─────────────┐
│ translateDOM│ In-place DOM modification
└─────────────┘
- User enters URL
- Website fetches via CORS proxy
- HTML is written to sandboxed iframe
translateDOMfrom@ingglish/domwalks text nodes and translates- Links are intercepted for navigation within iframe
Chrome Extension (@ingglish/extension)
Components
src/
├── manifest.json # Extension configuration
├── content-script.ts # Content script (DOM walking + message passing)
├── background.ts # Service worker (holds dictionary ~5MB)
└── popup.ts # Popup UI
Architecture
The extension uses a message-passing architecture to keep the content script lightweight:
- Background service worker: Loads the full CMU dictionary (~5MB) once
- Content script: Lightweight (~11KB), walks DOM and sends words to background for translation
- Translation cache: 50K entry in-memory cache in background for fast repeated lookups
Flow
┌──────────────┐ ┌──────────────┐
│ Popup UI │────>│ Message │
│ (popup.ts) │ │ Passing │
└──────────────┘ └──────┬───────┘
│
▼
┌──────────────────────────────────────────────────┐
│ Content Script (content-script.ts) │
│ • Walks DOM, collects text nodes │
│ • Sends batches of words to background │
│ • Applies translations in chunks (RAF) │
│ • Debounced MutationObserver (100ms) │
│ • In-place span updates for format switching │
└──────────────────────┬───────────────────────────┘
│ chrome.runtime.sendMessage
▼
┌──────────────────────────────────────────────────┐
│ Background (background.ts) │
│ • Loads CMU dictionary on startup │
│ • Caches translations (50K entries, FIFO) │
│ • Returns translated words │
│ • Manages tab-specific translation state │
└──────────────────────────────────────────────────┘
Performance Optimizations
Debounced MutationObserver: Waits 100ms for mutations to settle before processing, preventing freezes on sites with rapid DOM updates (e.g., infinite scroll)
In-place format switching: When switching between Ingglish and IPA, updates existing spans directly instead of restoring and re-translating the entire page
Chunked DOM updates: Uses
requestAnimationFrameto apply translations in chunks of 50 elements, keeping the main thread responsivePre-collected text nodes: Passes pre-collected nodes to
applyTranslationsMap()to avoid double DOM traversal
CORS Proxy (@ingglish/cors-proxy)
Cloudflare Worker that proxies requests to bypass CORS restrictions.
┌────────────┐ ┌───────────────────┐ ┌─────────────┐
│ Website │────>│ Cloudflare Worker │────>│ Target URL │
│ │ │ │ │ │
│ │<────│ + CORS headers │<────│ │
└────────────┘ └───────────────────┘ └─────────────┘
Security features:
- Origin allowlist validation
- SSRF prevention (blocks private IP ranges: 127., 10., 172.16-31., 192.168., ::1)
- Protocol restriction (HTTP/HTTPS only)
- Content-Type checking (HTML only)
- Cache control headers (minimum 5 minutes)
Data Flow Summary
┌───────────────────────────────────────────────────────────────────┐
│ Build Time │
├───────────────────────────────────────────────────────────────────┤
│ CMU Dictionary (126K words) ──> bundled with @ingglish/dictionary│
│ SUBTLEX Frequencies (74K) ──> bundled with @ingglish/dictionary │
└───────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Runtime │
├─────────────────────────────────────────────────────────────┤
│ loadDictionary() ──> parse & cache dictionary │
│ translateText() ──> O(n) word lookup + phoneme conversion │
│ reverseTranslate() ──> O(1) phoneme key lookup + frequency │
└─────────────────────────────────────────────────────────────┘
All paths are linear (no quadratic or exponential complexity). Dictionary data is loaded on-demand via dynamic imports.
See Performance for complexity tables, profiling scripts, and optimization guidelines.