tts.ampixa / g2p / about

About this tool

What it is

A native-speaker validation form for the project's Nepali grapheme-to-phoneme (G2P) gold lexicon. The lexicon underpins every TTS evaluation we publish.

The frontend produces candidate pronunciations from rules and from a source lexicon (Google's language-resources/ne, CC BY 4.0). Native speakers and linguists validate those candidates one word at a time. Validated entries become the project's gold lexicon.

Why we use IPA-style labels and not raw IPA

Typing IPA in a browser is hard. The project's labels (ax, tx, dz, etc.) are ASCII, easy to type, and unambiguously map to a single IPA equivalent. The cheatsheet has the full table.

Why we collapse some orthographic distinctions in default mode

The default mode is spoken_nepali. Per the primary references (Khatiwada 2009, Regmi 2025):

If you think a specific word needs the careful Sanskritized pronunciation, please note that in the comments field; don't try to encode it in default-mode phones.

What the form does to your decisions

  1. Saves them in your browser's local storage as you go (so you can stop and resume).
  2. When you click Export decisions as TSV, you get a single file with all your decisions plus your name.
  3. You email or upload that file to the project. We run a script that validates phones against the v1.0 inventory and appends accepted entries to gold_lexicon.tsv.

Any phone string that doesn't validate (e.g., a typo'd label) is flagged for follow-up before promotion. Nothing reaches the gold without passing the inventory check.

Privacy and licensing

Source-code references

The sources behind the v1.0 phone inventory and rules:

Contact

Questions about a specific entry, the workflow, or wanting to contribute: hello@ampixa.com.