Chapter 3Orthography and transcription
How Sakhalin Ainu has been written — in Latin, Cyrillic, and katakana — and the romanisation used in this grammar.
Sakhalin Ainu has no traditional script, but it has a longer history of being written than is often assumed, and in three different scripts. Russian observers and at least one native writer used Cyrillic between the 1860s and 1906; Bronisław Piłsudski and every later linguist used some form of the Latin alphabet; and from the Japanese period onward Ainu words and whole texts were set down in katakana, by researchers and by Ainu writers themselves. This chapter surveys these practices with attested specimens of each, then defines the romanisation used throughout this grammar and the Latin–Cyrillic–katakana correspondences used by this site. The phoneme inventory that the romanisation encodes is justified in the phonology chapter; the sources themselves are surveyed in the chapter on dialects and sources.
3.1 Cyrillic
Cyrillic is the script of two of the earliest bodies of connected Sakhalin Ainu text: the specimens appended to Mikhail M. Dobrotvorsky's (Михаил Михайлович Добротворский) dictionary of 1875, and the letters that Sentoku Tarōji (千徳太郎治) wrote to Piłsudski in 1906.
3.1.1 Dobrotvorsky and the earliest records
Dobrotvorsky, a Russian military physician, lived on Sakhalin from 1867 to 1872 and compiled an Ainu–Russian dictionary of over ten thousand headwords, more than half of them collected by himself, mainly from speakers of the central West coast such as Sambaku of Nayoro Sakaguchi (2021: 43–44). At the end of the dictionary he printed two short connected pieces — Ciwokanke's speech and a reconciliation dialogue — which Piłsudski already singled out as the first usable Sakhalin texts (Piłsudski 1912: XXIV).
Dobrotvorsky wrote Ainu with the pre-reform Russian alphabet, marking stress but not vowel length (Sakaguchi 2021: 45–47). Several of his conventions are phonetically revealing. He alternates б/п and к/г for /p/ and /k/, recording the intervocalic voicing that is allophonic in Sakhalin Ainu (see phonology). Word-initial /r/ he consistently heard with an occlusive onset and wrote тр, as in трéкоро for reekoro ‘very’ — an observation discussed at length by Piłsudski (1912: 8) and matched by Chiri's report that initial r wavers with a t-like stop (reekoro ~ teekoro) (Chiri 1942: 第1章). A silent ъ closes every consonant-final word, as in contemporary Russian spelling, and /c/ is normally ч. Example (1) shows the opening of his first text with our retranscription.
Танъ котàнъ охтà утáса—анъ кусý áреги анъ.
‘I have come to visit this village.’
Dobrotvorsky 1875: 90; Sakaguchi 2021: 49; West Sakhalin, West coast, central (recorded 1867–1872)
Dobrotvorsky's áреги shows both the voiced allophone of /k/ and the o~u, e~i wavering typical of unstressed vowels; the form is ariki 'come.PL', the plural stem regularly selected by the indefinite person.
3.1.2 Sentoku Tarōji's letters of 1906
Sentoku Tarōji (1872–1929), an East-coast Ainu of Otasan schooled under the Russian administration, wrote three letters in Ainu to Piłsudski, dated 4 June, 15 June and 11 August 1906. They are kept in the Sakhalin Regional Museum and were deciphered and published with a Japanese translation by Tangiku Itsuji (丹菊逸治) and Ogihara Shinko (荻原眞子) (Tangiku & Ogihara 2001). They are the earliest known Ainu-language texts written by an Ainu hand, antedating Chiri Yukie's (知里幸恵) アイヌ神謡集 of 1923 by seventeen years (Nakagawa 2006: 21).
Sentoku's orthography is an unstandardised but internally consistent adaptation of Russian cursive. In the letters we observe the following habits: the coda fricative is х (яхка yahka 'even if'); /c/ is written both ц and ч (цеп cep 'fish', нучя ~ нуця nuca 'Russian'); б appears for /p/ after nasals (камбі ~ кампі kampi 'letter'); both и and the pre-reform і serve for /i/; and — most striking — the devoiced /i/ of syllables between voiceless consonants is simply not written: цсе for cise ‘house’, цкін for cikin 'if', цнукара for ci=nukara 'we read it'. This matches the strong devoicing of unaccented /i/ that Chiri describes for the dialect (Chiri 1942: 第1章). Sentoku also mixes in Arabic numerals and undigested Russian words (помощникъ старосты 'assistant village head'), exactly as a bilingual letter-writer would.
сонко аноманде цкін нукараконде ва,
‘I send you this letter; when it arrives, please read it.’
Tangiku & Ogihara 2001: Letter 1; East Sakhalin, Otasan (Sentoku Tarōji)
цкін with the devoiced i of cikin unwritten; nukara konte, literally 'give seeing', is a benefactive request.
Пороноп окта 4 цсе нуця ан
‘At Poronop there are four Russian households.’
Tangiku & Ogihara 2001: Letter 1; East Sakhalin, Otasan (Sentoku Tarōji)
Sentoku writes okta where West-coast sources have ohta; both are attested forms of the locative postposition.
3.2 Latin
3.2.1 Piłsudski's practical orthography
The twenty-seven East-coast texts of Piłsudski's Materials are printed in a practical Latin orthography which he explains in the prefatory "Remarks on Ainu phonetics, and on the transcription of the texts" (Piłsudski 1912: 1–10). Its values are Polish rather than English: j is the glide /y/, v alternates with w, and a palatalised sibilant "between s and sh" is written with the Polish acute, wavering with plain s in his own hearing — iśam ~ isam 'there is not', śumari 'fox' (Piłsudski 1912: 5–6). The letter x renders the velar fricative that he correctly identified as found "only in Saghalien", corresponding to Hokkaidō p, t, k: axto 'rain' beside Yeso apto, cux beside cup 'sun', cex beside cep 'fish' (Piłsudski 1912: 7, 9). Voiced letters b, d, g appear wherever he heard intervocalic voicing; stress is marked on all non-oxytone words; whispered word-final vowels are set as small superscript letters; and words are divided etymologically rather than by sandhi (Piłsudski 1912: 9–10). Dal Corso has recently mapped this notation systematically onto a phonemic analysis (Dal Corso 2024). Example (4) shows one of his sentences in the original orthography and in ours.
Ćiśe oxta eśirep án.
‘I arrived at the house.’
Piłsudski 1912: 160; Dal Corso 2024: 71; East Sakhalin
Piłsudski's ć and ś record the automatic palatalisation of c and of s before i; the prefix e- on sirepa continues the deictic he- 'up to' (Dal Corso 2024: 71–72).
3.2.2 Linguistic romanisations
Chiri Mashiho (知里真志保) gave the Sakhalin dialect its first systematic linguistic transcription. In his 1942 grammar he replaced the Hepburn-style digraphs of earlier work with č, š, j and ü, and wrote the Sakhalin coda fricative as x (max 'woman', tex 'hand') (Chiri 1942: 凡例); vowel length he left unwritten, regarding it as non-distinctive (Chiri 1942: 第1章), a decision since reversed by all later workers (see Itabashi (2001) on the phonemic status of length). Nakagawa traces how this notation fed into the strict one-letter-one-phoneme romanisation established in the 1950s and codified in the dialect dictionary edited by Hattori Shirō (服部四郎) (Nakagawa 2006: 3–5): there every vowel-initial syllable is preceded by an apostrophe for the glottal stop, /c/ is written c, the coda fricative h, and long vowels are doubled (Hattori 1964: 34).
Murasaki Kyōko (村崎恭子) adopted the Hattori system, in simplified form, for her grammar and for all her text editions: glottal stop ' before vowel-initial syllables, doubled long vowels, particles written as separate words Murasaki (1976), Murasaki (2025: 凡例). Her transcription of the tales of Asai Take (浅井タケ) is the single largest body of Sakhalin text in this notation (Murasaki 2001). Recent philological work on the older corpora (Sakaguchi, Dal Corso) keeps the same letter values but omits the apostrophe and separates person clitics with an equals sign (Sakaguchi 2021: 45), which is also the practice of this grammar.
3.3 Katakana
Kana transcription of Ainu reaches back to the Edo-period interpreters, and native kana writing — Yamamoto Tasuke (山本多助), Nabesawa Motozō (鍋沢元蔵), Sunazawa Kura (砂沢クラ) and others — has been studied in detail by Kirikae (1997) and Nakagawa (2006: 24–31); that material is almost entirely Hokkaidō. For Sakhalin, kana enters with the Japanese administration. Kindaichi Kyōsuke's (金田一京助) edition of the autobiography of the Sakhalin Ainu Yamabe Yasunosuke (山辺安之助), あいぬ物語 (1913), prints the Ainu text as katakana ruby over the Japanese translation — though the spelling there is Kindaichi's, not Yamabe's own (Nakagawa 2006: 25). Sentoku, who had written Ainu in Cyrillic in 1906, used katakana for the Ainu names and words in his Japanese-language book of 1929 (Sentoku 1929).
The Sakhalin-specific kana conventions in use today were fixed by Chiri's 1942 grammar: syllable codas are written with small kana (ㇷ゚ ッ ㇰ ㇱ ㇺ for -p, -t, -k, -s, -m), and the coda fricative /h/ with a small ha-row kana agreeing in vowel colour with the preceding vowel — マㇵ mah 'woman', クㇷ kuh 'belt', テㇸ teh 'hand', オㇹ oh 'spear' (Chiri 1942: 凡例). The vowel-matched small kana for coda h remain the standard way of writing Sakhalin words in kana and are used on this site (§3.5).
The main Sakhalin text actually written down in kana by an Ainu writer is the tale that Yamamoto Tasuke (1904–1993), a Kushiro Ainu and later a prolific kana author, recorded in February 1937 from Kimura Usarusima (木村ウサルシマ) of Ay on the East coast, and published in his Ainu-language journal アイヌ・モシリ (Sakaguchi 2019). Yamamoto writes codas with small kana, but uses シュ in words like クシュ for kusu ‘in order to’, and sometimes -p and -k (チカㇷ゚, パテㇰ) where Sakhalin -h is expected — possibly carrying over his native Kushiro forms (Sakaguchi 2019: 114).
オㇷ゚ケ マヌ。
‘He farted, it is said.’
Sakaguchi 2019: 115; East Sakhalin, Ay (Kimura Usarusima)
Yamamoto's ㇷ゚ where the Sakhalin form is ohke; Sakaguchi retranscribes with -h.
イコロ ヌカラ クシュ オマン。
‘She went to look at the treasures.’
Sakaguchi 2019: 116; East Sakhalin, Ay (Kimura Usarusima)
3.4 The romanisation of this grammar
This grammar writes Sakhalin Ainu with eleven consonant letters — p t k c s h m n r w y — and five vowel letters a e i o u, with long vowels doubled (itah ‘speech’, kestoono 'the other day'). The glottal stop that begins every vowel-initial syllable is predictable and is left unwritten. The letter h serves both as syllable onset and for the coda fricative (itah, ahkas 'walk'); other codas are written as the sources have them (isam, kotan; on the -p, -t, -k retained in the far south see Sakaguchi (2021: 45) and morphophonology). Person clitics are separated with an equals sign (an=nukara), affixes in morphemic lines with a hyphen. Table 1 maps these letters onto the source transcriptions discussed above.
| This grammar | Hattori 1964 / Murasaki | Chiri 1942 | Piłsudski 1912 | Dobrotvorsky 1875 / Sentoku 1906 |
|---|---|---|---|---|
| p | p | p | p, b | п, б |
| t | t | t | t, d | т, д |
| k | k | k | k, g | к, г |
| c | c | č | c, ć | ч, ц |
| s | s | s, š | s, ś | с, сь, ш |
| h (onset) | h | h | h | х, г |
| h (coda) | h | x | x | х |
| m, n | m, n | m, n | m, n | м, н |
| r | r | r | r (tr, dr after n) | р; initial тр |
| w | w | ü | w, v | в |
| y | y | j | j | й; я, ю, е for ya, yu, ye |
| a e i o u | a e i o u | a e i o u | a e i o u | а э/е и/і о у |
| long vowels aa, ii… | doubled | not written | stress accent only | stress accent only |
| glottal stop: unwritten | ' | unwritten | unwritten | unwritten |
Retranscription from each source therefore follows mechanically. From Murasaki-edited texts we drop the apostrophes and keep everything else, including the doubled vowels; from Piłsudski we replace ś, ć, j, v, x with s, c, y, w, h, remove the accents, and restore vowel length only where independent evidence supports it; from the Cyrillic sources we apply the table above, restoring the devoiced vowels that Sentoku leaves unwritten. Examples (1) and (4) above show the procedure for Cyrillic and for Piłsudski's Latin; (7) completes the set with a sentence from Murasaki's edition of Asai Take.
Sannupista re monimahpo 'okayahci.
‘In Sannupis there lived three girls.’
Murasaki 2001: text 4; West Sakhalin, Odasu (Asai Take)
Murasaki's apostrophe before 'okayahci marks the glottal onset; we drop it. On the plural suffix -(a)hci see the chapter on number.
Where sources disagree on a form, the example keeps the form of the text actually cited, and the variation is discussed in prose (so Sentoku's okta stands beside Dobrotvorsky's ohta in §3.1). We never normalise one writer's dialect to another's. Restoring a phoneme the writer left unwritten — Sentoku's devoiced i in cikin — is recovery within his own dialect, a different operation from rewriting one dialect's form as another's; the first is phonemic restoration, the second is the normalisation we avoid. Plural verb morphology such as -(a)hci in (7) is treated in the number chapter, and the person clitics that the equals sign separates in person and alignment.
3.5 Script correspondences on this site
Outside the grammar, this site displays every Ainu form in three scripts — Latin, Cyrillic and katakana — converted automatically from the romanisation just described. The conversion between Latin and Cyrillic is lossless in both directions; the kana conversion is lossy, since broadly used Ainu katakana does not distinguish, for example, ay from a.i (both アイ). A form that cannot be syllabified is left unconverted. Table 2 gives the phoneme correspondences; Table 3 illustrates them with attested words.
| Phoneme | Latin | Cyrillic | Kana, onset | Kana, coda |
|---|---|---|---|---|
| /p/ | p | п | パ ピ プ ペ ポ | ㇷ゚ |
| /t/ | t | т | タ テ ト; tu = ト゚ | ッ |
| /k/ | k | к | カ キ ク ケ コ | ㇰ |
| /c/ | c | ц | チャ チ チュ チェ チョ | — |
| /s/ | s | с | サ シ ス セ ソ | ㇱ |
| /h/ | h | х | ハ ヒ フ ヘ ホ | ㇵ ㇶ ㇷ ㇸ ㇹ after a, i, u, e, o |
| /m/ | m | м | マ ミ ム メ モ | ㇺ |
| /n/ | n | н | ナ ニ ヌ ネ ノ | ン |
| /r/ | r | р | ラ リ ル レ ロ | ㇻ ㇼ ㇽ ㇾ ㇿ after a, i, u, e, o |
| /w/ | w | в | ワ ウィ ウェ ウォ | ウ |
| /y/ | y | я, ю, ё, е for ya, yu, yo, ye; coda й | ヤ ユ ヨ イェ | イ |
| /a i u e o/ | a i u e o | а и у э о | ア イ ウ エ オ | — |
| long vowels | aa, ii… | аа, ии… | アア (vowel kana repeated) | — |
| Latin | Cyrillic | Katakana | Gloss |
|---|---|---|---|
| itah | итах | イタㇵ | speech |
| aynu | айну | アイヌ | person, Ainu |
| cise | цисэ | チセ | house |
| ceh | цэх | チェㇸ | fish |
| tuytah | туйтах | ト゚イタㇵ | folktale |
| nispa | ниспа | ニㇱパ | gentleman, sir |
| yahka | яхка | ヤㇵカ | even if |
| ohta | охта | オㇹタ | in, at |
Two cells deserve comment. The kana coda series for /h/ and /r/ are vowel-matched, the convention introduced by Chiri (1942: 凡例) (§3.3); and the one-letter syllable ト゚ for tu continues a long line of single-character devices for this syllable — ヅ in Chiri's kana (Chiri 1942: 凡例), ト゚ and ツ゚ in much twentieth-century practice, including Yamamoto Tasuke's (Nakagawa 2006: 26). The Cyrillic of the site is a phonemic transliteration of the romanisation, not a revival of Dobrotvorsky's or Sentoku's spellings: it writes э for /e/ in all positions except after /y/, keeps doubled long vowels, and uses no stress marks or hard signs.