Chapter 2Dialects, sources, and previous research
The West and East coast dialect groups, the major text collections and recordings, and earlier grammatical descriptions.
Sakhalin Ainu is known from a record built up in four waves: the Russian-era documentation of the 1860s–1900s (Dobrotvorsky, Piłsudski), the writings of native authors and Japanese scholars in the Karafuto period (Sentoku, Kindaichi, Chiri), the post-war work with evacuee speakers in Hokkaidō (Hattori, Murasaki, Tamura), and the philological and descriptive research of recent decades. The waves do not sample the language evenly: the nineteenth- and early twentieth-century sources come mostly from the east coast, the post-war recordings almost entirely from the west coast. This chapter first sets out the dialect landscape and then surveys the sources and the research built on them.
2.1 Dialect division
The classical picture of Ainu dialect relations rests on the 200-item lexical survey carried out from 1955 by Hattori Shirō 服部四郎 and Chiri Mashiho 知里真志保 among evacuee speakers in Hokkaidō, which covered six Sakhalin localities: Ociho 落帆, Tarantomari 多蘭泊, Maoka 真岡, Shiraura 白浦, Rayciska 来知志, and Nairo 内路 (Sakaguchi 2024: 8), (Ono 2020: 231–232). The cluster analysis of Asai (1974: 100) grouped these six against all Hokkaidō dialects and North Kuril as one of three top-level branches, and divided them internally into a northern group (Nairo, in the Tarayka region), a central group (Ociho, Maoka, Shiraura, Rayciska), and a southern group (Tarantomari). Ono (2020) reanalysed Asai’s cognacy matrix with modern clustering methods and found the deepest division in Ainu to run between the Sakhalin dialects and everything else; the coherence of Sakhalin as a unit is robust under either analysis.
Within Sakhalin, the documentation sorts itself by coast. West-coast varieties are known from post-war speakers: Rayciska and Esutoru 恵須取 (Fujiyama Haru 藤山ハル), Odasu 小田洲 (Asai Take 浅井タケ), Usoro 鵜城 (the speaker recorded by Wada Bunjirō 和田文治郎), Maoka and Tarantomari (speakers recorded by Murasaki and by the 1950s surveys) (Tangiku 2022), (Dal Corso 2024: 57–58). East-coast varieties are known chiefly from the Russian era and the first Karafuto years: Ay, Tunayci 富内, Hunup, and Tarayka through Piłsudski’s texts, Otasan 小田寒 through Sentoku Tarōji’s letters, and Tunayci and Ociho through Kindaichi’s 1907 phonetic survey (Dal Corso 2024: 58), (Dal Corso 2025a), (Sakaguchi 2024: 8).
Tarayka and neighbouring Nairo in the north-east stand apart from the rest of Sakhalin. They retain the syllable-final stops p t k that other Sakhalin dialects neutralise to h (Tarayka tek ‘hand’, sik ‘eye’ against west-coast teh, sis), and they turn word-initial r into a stop in a number of words (Nairo tup ‘ice’ where Esutoru has ruh) (Tangiku 2022); Vovin (1993) treats this Nairo t- as continuing a distinct Proto-Ainu initial, his *d-. In Asai’s matrix Nairo shares only 60–74 of 110 test items with the other Sakhalin localities, which themselves share 84–93 with one another (Asai 1974: 92), (Ono 2020: 234). The divergence matches the ethnographic position of the Tarayka people, who lived in the contact zone with the Uilta and Nivkh and are singled out as culturally distinct in the nineteenth-century Japanese records (Shiraishi & Tangiku 2022).
2.1.1 What differs between the coasts
The two coastal groups differ at every level of structure; here we point to the differences taken up in later chapters. In phonology, beside the coda facts just mentioned (Chapter 4), the Piłsudski corpus shows open-mid values of e and o for the east coast where the west-coast recordings show close-mid vowels, frequent intervocalic voicing of plosives that is marginal in the west-coast materials, and no trace of the glottal stop heard in west-coast speech (Dal Corso 2024: 59–61). In the lexicon there are coast-bound pairs such as west-coast wampakka ‘mittens’ (a Uilta loan) against east-coast matumere (Shiraishi & Tangiku 2022), and finer-grained isoglosses within each coast, for example rayki ‘kill’ in Shiraura and Esutoru against rayke in Maoka, Tarantomari, and Ociho (Tangiku 2022).
The sharpest morphological difference found so far concerns person marking (Chapter 13). The east-coast dialects of Piłsudski’s corpus have a dedicated exclusive first-person plural set — ci- for transitive subjects, -as for intransitive subjects, and an object prefix in- found nowhere else in Ainu — while the west-coast materials lack the productive set, retaining at most marginal traces (Chapter 13 (Person marking and alignment)) (Dal Corso 2024: 66–68):
Kamúi utara inránu.
‘the gods love me’
Piłsudski 1912: 86; Dal Corso 2024: 67; East Sakhalin
in- is the exclusive first-person plural object prefix, here referring to the narrator, as is usual in folklore; the stem is ranup with elision of the final stop.
In the west-coast corpus the same job falls to the indefinite set: speakers refer to themselves with an= and its congeners, as in Asai Take’s
‘tell it, and I will listen’
Murasaki 2001: text 5; West Sakhalin, Odasu (Asai Take)
Verbal morphology also diverges in the functions of the applicative and reflexive prefixes e-, ko-, and yay-, which show valency-preserving uses in the east-coast texts beyond what the west-coast materials attest (Chapter 14) (Dal Corso 2024: 68–76). Since the east-coast record predates the west-coast record by sixty to seventy years, each such difference must be weighed as possibly diachronic rather than purely geographic (Dal Corso 2024: 53–54).
2.2 The documentary record
Dobrotvorsky 1875. The Ainu–Russian dictionary of Mikhail M. Dobrotvorsky Михаил Михайлович Добротворский (1836–1874), compiled during his five years as a military physician on Sakhalin (1867–1872) and published posthumously in Kazan, is the earliest substantial record of the language (Dobrotvorsky 1875), (Sakaguchi 2021: 43). It numbers 10,930 entries, of which 5,733 — among them about 100 personal names and 511 place names — come from his own fieldwork, the rest from earlier glossaries; his own material is recognisable by its accent marks (Sakaguchi 2021: 43). An appendix contains two connected texts: a speech addressed by Ciwokanke to the priest Simeon at Kusunay, and a reconciliation dialogue. The named consultants of the example sentences — Sambaku, Kasituru, Sukenno — lived on the central west coast, and the language of the examples, with first-person singular cokay, differs from the north-west-coast speech recorded after the war (Sakaguchi 2021: 43–44). The transcription is good for its date but runs together i/e and o/u and does not mark vowel length; the texts were re-edited, romanised, and translated by Sakaguchi (2021), whose edition this grammar follows when citing them.
Piłsudski 1912. Bronisław Piłsudski (1866–1918), exiled to Sakhalin in 1887, worked among the east-coast Ainu in 1896 and again from 1902 to 1905, latterly for the Russian Academy of Sciences (Dal Corso 2024: 56–57). His Materials for the Study of the Ainu Language and Folklore publishes 27 ucaskuma ‘tradition, historical tale’ out of a collection he himself put at 350 texts (Piłsudski 1912: XXI). The texts were dictated by named speakers of four east-coast villages — Tarayka, Hunup, Ay, and Tunayci — each introduced in the notes to his or her first text: Sisratoka, aged 28, of Tarayka, Ipoxni, aged 32, of Hunup, and several tellers at Tunayci, among them the 88-year-old Yasinoske (Piłsudski 1912), (Dal Corso 2024: 58). Most were monolingual first-language speakers, so the corpus shows the language with almost no Japanese or Russian admixture — a state never again captured (Dal Corso 2024: 54, 58). Each text carries an English translation and copious grammatical, lexical, and ethnographic notes, and the preface includes Piłsudski’s description of his transcription, a close phonetic one which he was able to check with the phonetician Rousselot in London in 1910 (Piłsudski 1912: XV), (Dal Corso 2024: 59). Its limitations are those of its method: dictation is slow, interrupts narration, and invites unnatural repetition, as Piłsudski himself observed (Piłsudski 1912: IX), (Dal Corso 2024: 59). He also recorded part of his collection on phonograph wax cylinders, many now degraded beyond playback, though some audio has been recovered by laser reading (Dal Corso 2024: 59). Counting later publications, some forty of his Sakhalin texts are now in print with the Ainu original (Sakaguchi 2024: 8).
Sentoku Tarōji. The one substantial body of Sakhalin Ainu written by a native speaker comes from Sentoku Tarōji 千徳太郎治 (Ainu name Taronci), a Tsuishikari Ainu who returned to Otasan on the east coast and worked with Piłsudski (Sakaguchi 2024: 8). His letters to Piłsudski, written in 1906 in Cyrillic-script Ainu, were deciphered and published with transliteration and Japanese translation by Tangiku & Ogihara (2001); they are first-hand epistolary prose, a register otherwise absent from the corpus (Chapter 3 discusses their spelling). His Japanese-language book 樺太アイヌ叢話 Karafuto ainu sōwa (Sentoku 1929) contains two further oral-literature texts in Ainu (Sakaguchi 2024: 8).
The Karafuto period. Kindaichi Kyōsuke 金田一京助 visited the new territory in 1907 and published detailed reports on the phonetics of Tunayci and Ociho, followed by the autobiography of the Tsuishikari-raised speaker Yamabe Yasunosuke 山辺安之助, あいぬ物語 Ainu monogatari (1913), with a grammatical outline as its appendix — the first systematic sketch of the language (Sakaguchi 2024: 8). The central work of the period is Chiri (1942), on which see §2.3. Texts taken down by the physician Wada Bunjirō from a speaker of the Usoro dialect have been published in instalments by Kitahara (2013–2019) (Dal Corso 2024: 57–58), and folktales by Numahata Ume 沼端ウメ of Usoro have likewise appeared from manuscript (Sakaguchi 2024: 9).
Post-war fieldwork. After the evacuation, all fieldwork took place in Hokkaidō. The Hattori–Chiri survey of 1955 (published 1960) supplied the comparative lexicon analysed by Asai (1974) and led to Hattori’s dialect dictionary (Hattori 1964), whose Sakhalin columns rest largely on Rayciska forms from Fujiyama Haru; Hattori also published studies of her dialect’s age-graded vocabulary (1957), accent (1961), and vowel length (1967) (Sakaguchi 2024: 8–9), (Tangiku 2022). Murasaki Kyōko 村崎恭子 recorded Fujiyama Haru extensively: her text collection (Murasaki 1976) presents folktales and conversations of Fujiyama Haru and of Ōta Yuk, a speaker of the Maoka dialect, and her grammar (Murasaki 1979) is the only full descriptive grammar of any Sakhalin dialect; the 1976 corpus was re-edited with glosses and an English translation by Dal Corso (Dal Corso 2021a) (on which see also Dal Corso (2024: 57–58)). Murasaki’s earliest analysis of Rayciska syntax appeared under her maiden name as (Furukawa 1967).
The last speakers. Asai Take of Odasu was recorded by Murasaki from the mid-1980s into the early 1990s; the corpus files of the present project carry session dates of 1984–1991. A first volume of the folktales was issued privately in 1996 (Murasaki 1996); the complete set then appeared with audio as 改訂浅井タケ昔話全集 in two volumes in the ELPR series (Murasaki 2001), (Tangiku 2022). A further ELPR volume edited by Tamura Suzuko 田村すず子 presents the oral literature of Yamada Hayo 山田ハヨ, recorded in 1958, together with other Sakhalin sound materials (Kitahara et al. 2003). Murasaki has since published shorter collections of Fujiyama Haru material (2013, 2016) (Tangiku 2022) and a conversation course based on her field data (Murasaki 2025).
2.2.1 Generational variation
The Rayciska record is deep enough to show variation in apparent time as well as in space. Hattori’s dictionary sets certain forms apart as onne itah ‘elders’ speech’, distinct from what his consultants themselves said: the score numeral hohne ‘twenty’ carries this label, beside the everyday decimal tu kunkutu (Hattori 1964: 262), (Murasaki 2009: 74), and the same consultants knew the exclusive pronoun cioka only as the speech of elders or of the southern west coast (Hattori 1961: 5), (Sakaguchi 2024: 27). Murasaki draws the same line in her course materials, listing elders’ kaa and enkaa beside everyday kaske and enkaske among the relational nouns (Murasaki 2025: 17–18). These are not mere memories of a lost stage: the doublets coexist in running speech — Asai Take’s fifth folktale has both kasketa and elders’ kaata ‘on top of’ within a single passage (Murasaki 2001: text 5) — though the elders’ member could be moribund, as when Naert’s Rayciska informant produced the score construction wanpe ikasma sine hoh ‘thirty’ only as an archaism (Sakaguchi 2022: 71). Hattori’s 1957 article on the age-graded vocabulary of Fujiyama Haru’s dialect, noted above, was the first study to take the phenomenon as its object (Sakaguchi 2024: 8–9).
| everyday | elders’ (onne itah) | sources | discussion | |
|---|---|---|---|---|
| ‘twenty’ | tu kunkutu | hohne | (Hattori 1964: 262); (Murasaki 2009: 74); (Murasaki 2025: 25) | Chapter 10 |
| ‘on top of (in contact)’ | kaske | kaa | (Murasaki 2025: 17–18); (Murasaki 2001: text 5) | Chapter 11 |
| ‘above (detached)’ | enkaske | enkaa | (Murasaki 2025: 17–18) | Chapter 11 |
| ‘we (exclusive)’ | indefinite an=, anoka | cioka (or southern WS) | (Hattori 1961: 5); (Sakaguchi 2024: 27) | Chapter 13 |
This generational layering must be kept apart from the register and genre variation that runs through the same corpus. The alternation between the KU-series and the indefinite AN-series in first-person reference tracks text type, not speaker age: the Sentoku letters use an= twenty-three times and ku= never, Yamabe’s autobiography is narrated almost wholly in the AN-series, while Asai Take’s tuytah ‘folktale’ use ku- freely in dialogue (Sakaguchi 2024: 46) (Chapter 13). Both kinds of variation bear on the warning given in §2.1.1: with an east-coast record two generations older than the west-coast one, differences between the coasts may be differences between periods, and the elders’ doublets show such change caught in progress within a single community.
2.3 Previous research
The first extensive grammar centred on Sakhalin Ainu is アイヌ語法研究 Ainu gohō kenkyū by Chiri Mashiho (Chiri 1942), published in the reports of the Karafuto Government Museum in Toyohara. Chiri planned three parts — phonology, morphology, and phraseology — but completed only the phonology and part of the morphology, never reaching syntax, as his preface records (Chiri 1942: まへがき). The work treats Ainu as a whole with Sakhalin at the centre, systematically marking Hokkaidō and Karafuto forms, and its phonology establishes points still cited today: the five-vowel system, the Sakhalin-only coda x (our h), the exclusion of stops and r from Sakhalin codas, and the lengthening of accented vowels (Chiri 1942: §§1–6). Its examples are drawn partly from Piłsudski’s texts, partly from Chiri’s own Karafuto fieldwork (Sakaguchi 2024: 8).
Murasaki’s grammar (Murasaki 1979) describes the phonology and morphology of Rayciska from her own recordings and remains the reference description; Piłsudski’s grammatical notes, Kindaichi’s 1913 outline, and Hattori’s articles are its main predecessors (Sakaguchi 2024: 8–9). Specialised studies of the post-war corpus include Tangiku (1998) on west-coast verbs in -rV, Itabashi (2001) on the correspondence between Sakhalin vowel length and Hokkaidō pitch accent, and Murasaki (2009) on numerals.
Recent work falls into three strands. One is typological reassessment: Bugaeva (2016) argues that the Sakhalin possessive-based nominalisations are innovations, and the Handbook of the Ainu Language (Bugaeva 2022) contains chapters directly on Sakhalin — Tangiku (2022) on the differences between Karafuto and Hokkaidō dialects and Shiraishi & Tangiku (2022) on contact among the languages of Sakhalin. The second is the description of grammatical subsystems across the whole record: Dal Corso on evidentiality (Dal Corso 2018), the rise of analytic negation (Dal Corso 2021), (Dal Corso 2025b), and morphological alignment (Dal Corso 2025a); Sakaguchi Ryō 阪口諒 on utara (Sakaguchi 2020a), the nominal plural suffix -ahcin (Sakaguchi 2020b), numerals and calculation (Sakaguchi 2022), and person and number marking across all sources in his dissertation (Sakaguchi 2024). The third is philology: Sakaguchi’s editions of early texts, including the Dobrotvorsky texts (Sakaguchi 2021), Tangiku Itsuji’s 丹菊逸治 edition of the Sentoku letters (Tangiku & Ogihara 2001), and Dal Corso’s re-edition of the Murasaki corpus and analysis of Piłsudski’s corpus (Dal Corso 2024). No descriptive grammar of any east-coast dialect yet exists (Sakaguchi 2024: 9); for the east coast, Piłsudski’s corpus and the Sentoku letters must do the work that recordings do for the west.
2.4 The corpus used in this grammar
The examples in this grammar are drawn from the published record surveyed above: the Asai Take folktales (Murasaki 2001), the Sentoku letters (Tangiku & Ogihara 2001), the Dobrotvorsky texts in Sakaguchi’s edition (Dobrotvorsky 1875), (Sakaguchi 2021), the Yamada Hayo recordings (Kitahara et al. 2003), Piłsudski’s Materials (Piłsudski 1912), the examples in Chiri’s grammar (Chiri 1942), and Murasaki’s course sentences (Murasaki 2025).
Every example cites the original publication, with text, letter, page, or section number, and carries a dialect tag — WS for the west coast, ES for the east coast, SA where no finer localisation is possible — together with the locality and speaker where these are known. All Ainu material is retranscribed into the romanisation defined in Chapter 3; where a source’s own spelling is itself the point, it is reproduced as an additional line, as with the Cyrillic of the Sentoku letters in Chapter 1.