The Beautiful Dissociation of the Japanese Language

Marco Giancotti,

April 20, 2024

Cover image:

Drawing 1937,0710,0.290, Katsushika Hokusai, British Museum

When I tell people around the world that I've been living in Japan for over a decade, many look both impressed and mystified at once. The place has a good reputation. Some folks are in awe at the temples and the gardens, others at the nature or the food. The extreme tidiness and civility of the local culture are the target of universal admiration. But many of those same people see the local language as an almost impenetrable barrier, a world of pain that one must go through in order to be allowed to live here. I must be so patient and smart, they think. But I (begrudgingly) have to tell them that it's not entirely true.

The thing is (I tell those people) the language is part of the wonders of the place. It was the biggest charm for me in the first place. It's complex, yes, but it's rich and quirky and different. In particular, a whole realm of consciousness exists in the sphere of Japanese speakers that's perhaps truly unique in the world, more so than the sushi and the nature and decorum. It even allows for new literary techniques that are unimaginable in any other language.

Usually this is the point when I lose my interlocutor. I might as well be speaking Japanese to them.

I've always wanted to explain that realm, to show what a strange and mind-bending world is accessed by learning Japanese. It feels almost impossible. In this rather long and winding post, I'll try anyway. I'll do my best to convey something that's damn near untranslatable. Whether this is a rabbit hole you want to tumble down with me, that's up to you.

The Things I'm Not Talking About

Unless you've studied the language for a good while, you might only be aware of one or more of the following strange-sounding facets of Japanese:

It uses kanji characters for writing (more on this later), and it uses a whole lot of them. Depending on who you ask, there are four, five or more thousand characters in use, and you can't read a newspaper if you don't know at least 2,000 of the more common ones.
There are also two syllabic scripts (syllable-based alphabets) in use, called hiragana and katakana. These two, plus kanji, are used all together, sometimes but not always interchangeably.
There are lots of "untranslatable" words in Japanese, like the salutation お疲れ様 (otsukare-sama), roughly meaning "I appreciate the hard work you've been doing", and もったいない (mottainai) for "it would be a pity not to enjoy that to its fullest value".
Exotic-sounding grammar features like the subject-object-verb sentence structure (i.e. "anteater ant eat") and the lack of important-sounding grammatical elements like articles ("the", "a"), any singular-plural distinction, and most verb tenses familiar to English/Romance language speakers.
It's very vague and context dependent.

These aspects give the language an arcane and difficult-sounding aura, but none of them is truly unique to Japanese. Chinese, for instance, uses way more characters and has an even more bare-bones grammar. All languages have untranslatable terms. And several other cultures routinely employ multiple writing systems. These aren't the things I mean by "a truly unique realm", and they're not the topic of this post.

What's usually not known are the subtle effects of the strange history of the Japanese tongue. These effects are, I believe, absolutely unique to this language, with no parallel anywhere else in the world. On top of that, native Japanese speakers are usually so accustomed to these quirks that they never give them a second thought. So no one—except, I guess, some linguists—ever talks or thinks about these fascinating aspects of Japanese. I write about them here because, well, that's mottainai.

It took me a while to put my finger on it, but now I know what the source of that uniqueness is: it's the unstoppable, wonderful dissociation between what's written and what is spoken in Japanese. To see how that could have happened, we need to take a step back.

Dissociated from Birth: a History

The Japanese language existed in purely oral form for centuries, until mainland scholars brought Chinese characters to the Japanese islands. That happened in the 5th century C.E. These characters are, unlike the western alphabets, "logograms", that is, each character is associated with a specific meaning.

A three-stroke character corresponds to the meaning of 'mountain', a different character with six strokes corresponds to 'blood'.

Chinese and Japanese are enormously different spoken languages. Except for a large number of words imported directly into Japanese (but evolved to sound quite unlike the originals), the two languages have essentially nothing in common. The pronunciation, the grammar, everything is 180° different. A consequence of this is that those Chinese characters, evolved over millennia to fit the Chinese language like a glove, were a bad match for the way the islanders spoke.

Imagine those poor scholars of the Yamato court in Western Japan in the 7th century. They must have been intrigued by this revolutionary technology called "writing", where you could freeze your words onto a stone or the blade of a sword so that others may understand it later. Why leave it to the Chinese immigrants? Why not master it for their own native language?

Except it must have been excruciatingly difficult. The characters were meant to be used as modular building blocks—a kind of modularity that Japanese just didn't have.

Not many depictions of scholars from the Kofun period survive (they must have been too busy wrangling the kanji), but warrior statues abound. So here is a warrior.

While Chinese uses fixed "plug-and-play" markers to indicate tenses and and grammatical functions, on this front Japanese is more similar to English, because it modifies the very shape of words for those things. So in Chinese you say "chī" for "eat" and "chīle" for "ate". The "le" part indicates an action that has completed, and it can be strapped onto any verb to turn it into a past tense. So those two spoken words can be neatly segregated into written logograms: 吃 ("chī") for the present tense, and 吃了 ("chī" + "le") for the past.

Compare that to English where we transform "eat" into "ate", and the equivalent Japanese modification of "taberu" into "tabeta". In both cases we're not adding or removing blocks but changing part of the word to convey the difference. In techie terms, the Chinese script doesn't support the structure of languages like English and Japanese. It doesn't have what it takes.

The only solution for those early Japanese scribes, then, was to do a lot of shoehorning. And boy, did they shoehorn.

The Japanese scholars-aristocrats began repurposing the Chinese characters, which they called kanji (for, well, "Chinese characters"). Sometimes, instead of using them for their meaning, they used them for (gasp!) their pronunciation. By ignoring the original content of a kanji, they could string them together to form almost any sound.

To a Chinese reader, such words would have looked utterly random, devoid of any coherence or structure. But to a trained Japanese, they translated into familiar words.

Over the centuries those "sound-only" kanji, called man'yougana, evolved into something else entirely. They became simpler, more streamlined, and more standardized. Where the symbols were originally composed of many short strokes, they gradually lost detail and complexity. Where the scribes could choose between a slew of different kanji for any given sound (for instance, the sound pa could be represented by any of 20 characters), later the number of options dwindled and eventually settled to two.

That's how the two syllable-based alphabets in use today, hiragana and katakana, came about (collectively kana). For example, this is how the sound for "i" (pronounced "ee") evolved from two separate kanji into respective kana pronounced exactly the same.

Two Chinese characters that were pronounce 'i' were eventually simplified into phonetic characters for hiragana and katakana.

With this new sound-based tool invented by the islanders, finally the Japanese language had a suitably flexible way to write anything one could pronounce. Today full-blown kanji are used for their meaning, while hiragana and katakana are used for sound-based writing and grammar stuff.

(By the way, the application of Chinese characters to other languages happened in several other places, like Korea and Vietnam, but only Japanese still retains the writing system today, making it a unique case among contemporary languages.)

And so, thanks to this grafting of one language's way of writing into a vastly different tongue, a dissociation was born. Japanese is a language where the spoken and the written co-evolved in directions never seen elsewhere. The differences in grammatical structure are only part of the story. Let's start with the basics.

Anomaly 1: One Way to Write It, Many Ways to Read It

Like most language pairs, there was rarely a one-to-one correspondence between Chinese and Japanese words. Often a single Chinese word or character could merely approximate the meaning of several spoken Japanese words. Each of those local words might have been related to the others, but it carried a different nuance. Even so, for lack of a better solution, often the same kanji was used for all the various meanings.

This had two major effects. First, while in any Chinese language each character is associated with a single way to speak it, in Japanese every kanji can be pronounced in multiple, very different ways. This part is perhaps the biggest bane of Japanese students.

Extreme cases have 15+ different pronunciations (readings) for a single kanji! For example, the kanji for "life":

The same character, meaning generically 'life', is pronounced 'shēng' in Mandarin, while in Japanese it can be pronounced as 'sei', 'shou', 'ikasu', 'ikiru', 'ikeru', 'u', 'umare', 'umareru', 'umu', 'ou', 'ki', 'nasu', 'naru', 'nama', 'haeru', 'hayasu', 'musu', ...

Some of these pronunciations are common, others quite rare, but you need to know and be able to discriminate most of them in order to correctly read and write modern Japanese.

The second effect of the imperfect match between the written word and the meanings it is associated with is a kind of "chronic looseness" in the conversion of language to and from writing. A Japanese reader isn't expected to correctly pronounce everything. New, unfamiliar words will be opaque to them. In a sense, this is similar to English, only much worse.

In English, the question is usually about the right way to pronounce a vowel or two. As is clear from the example just above, in Japanese, sometimes you don't know if the kanji for life has to be read as "nasu" or "shou". Add to that the huge number of kanji in circulation, and in many cases you have absolutely nothing to work with. If you haven't seen the kanji before, you have zero hints about the right sounds to make.

Japanese has a trick up her sleeve to solve this problem, called furigana. These are tiny kana characters showing you how to pronounce a difficult kanji.

A sign in a train station. It's often impossible to guess the pronunciation of place names, hence the furigana above the official kanji name. Source: Wikipedia.

Kids keep on learning new fundamental kanji until the end of high school, so they wouldn't be able to read without extensive application of furigana. Adults are able to survive with less furigana, but you'll still find them on the rarer words that people might not have encountered before (or for which they might have forgotten the reading). For example, this is a page I opened at random from a Haruki Murakami non-fiction book.

A page in vertical Japanese script, from a Murakami book. Two tiny hiragana characters are highlighted between the lines of normal-sized text.

In the whole page, only one kanji word (a somewhat rare way to say "mock") has the pronunciation spelled out with furigana (those two tiny squiggles between the lines).

Remember this part about furigana, because they'll come up again in the following sections.

Now for a practical consequence: people need to get creative just to teach others how to write their names.

Anomaly 2: Spelling Bee, but Creative

There is one problem that arises because of how kanji work: how do you explain which character you're talking about without writing it down?

This happens all the time with people's names. The Japanese like to choose nice and distinctive kanji for their names, even when using common name pronunciations (another instance of dissociation: common name readings on unheard-of kanji choices are all the rage this century). This means that, just by hearing what someone is called, you're usually unable to write it down.

So people have to explain the kanji to you, and they do it by telling you which other well-known words each kanji appears in, or how it is built from simpler components.

Perhaps the most famous scene showing the hoops you have to jump through just to explain your name's spelling. It's from the manga "Death Note". Here the woman is giving a fake name, but soon afterwards Raito finds out her real name and uses it to kill her real quick.

Anomaly 3: Breaking Out of the Box

Reading is made more difficult by the frequent use of jukujikun, words where the pronunciation is not the combination of the normal readings of the individual kanji contained. In these cases there isn't any correspondence between parts of the spoken word and the kanji that represent it.

Japanese has a lot of compound words of Chinese origin, where two or more kanji appear as a set. These compounds are usually very straightforward sequences of kanji readings. So 美術 (bijutsu), meaning art, is the combination of 美 ("bi", beauty) and 術 (jutsu, skill). If you're confident about the individual sounds, you just have to say them one after the other. Finally something simple!

The word for art, 'bijutsu' is the simple composition of 'bi' for beauty and 'jutsu' for skill. Similarly, the word for science, which is pronounced 'kagaku', is the summ of 'ka' for subject and 'gaku' for study.

But no, you can't let your guard down. The shoehorning work of the ancients has left deformations that survive in the modern language.

Take, for example, the word for "adult". It's pronounced otona and written 大人, the kanji for "big" and "person" respectively. Following the usual method, you might wonder, is oto a pronunciation of the first kanji, and na of the second? Or is it o and tona?

Well, neither. There is no way to split them. The two-kanji word exists as a single block, and looking up each kanji separately won't yield even a bit of this word's true reading. There are many examples like that.

Unlike most kanji compounds in Japanese and all words in Chinese, jukujikun like the word for today ('kyou') and tomorrow ('ashita') cannot be seen as the composition of sub-parts corresponding to the kanji.

A bull-headed shrike. Source: Alpsdake, Wikimedia Commons (CC BY-SA 4.0).

In some cases, the number of kanji is longer than the number of syllables in the word! Try splitting that up.

The name for the bird called 'bull-headed shrike' is pronounced mozu, with two syllables, but it's written with three kanji.

Anomaly 4: Two Words in a Trench Coat

But, despite all of its difficulties, this dissociation of writing and speaking opens up some interesting opportunities.

For example, there is a category of verbs that I find fascinating. As far as I know, it has no official academic name, and I've never heard anyone else even mention this. I would describe them as "two words in a trench coat". In each case, you can see how two simpler verbs were chained and wrapped with a single kanji.

For example, the verb 司る (tsukasadoru), meaning "to be in charge" seems pretty innocent at first. It's one kanji plus its hiragana ending indicating the tense, like all other verbs. But look up the etymology, and you'll find that it used to be two words: 官 (tsukasa) for "position of authority" and 取る (toru) for "take". Each of these two words has its own kanji and independent meaning. But over the years, their combined form became so routine that someone decided to give it its own different kanji, probably for no other reason than convenience. Here are a couple more cases:

The verb 'kokorozasu', which means to aspire, is written with a single kanji but can be seen as the fusion of the separate words 'kokoro', for heart, and 'sasu', for to point, each with their different kanji. Similarly for the verb 'mitomeru' (to confirm), a combination of the verbs 'mite' (see) and 'tomeru' (retain).

These fusions are like language fossils. It shows that kanji are only an after-the-fact addition to a pre-existing vocabulary. But at a practical level, they're a way to exploit the looseness of Japanese writing to make writing more convenient (notice how short the final written words are compared to the original).

Anomaly 5: Make the kanji Work for You

In Anomaly 1 I said that one kanji can be ambiguous in terms of its pronunciation. But the flip side of the same phenomenon is that kanji can help reduce ambiguity in meaning, increasing the precision of the written word.

Spoken Japanese is actually rather poor in vocabulary. A lot of its verbs are reused in very different contexts with different meanings that are only related in a very abstract way. Thanks to the unique slap-it-on-and-you're-ready-to-go mindset of Japanese writing, however, the vagueness can be pared down a lot.

There is a surprising number of verbs that have exactly the same pronunciation, but are written with different kanji in different contexts. For example:

The verb pronounced 'yomu' can be written with two different kanji depending on whether it means to read or to compose a poem. The verb pronounced 'okuru' has different kanji for the meaning of sending and gifting. And the verb 'toru' has at least 7 different kanji forms, for the following meanings: to take, to take a picture, to harvest, to capture, to ingest a substance, to steal, and to record.

The cool thing here is that these are all different meanings, but if you squint you can see how they must have originated from the same primordial word. They started with a generic, blunt verb (e.g. toru, to take), and later applied to it different kanji to distinguish its nuances. Handy!

There are also non-verbs examples of this trick. My favorite are all the versions of the word "cousins" meant in a reciprocal sense, as in "she and I are cousins (of each other)". In spoken language, you just say itoko in all cases, and that's it. In written language, you use the appropriate combinations of the kanji 兄 (ani, older brother), 弟 (otouto, younger brother), 姉 (ane, older sister), and 妹 (imouto, younger sister), preceded by the kanji 従 (shitagau, accompany), to specify the exact genders in the relationship. (These are also jukujikun, come to think of it.)

The same noun 'itoko', meaning cousins, is written in four different combinations of kanji: one for cousins who are both male, one for cousins who are both female, one where the older cousin is male and the younger is female, and one that's the other way around.

You can use kanji, then, to add a layer of meaning that doesn't exist in the spoken language. Which brings us to the last and, in my opinion, most interesting point.

Anomaly 6: Dissociation as Canvas

Finally we come to gikun, the most exquisite (ab)use of the rift between written and spoken Japanese. It's based on the clever use of furigana, the little pronunciation marks explained above. Ninety-nine percent of the time, people use furigana as you would expect—plainly indicating the correct dictionary reading of each word. But once you have a tool, who can resist playing with it?

Gikun is the replacement of a kanji's or word's normal pronunciation with something else through furigana. Novelists and manga-ka use it to inject an almost subliminal layer of meaning beyond what is afforded by the words and kanji. It achieves an effect similar to a textual voice over, at the same time as the actual text you're reading.

You see it a lot in manga: the actual kanji say something, but the furigana, instead of giving you the true pronunciation of the word, give you something else entirely. Sometimes it's a synonym of the word with a more pungent nuance. For example:

Source: Boku no Hero Academia, cited by japanesewithanime.com (CC BY-SA 4.0)

The author of the excellent gikun explanaton on japanesewithanime.com clarifies the context:

Todoroki Shouto 轟焦凍 has both cold and heat abilities, which come from the sides of his body: from the right comes cold, from the left comes heat.

Here Shouto, the protagonist, is telling the flame-y man "during combat, I won't use my heat power for any reason at all." Being a kids' comic, all kanji have furigana. But the kanji for "heat" (highlighted in red) comes with an unexpected reading. Instead of the official netsu, the furigana reads hidari, which means "left".

So the reader gets two messages at the same time: the character says "I won't use the left", but the text is saying "I won't use heat".

Sometimes, authors use gikun for the luxury of introducing cool foreign-sounding words while simultaneously providing the meaning for it through kanji.

Source: Full-Metal Alchemist, cited by japanesewithanime.com (CC BY-SA 4.0)

And the context from japanesewithanime.com:

Good guy with mechanical arm fights bad guy with mechanical arm.

The blond guy says, "Oh, an automail colleague?" to say that they both have mechanical arms. But "automail" isn't a real word, and the Japanese reader may not guess the etymology of "automatic" + "mail (armor)". So the meaning is provided by the kanji (literally "mechanical armor"), and the word "automail" comes as furigana above that. Again, you get two things in one swoop: a neologism and its meaning.

Other times it's simply a clarification of the word in that specific context.

Source: Noragami, cited by japanesewithanime.com (CC BY-SA 4.0)

The context:

Yato 夜ト, who is a God fighting spiritual beings related to human's negative feelings, goes to the hospital make [sic] a visit to someone...

The off-screen speaker is saying, "Tonight will be rough too. This place is their nest. Now that the regalia aren't with us, we shouldn't stay long." (I assume the "regalia" are some kind of warrior.) In this case the words "this place" appear as furigana for the word "hospital". The kanji say "hospital", the reading says "here/this place". Through gikun, the author is avoiding confusion on the current location of the characters.

Some novelists use this dissociation for artistic effect, too. Horror-mystery writer Natsuhiko Kyogoku, who loves to create the Japanese version of a Gothic atmosphere, constantly uses archaic, long-forgotten kanji in his brick-sized novels. You, as the average Japanese reader, probably have never seen most of those kanji before, but you're able to follow without problems thanks to his gikun. The furigana expose the modern, recognizable readings that you can recognize, even though they are not the real readings of those obsolete kanji.

Kyogoku's gikun shenanigans are a bit too much at times. Someone who uses them more sparingly and subtly is Haruki Murakami.

Another page in vertical Japanese script, from a Murakami book. Again, tiny furigana characters between the lines are highlighted.

This is a page from his novel Norwegian Wood. Jay Rubin, the novel's English translator, worded the highlighted sentence like this:

What if somewhere inside me there is a dark limbo where all the truly important memories are heaped and slowly turning into mud?

But the word "limbo" (pointed by the arrow) in the original is actually a gikun. The Japanese reader sees two things at once: the kanji with the meaning of "a remote region"—normally pronounced hendo—and the katakana pronunciation of the foreign word "limbo" as furigana next to it.

The Japanese language does have more accurate words for "limbo", words with a stronger link to the original catholic meaning of the word, but Murakami decided not to use them.

This little choice, seen by the reader in a fleeting instant as they devour the pages, and likely not even noticed consciously, is doing a lot of work. It's indulging in the mystical-sounding foreign word "limbo", but it's also clarifying the general meaning of liminal space for the readers unfamiliar with it. It's making the metaphor of an actual hidden place within oneself stronger, while avoiding overly religious undertones. In short, this gikun alters the flavor of the word a tiny bit, just enough to achieve the thematic and stylistic goals of the sentence. Like a pinch of nutmeg in your butter cake.

How is one to translate that?

It's difficult to convey what it feels like to read in this way. For me, back when I started reading in Japanese some 16 years ago, it was a totally new experience, something that I never thought would be possible with text. It's like reading in stereo, where sometimes the same message is conveyed to you in two different formats on separate channels, and sometimes two messages blend together as something new.

Because of the unique dissociation between the written word and the way it is pronounced, Japanese is not only harder to learn, but it's also more malleable and richer in a way that cannot be imitated. It's an extra dimension of language and a happy historical accident. ●