With Unicode adding more and more useless emoji, and seemly doing little else, it’s time to ask an important question: what the fuck is the Unicode Consortium supposed to be doing anyway?
It’s time to dust off Howard Oakley’s excellent blog post Why we can’t keep stringing along with Unicode, and think about the Normalization problem for file names and the Glyph Variation problem of CJK font sets. These problems fit together surprisingly well. My take is the problems must be tackled together as one thing to find a solution. Let’s take a look at the essential points that Oakley makes:
Unicode is one of the foundations of digital culture. Without it, the loss of world languages would have accelerated greatly, and humankind would have become the poorer. But if the effect of Unicode is to turn a tower of Babel into a confusion of encodings, it has surely failed to provide a sound encoding system for language.
Neither is normalisation an answer. To perform normalisation sufficient to ensure that users are extremely unlikely to confuse any characters with different codes, a great many string operations would need to go through an even more laborious normalisation process than is performed patchily at present.
Pretending that the problem isn’t significant, or will just quietly go away, is also not an answer, unless you work in a purely English linguistic environment. With increasing use of Unicode around the world, and increasing global use of electronic devices like computers, these problems can only grow in scale…
Having grown the Unicode standard from just over seven thousand characters in twenty-four scripts, in Unicode 1.0.0 of 1991, to more than an eighth of a million characters in 135 scripts now (Unicode 9.0), it is time for the Unicode Consortium to map indistiguishable characters to the same encodings, so that each visually distinguishable character is represented by one, and only one, encoding.
The Normalization Problem and the Gylph Variation Problem As Oakley explains earlier in the post: the problem for file system naming boils down to the fact that Unicode represents many visually-identical characters using different encodings. Older file systems like HFS+ used Normalization to resolve the problem, but it is incomplete and inefficient. Modern file systems like APFS avoid Normalization to improve performance.
Glyph variations are the other side of the coin. Instead of identical looking characters using different encodings, we have different looking characters that are variations of the same ‘glyph’. They have the same encoding but they have to be distinguished as variation 1, 2, 3, etc. of the parent glyph. Because this is CJK problem, western software developers traditionally see it as a separate problem for the OpenType partners to solve and not worth considering.
Put another way there needs to be an unambiguous 1-to-1 mapping and an unambiguous 1-1/1-2/1-3-to-1 mapping. I say the problems are two sides of the same coin and must be solved together. Unicode has done a good job of mapping things but it is way past time for Unicode to evolve beyond that and tackle bigger things: lose the western centric problem solving worldview (i.e. let’s fix western encoding issues first and deal with CJK issues later), and start solving problems from a truly globally viewpoint.
The new SF Symbols system font is a fantastic addition to iOS, macOS, etc. and it is an OpenType (née QuickDraw GX TrueType) variable font to boot. But please, Apple, the crusty old Font and Typography pallet has to go. Put the poor dead horse in its grave already. A fantastic new variable font from Apple screams for a nice new user friendly font UI to access those lovely glyph variations. If you don’t, nobody will ever find them. And that’s a shame.
I knew Nat back in the 1990’s when he was an engineer at Claris responsible for the development of the highly successful Japanese version of ClarisWorks. I think he also worked on a QuickDraw GX version of ClarisWorks until GX was killed in 1997.
Luckily for us, he moved from Apple to Adobe in 1998 and put his extensive knowledge of advanced Japanese typography and programming skills to help solve a big problem: the inability to reproduce beautiful Japanese layout (kumihan) on western created layout software and fonts of that time (Quark XPress, Illustrator, InDesign 1.0, etc.). McCully explains the background and 2 year development of InDesign J at the 10:25 mark in his presentation, and the challenges of working around the limitations of baseline font metrics while developing good line break algorithms for Japanese layout.
The result was InDesign 1.0 J which shipped in early 2001. InDesign J was the first, and only, major software application developed outside of Japan that followed the Japanese Industrial Standard (JIS) X4051 typesetting and composition specification (the kumihan “bible”) and traditional Japanese print production methods. I have covered some basics of Japanese layout before, but a review is helpful for first time readers. I’ll use a mix of my material and McCully’s presentation to explain.
Japanese Layout Basics: Space
Japanese culture is the only culture I know of where the central core cultural value is subtraction: how much can we take away to bring out the essential beauty of an object. This is embodied in ikebana, in Japanese gardens and in people: the Kanji shitsuke 躾, inadequately rendered in English as ‘discipline’, is a Kanji that originates from Japan, not China. Western culture and Chinese culture are similar in that the central cultural value is addition: how much can we add that’s not here to make an object more beautiful.
The central core value of traditional Japanese text composition and layout, called ‘kumihan’, is space. Kumihan is driven by what text will fit in a given space, how to balance and minutely control that space. Space is not empty, it is a discrete element of kumihan, equal to Kanji. Kanji characters are contained in little boxes of space, known as virtual bodies, also called the grid system because the middle of each box is one center point on a grid. Everything is calculated from the center and the space surrounding that center; there is no baseline.
Western created DTP layout is graphics-driven and calculated by margins and font baselines. The western baseline typography model and font metrics is how PostScript and OpenType fonts, and all layout engines evolved. Adobe was well acquainted with the shortcomings of their own font technology and InDesign J got around the problems by adding proprietary Kanji virtual body font metrics and Japanese line break algorithms.
That is fine for InDesign and print production, but web layout and typography via CSS is an entirely different world. There are 3 huge obstacles for good vertical Japanese typography on the web:
No font metrics for virtual body/em-box glyph space placement: everything has to be accomplished with baseline metrics
No reliable space control
No reliable line breaks
In the presentation McCully highlights a web post from Vincent De Oliveria that neatly summarizes basic problems of working with CSS:
Line-height and vertical-align are simple CSS properties. So simple that most of us are convinced to fully understand how they work and how to use them. But it’s not. They really are complex, maybe the hardest ones, as they have a major role in the creation of one of the less-known feature of CSS: inline formatting context…
-inline formatting context is really hard to understand – vertical-align is not very reliable a line-box’s height is computed based on its children’s line-height and vertical-align properties we cannot easily get/set font metrics with CSS
This is further complicated by all the devices out there. A Japanese web page that looks good on iOS looks terrible on Samsung Galaxy because Samsung has a different layout engine that has a different idea of how to use space.
The end result, as McCully concludes in his presentation, is that quality Japanese vertical layout on web pages is very difficult to achieve. It requires a massive amount of extra work dealing with CSS limitations and optimizing things deep in the OS layout engine level such as iOS/macOS Core Text.
Failure of Open Standards
It’s helpful at this point to remember the key goals of QuickDraw GX:
Treat all writing systems and layout models equally as one single layout package
Make advanced typography a comprehensive high level framework that is standard across the OS and applications, simple to use for developers, and easily available to all users
People only remember the failure of GX technology at a time when Apple was spinning out of control, but the goals were, and remain, visionary and timeless. GX was about breaking advanced typography out of a niche to make advanced typography of all writing systems a widely used standard feature for developers, and for users. Unfortunately those goals were forgotten in the rise of web technologies and open standards like CSS and EPUB which focused on improving web based text only from a western typography perspective. Vertical text layout almost didn’t make into the EPUB format until a small but vocal group of Japanese programmers argued for its inclusion. They encountered a lot of resistance along the way, which seems to be the case for any feature outside the immediate needs the western typography perspective.
What we have now are web technologies and OS text layout engines that offer advanced layout from a limited perspective for a limited set of web designer programmers. In other words, niche. We can see the same thing happening with OpenType Variable Fonts. They are mostly for the web. They will remain niche. They will remain western due to the high development costs of creating OpenType Variable fonts with huge glyph sets like Japanese.
It’s an unfortunate situation, but without a vision and strong leadership, the smart people in the room always run off in different directions creating an animal farm of different ideas and approaches pulling in different directions. That’s what open is. Very rarely does it coalesce into a tight integrated whole greater than the sum of the parts.
In the eternal words of venerable Japanese font engineer Tomihisa Uchida, “no matter what kind of fancy fonts you have, they look bad with poor typography”. Which brings us to Apple News+ and why it will never see the light of day in Japan: Japanese customers will never pay for a news subscription service that doesn’t deliver good looking vertical text content. The Apple News Format can’t pull that one off, and Apple is not going to spend the resources to do it right. The iWorks vertical text support feature is certainly proof of that.
Today is International Haiku Day and Apple Education is celebrating it in Japan with the new version of iWork Pages that finally supports vertical text layout. In honor of International Haiku Day and vertical text support in Pages, I tried writing a haiku in vertical text using the new version of Pages. This is how it went…
Update: Japanese reactions to the Apple Education ad (top screen shot) now running on Twitter are fun and sarcastic: “Are you serious? Way too late,” “Good thing I didn’t wait and installed egword,” “10 years too late,” “Oh, Pages is finally useable,” etc.
Good localization is never easy, that is to say it’s easy to fuck up, especially when different app pieces come from different companies. I already pointed out that the Yahoo supplied backend Japanese data took a real nosedive after the Verizon purchase, but there is more.
Japanese stock ticker names in the Stock app widget are hideous to look at. They shrink into oblivion instead of intelligently truncating a long name to keep it readable. This is a textbook case of how not to do app internationalization. Nobody at Verizon or Apple evidently cares enough about quality to fix it. It’s another nail in the coffin of Apple’s typography legacy.