Unicode needs a new Mission

With Unicode adding more and more useless emoji, and seemly doing little else, it’s time to ask an important question: what the fuck is the Unicode Consortium supposed to be doing anyway?

It’s time to dust off Howard Oakley’s excellent blog post Why we can’t keep stringing along with Unicode, and think about the Normalization problem for file names and the Glyph Variation problem of CJK font sets. These problems fit together surprisingly well. My take is the problems must be tackled together as one thing to find a solution. Let’s take a look at the essential points that Oakley makes:

Unicode is one of the foundations of digital culture. Without it, the loss of world languages would have accelerated greatly, and humankind would have become the poorer. But if the effect of Unicode is to turn a tower of Babel into a confusion of encodings, it has surely failed to provide a sound encoding system for language.

Neither is normalisation an answer. To perform normalisation sufficient to ensure that users are extremely unlikely to confuse any characters with different codes, a great many string operations would need to go through an even more laborious normalisation process than is performed patchily at present.

Pretending that the problem isn’t significant, or will just quietly go away, is also not an answer, unless you work in a purely English linguistic environment. With increasing use of Unicode around the world, and increasing global use of electronic devices like computers, these problems can only grow in scale…

Having grown the Unicode standard from just over seven thousand characters in twenty-four scripts, in Unicode 1.0.0 of 1991, to more than an eighth of a million characters in 135 scripts now (Unicode 9.0), it is time for the Unicode Consortium to map indistiguishable characters to the same encodings, so that each visually distinguishable character is represented by one, and only one, encoding.

The Normalization Problem and the Gylph Variation Problem
As Oakley explains earlier in the post: the problem for file system naming boils down to the fact that Unicode represents many visually-identical characters using different encodings. Older file systems like HFS+ used Normalization to resolve the problem, but it is incomplete and inefficient. Modern file systems like APFS avoid Normalization to improve performance.

Glyph variations are the other side of the coin. Instead of identical looking characters using different encodings, we have different looking characters that are variations of the same ‘glyph’. They have the same encoding but they have to be distinguished as variation 1, 2, 3, etc. of the parent glyph. Because this is CJK problem, western software developers traditionally see it as a separate problem for the OpenType partners to solve and not worth considering.

Put another way there needs to be an unambiguous 1-to-1 mapping and an unambiguous 1-1/1-2/1-3-to-1 mapping. I say the problems are two sides of the same coin and must be solved together. Unicode has done a good job of mapping things but it is way past time for Unicode to evolve beyond that and tackle bigger things: lose the western centric problem solving worldview, and start solving problems from a truly globally viewpoint.

Advertisements

macOS High Sierra 10.13.3 still leaks encryption passwords in plain text

Thank goodness that Howard Oakley and friends are staying on top of APFS bugs and security issues:

If you have erased an existing unencrypted APFS volume to change it into an encrypted APFS volume in the last 20 days or so, then you can be certain that the passphrase to that encrypted volume is stored in your unified log, and accessible to anyone who can access your Mac as an admin user (or when an admin user is logged on).

Just like the last security problem, the actual APFS format is not the problem, a Disk Utility bug is. Hopefully Apple will fix this ASAP.

Is APFS fully supported yet?

Nobody covers APFS better than Howard Oakley:

So, as of High Sierra 10.13.3, APFS is the standard file system for SSDs which are only used by High Sierra systems, “can” be used on hard disks which are only used by High Sierra systems, but remains unsupported on Fusion Drives.

There are four major limitations to the use of APFS.

Essential reading.

Using APFS in High Sierra

Howard Oakley takes stock of APFS in High Sierra both good and not so good. If you have the slightest interest in APFS read his posts. The quick summary is that if your Mac boots from a SSD, you can reap the Clone and Snapshot feature benefits of APFS which can be substantial.

If your Mac boots from a Fusion Drive or hard disk, you are in limbo because Apple has not completed APFS Fusion Drive/HD support.  Oakley warns of potential, “adverse effects of copy-on-write, perhaps the single most important technology behind APFS” on hard disk media and concludes

you can see why the performance of APFS on rotating disks is far inferior to that of HFS+. That is, though, something of a worst case.

But there is more. APFS brings yet more changes to basic Finder behaviors.

Apple has made Finder’s simple human interface progressively more complex. Originally:

  • Dragging an item from one folder to another on the same volume moved it; to copy you Option-dragged.
  • Dragging an item from one volume to another copied it.

Children of all ages, myself included, have found those principles clean and simple, and quite fail-safe.

Now, rules have become:

  • Dragging an item from one folder to another on the same volume moves it.
  • To make a copy (not clone) on an HFS+ volume, Option-drag to another location.
  • To make a clone (not copy) on an APFS volume, Option-drag to another location, but I can’t see how to make a true copy.
  • Dragging an item from one volume to another copies it, unless either of the volumes is on iCloud Drive, in which case it moves it.
  • To make a copy (not clone) to or from iCloud Drive, use Option-drag instead.

Messy.

I agree with Oakley’s final summary that we’ll have to wait and find out how serious Apple’s commitment to macOS really is. High Sierra is not turning out to be the next Snow Leopard. Not by a long shot. Will macOS remain a serious platform or become an iPhone accessory?