From Emperors to Zombies:
Ruby from Unicode 10 to 12.1

https://www.sw.it.aoyama.ac.jp/2019/pub/RubyKaigiLight/

Ruby Kaigi 2019 Lightning Talk, Fukuoka, Japan, April 19, 2019

Martin J. DÜRST (マーティンと呼んでください)

duerst@it.aoyama.ac.jp, Aoyama Gakuin University

Ruby Programming Language

© 2019 Martin J. Dürst, Aoyama Gakuin University

Abstract

Each new version of Ruby is based on the latest version of Unicode. This lightning talk looks at the updates from Unicode 10.0.0 to Unicode 12.1.0. Unicode 12.1.0 is a special version that adds a single composed character for the new Japanese era Reiwa that will begin on May 1st. Unicode 12.0.0 added several scripts and characters, but didn't need any updates to Ruby's internals. Unicode 11.0.0 on the other hand required some change in Ruby internals which lead to the discovery of a bug affecting, among else, the zombie emoji.

For Best Viewing

These slides have been created in HMTL, for projection with Opera (≤12.17 Windows/Mac/Linux; use F11 to switch to projection mode). The slides have been optimized for a screen size of 1920x1080pixels, but can easily be viewed on other screens, too. Texts in gray, like this one, are comments/notes which do not appear on the slides. Please note that depending on the browser and OS you use, some rare characters or special character combinations may not display as intended, but e.g. as empty boxes, question marks, or apart rather than composed.

 

What is Your Ruby Version?

What is Your Unicode Version?

What is Your Unicode Version?

RbConfig::CONFIG["UNICODE_VERSION"]

'12.1.0'

What is Your Unicode Version?

Who cares?
どうでもいいのではないか?

Does Your Unicode Version Matter?

New Scripts and Characters

Does Your Unicode Version Matter?

New Scripts and Characters
New Emoji

Does Your Unicode Version Matter?

New Scripts and Characters
New Emoji
Reiwa (令和) support

Ruby and Unicode Versions

Year (Y) Unicode Version (U) Ruby Version (R)
published in Spring/Summer published around Christmas
2014 7.0.0 2.2
2015 8.0.0 2.3
2016 9.0.0 2.4
2017 10.0.0 2.5
2018 11.0.0 2.6
2019 12.0.0 2.7

U = Y - 2007 = 10R-15              R = (Y-1992) · 0.1 = 0.1U + 1.5      

 

Not so Fast!

Not so Fast!

Faster!!!

Ruby and Unicode Versions

Supported Unicode version Ruby version
11.0.0: June 5, 2018 2.6.0: December 25, 2018
12.0.0: March 5, 2019 2.6.2: March 13, 2019
12.1.0: May 7, 2019 2.6.3: April 17, 2019

 

Faster Indeed!

Supported Unicode version Ruby version Time to Publication
11.0.0: June 5, 2018 2.6.0: December 25, 2018 200 days
12.0.0: March 5, 2019 2.6.2: March 13, 2019 8 days
12.1.0: May 7, 2019 2.6.3: April 17, 2019  - 21 days

 

Faster But Why?

Supported Unicode version Ruby version Time to Publication
11.0.0: June 5, 2018 2.6.0: December 25, 2018 200 days
12.0.0: March 5, 2019 2.6.2: March 13, 2019 8 days
12.1.0: May 7, 2019 2.6.3: April 17, 2019  - 21 days

Unicode 12.1.0 is still in beta

 

The Easiest: Unicode 12.0.0

This is what an easy (I'd wish to say typical) upgrade looks like:

 

The Weirdest: Unicode 11.0.0

 

Extended   
Grapheme
Cluster
WHAT?


/\x/


/\x/
WHAT?


/\x/
read as backslash-x

Extended Grapheme Clusters   

Example: Flag of Wales flag of wales

flag_of_Wales = "\u{1F3F4 E0067 E0062 E0077 E006C E0073 E007F}"

flag_of_Wales.length7 (characters)

flag_of_Wales.bytes.length28 (bytes)

flag_of_Wales.grapheme_clusters.length1
(extended grapheme clusters)

"A#{flag_of_Wales}Z".match? /A\xZ/true

 

How is \x Implemented?

 

How to Understand Code?

Rewrite it!

 

Rewritten!

700 lines, 1 function
⇒ 300 lines, 5 functions
Very Different Code Style

Questions Remain

No way to reuse node tree?
No way to convert subexpression to node tree?

How to Rewrite Safely?

Test First!

The Zombie Bug

 

The Zombie Bug Explained

 

Unicode 12.1.0 Background

 

The Fastest: Unicode 12.1.0, Part 1

 

Reiwa Background

 

The Fastest: Unicode 12.1.0, Part 2

 

The Fastest: Unicode 12.1.0, Part 3

 

New in Ruby 2.6.3

In 2.6.2:

"\u32FF".match? /\p{age=12.1}/
Systax Error: invalid character property name

In 2.6.3:

"\u32FF".unicode_normalize :nfkc
"令和"

"\u32FF".match? /\p{age=12.1}/
true

 

New in Ruby 2.6.3

"\u32FF".unicode_normalize :nfkc
"令和"

"\u32FF".match? /\p{age=12.1}/
true

Enjoy!