https://www.sw.it.aoyama.ac.jp/2023/pub/RubyꝩduЯ/
duerst@it.aoyama.ac.jp, Aoyama Gakuin University
© 2023 Martin J. Dürst, Aoyama Gakuin University
A bit over a year ago, the Trojan Source attacks (https://trojansource.codes) created quite a bit of a scare. This talk looks at what has already been done, and what can and should be done, for Ruby.
Ruby has embraced Unicode in the form of UTF-8 for source code so that identifiers as well as comments can use non-ASCII characters. This can be very convenient but also may be dangerous.
We will explain the dangers: Bidirectional attacks can use special Unicode formatting characters to regroup source text so that it looks like it does something, but actually does something else. Homoglyph attacks can use lookalike characters to confuse code reviewers. Invisible characters and special spaces can be even more difficult to detect.
Remedies include better Ruby parsing, new checks to editors, IDEs, and code management sites such as github, and stronger linters such as Rubocop. We will discuss what has already been done, what still needs to be done, and how to use the various tools together.
This web page can be viewed with any browser, but has been created for projection as slides with Opera (≤12.17 Windows/Mac/Linux; use F11 to switch to projection mode). Texts in gray, like this one, are comments/notes which do not appear on the slides. Please note that depending on the browser and OS you use, some rare characters or special character combinations may not display as intended, but e.g. as empty boxes, question marks, or apart rather than composed.
Mainly in the following areas:
String#encode
, Ruby 1.9)String#unicode-normalize
, Ruby
2.2)String#upcase
,..., Ruby 2.4)Nicholas Boucher, University of Cambridge
Ross Anderson: Well known security expert at University of Cambridge and University of Edinborgh
Author of Security Engineering, 1182 pages, 1.887kg
white = 40 black = 45 whіtе = white + 6.5 # komi score = white - black if score>0 then puts 'white wins' else puts 'black wins' end
Who wins?
white = 40 black = 45 whіtе = white + 6.5 # komi score = white - black if score>0 then puts 'white wins' else puts 'black wins' end
Please raise your hand:
✋White wins: Left handBlack wins: Right hand✋
white = 40 black = 45 white = white + 6.5 # komi score = white - black if score>0 then puts 'white wins' else puts 'black wins' end
Who wins?
white = 40 black = 45 white = white + 6.5 # komi score = white - black if score>0 then puts 'white wins' else puts 'black wins' end
Please raise your hand:
✋White wins: Left handBlack wins: Right hand✋
50 => white 40 => black score = black - white if score>0 then puts 'white wins' else puts 'black wins' end
Who wins?
50 => white 40 => black score = black - white if score>0 then puts 'white wins' else puts 'black wins' end
Please raise your hand:
✋White wins: Left handBlack wins: Right hand✋
= 50 = 40 if - > 0 then puts 'white wins' else puts 'black wins' end
Please raise your hand:
✋White wins: Left handBlack wins: Right hand✋
White wins | Black wins | |
Example 1 | ✋✋✋✋✋✋✋✋ | |
Example 2 | ✋✋✋✋✋✋✋✋ | ✋ |
Example 3 | ✋✋✋✋✋✋✋✋ | |
Example 4 | ✋✋✋✋ | ✋✋✋✋ |
Example 1: black wins Example 2: black wins Example 3: black wins Example 4: black wins
(source)
White wins | Black wins | |
Example 1 | ✋✋✋✋✋✋✋✋ | ✓ OK |
Example 2 | ✋✋✋✋✋✋✋✋ | ✓ OK✋ |
Example 3 | ✋✋✋✋✋✋✋✋ | ✓ OK |
Example 4 | ✋✋✋✋ | ✓ OK✋✋✋✋ |
Don't be disappointed: It's my fault that you guessed wrong.
white = 40 black = 45 whіtе = white + 6.5 # komi score = white - black if score>0 then puts 'white wins' else puts 'black wins' end
whіtе => wh\u0456t\u0435
whіtе = white + 6.5 # komi
whіtе => wh\u0456t\u0435
і
: U+0456: Cyrillic Small
Letter Byelorussian-Ukrainian I (Russian i: и
)
е
: U+0435: Cyrillic Small
Letter IE
The 6.5 points gets assigned to the fake whіtе
,
so the real white
looses
This is a homoglyph attack
white = 40
black = 45
white<U+200B> = white + 6.5 # komi
score = white - black
if score>0 then puts 'white wins'
else puts 'black wins'
end
U+200B
: ZERO WIDTH SPACE
Ruby allows all non-ASCII spaces in identifiers!
Let's call this an invisible space attack
<U+3000> = 50 <U+200B> = 40 if <U+200B> - <U+3000> > 0 then puts 'white wins' else puts 'black wins' end
U+200B
: ZERO WIDTH SPACE
U+3000
: IDEOGRAPHIC SPACE (fullwidth space,
全角スペース)
Ruby allows variable/method names consisting only of spaces!
(not really an attack)
50 => LREwhitePDF 40 => RLOLREblackPDF score = RLOLREblackPDF - LREwhitePDF if score > 0 then puts 'white wins' else puts 'black wins'
RLO, LRE, PDF: Unicode bidirectional (bidi) formatting characters
Allowed as part of variable names in Ruby!
score = RLOLREblackPDF - LREwhitePDF
is shown as
score = white - black
This is a bidirectional reordering attack
end
for LRE, RLO, and othersend
for LRE; RLO ends at line endAbbreviation | Direction | Generation | Closed by | Influence |
---|---|---|---|---|
LRM | → | 1 | - | local |
RLM | ← | 1 | - | local |
ALM | ← | 1 | - | local |
LRE | → | 1 | general flow | |
RLE | ← | 1 | general flow | |
LRO | → | 1 | forcinng | |
RLO | ← | 1 | forcinng | |
1 | - | |||
LRI | → | 2 | PDI | general flow |
RLI | ← | 2 | PDI | general flow |
FSI | first strong | 2 | PDI | general flow |
PDI | 2 | - |
RLO
?RLOLREblackPDF - LREwhitePDF
white - black
LRE.....PDF
?etihw - kcalb
40 => black
and not black = 40
?RLOLREblackPDF =
40
would display as40 = black
, which does not look like RubyFinger pointing doesn't solve any problems
Why not simply disallow non-ASCII characters in code?
(over Trojan source attacks, that is)
randornhouse.com
case)<0x202c>
)U+020C
for bidi controls, раура
l
for non-ASCII characters<200C>
LRI
for second-generation bidi controls, nothing special for first-generation
bidi controlsSend questions and comments to Martin Dürst
(mailto:duerst@it.aoyama.ac.jp)
or open/contribute to a bug report or
feature request
The latest version of this presentation is available at: