Update on IDNs and IRIs

Abstract

In domain names such as www.unicode.org, only a limited number of characters are allowed. This limitation also applies to Uniform Resource Identifiers (URIs) such as http://www.unicode.org. Internationalized Domain Names (IDNs) and Internationalized Resource Identifiers (IRIs) changed this a few years ago, both allowing a wide range of characters from the Unicode repertoire. The specifications underlying these technologies are currently facing an overhaul, major for IDNs and minor for IRIs. The long-overdue and now imminent introduction of the first international top-level domain names will mean that the importance of IDNs and IRIs will significantly increase in the near future.

The presentation will give a general overview of IDNs and IRIs and discuss the current revisions of the specifications in detail. For IDNs, the set of allowed characters is defined using an inclusion-based model rather than the earlier exclusion-based model. Fixed tables are replaced by a property-based selection process to avoid fixing the specification to a single version of Unicode. The mapping step (dealing with casing and normalization, among else) is moved out of the core libraries and closer to the user to allow adaptions for special cases and reduce user surprises. The IRI specification is being extended with descriptions of widely used variants for handling characters strictly speaking not allowed in IRIs. Both specifications are affected by bug fixes to bidirectionality restrictions.

	IDNA 2003	IDNA 2008
Unicode coverage	Unicode 3.2	Unicode 5.2 and beyond
Registration vs. lookup	same rules apply	registration is stricter
Symbols and punctuation	mostly allowed	mostly prohibited
Mapping and normalization	required, predefined	not needed, may vary
Coverage definition	by table	by rules + exceptions
Context-specific characters	no	yes (ZWJ/ZWNJ)
Combining marks at end of RTL	no	yes (needed for Dhivehi, Yiddish...)
Numbers at end of RTL	no	yes

Update on Internationalized Domain Names and
Internationalized Resource Identifiers

IUC 33, San Jose, CA, U.S.A., October 2009

Martin J. DÜRST

Overview

Abstract

Assumptions

Some Terms to Start With

Anatomy of a Resource Identifier

IRI and IDN Examples

Why Internationalizing Identifiers?

IRI Implementation

IDN Implementation

Everything is Fine?

IDNA Update: IDNA 2008

Changes from IDNA 2003 to IDNA 2008

Why IDNA 2008

Problems with IDNA 2008

Top-Level IDNs

IRI Update

Internationalizing IRI Components

Email Address Internationalization

Conclusions

Q & A

Update on Internationalized Domain Names and Internationalized Resource Identifiers

IUC 33, San Jose, CA, U.S.A., October 2009

Martin J. DÜRST

Overview

Abstract

Assumptions

Some Terms to Start With

Anatomy of a Resource Identifier

IRI and IDN Examples

Why Internationalizing Identifiers?

IRI Implementation

IDN Implementation

Everything is Fine?

IDNA Update: IDNA 2008

Changes from IDNA 2003 to IDNA 2008

Why IDNA 2008

Problems with IDNA 2008

Top-Level IDNs

IRI Update

Internationalizing IRI Components

Email Address Internationalization

Conclusions

Q & A

Update on Internationalized Domain Names and
Internationalized Resource Identifiers