Abstract

If the Latin Alphabet is not your (or your customer's) main script, there are many good reasons for including non-Latin characters in a Web address (URL/URI). This presentation will tell you why, when, and how you can and should do this, and provide the necessary background to make things work for servers and clients.

Non-ASCII characters have been used in Web addresses for more than a decade. Such Web addresses have been called Internationalized Resource Identifiers (IRIs), and since 2005 have been specified in RFC 3987. Early this year, the IETF chartered a Working Group to update the RFC 3987.

The presentation will first explain the basic rules for working with IRIs, in particular the conversion to URIs via UTF-8 and percent-encoding. To provide a deeper understanding, we will then concentrate on the major issues that the IRI Working Group is working on addressing:

Moving from defining IRIs as a presentation element, while restricting protocols to using URIs, to defining IRIs as protocol elements on par with URIs.
Balancing between syntactical uniformity for long-term simplicity and backwards conformance with established browser behavior in particular
for the domain name and fragment identifier parts of an IRI.
Moving the specification from a before-after descriptive style to a more procedural style that covers edge cases of implementations existing
in the wild.
Comparing, normalization, and security issues for IRIs.
Restrictions and display advice for bidirectional IRIs.

[Text appearing in gray are comments not showing up in presentation mode. The best way to view the slides as they were presented is with Opera, pressing F11.]

Speakers' Introduction

What's an IRI

URIs are often also called URLs (Uniform/Universal Resource Locators), although strictly speaking, URLs are a subset of URIs.

Why Internationalized?

How IRIs Work

The extended character repertoire is essentially the only difference between URIs and IRIs, and conversion is easy using UTF-8 and percent-encoding. However, as in many other areas of Unicode and internationalization, the details can be surprisingly tricky.

IETF IRI WG

Documents being Updated

There is a chance that we will create additional documents when we split up some work (list of WG documents).

Registration Guidelines

Main Issues for IRIs

Decomposition of an IRI

To Punycode or not to Punycode

Query Part

Bidirectionality

Conventions for Bidi Display

Normalization

Browser Quirks and Other Legacy

Scheme Names

How You Can Contribute

Conclusions

Q & A

Further Material

Lots of links everywhere throughout the talk, please use them!

Some older material (more background information):

Update on Internationalized Domain Names and Internationalized Resource Identifiers, Martin J. Dürst, Internationalization and Unicode Conference 33, San Jose, CA, USA, October 2009.
IRIs and IDNs: Testing, Implementation, and Specification Evolvement, Martin J. Dürst, 31st Internationalization and Unicode Conference, San Jose, CA, USA, October 2007.
Internationalized Resource Identifiers, Internationalization & Unicode Conference 26, San Jose, CA, USA, Sept. 2004.
Recent Progress on Internationalized Resource Identifiers (IRIs), Martin J. Dürst, 25th Internationalization and Unicode Conference, Washington DC, USA, April 2004.
Internationalized Resounce Identifiers (IRIs) - Server-side Implementation, Martin J. Dürst, 24th Internationalization & Unicode Conference, Atlanta, GA, USA, Sept. 2003.
Internationalized Resource Identifiers: From Specification to Testing, Martin J. Dürst, 19th International Unicode Conference, San Jose, CA, USA, Sept. 2001.
Internationalizing Internet Identifiers, Martin J. Dürst, 11th International Unicode Conference, San Jose, CA, USA, Sept. 1997.

IRIs Beyond the Napkin: A Survey of
Internationalized Resource Identifier
Issues and Implementation

IUC 34, Santa Clara, CA, U.S.A., 20 October 2010

Martin J. DÜRST and Addison PHILLIPS

Overview

Abstract

Speakers' Introduction

What's an IRI

Why Internationalized?

How IRIs Work

IETF IRI WG

Documents being Updated

Registration Guidelines

Main Issues for IRIs

Decomposition of an IRI

To Punycode or not to Punycode

Query Part

Bidirectionality

Conventions for Bidi Display

Normalization

Browser Quirks and Other Legacy

Scheme Names

How You Can Contribute

Conclusions

Q & A

Further Material

IRIs Beyond the Napkin: A Survey of Internationalized Resource Identifier Issues and Implementation

IUC 34, Santa Clara, CA, U.S.A., 20 October 2010

Martin J. DÜRST and Addison PHILLIPS

Overview

Abstract

Speakers' Introduction

What's an IRI

Why Internationalized?

How IRIs Work

IETF IRI WG

Documents being Updated

Registration Guidelines

Main Issues for IRIs

Decomposition of an IRI

To Punycode or not to Punycode

Query Part

Bidirectionality

Conventions for Bidi Display

Normalization

Browser Quirks and Other Legacy

Scheme Names

How You Can Contribute

Conclusions

Q & A

Further Material

IRIs Beyond the Napkin: A Survey of
Internationalized Resource Identifier
Issues and Implementation