Use accesskey "n" to jump to the internal navigation links at any point. Right now you can

 
ishida >> writing

An Introduction to Writing Systems & Unicode:
A review of script characteristics affecting computer-based script support and Unicode

Part 1: Introduction & TOC

Next part. Part 2: Large character sets

Contents

  1. Front matter
    1. Introduction
    2. Sources
  2. Large character sets
    1. CJK character sets
      1. Chinese
      2. Japanese
      3. Korean
      4. Visual characteristics
      5. Radicals
    2. Character sets, encodings, and multi-byte characters
      1. Character sets & encodings
      2. Unification
      3. Respecting character boundaries
      4. Tools
    3. Inputting ideographic characters
      1. Getting to the right character quickly
      2. Chinese input methods
      3. Alternative representations of characters
      4. Tools
  3. Complex script rendering
    1. Definitions
    2. Combining characters
      1. Arabic & Hebrew short vowels
      2. Context-sensitive placement of diacritics
      3. Vowel signs
      4. Precomposed vs. decomposed
      5. Coding combining characters
      6. Normalization
    3. Context-sensitive glyph shaping
      1. Word final glyph variants
      2. Cursive script
      3. Inputting cursive glyphs
    4. More character to glyph rendering
      1. Special joining forms
      2. Positional variation
      3. Ligatures
      4. Joiner & non-joiner control characters
      5. Grapheme clusters
  4. Text direction
    1. Vertical text
      1. Text flow
      2. Rotations & shifts
      3. Tate chu yoko
      4. Vertical columns
    2. Bidirectional text
      1. Right alignment
      2. Bidirectional ordering
      3. Unicode bidirectional algorithm
      4. Mirrored characters
      5. Bidi formatting control characters
      6. Visual selection
    3. Directional bias in layout & graphics
      1. Screen layout
      2. Graphics, icons and charts
  5. Text boundaries & wrapping
    1. Word boundaries
      1. Western
      2. Chinese
      3. Japanese
      4. Korean
      5. Thai
    2. Line breaking
      1. Basic alternatives
      2. CJK line breaking rules
      3. Wrapping Latin text in Arabic & Hebrew
      4. Hyphenation
    3. Justification
      1. Basic alternatives
      2. Justification in Chinese & Japanese
      3. Justification in Arabic
  6. Typographic differences
    1. Character size & line height
      1. Glyph complexity
      2. Line height & inter-line spacing
      3. Baseline alignment
      4. Proportional spacing
    2. Ruby
      1. Furigana
      2. Bopomofo
      3. Interlinear annotation characters
    3. Other typographic differences
      1. Emphasis
      2. Kumimoji and warichu
  7. Sorting & case conversion
    1. Sorting
      1. Basic Latin
      2. Arabic
      3. Thai
      4. Korean
      5. Chinese & Japanese
      6. Multilingual text
      7. Indexing & alphabetic ordering
    2. Case conversion

Introduction

Front matter

Intended audience

Anyone who wants to better understand how scripts work in computerised environments, and more particularly with regards to Unicode. The material should be accessible for a wide audience, from software engineers to managers.

While the tutorial is perfectly accessible to beginners, it has also attracted very good reviews from people at an intermediate and advanced level, due to the breadth of scripts discussed. No previous knowledge is assumed.

Why should you read this?

When planning to introduce products into new markets it is important to understand the impact of having to support different scripts. The tutorial will make clear that this is not usually a trivial issue, and if you need to implement support, it may involve decisions at a very early stage in the design process.

This tutorial is particularly useful for people who are new to Unicode, in that it provides an overview of the basics in the context of real examples.

Objectives

This material was initially developed for delivery as a regularly-featured tutorial at Internationalization & Unicode Conferences.

The tutorial will provide you with an understanding of key requirements for implementing writing systems in information technology. It will do this by examining real examples of a wide range of modern scripts to discover features that a computerized implementation must support. It will also make special reference, where appropriate, to how the Unicode Standard points the way forward for meeting these requirements.

The tutorial does not provide detailed coding advice, but does provide the essential background information you need to understand the fundamental issues. It will also constitute an excellent orientation for newcomers to the topic, providing a wide-ranging framework that assists in assimilating further, more detailed and specific information.

Naturally, given the tutorial format this is an ambitious approach, and it will mean that we cannot go into great detail on any particular topic. If you would like to understand a topic better, there are a couple of excellent resources cited at the end of the tutorial, one of which is the very readable Unicode Standard itself.

Scripts addressed and Conventions

We will organize the material in the tutorial by concept, rather than by script. To help you, the script or scripts to which the concept applies will always be listed at the top right of the slide.

The main scripts we will use as examples include:

The tutorial covers most of the key features of each of these scripts.

An objective of the tutorial is to introduce a number of terms used to describe script features or characters. These terms are called out under the slide title on slides where they are introduced.

There is a set of web pages with sample text in each of the main scripts we will address. Each of the sample pages is a translation of the same English text. We will use these samples to illustrate as many of the points made as possible. That way you will be able to experiment with the examples yourself. In fact, where I have taken an example from a sample page I have typically included the text of that sample on the slide to help you locate real instances more easily.

If you use these examples for your own material, please ensure that you cite this paper and the web site as a source reference.

Key sources

The top two sources provide very accessible information if you wish to delve deeper into most of the topics covered in this tutorial.

Next part. Part 2: Large character sets

Author: Richard Ishida.

Valid XHTML 1.0!
Valid CSS!
Encoded in UTF-8!

Content created February, 2003. Last update 2010-08-29 13:25 GMT