Japanese text requires special attention in the design of DSS because of the complexity of the language. Some of the issues which contribute to difficulty for a transnational DSS are highlighted below.
In Japanese, one cannot assume that one byte is equivalent to one character, because Japanese characters generally require multiple bytes for representation.
The Japanese character set contains over 10,000 characters.
The Japanese writing system is a mixture of four different writing systems.: Roman characters; Hiragana; Katakana; and Kanji.
Roman characters correspond to the 52 characters (including both upper case and lower case) of the English language. In addition, there are Roman characters associated with the ten numerials. Japanese use the Roman characters primarily in the construction of tables and in the creation of acronyms.
Hiragana characters are ones that represent sounds, such as syllables. Generally, these characters are used to create suffixes for some words, or to write native Japanese words. The Hiragana characters appear to have a calligraphic look. For example, the character ? represents the sound made by the letters "ma" whereas, the character ? represents the sound made by the combination of letters, "mi."
Katakana characters represent a phonetic alphabet as well. However, they are used to represent words of foreign origin, such as bread -- ??? (pronounced "pan"), which was derived from the Portuguese word for bread, pão (pronounced "pown"). In addition, they are used for emphasis, similar to the way we use italics in English. The Katakana characters have a squared, rigid look in comparison to the Hiragana characters. For example, the character ? represents the sound made by the combination of "ma" while the character ? represents the sound made by the combination of letters "ku."
Kanji characters were borrowed from the Chinese over 1500 years ago. There are tens of thousands of these characters in use by the Japanese. These characters represent specific words or combinations of words. For example, ? when used alone indicates a tree, while two of the character, ??, indicates woods and three of the characters, ???, means a forest.
There is no recognized character set for Japanese similar to ASCII for English. Nor is there a universally recognized encoding method for Japanese.