Skip to content

Character Encodings

This page documents the character encodings supported by the tailor.iconv API in the Function service. Understanding these encodings is essential for handling text data conversion between different character sets, especially when working with Japanese, Chinese, and Korean text.

Overview

The Function service provides comprehensive character encoding conversion capabilities through the tailor.iconv API. This includes support for over 100 character encodings, with special focus on Japanese business systems and enterprise environments.

Unicode

Encoding NameAliasesDescriptionCharacter Types
UTF-8UTF8Unicode (UTF-8)Full-width/half-width alphanumeric, kana, kanji, symbols
UTF-16UTF16Unicode (UTF-16)Full-width/half-width alphanumeric, kana, kanji, symbols
UTF-16BE-Unicode (UTF-16 Big Endian)Full-width/half-width alphanumeric, kana, kanji, symbols
UTF-16LE-Unicode (UTF-16 Little Endian)Full-width/half-width alphanumeric, kana, kanji, symbols

Japanese Character Encodings

Shift_JIS Family Encodings

Encoding NameAliasesDescriptionCharacter Types
Shift_JISSHIFT_JIS, SJISShift JIS codeHalf-width alphanumeric, half-width katakana, full-width kana, JIS Level 1 & 2 kanji
SJIS-Shift_JIS aliasHalf-width alphanumeric, half-width katakana, full-width kana, JIS Level 1 & 2 kanji
CP932Windows-31J, MS932Microsoft extended Shift_JISHalf-width alphanumeric, half-width katakana, full-width kana, JIS Level 1 & 2 kanji, NEC special characters, IBM extended characters

EUC-JP Family Encodings

Encoding NameAliasesDescriptionCharacter Types
EUC-JPEUCJP, eucJPJapanese EUC codeHalf-width alphanumeric, full-width kana, JIS Level 1 & 2 kanji
EUC-JP-MSeucJP-msMicrosoft extended EUC-JPHalf-width alphanumeric, full-width kana, JIS Level 1 & 2 kanji, NEC special characters, IBM extended characters

ISO-2022 Family Encodings

Encoding NameAliasesDescriptionCharacter Types
ISO-2022-JPISO2022JP, JISJIS code (7-bit)ASCII, JIS Roman, half-width katakana, JIS Level 1 & 2 kanji

Enterprise System Encodings

IBM EBCDIC Japanese Encodings

Encoding NameAliasesDescriptionCharacter Types
IBM037EBCDIC-CP-US, CP037EBCDIC US/CanadaAlphanumeric
IBM290EBCDIC-JP-KANA, CP290Japanese EBCDIC katakanaAlphanumeric, katakana
IBM930CP930Japanese EBCDIC (kanji)Alphanumeric, katakana, hiragana, JIS Level 1 & 2 kanji
IBM939CP939Japanese EBCDIC extendedAlphanumeric, katakana, hiragana, JIS Level 1 & 2 kanji, extended characters
IBM943CP943Japanese PC codeHalf-width alphanumeric, half-width katakana, full-width kana, JIS Level 1 & 2 kanji
EBCDIC-JP-E-Japanese EBCDIC alphanumeric katakanaAlphanumeric, katakana
EBCDIC-JP-KANA-Japanese EBCDIC katakanaAlphanumeric, katakana

Enterprise System Aliases

Alias NameActual EncodingDescriptionCharacter Types
HitachiKEIS83IBM290Hitachi KEIS83 compatibleAlphanumeric, katakana
HitachiKEIS90IBM290Hitachi KEIS90 compatibleAlphanumeric, katakana
NECJIPSIBM290NEC JIPS compatibleAlphanumeric, katakana
NECJISIBM290NEC JIS compatibleAlphanumeric, katakana

Suffix characters are ignored in enterprise system aliases.

Chinese Encodings

Encoding NameAliasesDescriptionCharacter Types
GB2312EUC-CNSimplified Chinese (basic)ASCII, simplified Chinese
GBKCP936Simplified Chinese (extended)ASCII, simplified Chinese (extended)
GB18030-Simplified Chinese (complete)ASCII, simplified Chinese (all characters)
Big5BIG5Traditional ChineseASCII, traditional Chinese
BIG5HKSCSBig5-HKSCSHong Kong extended traditional ChineseASCII, traditional Chinese, Hong Kong additional characters

Korean Encodings

Encoding NameAliasesDescriptionCharacter Types
EUC-KREUCKRKorean EUCASCII, Korean
UHCCP949Unified Hangul CodeASCII, Korean (extended)
JOHAB-Combinatorial HangulASCII, Korean (combinatorial)
ISO-2022-KRISO2022KRKorean ISO-2022ASCII, Korean characters

Other Languages

Encoding NameAliasesDescriptionCharacter Types
ISO-8859-1Latin-1Western EuropeanASCII, Western European characters
ASCIIUS-ASCIIAmerican Standard CodeBasic ASCII characters (0-127)

API Usage Examples

Basic Conversion

javascript
// UTF-8 to Shift_JIS conversion
const sjisData = tailor.iconv.convert("日本語テキスト", "UTF-8", "Shift_JIS");

// EUC-JP to UTF-8 conversion
const utf8Text = tailor.iconv.convert(eucjpData, "EUC-JP", "UTF-8");

Using Enterprise Aliases

javascript
// HitachiKEIS83 (actually IBM290) conversion
const keisData = tailor.iconv.convert("カタカナ", "UTF-8", "HitachiKEIS83");

Custom Replacement Characters

javascript
// Replace unconvertible characters with asterisk
const asciiText = tailor.iconv.convert("Hello 世界!", "UTF-8", "ASCII//TRANSLIT:*");
// Result: "Hello **!"

// Custom string replacement
const asciiText2 = tailor.iconv.convert("Test 日本語", "UTF-8", "ASCII//TRANSLIT:[?]");
// Result: "Test [?][?][?]"

// Underscore replacement
const asciiText3 = tailor.iconv.convert("abc世界xyz", "UTF-8", "ASCII//TRANSLIT:_");
// Result: "abc__xyz"

TRANSLIT with Special Characters

javascript
// Convert special characters using TRANSLIT
const specialText = "abc ß α € àḃç";

// Default TRANSLIT (uses ? for unconvertible characters)
const result1 = tailor.iconv.convert(specialText, "UTF-8", "ASCII//TRANSLIT");
// Result: "abc ? ? ? ???"

// Custom replacement with asterisk
const result2 = tailor.iconv.convert(specialText, "UTF-8", "ASCII//TRANSLIT:*");
// Result: "abc * * * ***"

// Custom replacement with underscore
const result3 = tailor.iconv.convert(specialText, "UTF-8", "ASCII//TRANSLIT:_");
// Result: "abc _ _ _ ___"

Checking Available Encodings

javascript
// Get all Japanese-related encodings
const allEncodings = tailor.iconv.encodings();
const jpEncodings = allEncodings.filter((enc) =>
  enc.match(/JP|JIS|Shift|KEIS|93[029]|94[39]|EBCDIC.*JP/i),
);

Error Handling

The tailor.iconv API supports special flags for handling characters that cannot be converted:

  • //IGNORE: Silently ignores characters that cannot be converted
  • //TRANSLIT: Attempts to transliterate characters to similar ones in target encoding (default replacement: ?)
  • //TRANSLIT:char: Custom replacement character extension
    • Example: ASCII//TRANSLIT:* replaces unconvertible characters with *
    • Example: ASCII//TRANSLIT:[?] replaces unconvertible characters with [?]

Best Practices

  1. Encoding Selection

    • For text with kanji: Use UTF-8, Shift_JIS, EUC-JP, or IBM930/939
    • For katakana-only text: IBM290 is also available
    • For email transmission: ISO-2022-JP is recommended
  2. Preventing Character Corruption

    • Use the same encoding for both sender and receiver
    • Specify the source encoding accurately
  3. Error Handling

    • Use //IGNORE to skip unconvertible characters
    • Use //TRANSLIT for character substitution
    • Use custom replacement characters for specific requirements