Character Codes: ASCII, Unicode, UTF-8, Shift_JIS Differences Explained for the IT Passport Exam

April 27, 2026

Organize the basics of character codes (ASCII, Unicode, UTF-8, Shift_JIS, EUC-JP) and the causes of mojibake (garbled text) for the IT Passport exam.

TagsIT PassportTechnologyCharacter Codes

What Are Character Codes?

Character codes are rules for representing characters as numbers inside a computer. It is helpful to understand them in two parts: the code system (a mapping table of characters and numbers) and the encoding method (how numbers are converted into byte sequences).

Major Character Codes

ASCII

ASCII is a basic code established in 1963 that uses 7 bits to represent uppercase letters, lowercase letters, digits, and symbols. Note that it cannot represent Japanese characters.

JIS Code Systems

The JIS code systems are Japanese codes based on Japanese Industrial Standards. JIS X 0208 includes 6,879 characters, including kanji. Common encoding methods include Shift_JIS (SJIS), widely used on Windows systems, and EUC-JP, used on UNIX systems.

Unicode Systems

The Unicode system is an international standard for uniformly handling characters from all over the world. Representative encoding methods include UTF-8, a variable-length encoding (1 to 4 bytes) that is the mainstream web standard; UTF-16, which is basically fixed at 2 bytes (with supplementary characters using 4 bytes); and UTF-32, which always uses 4 bytes.

Main Causes of Mojibake (Garbled Text)

The main cause of mojibake is using different character codes on the sending and receiving sides. For example, it occurs when trying to interpret a UTF-8 page as Shift_JIS. This can be prevented by explicitly specifying the character code in the HTTP header or with <meta charset="UTF-8"> in HTML.

How to Choose a Character Code

Use Case	Recommended Code
International web	UTF-8
Compatibility with existing Windows files	Shift_JIS (maintain compatibility)
General new development	UTF-8

Related Terms

BOM (Byte Order Mark)

The BOM is an identification mark at the beginning of a file indicating the byte order. For UTF-8, you need to be aware of whether a BOM is present or not.

Surrogate Pairs

Surrogate pairs are a mechanism in UTF-16 for representing characters that cannot be expressed in 2 bytes (such as emoji) using 4 bytes.

Key Points for the IT Passport Exam

The IT Passport exam tests comparisons of the features of ASCII, Shift_JIS, and UTF-8, the purpose of Unicode (unifying the world's characters), and the causes of mojibake.

Typical Past Exam Question Patterns

"Which character code uniformly represents characters from around the world?" → Unicode
"Which character code is the standard for the web?" → UTF-8

Related Terms

Compression (Compression (Lossless/Lossy) and JPEG/PNG/MP3)
Number Base Conversion (Binary, Hexadecimal, and Logical Operations)
HTTP (How HTTP/HTTPS Works)

Study Tips

Learn the three families (ASCII, JIS systems, Unicode systems). The current mainstream is UTF-8. Understanding that mojibake is caused by a mismatch between the sending and receiving codes is effective exam preparation.

Summary

If you grasp the lineage of major character codes, the dominance of UTF-8, and the causes of mojibake, you can score points on related questions. For comprehensive practice on the Technology domain, see the Technology Summary. For a full-length practice test, use the Mock Exam.

Apr 27, 2026

What Is 5G? Differences from 4G and Use Cases for the IT Passport Exam

Organizes the three main features of 5G (high speed, low latency, massive connectivity), differences from 4G, and applications in autonomous driving and remote medicine for the IT Passport exam.

IT PassportTechnologyCommunications

Apr 27, 2026

AI and Machine Learning Basics | Key IT Passport Exam Terminology

Organizes AI-related terms tested on the IT Passport exam, including the relationship between AI, machine learning, and deep learning, differences between supervised/unsupervised/reinforcement learning, and generative AI and LLMs.

IT PassportTechnologyAI

Apr 27, 2026

Algorithms and Computational Complexity: Big O Notation and the Basics of Search and Sort for the IT Passport Exam

A summary of algorithm fundamentals, linear search and binary search, bubble sort and quicksort, and Big O notation for computational complexity, tailored for the IT Passport exam.

IT PassportTechnologyAlgorithms

Character Codes: ASCII, Unicode, UTF-8, Shift_JIS Differences Explained for the IT Passport Exam

What Are Character Codes?

Major Character Codes

ASCII

JIS Code Systems

Unicode Systems

Main Causes of Mojibake (Garbled Text)

How to Choose a Character Code

Related Terms

BOM (Byte Order Mark)

Surrogate Pairs

Key Points for the IT Passport Exam

Typical Past Exam Question Patterns

Related Terms

Study Tips

Summary

関連記事

What Is 5G? Differences from 4G and Use Cases for the IT Passport Exam

AI and Machine Learning Basics | Key IT Passport Exam Terminology

Algorithms and Computational Complexity: Big O Notation and the Basics of Search and Sort for the IT Passport Exam

Pro 会員になる