Representing text — ASCII and Unicode
Each character → a binary code. ASCII for English; Unicode for everything.
Computers don't 'understand' text directly. Each character (letter, digit, symbol) is mapped to a numerical CODE, then stored as binary.
ASCII (American Standard Code for Information Interchange). Uses 7 bits per character → 128 codes. Examples:
- 'A' = 65 = 1000001
- 'a' = 97 = 1100001
- '0' = 48 = 0110000
- ' ' (space) = 32 = 0100000
ASCII covers English uppercase + lowercase, digits, and basic punctuation. Extended ASCII uses 8 bits → 256 codes, adding accented characters and symbols.
Limitation. ASCII is English-centric. It can't represent Chinese, Arabic, Hindi, emoji, or thousands of other characters.
Unicode. Modern standard supporting characters from every writing system. Uses up to 32 bits per character → over 1 million possible codes. The most common encoding is UTF-8, which uses 8-32 bits depending on the character (and is backwards-compatible with ASCII for the first 128 codes).
Cambridge tip. Mark scheme rewards (a) the precise definition of ASCII (7-bit, 128 chars), (b) at least one character→code example, and (c) why Unicode was developed (more characters, all scripts).
- ASCII: 7-bit, 128 chars, English-centric.
- Unicode: up to 32-bit, all scripts.
- Each character has a unique numerical code → binary.
- UTF-8 is the dominant Unicode encoding.