Common Text Encoding Schemes | ASCII to Unicode Evolution | Illustration

Common Text Encoding Schemes | 101 Understanding How Computers Read Text | ASCII to Unicode Evolution Simplified

Understanding text encoding and common text encoding schemes helps to explain how computers store, process, and display text across various languages and platforms.

Introduction

Have you ever wondered how your computer knows what letters you are using? For instance, to you, the words are natural when you type a message to a friend, name a file, or search for something online. But to a computer, everything is just numbers — a world of 0s and 1s.

So, how does it read text? That is where something magical called text encoding comes in.

What is Text Encoding?

Text encoding is like a secret language between humans and machines. It turns each letter, symbol, or emoji you type into a special number — a code that computers can understand and process.

Every “M”, “$”, or “😊” becomes a series of 0s and 1s (like a digital Morse code). This is how your messages travel, get stored, and show up perfectly on your screen.

Types of Text Encoding

Text encoding has evolved and can be classified as follows:

5 Common Text Encoding Schemes - ASCII to Unicode - Tree Diagram

ASCII

ASCII stands for American Standard Code for Information Interchange. It is the grandparent of modern text encoding.

ASCII (The Classic Old-School)

It uses 7 bits (just 0s and 1s) to represent 128 characters. The characters include:

  • digits 0–9
  • symbols (like @ and #)
  • letters A–Z and/or a–z
  • control codes (like a line break)

Long before emojis and worldwide languages appeared on screens, there was ASCII.

Example

The word“Peace!” in ASCII is as follows:

CharacterASCII (Decimal)ASCII (Hex)ASCII (Binary)
P805001010000
e1016501100101
a976101100001
c996301100011
e1016501100101
!332100100001

With ASCII, early computers could speak English — at least a basic version of it.

Extended ASCII

As people wanted to write in more languages, the need for more characters arose. This led to the development of Extended ASCII.

Extended ASCII used 8 bits, doubling the capacity to 256 characters. It allowed for additional accented letters (like é, ñ, ü, ç, ø), symbols (like ©, €, ¥, §, ¿, ¡), and various European characters (like ß, æ, Ø, Å, Þ).

Example

The string “Péã©€¡” in Extended ASCII (such as Windows-1252 or ISO 8859-1) is represented as:

CharacterASCII (Decimal)ASCII (Hex)ASCII (Binary)
P805001010000
é233E911101001
ã227E311100011
©169A910101001
1288010000000
¡161A110100001

Note:

The euro sign (€) is not present in older ISO 8859-1 but is included in Windows-1252, a popular Extended ASCII variant used by Microsoft.

The world with hundreds of languages is not limited to 256 characters. This is where Unicode (or UTF) comes into play.

Unicode

What if you want to write in Chinese, Arabic, Greek, Cyrillic, Asian scripts, or use emojis? ASCII will not help. That is why we now use Unicode.

Unicode can be defined as:

a universal codebook that can represent over a million characters from every known writing system on Earth.”

Unicode comes in different flavours — or “encodings” — depending on how you want to store or transmit the data.

Unicode is the ‘Global Translator’.

Classification of Unicode

The most common encodings are:

  • UTF-8 (most popular on the web)
  • UTF-16 (used in Windows, Java, etc.)
  • UTF-32 (fixed-width, less efficient but simple)

UTF–8

UTF–8 stands for Unicode Transformation Format – 8 bit. It is perfect for websites, apps, and global communication. That is why it dominates online.

Main features
  • It is smart and flexible.
  • It uses 1 to 4 bytes per character.
  • It is backward compatible with ASCII, so older systems still work.

UTF–8 is the Most Popular Kid on the Block.

UTF–16

UTF–16 balances space efficiency and character coverage for global scripts. It is not ASCII-compatible, so mixing with legacy systems may require conversion.

Main Features
  • It can handle all characters.
  • It uses 2 or 4 bytes per character.
  • It is common in Windows, Java, and some document formats.

UTF–16 is like a middle ground.

UTF–32

UTF–32 is super simple but memory-hungry. It wastes space, especially for text-heavy files or webpages.

Main features
  • Uses exactly 4 bytes per character.
  • Easy for computers to process (fixed size).

Encoding Comparison Table — “Sky”

CharactersSky
Decimal (Unicode)83107121
Hexadecimal (Unicode)536B79
Unicode Code PointU+0053U+006BU+0079
UnicodeBytes per CharTotal BytesByte Value (Hex)
UTF–81 byte3 bytes53 6B 79
UTF–162 bytes6 bytes00 53 00 6B 00 79
UTF–324 bytes12 bytes00 00 00 53 00 00 00 6B 00 00 00 79

Encoding Comparison Table — “فلك” (Falak)

Charactersفلك
Decimal (Unicode)160116041603
Byte Code (Hex)064106440643
Unicode Code PointU+0641U+0644U+0643
UnicodeBytes per CharTotal BytesByte Value (Hex)
UTF–82 bytes6 bytesD9 81 D9 84 D9 83
UTF–162 bytes6 bytes06 41 06 44 06 43
UTF–324 bytes12 bytes00 00 06 41 00 00 06 44 00 00 06 43

The differences in hex representations come from:

  • The encoding format of UTF-8/16/32
  • The number of bytes used per character
  • The byte order (big-endian vs little-endian), especially in UTF-16 and UTF-32

Encode Your Name

Want to see your name the way a computer does? We shall use ASCII and Unicode encoding to see how your name is stored and understood by machines.

ASCII and Unicode Encoding - A Kid Standing in Front of a Smiling Cartoonish Computer with His Name on A Paper
Example

Name: Ali

ASCII Code (for English Letters)

1. Open MS Word.

2. Type your name: Ali

3. Now use the ASCII method:

  • Place your cursor after the A
  • Press Alt + 65 → it types A (ASCII 65)

4. You can check the remaining letters with code using this chart:

LetterASCII (Decimal)ASCII (Hex)Binary
A654101000001
l1086C01101100
i1056901101001
Unicode Code Point (for All Languages)

Unicode lets us go beyond English. It is useful for different languages, emojis, math symbols, and more.

1. Open MS Word.

2. Type your name: Ali or علی

3. Select the letter.

4. Press Alt + X

5. Word will convert the character to its Unicode Hex Code:

CharacterLanguageUnicode Code Point (Hex)Unicode (Decimal)Description
AEnglishU+004165Latin Capital Letter A
lEnglishU+006C108Latin Small Letter L
iEnglishU+0069105Latin Small Letter I
عArabicU+06391593Arabic Letter Ain
لArabicU+06441604Arabic Letter Lam
یArabicU+06CC1740Arabic Letter Farsi Yeh

Note:

  • To check for ASCII, use ‘Decimal Code’.
  • To check for Unicode, use ‘Hex Code’.
Try It Yourself!

You can type your name in any language — English, Arabic, Chinese, Japanese, etc. Then use the same steps to see how your name is stored as code by a computer.

Conclusion

Behind every message you send or document you write, there is a hidden system at work. This system turns your words into machine-readable code.

From the simple beginnings of ASCII to the powerful and flexible Unicode, text encoding is what allows computers to store, process, and share human language.

It ensures:

  • Text displays correctly across devices
  • Support for multiple languages
  • Global communication

Frequently Asked Questions (FAQs)

What is encoding?

Encoding is the process of converting characters (like letters, numbers, and symbols) into a format that computers can understand and store, usually as binary code (0s and 1s).

Name common text encoding schemes.

  • ASCII (American Standard Code for Information Interchange)
  • Extended ASCII (Windows-1252, ISO 8859-1, etc.)
  • Unicode (with encodings like UTF-8, UTF-16, UTF-32)

What do you know about the evolution from ASCII to Unicode?

  • ASCII was designed for English and includes only 128 characters (7-bit).
  • Extended ASCII was introduced (8-bit, 256 characters) as global computing grew. But that still was not enough for all languages.
  • Unicode was developed to include every writing system, emoji, and symbol. It supports over 140,000 characters.

What is the primary purpose of the ASCII encoding scheme?

To represent the Basic English characters and control codes (like Enter, Tab, etc.) in binary format so computers can process and display text.

Explain the difference between ASCII and Unicode.

FeatureASCIIUnicode
Bit Size7 bits (Standard)8–32 bits (UTF-8, UTF-16, etc.)
Characters128 (English only)1.1+ million (all languages)
Language SupportEnglish onlyGlobal (Arabic, Chinese, emojis, etc.)
CompatibilityLegacy systemsModern systems and the web

How does Unicode handle characters from different languages?

It assigns a unique code point (like an ID number) to each character, no matter the language or script.

Example

  • A (English) → U+0041
  • ف  (Arabic) → U+0641
  • 字 (Chinese) → U+5B57
  • 😀 (Emoji) → U+1F600

These code points are stored using UTF encodings.

Explain how characters are encoded using Unicode. Provide examples.

Each character is mapped to a Unicode code point, then stored using an encoding (like UTF-8).

Examples

CharacterLanguageUnicode Code PointUTF-8 (Hex Bytes)
AEnglishU+004141
فArabicU+0641D9 81
ChineseU+5B57E5 AD 97
😀EmojiU+1F600F0 9F 98 80

What does ASCII stand for?

American Standard Code for Information Interchange

How many bits are used in the standard ASCII encoding?

7 bits!

7 bits represent 128 characters.

Which of is a key advantage of Unicode over ASCII?

It can represent characters from many different languages.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.