Character encoding methods

Character encoding methods arose with the appearance of a computer, when the task of representing non-numeric values in binary code began.

Character encoding methods arose with the appearance of a computer, when the task of representing non-numeric values in binary code began

For the encoding of symbols, a method was proposed that was later widely used for sounds and images. The set of input and display symbols is called the alphabet of the computer system. These are Arabic numerals, letters of the Latin alphabet, punctuation marks, special symbols and signs, letters of the national alphabet, pseudo-graphic symbols - rasters, rectangles, single and double frames, arrows. Initially, 1 byte (8 bits) was assigned to encode one character, This method allowed to encode an alphabet of 256 different characters. So there was a code table - a system in which each character of the alphabet is assigned a unique code. But different computer manufacturers for encoding the same symbols have created their code tables. In this case, the characters typed using one code table were not displayed correctly when using another table. Therefore, in 1981 the US Institute of Standardization adopted the code table standard, called the ASCII (American Standard Code of Information Interchange). This table was used in computer programs running MS-DOS operating system, and soon acquired the status of an international one.

The method for encoding symbols in the form of an ASCII table contains 256 characters and their codes. The table consists of two parts: basic and extended. The main part (symbols with codes from 0 to 127 inclusive) is basic, it can not be changed in accordance with the adopted standard. It includes: control characters (they correspond to codes 1 to 31), Arabic numerals, letters of the Latin alphabet, punctuation marks, special symbols.

The extended part (symbols with codes from 128 to 255) is given to national alphabets, pseudo-graphic symbols and some special symbols. In accordance with approved standards, this part of the table varies depending on the national alphabet of the country in which it is used and the way in which characters are encoded. The Windows operating system supports a large number of extended tables for various national alphabets. The most common code table of the Russian alphabet is Latin Windows 1251.

The methods for encoding symbols, consisting of 256 codes, clearly did not satisfy some Asian countries for coding their national alphabets. Therefore, in 1991 there was a single standard, built on a 16 bit coding scheme and called UNICODE. It allows you to encode 2¹⁶ = 65536 characters, which is enough to encode all the national alphabets in one table. Since each symbol of this encoding method occupies two bytes (instead of one, as before), all text documents presented in UNICODE have become twice as long.

Sciences

Calendar

Character encoding methods

Tools

Useful