Summary And Basic Python Operations
ASCII (American Standard Code for Information Interchange) is a standard for representing characters. The ASCII code consists of 128 characters, each of which is represented by an 8-bit binary integer, meaning that each integer can be specified by 8 binary digits (“bits”). In representing characters, only positive integers are used. To recall the previous discussion of binary operations and arithmetic, each bit can only have the value of “0” or “1”. For example, the first few positive integers are shown below in their binary representation. For ease of reading, the 8-bit values are shown in groups of four (4) bits.
Integer Binary
0 0000 0000
1 0000 0001
2 0000 0010
3 0000 0011
4 0000 0100
5 0000 0101
6 0000 0110
7 0000 0110
8 0000 1000
9 0000 1001
10 0000 1100
Integers can be written as sums of powers of 2, which are used because binary digits (bits) can only assume two values: 0 or 1. For example, 1 = 20, which means that the 0th (rightmost) bit is 1, with the rest of the bits set to zero. Therefore, 1 is represented in binary as 0000 0001. 2 = 21, which means that the 1st (second rightmost) bit is 1, with the remaining bits set to zero. Therefore, 2 is represented in binary as 0000 0010. For the integer 3, 3 = 2 + 1, and, given the representation of 1 and 2, 3 in binary is 0000 0011, or the binary representation of 2 added to the binary representation of 1. The same procedure is used for all other integers, up to 28 – 1 = 255 (although 28 = 256, zero (0) must be represented. In binary, 0 = 0000 0000).
In the following discussion, it is useful to know the powers of 2, up to 28:
20 = 1
21 = 2 (= 2 x 1, or double the previous power of 2)
22 = 4 (= 2 x 2)
23 = 8 (= 2 x 4)
24 = 16 (= 2 x 8)
25 = 32 (= 2 x 16)
26 = 64 (= 2 x 32)
27 = 128 (= 2 x 64)
28 = 256 (= 2 x 128)
Recall that zero (0) is also represented in binary as 0.
For example, 14 = 8 + 4 + 2 = 23 + 22 + 21, so its binary representation is 0000 1110. 27 = 16 + 8 + 2 + 1 = 24 + 23 + 21 + 20, and therefore, in binary, 14 = 0001 1011. In other words, the 4th, 3rd, 1st, and 0th bits are 1, and the remaining bits of the 8-bit value are zeros.
As a more complex example, consider 159. Using only the powers of 2 (and some practice), it is seen that 159 = 128 (largest power of 2 less than or equal to 159) + 31 (left over, i.e. 159 – 128)
= 128 + 16 (largest power of 2 less than or equal to 31) + 15 (left over)
= 128 + 16 + 8 (largest power of 2 less than or equal to 15) + 7 (left over)
= 128 + 16 + 8 + 4 (largest power of 2 less than or equal to 7) + 3 (left over)
= 128 + 16 + 8 + 4 + 2 (largest power of 2 less than or equal to 3) + 1 (left over)
= 128 + 16 + 8 + 4 + 2 + 1
= 27 + 24 + 23 + 22 + 21 + 20 (from table above)
Since the powers of 2 in the above sum are 7, 4, 3, 2, 1, and 0, the 7th, 4th, 3rd, 2nd, 1st, and 0th bits of the 8-bit binary number are 1, and the remaining bits are 0. Therefore, in binary,
15910 = 1001 11112. The subscript 10 after 159 indicates that 159 is a decimal, base-10 number. Although this subscript can be omitted because decimal numbers are understood, the subscript is provided simply for clarity. The subscript 2 after the binary number clarifies that 1001 1111 is a binary, base-2 number.
A binary (base-2) representation can be easily converted to a decimal (base-10) format. One need only identify the positions of the ones in the binary number and add the corresponding powers of 2. For instance, the binary number 0010 1001 can be written in terms of powers of 2 as:
0010 10012 = (0 x 27) + (0 x 26) + (1 x 25) + (0 x 24) + (1 x 23) + (0 x 22) + (0 x 21) + (1 x 20)
= 25 + 23 + 20 = 32 + 8 + 1
= 41
Another example demonstrates that only bits with the value of 1 need to be considered.
0101 11102 = 26 + 24 + 23 + 22 + 21 = 64 + 16 + 8 + 4 + 2 = 94.
Bit positions 7, 5, and 0 are zeros, and since 0 x 27 = 0, 0 x 25 = 0, and 0 x 20 = 0, those positions do not contribute to the final sum.
(Note: For readers who have no programming experience, the following section can be skipped. Those readers can return to this section after studying the interactive Python tutorials presented in the next course.)
In Python, the bin
function converts a base-10 integer into its binary representation. For example:
>>> bin(108)
'0b1101100'
>>> bin(22)
'0b10110'
The prefix 0b
indicates binary. The base-10 number 108 can be fully represented with 7 bits, while 22 requires 5 bits. However, in an 8-bit binary representation, the leading zeros would be written out. For example, 10810 = 0110 11002, and 2210 = 0001 01102. Using the example of 159 above:
>>> bin(159)
'0b10011111'
which is expected from converting this value to binary.
The converse operation, converting a binary string to a base-10 integer, can be accomplished with the int (integer) function, specifying the base of the input argument – in this case, 2, because the string represents a binary number. For example,
>>> int('10110100', 2)
180
Using the two examples above, the Python operations are:
>>> int('00101001', 2)
41
>>> int('01011110', 2)
94
NOTE:
For those interested in the workings of the algorithms for the binary-to-base-10 and base-10-to-binary conversions, the Python script, binary_decimal_conversions.py, contains two functions for base-10 to binary:
convert_Base10_to_8bit_Binary
and
convert_Base10_to_8bit_Binary_powers
The first function is a straightforward implementation, as demonstrated in the discussion above, while the second function uses powers of 2, and is somewhat more mathematically complex. However, the reader is encouraged to study these implementations, try them, and possibly modify them.
The script also contains two functions for converting an 8-character binary string to a base-10 integer:
convert_8bit_binary_to_decimal
and
convert_8bit_binary_to_decimal_logarithms
.
The first function is a straightforward implementation, and the second function uses base-2 logarithms, and is mathematically more complex.
A sample session is shown below.
>>> b0 = convert_decimal_to_8bit_binary(159)
>>> b1 = convert_decimal_to_8bit_binary_powers(159)
>>> b0
'10011111'
>>> b1
'10011111'
>>> b0 == b1
True
>>> n0 = convert_8bit_binary_to_decimal('00101001')
>>> n1 = convert_8bit_binary_to_decimal_logarithms('00101001')
>>> n0
41
>>> n1
41
>>> n0 == n1
True
Recall that the base-2 logarithm is the inverse of the corresponding power of 2. That is, log2(2n) = n, and 2log2(m) = m.
For example, if n = 5, then 2n = 25 = 32, and log2(32) = 5 = n. Furthermore, if m = 128, then log2(m) = log2(128) = 7 (because 27 = 128), and 2log2(128) = 27 = 128 = m.
This can be seen by experimenting with the corresponding Python functions, assuming that the Numpy numerical library has been imported.
>>> n = 5
>>> m = 2 ** n
>>> m
32
>>> np.log2(m)
5.0
>>> np.log2(m) == n
True
>>> m = 128
>>> n = np.log2(m)
>>> n
7.0
>>> 2 ** np.log2(m)
128.0
>>> 2 ** np.log2(m) == m
True
Returning to the discussion of ASCII codes, as stated above, each ASCII character is represented with an 8-bit binary number. ASCII tables can be found in a variety of places, including the Internet (e.g., ASCII). However, some basic codes are worth pointing out.
Character ASCII CODE
A (upper-case) 65
a (lower-case) 97
0 (“zero” character) 48
Escape key 27
Space character 32
* (asterisk) 42
. (period) 46
/ (forward slash) 47
: (colon) 58
\ (backward slash) 92
~ (tilde) 126
Note that the alphabetic and numeric characters can be determined from the codes provided above. For example, B
follows A
, and therefore, the ASCII code for B
is 66. Consequently, the ASCII code for C
is 67. Following this procedure, the ASCII code for G
is 71. Because the ASCII code for a
(lower-case “A”) is 97, the code for b
is 98, the code for c
is 99, and the code for g
is 103. Similarly, since 1 follows 0 numerically, the ASCII code for the character 1
is 49, for the character 2
, 50, and for the character 9
, the ASCII code is 57. The ASCII codes for other characters can be found in widely available tables.
In the popular Python language, which is employed extensively in the digital humanities, the ASCII code for a character can be found with the ord
function. For example:
>>> ord('.')
46
>>> ord('~')
126
>>> ord('G')
71
>>> ord('g')
103
>>> ord('c')
99
>>> ord('9')
57
>>> ord('h')
104
>>> ord('H')
72
>>> ord('&')
38
>>> ord('"')
34
>>> ord('\'')
39
>>> ord('\\')
92
Note the last two ord
operations. Because a single quote (‘) is used to denote characters in Python, to determine the ASCII code for the single quote, the special character backslash (\) is used. To determine the ASCII code for the backslash, the character is also preceded with the backslash character.
In the converse procedure, i.e., to determine the character from its ASCII code, the chr
function is used in Python. For example:
>>> chr(65)
'A'
>>> chr(48)
'0'
>>> chr(32)
' '
>>> chr(111)
'o'
>>> chr(55)
'7'
>>> chr(40)
'('
>>> chr(120)
'x'
However, there are only 128 standard ASCII codes. Codes for integers in the range 128 to 255 are not standardized. Most languages, however, require special characters, such as accented characters. In fact, many languages do not even employ Roman script. Consequently, a different encoding scheme is required. Unicode is now the accepted encoding standard, defining the UTF-8, UTF-16, and UTF-32 Unicode Transformation Formats, as well as other encodings. Lists of Unicode characters are readily accessible.