Introduction To Computer Organization
This section discusses the basic principles of the organization of computer systems. The discussion will focus on the functional units of computer systems, and how these units are organized in a working computer. However, the details will be presented as levels of abstraction instead of at the level of electronic components, which is not in the scope of this course. Such abstractions also hide unnecessary details (Schneider & Gersting, 2018).
First, the primary components of a computer system will be presented. Modern computers utilize what is known as the von Neumann architecture, formulated by American mathematician John von Neumann (1903 – 1957). The von Neumann architecture is foundational for the vast majority of computers in use today. Although the details of the architecture are quite complex, it consists of three main characteristics (Schneider & Gersting, 2018):
- Subsystems
There are four subsystems (Schneider & Gersting, 2018):
- Memory
- Input/output mechanisms
- The arithmetic/logic unit (ALU), and
- The control unit, or central processing unit (CPU), the “brain” of the computer
- The stored program concept, and
- The sequential execution of instructions.
Memory is the functional unit in which data are stored or retrieved. There are different types of memory access, depending on how memory is searched and how data is retrieved from memory. In random access memory (RAM), memory is organized into cells, with each cell identified by a unique address. Any cell can (theoretically) be accessed in the same amount of time. Values in memory cells may be read and changed. The cell size, called the memory width, is usually 8 bits. The maximum memory size, or address space is 2N, where N is length of the memory address.
Memory supports two basic operations: fetch and store. In a fetch operation, data are retrieved from memory. The fetch is sometimes qualified as a nondestructive fetch, indicating that data in a memory cell are not modified when they are retrieved. In the store operation, data are written to memory, and therefore the operation is termed a destructive store.
Memory access time is the time required to fetch or store data. In modern computers, RAM typically requires 5-10 nanoseconds (5 – 10 x 109 seconds). Decoding memory addresses and the details of memory access and storage are complex and are beyond the scope of the current discussion.
Although memory access time is quite short (that is, RAM is very fast), the speed at which RAM improved in speed is slower than corresponding improvements in CPU (central processing unit) speeds. A special-purpose type of memory, known as cache memory, is placed on or very close to the CPU and is used to improve memory access. Cache memory is extremely fast, much faster than RAM, but, by necessity, much smaller than RAM, and, additionally, more expensive. Cache memory operates on the principle of locality, in which values close to recently accessed memory are more likely to be accessed in subsequent operations. Additionally, if a value has recently been used, it is likely that it will be used again. For example, if memory address X is accessed to retrieve data, then it is likely that data in memory address X + 1 will be subsequently required; that is, that memory address X + 1 (or X + 2, or X + 3) would need to be accessed. As an efficiency implementation using the principle of locality, the system could load values in neighbouring memory locations into cache memory, and store recently-used values there. The performance gained by the principle of locality can be quantified by the cache hit rate, which is the percentage of times that values that need to be used in a computation are found in the cache memory (Schneider & Gersting, 2018).
Input/output and mass storage are the next topics to be discussed. Input/output (I/O) connects the CPU to the external world, as well as to other devices. I/O devices are familiar to most computer users. Common input devices include a keyboard, mouse, and microphone. Monitors, printers, audio, and haptics (tactile feedback) are examples of common output devices. Data storage is accomplished through hard drives, solid-state drives (SSDs), digital video disks (DVDs), and flash drives (USB sticks), among others. Also in the realm of I/O, inter-computer communication takes place through computer networks (Schneider & Gersting, 2018).
RAM is a form of volatile memory, in which memory contents are destroyed or cleared in the absence of power, whereas mass storage systems are forms of nonvolatile memory, where data stored in the devices are persistent. There are many varieties of nonvolatile memory, including direct access storage devices (DASDs) and sequential access storage devices (SASDs). DASDs include hard drives, CDs, DVDs, and any storage device that contains disks. Disks are comprised of tracks, or concentric rings around disk surface, and sectors, or fixed size segments of tracks. Sectors are the units of retrieval for disks: a sector read in its entirety from storage into memory. The time to retrieve data is based on the seek time, which is the time to position the read/write head over the correct track, latency, which is the time for the correct sector to rotate under the read/write head of the storage device, and transfer time, or the time for the contents of a sector to be read into or written from memory. Examples of non-disk DASDs include flash memory, and optical storage devices (Schneider & Gersting, 2018).
DASDs and SASDs are orders of magnitude slower than RAM, on the order of microseconds or milliseconds. Recall that RAM is itself much slower than the CPU. Therefore, I/O devices generally constitute a computation bottleneck, as the CPU “waits” for data from slower I/O devices or from RAM. The I/O controller device manages data transfer with slow I/O devices, thereby freeing the processor to do other work while it “waits” for data to arrive. This controller sends a special signal, known as an interrupt signal, to the processor when the I/O task has completed, and the data are ready. The processor is thereby “interrupted” so that it can process the newly arrived data (Schneider & Gersting, 2018).
The arithmetic/logic unit, or ALU, is a component of the CPU. It contains the circuits for arithmetic operations, including addition, subtraction, multiplication, and division. The ALU also contains circuits for comparison and logic, including equality conditions, circuits for the AND, OR, and NOT operators. The ALU is comprised of registers, or extremely efficient dedicated memory connected directly to circuits. The data path indicates how information flows into and from the ALU, specifically, from registers to circuits, and from circuits back to the registers (Schneider & Gersting, 2018).
The basic schema of information flow in a computation can be described as follows at a high level of abstraction. Data arrive from memory or storage “outside” the ALU to the registers. A signal is input into the multiplexor, indicating the operation that is to be performed. The result goes back to register, and then is sent outside the ALU as output through a multiplexor.
The control unit implements the stored program characteristic of the von Neumann architecture. Programs are encoded in a binary representation and stored in memory. The control unit fetches (retrieves) encoded instructions from memory, decodes them, and executes them. These instructions are encoded in operation code (op code), indicating the operation to be performed. The op code also encodes memory addresses to indicate which memory addresses or registers contain the data (or further instructions) on which to operate (Schneider & Gersting, 2018).
After op code, the next higher level of abstraction in encoding is machine language. Machine language consists of binary strings that encode instructions that are subsequently executed by circuits in the computer hardware. Machine code is essentially sequences of instructions that encode algorithms. Each specific type of computer processor has its own machine language, and each instance of machine language consists of an instruction set, which are the instructions that are implemented by a particular processor. Two of the main categories of computers, by instruction set, are reduced instruction set computers, or RISC machines, and complex instruction set computers, or CISC machines. In RISC systems, each instruction is highly optimized, and its hardware is easy to design. CISC systems are characterized by a large instruction set. A single instruction can be quite complex and may perform complex operations. The hardware design is correspondingly complex. Modern computer hardware may contain a combination of RISC and CISC characteristics (Schneider & Gersting, 2018).
There are instructions to perform a variety of computational tasks. Such tasks include data transfer, wherein data are moved data from memory to a register, arithmetic and logical operations, comparison operations, where two values are compared (e.g., is A < B?), branch operations that change the flow of the program from sequential operation to a non-sequential instruction, and branching, which enables conditional operations and loop.
Other important components of a computer system include the program counter (PC) register that holds address of next instruction to be executed, and the instruction register (IR) that holds the encoding of current instruction being executed. The instruction decoder circuit decodes the op code of an instruction. Specialized helper circuits send addresses to the proper circuits, and send signals to the ALU, I/O controller, and memory (Schneider & Gersting, 2018).
From the discussion above, the main features of the Von Neumann architecture are identifiable. The architecture operates primarily on the Fetch-Decode-Execute cycle, also called the Von Neumann cycle, which the machine repeats until a HALT instruction is encountered or until an error condition forces the cycle to end. In the Fetch phase, the next instruction to be executed is retrieved and loaded into memory. In the Decode phase, the instruction decoder obtains the op code to be executed. The operations performed in the Execute phase are, of course, different for each instruction (Schneider & Gersting, 2018).
Although the Von Neumann architecture has proven highly successful for a variety of computational tasks, and, indeed, has been foundational for almost all computers and computer-based devices in existence today, the architecture has posed some difficulties as the computational problems that need to be solved have grown larger and more complex. One difficulty is that computer processor speeds no longer increase exponentially, as indicated by Moore’s Law, named for Gordon Moore, a co-founder of Intel Corporation. Formulated in 1965, Moore’s Law states, in essence, that the number of transistors that can reside on a microprocessor chip doubles approximately every 18 months, resulting in a corresponding increase in processor performance. Increasing the number of transistors on a single chip requires a reduction in size of the transistors, causing the gates to be placed much closer together. From a physical standpoint, it is no longer possible to compress the space in which the gates reside. The over-compression of transistors on a chip results in excessive heat production, which increases rapidly with an increased number of transistors. Furthermore, there is a maximum speed, enforced by the laws of physics, that must be considered: the speed of light, which pertains to signals through wire, cannot be exceeded. These problems underscore a problem known as the Von Neumann bottleneck, which is the inability of sequential machines to scale to rapidly increasingly larger and more complex computational problems (Schneider & Gersting, 2018).
To overcome the problems inherent in the Von Neumann Architectures, different classes of architectures, known as non-Von Neumann architectures, are increasingly studied as alternative ways to organize computers. Most implementations of these architectures are experimental or theoretical, except for parallel computing. In parallel computing, or parallel processing, multiple independent processing units coordinate activities to operate simultaneously. In the recent past, parallel computing, sometimes called high performance computing (HPC) to emphasize its efficiency and speed benefits, was the exclusive domain of supercomputers, or large, expensive, systems that were difficult to program, maintain, and operate. In the present day, desktop and laptop machines running multiple cores (multiple processors on a single die) and cloud computing, where computation and storage take place transparently to the user in possibly remote machines connected by networks allow users to take advantage of high-performance computing.
A specific type of computing for non-Von Neumann architectures is called multiple-instruction-multiple-data (MIMD) parallel processing. MIMD, as the name suggests, employs multiple instruction streams and multiple data streams. Distributed computing, where systems do not have to reside in the same local area and communicate through message passing through networks, is a type of MIMD. MIMD is also characteristic of cluster computing, where relatively inexpensive commodity computers are connected either through a local network or larger network, or even the Internet. MIMD works with multiple, independent processors, where each ALU operates on its own data. Each processor can operate independently on its own data, on its own program, and at its own rate. There are many varieties of MIMD systems, such as special-purpose systems, which are represented by newer generations of supercomputers. As previously mentioned, in cluster computing, standard machines communicate over a local area network (LAN) or wide area network (WAN), or possibly (but less commonly) the Internet, which is becoming a choice for distributed MIMD computing.
In grid computing, machines of varying power are connected over networks and/or the Internet, and work in concert to perform specific tasks. Recent examples include the SETI (Search for Extraterrestrial Intelligence) project, or “FoldIt”, a crowdsourced game in which players contribute to problems in protein folding. In these projects, computers run algorithms to analyze (sometimes constantly arriving) data. Another active research area includes parallel algorithms to take advantage of increased processing power. Non-Von Neumann Architectures are also increasingly employed to overcome the shortcomings of traditional architecture. An example of such an architecture includes shared memory processors, in which many independent CPUs share a large pool of memory. Such systems are relatively easy to program and are beneficial for single-instruction-multiple-data (SIMD) parallelism. However, these systems are not as scalable as distributed-memory clusters. Graphics processing units (GPUs) are another example of a SIMD architecture. These units are available on most desktop and laptop systems. Originally designed for graphics-intensive applications and gaming, these GPUs have been recognized to have general-purpose computing capabilities, as they are massively parallel, with many weak individual units.