Why do we need to develop a set of instructions for a computer to perform a task?

Domain 3: Security Engineering [Engineering and Management of Security]

Eric Conrad, ... Joshua Feldman, in CISSP Study Guide [Third Edition], 2016

CISC and RISC

CISC [Complex Instruction Set Computer] and RISC [Reduced Instruction Set Computer] are two forms of CPU design. CISC uses a large set of complex machine language instructions, while RISC uses a reduced set of simpler instructions.

The “best” way to design a CPU has been a subject of debate: should the low-level commands be longer and powerful, using less individual instructions to perform a complex task [CISC], or should the commands be shorter and simpler, requiring more individual instructions to perform a complex task [RISC], but allowing less cycles per instruction and more efficient code? There is no “correct” answer: both approaches have pros and cons. ×86 CPUs [among many others] are CISC; ARM [used in many cell phones and PDAs], PowerPC, Sparc, and others are RISC.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780128024379000047

Instruction Sets

Marilyn Wolf, in Computers as Components [Third Edition], 2012

2.2 Preliminaries

In this section, we will look at some general concepts in computer architecture and programming, including some different styles of computer architecture and the nature of assembly language.

2.2.1 Computer Architecture Taxonomy

Before we delve into the details of microprocessor instruction sets, it is helpful to develop some basic terminology. We do so by reviewing a taxonomy of the basic ways we can organize a computer.

von Neumann architectures

A block diagram for one type of computer is shown in Figure 2.1. The computing system consists of a central processing unit [CPU] and a memory. The memory holds both data and instructions, and can be read or written when given an address. A computer whose memory holds both data and instructions is known as a von Neumann machine.

Figure 2.1. A von Neumann architecture computer.

The CPU has several internal registers that store values used internally. One of those registers is the program counter [PC], which holds the address in memory of an instruction. The CPU fetches the instruction from memory, decodes the instruction, and executes it. The program counter does not directly determine what the machine does next, but only indirectly by pointing to an instruction in memory. By changing only the instructions, we can change what the CPU does. It is this separation of the instruction memory from the CPU that distinguishes a stored-program computer from a general finite-state machine.

An alternative to the von Neumann style of organizing computers is the Harvard architecture, which is nearly as old as the von Neumann architecture. As shown in Figure 2.2, a Harvard machine has separate memories for data and program. The program counter points to program memory, not data memory. As a result, it is harder to write self-modifying programs [programs that write data values, then use those values as instructions] on Harvard machines.

Figure 2.2. A Harvard architecture.

Harvard architectures are widely used today for one very simple reason—the separation of program and data memories provides higher performance for digital signal processing. Processing signals in real time places great strains on the data access system in two ways: First, large amounts of data flow through the CPU; and second, that data must be processed at precise intervals, not just when the CPU gets around to it. Data sets that arrive continuously and periodically are called streaming data. Having two memories with separate ports provides higher memory bandwidth; not making data and memory compete for the same port also makes it easier to move the data at the proper times. DSPs constitute a large fraction of all microprocessors sold today, and most of them are Harvard architectures. A single example shows the importance of DSP: Most of the telephone calls in the world go through at least two DSPs, one at each end of the phone call.

Another axis along which we can organize computer architectures relates to their instructions and how they are executed. Many early computer architectures were what is known today as complex instruction set computers [CISC]. These machines provided a variety of instructions that may perform very complex tasks, such as string searching; they also generally used a number of different instruction formats of varying lengths. One of the advances in the development of high-performance microprocessors was the concept of reduced instruction set computers [RISC]. These computers tended to provide somewhat fewer and simpler instructions. RISC machines generally use load/store instruction sets—operations cannot be performed directly on memory locations, only on registers. The instructions were also chosen so that they could be efficiently executed in pipelined processors. Early RISC designs substantially outperformed CISC designs of the period. As it turns out, we can use RISC techniques to efficiently execute at least a common subset of CISC instruction sets, so the performance gap between RISC-like and CISC-like instruction sets has narrowed somewhat.

Instruction set characteristics

Beyond the basic RISC/CISC characterization, we can classify computers by several characteristics of their instruction sets. The instruction set of the computer defines the interface between software modules and the underlying hardware; the instructions define what the hardware will do under certain circumstances. Instructions can have a variety of characteristics, including:

fixed versus variable length;

addressing modes;

numbers of operands;

types of operations supported.

We often characterize architectures by their word length: 4-bit, 8-bit, 16-bit, 32-bit, and so on. In some cases, the length of a data word, an instruction, and an address are the same. Particularly for computers designed to operate on smaller words, instructions and addresses may be longer than the basic data word.

Little-endian vs. big-endian

One subtle but important characterization of architectures is the way they number bits, bytes, and words. Cohen [Coh81] introduced the terms little-endian mode [with the lowest-order byte residing in the low-order bits of the word] and big-endian mode [the lowest-order byte stored in the highest bits of the word].

We can also characterize processors by their instruction execution, a separate concern from the instruction set. A single-issue processor executes one instruction at a time. Although it may have several instructions at different stages of execution, only one can be at any particular stage of execution. Several other types of processors allow multiple-issue instruction. A superscalar processor uses specialized logic to identify at run time instructions that can be executed simultaneously. A VLIW processor relies on the compiler to determine what combinations of instructions can be legally executed together. Superscalar processors often use too much energy and are too expensive for widespread use in embedded systems. VLIW processors are often used in high-performance embedded computing.

The set of registers available for use by programs is called the programming model, also known as the programmer model. [The CPU has many other registers that are used for internal operations and are unavailable to programmers.]

Architectures and implementations

There may be several different implementations of an architecture. In fact, the architecture definition serves to define those characteristics that must be true of all implementations and what may vary from implementation to implementation. Different CPUs may offer different clock speeds, different cache configurations, changes to the bus or interrupt lines, and many other changes that can make one model of CPU more attractive than another for any given application.

The CPU is only part of a complete computer system. In addition to the memory, we also need I/O devices to build a useful system. We can build a computer from several different chips but many useful computer systems come on a single chip. A microcontroller is one form of a single-chip computer that includes a processor, memory, and I/O devices. The term microcontroller is usually used to refer to a computer system chip with a relatively small CPU and one that includes some read-only memory for program storage. A system-on-chip generally refers to a larger processor that includes on-chip RAM that is usually supplemented by an off-chip memory.

2.2.2 Assembly Languages

Figure 2.3 shows a fragment of ARM assembly code to remind us of the basic features of assembly languages. Assembly languages usually share the same basic features:

one instruction appears per line;

labels, which give names to memory locations, start in the first column;

instructions must start in the second column or after to distinguish them from labels;

comments run from some designated comment character [; in the case of ARM] to the end of the line.

Figure 2.3. An example of ARM assembly language.

Assembly language follows this relatively structured form to make it easy for the assembler to parse the program and to consider most aspects of the program line by line. [It should be remembered that early assemblers were written in assembly language to fit in a very small amount of memory. Those early restrictions have carried into modern assembly languages by tradition.] Figure 2.4 shows the format of an ARM data processing instruction such as an ADD. For the instruction

Figure 2.4. Format of an ARM data processing instruction.

ADDGT r0,r3,#5

the cond field would be set according to the GT condition [1100], the opcode field would be set to the binary code for the ADD instruction [0100], the first operand register Rn would be set to 3 to represent r3, the destination register Rd would be set to 0 for r0, and the operand 2 field would be set to the immediate value of 5.

Assemblers must also provide some pseudo-ops to help programmers create complete assembly language programs. An example of a pseudo-op is one that allows data values to be loaded into memory locations. These allow constants, for example, to be set into memory. An example of a memory allocation pseudo-op for ARM is:

BIGBLOCK % 10

The ARM % pseudo-op allocates a block of memory of the size specified by the operand and initializes those locations to zero.

2.2.3 VLIW Processors

CPUs can execute programs faster if they can execute more than one instruction at a time. If the operands of one instruction depend on the results of a previous instruction, then the CPU cannot start the new instruction until the earlier instruction has finished. However, adjacent instructions may not directly depend on each other. In this case, the CPU can execute several simultaneously.

Several different techniques have been developed to parallelize execution. Desktop and laptop computers often use superscalar execution. A superscalar processor scans the program during execution to find sets of instructions that can be executed together. Digital signal processing systems are more likely to use very long instruction word [VLIW] processors. These processors rely on the compiler to identify sets of instructions that can be executed in parallel. Superscalar processors can find parallelism that VLIW processors can't—some instructions may be independent in some situations and not others. However, superscalar processors are more expensive in both cost and energy consumption. Because it is relatively easy to extract parallelism from many DSP applications, the efficiency of VLIW processors can more easily be leveraged by digital signal processing software.

In modern terminology, a set of instructions is bundled together into a VLIW packet, which is a set of instructions that may be executed together. The execution of the next packet will not start until all the instructions in the current packet have finished executing. The compiler identifies packets by analyzing the program to determine sets of instructions that can always execute together.

Inter-instruction dependencies

To understand parallel execution, let's first understand what constrains instructions from executing in parallel. A data dependency is a relationship between the data operated on by instructions. In the example of Figure 2.5, the first instruction writes into r0 while the second instruction reads from it. As a result, the first instruction must finish before the second instruction can perform its addition. The data dependency graph shows the order in which these operations must be performed.

Figure 2.5. Data dependencies and order of instruction execution.

Branches can also introduce control dependencies. Consider this simple branch

 bnz r3,foo

 add r0,r1,r2

foo: …

The add instruction is executed only if the branch instruction that precedes it does not take its branch.

Opportunities for parallelism arise because many combinations of instructions do not introduce data or control dependencies. The natural grouping of assignments in the source code suggests some opportunities for parallelism that can also be influenced by how the object code uses registers. Consider the example of Figure 2.6. Although these instructions use common input registers, the result of one instruction does not affect the result of the other instructions.

Figure 2.6. Instructions without data dependencies.

VLIW processors examine inter-instruction dependencies only within a packet of instructions. They rely on the compiler to determine the necessary dependencies and group instructions into a packet to avoid combinations of instructions that can't be properly executed in a packet. Superscalar processors, in contrast, use hardware to analyze the instruction stream and determine dependencies that need to be obeyed.

VLIW and embedded computing

A number of different processors have implemented VLIW execution modes and these processors have been used in many embedded computing systems. Because the processor does not have to analyze data dependencies at run time, VLIW processors are smaller and consume less power than superscalar processors. VLIW is very well suited to many signal processing and multimedia applications. For example, cellular telephone base stations must perform the same processing on many parallel data streams. Channel processing is easily mapped onto VLIW processors because there are no data dependencies between the different signal channels.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780123884367000027

Stairway to Successful Kernel Exploitation

Enrico Perla, Massimiliano Oldani, in A Guide to Kernel Exploitation, 2011

CPU and Registers

The CPU's role is extremely simple: execute instructions. All the instructions that a CPU can execute comprise the architecture's instruction set. At the very least, a typical instruction set provides instructions for arithmetic and logic operations [add, sub, or, and, etc.], control flow [jump/branch, call, int, etc.], and memory manipulation [load, store, push, pop, etc.]. Since accessing memory is usually a slow operation [compared to the speed at which the CPU can crank instructions], the CPU has a set of local, fast registers. These registers can be used to store temporary values [general-purpose registers] or keep relevant control of information and data structures [special-purpose registers]. CPU instructions usually operate on registers.

Computer architectures are divided into two major families: RISC [Reduced Instruction Set Computer], which focuses on having simple, fixed-size instructions that can execute in a clock cycle; and CISC [Complex Instruction Set Computer], which has instructions of different sizes that perform multiple operations and that can execute for more than a single clock cycle. We can further differentiate the two based on how they access memory: RISC architectures require memory access to be performed through either a load [copy from memory] or a store instruction, whereas CISC architectures may have a single instruction to access memory and, for example, perform some arithmetic operation on its contents. For this reason, RISC architectures are also usually referred to as load-store architectures. On RISC architectures, apart from load, store, and some control flow instructions, all the instructions operate solely on registers.

Note

Today the distinction between RISC and CISC is blurry, and many of the issues of the past have less impact [e.g., binary size]. As an example, all recent x86 processors decode complex instructions into micro-operations [micro-ops], which are then executed by what is pretty much an internal RISC core.

The CPU fetches the instructions to execute from memory, reading a stream of bytes and decoding it accordingly to its instruction set.A A special-purpose register, usually called the instruction pointer [IP] or program counter [PC], keeps track of what instruction is being executed.

As we discussed in Chapter 2, a system can be equipped with a single CPU, in which case it is referred to as a uniprocessor [UP] system, or with multiple CPUs, in which case it is called a symmetric multiprocessing [SMP] system.B SMP systems are intrinsically more complex for an operating system to handle, since now true simultaneous execution is in place. From the attacker's point of view, though, SMP systems open more possibilities, especially when it comes to winning race conditions, as we will discuss later in this chapter.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9781597494861000036

Intel® Pentium® Processors

In Power and Performance, 2015

2.2.2 μops

Processor architectures are classified as either a Reduced Instruction Set Computer [RISC] or as a Complex Instruction Set Computer [CISC]. The difference between the two classifications is that RISC architectures have a small number of simple general purpose instructions that each perform one single operation, essentially providing the basic building blocks for computations. CISC architectures, on the other hand, have a large number of more complex instructions, that are each capable of performing multiple internal operations.

For example, consider performing an arithmetic operation on a value in memory. For a RISC architecture, the corresponding arithmetic instruction would only be capable of operating on a register. As a result, before the operation could begin, a load instruction would be issued to fetch the value from memory and store it into a register. Once that is complete, the operation would be performed, with the result stored in a register. Finally, a store instruction would be issued to commit the result back to memory. On the other hand, the arithmetic operation’s instruction for a CISC architecture would accept a memory operand. Assuming the memory operand is the instruction’s destination operand, this form of the instruction would automatically fetch the value from memory, perform the operation, and then commit the result back to memory, all in one instruction.

As a result, CISC architectures are often able to perform an algorithm in less instructions than a RISC architecture, since one CISC instruction can perform the equivalent work of multiple RISC instructions. On the other hand, due to the simplified nature of their instructions, RISC architectures are often less complex, and therefore require less silicon. Additionally, due to the logical separation of different instructions for specific tasks, RISC architectures are capable of scheduling and executing instructions at a finer granularity than CISC architectures.

The x86 family of processors are classified as CISC, since x86 instructions are capable of performing multiple internal operations. Starting with the Pentium Pro, Intel Architecture is actually a hybrid approach between the two. The instruction set is not modified, so x86 instructions are still CISC, but the Front End of the processor translates each instruction into one or more micro-ops, typically referred to as μops or sometimes just uops. These μops are very similar to RISC instructions, each specialized for a specific task.

Consider the previous example for how CISC and RISC architectures handle an arithmetic operation. The x86 instruction set still supports memory operands for that arithmetic instruction, making it appear CISC to the programmer; however, the Front End might decode that single instruction into three μops. The first, a load μop, might be responsible for loading the contents described by the memory operand. The second μop would then be responsible for performing the actual operation. The third μop would then be responsible for committing the result back to memory.

This hybrid approach gives Intel Architectures the benefits of both approaches. Since memory accesses can be expensive, fetching fewer instructions benefits performance. The CISC nature of the x86 instruction set can be thought of as opcode compression, thereby improving instruction fetch bandwidth. At the same time, by breaking these complex instructions into smaller μops, the execution pipeline can be more agile and flexible, as Section 2.2.3 describes.

The cost of this approach is a more complicated Front End, which requires logic for decoding instructions into μops. In general, this cost is insignificant compared to the performance improvement achieved.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780128007266000021

Cloud Infrastructure Servers

Caesar Wu, Rajkumar Buyya, in Cloud Data Centers and Cost Modeling, 2015

11.7 Summary

In this chapter, we concentrated on the server. We not only discussed x86 [or CISC] servers, but also explored the details of RISC servers, with a special focus on the Oracle/Sun SPARC server. From a computer evolution perspective, we can see why the client/server architecture has become the mainstream computer architecture in the data center:

1.

The Internet has become widespread and has been integrated with our daily life. The number of hosts has increased exponentially or 900% during the last 20 years or so.

2.

In contrast to a mainframe, the client/server architecture is very flexible and much easier to deploy.

3.

The cost of server deployment is only a fraction of that for a mainframe.

We also presented brief information about the major vendors that provide these servers:

X86 [CISC] processor: These are often produced by two major vendors, Intel and AMD.

RISC [SPARC] processors: These are mainly provided by Oracle/Sun and Fujitsu.

We clarified all terms, units, and jargon with regard to servers and processors. We aimed at establishing the foundation for our cost modeling because these units and terms, such as sockets, cores, methods, domains, and multithreading, will be the physical baseline to measure the cost of a cloud infrastructure.

As Extremetech illustrated, the Intel x86 chip has not only taken over the PC/workstation market, but also server market in data centers during the last 30 years. In comparison with RISC servers, the x86 or CISC server has gradually become the dominant computer in the server market and RISC servers have been losing ground.

The landscape of the server market has changed a lot since 2005. Cisco has been gaining momentum in x86 server [blade server] market share since 2009. Based on IDC data from the second quarter 2013, traditional server vendors are losing x86 server market share and “others” [including some Chinese vendors, such as Huawei and Lenovo] have gained a significant amount of the server market [40% in volume and 21% in revenue]. This has indicated that x86 servers have become a commodity type of product because many servers are OEM products made in China. This could be one of influential factors for many decision makers in making capex investment decisions. We will continue this discussion in later chapters. Despite major traditional server vendors [such as HP, IBM, and Dell] losing market share, they still hold over 50% market share in terms of volume and 70% in revenue [for all types of servers including mainframe, x86, RISC, and EPIC].

After we discussed the details of x86 servers or processors, we moved on to physical installation of servers. Traditionally, there were only two types of servers styles: tower and rack-mounted servers [or pizza boxes]. During the dot-com boom era between the late 1990s and early 2000s, people started to develop blade servers in order to deploy servers on a large scale while saving physical space and cabling. From an incremental perspective, the rack-mounted server would be a better solution for TCO/ROI. We will give more detail in later chapters.

In the final part of this chapter, we unveiled the details of RISC servers. We explained what a RISC server is. And what the differences between x86 [CISC] and RISC servers are. We also looked at why the RISC server is losing market share to x86?

In particular, we focused on Oracle/Sun SPARC servers. We explained the difference between M-series and T-series SPARC servers. We also listed both M-series and T-series SPARC server configurations and listed prices for both obsolete and current models.

At the end of this chapter, we briefly touched on SPARC logical domains [LDoMs] or the VM manager for SPARC in order to explain how to make an assumption of the number of VMs per physical SPARC server.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780128014134000118

Digital Signal Processors

James D. Broesch, in Digital Signal Processing, 2009

Technology Trade-offs

In a RISC processor, no instruction occupies more than one memory word; it can be fetched in 1 bus cycle and executes in 1 machine cycle. On the other hand, many RISC instructions may be needed to perform the same function as one CISC-type instruction, but in the RISC case, you can get the required complexity only when needed.

Getting analog signals into and out of a general-purpose microprocessor often requires a lot of external hardware. Some microcontrollers have built-in A/D and D/A converters, but in most cases, these converters only have 8- or 12-bit resolution, which is not sufficient in many applications. Sometimes these converters are also quite slow. Even if there are good built-in converters, there is always need for external sample-and-hold [S/H] circuits, and [analog] anti-aliasing and reconstruction filters.

Some microprocessors have built-in high-speed serial communication circuitry, serial peripheral interface [SPI] or I2C™. In such cases we still need to have external converters, but the interface will be easier than using the traditional approach, i.e., to connect the converters in parallel to the system bus. Parallel communication will of course be faster, but the circuits needed will be more complicated and we will be stealing capacity from a common, single system bus.

The interrupt facilities found on many general-purpose processors are in many cases “overkill” for signal processing systems. In this kind of real-time application, timing is crucial and synchronous programming is preferred. The number of asynchronous events, e.g., interrupts, is kept to a minimum. Digital signal processing systems using more than a few interrupt sources are rare. One single interrupt source [be it timing or sample rate] or none is common.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780750689762000080

ARM EMBEDDED SYSTEMS

ANDREW N. SLOSS, ... CHRIS WRIGHT, in ARM System Developer's Guide, 2004

1.1 THE RISC DESIGN PHILOSOPHY

The ARM core uses a RISC architecture. RISC is a design philosophy aimed at delivering simple but powerful instructions that execute within a single cycle at a high clock speed. The RISC philosophy concentrates on reducing the complexity of instructions performed by the hardware because it is easier to provide greater flexibility and intelligence in software rather than hardware. As a result, a RISC design places greater demands on the compiler. In contrast, the traditional complex instruction set computer [CISC] relies more on the hardware for instruction functionality, and consequently the CISC instructions are more complicated. Figure 1.1 illustrates these major differences.

Figure 1.1. CISC vs. RISC. CISC emphasizes hardware complexity. RISC emphasizes compiler complexity.

The RISC philosophy is implemented with four major design rules:

1.

Instructions—RISC processors have a reduced number of instruction classes. These classes provide simple operations that can each execute in a single cycle. The compiler or programmer synthesizes complicated operations [for example, a divide operation] by combining several simple instructions. Each instruction is a fixed length to allow the pipeline to fetch future instructions before decoding the current instruction. In contrast, in CISC processors the instructions are often of variable size and take many cycles to execute.

2.

Pipelines—The processing of instructions is broken down into smaller units that can be executed in parallel by pipelines. Ideally the pipeline advances by one step on each cycle for maximum throughput. Instructions can be decoded in one pipeline stage. There is no need for an instruction to be executed by a miniprogram called microcode as on CISC processors.

3.

Registers—RISC machines have a large general-purpose register set. Any register can contain either data or an address. Registers act as the fast local memory store for all data processing operations. In contrast, CISC processors have dedicated registers for specific purposes.

4.

Load-store architecture—The processor operates on data held in registers. Separate load and store instructions transfer data between the register bank and external memory. Memory accesses are costly, so separating memory accesses from data processing provides an advantage because you can use data items held in the register bank multiple times without needing multiple memory accesses. In contrast, with a CISC design the data processing operations can act on memory directly.

These design rules allow a RISC processor to be simpler, and thus the core can operate at higher clock frequencies. In contrast, traditional CISC processors are more complex and operate at lower clock frequencies. Over the course of two decades, however, the distinction between RISC and CISC has blurred as CISC processors have implemented more RISC concepts.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9781558608740500022

The Software-Defined Radio as a Platform for Cognitive Radio

Max Robert, Bruce A. Fette, in Cognitive Radio Technology [Second Edition], 2009

General-Purpose Processors

General-purpose processors are the target processors that probably first come to mind to anyone writing a computer program. GPPs are the processors that power desktop computers and are at the center of the computer revolution that began in the 1970s. The landscape of microprocessor design is dotted with a large number of devices from a variety of manufacturers. These different processors, while unique in their own right, do share some similarities, namely, a generic instruction set, an instruction sequencer, and a memory management unit [MMU].

There are two general types of instruction sets: [1] machines with fairly broad instruction sets, known as complex instruction set computers [CISCs]; and [2] machines with a narrow instruction set, known as reduced instruction set computers [RISCs]. Generally, the CISC instructions give the assembly programmer powerful instructions that address efficient implementation of certain common software functions. RISC instruction sets, while narrower, are designed to produce efficient code from compilers. The differences between the CISC and RISC are arbitrary, and both styles of processors are converging into a single type of instruction set. Regardless of whether the machine is CISC or RISC, they both share a generic nature to their instructions. These include instructions that perform multiplication, addition, or storage, but these instruction sets are not tailored to a particular type of application. In the context of CR, the application in which we are most interested is signal processing.

The other key aspect of the GPP is the use of an MMU. Because GPPs are designed for generic applications, they are usually coupled with an operating system. This operating system creates a level of abstraction over the hardware, allowing the development of applications with little or no knowledge of the underlying hardware. Management of memory is a tedious and error-prone process, and in a system running multiple applications, memory management includes paging memory, distributed programming, and data storage throughout different blocks of memory. An MMU allows the developer to “see” a contiguous set of memory, even though the underlying memory structure may be fragmented or too difficult to control in some other fashion [especially in a multitasking system that has been running continuously for an extended period of time]. Given the generic nature of the applications that run on a GPP, an MMU is critical because it allows the easy blending of different applications with no special care needed on the developer's part.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780123745354000035

Instruction Sets

Marilyn Wolf, in Computers as Components [Fourth Edition], 2017

2.2.1 Computer architecture taxonomy

Before we delve into the details of microprocessor instruction sets, it is helpful to develop some basic terminology. We do so by reviewing a taxonomy of the basic ways through which we can organize a computer.

von Neumann architectures

A block diagram for one type of computer is shown in Fig. 2.1. The computing system consists of a central processing unit [CPU] and a memory. The memory holds both data and instructions and can be read or written when given an address. A computer whose memory holds both data and instructions is known as a von Neumann machine.

Figure 2.1. A von Neumann architecture computer.

The CPU has several internal registers that store values used internally. One of those registers is the program counter [PC], which holds the address in memory of an instruction. The CPU fetches the instruction from memory, decodes the instruction, and executes it. The program counter does not directly determine what the machine does next, but only indirectly by pointing to an instruction in memory. By changing only the instructions, we can change what the CPU does. It is this separation of the instruction memory from the CPU that distinguishes a stored-program computer from a general finite-state machine.

Harvard architectures

An alternative to the von Neumann style of organizing computers is the Harvard architecture, which is nearly as old as the von Neumann architecture. As shown in Fig. 2.2, a Harvard machine has separate memories for data and program. The program counter points to program memory, not data memory. As a result, it is harder to write self-modifying programs [programs that write data values, then use those values as instructions] on Harvard machines.

Figure 2.2. A Harvard architecture.

Harvard architectures are widely used today for one very simple reason—the separation of program and data memories provides higher performance for digital signal processing. Processing signals in real time places great strains on the data access system in two ways: first, large amounts of data flow through the CPU; and second, that data must be processed at precise intervals, not just when the CPU gets around to it. Data sets that arrive continuously and periodically are called streaming data. Having two memories with separate ports provides higher memory bandwidth; not making data and memory compete for the same port also makes it easier to move the data at the proper times. DSPs constitute a large fraction of all microprocessors sold today, and most of them are Harvard architectures. A single example shows the importance of DSP: most of the telephone calls in the world go through at least two DSPs, one at each end of the phone call.

RISC versus CISC

Another axis along which we can organize computer architectures relates to their instructions and how they are executed. Many early computer architectures were what is known today as complex instruction set computers [CISC]. These machines provided a variety of instructions that may perform very complex tasks, such as string searching; they also generally used a number of different instruction formats of varying lengths. One of the advances in the development of high-performance microprocessors was the concept of reduced instruction set computers [RISC]. These computers tended to provide somewhat fewer and simpler instructions. RISC machines generally use load/store instruction sets—operations cannot be performed directly on memory locations, only on registers. The instructions were also chosen so that they could be efficiently executed in pipelined processors. Early RISC designs substantially outperformed CISC designs of the period. As it turns out, we can use RISC techniques to efficiently execute at least a common subset of CISC instruction sets, so the performance gap between RISC-like and CISC-like instruction sets has narrowed somewhat.

Instruction set characteristics

Beyond the basic RISC/CISC characterization, we can classify computers by several characteristics of their instruction sets. The instruction set of the computer defines the interface between software modules and the underlying hardware; the instructions define what the hardware will do under certain circumstances. Instructions can have a variety of characteristics, including:

fixed versus variable length;

addressing modes;

numbers of operands;

types of operations supported.

Word length

We often characterize architectures by their word length: 4-bit, 8-bit, 16-bit, 32-bit, and so on. In some cases, the length of a data word, an instruction, and an address are the same. Particularly for computers designed to operate on smaller words, instructions and addresses may be longer than the basic data word.

Little-endian versus big-endian

One subtle but important characterization of architectures is the way they number bits, bytes, and words. Cohen [Coh81] introduced the terms little-endian mode [with the lowest-order byte residing in the low-order bits of the word] and big-endian mode [the lowest-order byte stored in the highest bits of the word].

Instruction execution

We can also characterize processors by their instruction execution, a separate concern from the instruction set. A single-issue processor executes one instruction at a time. Although it may have several instructions at different stages of execution, only one can be at any particular stage of execution. Several other types of processors allow multiple-issue instruction. A superscalar processor uses specialized logic to identify at run time instructions that can be executed simultaneously. A VLIW processor relies on the compiler to determine what combinations of instructions can be legally executed together. Superscalar processors often use too much energy and are too expensive for widespread use in embedded systems. VLIW processors are often used in high-performance embedded computing.

The set of registers available for use by programs is called the programming model, also known as the programmer model. [The CPU has many other registers that are used for internal operations and are unavailable to programmers.]

Architectures and implementations

There may be several different implementations of an architecture. In fact, the architecture definition serves to define those characteristics that must be true of all implementations and what may vary from implementation to implementation. Different CPUs may offer different clock speeds, different cache configurations, changes to the bus or interrupt lines, and many other changes that can make one model of CPU more attractive than another for any given application.

CPUs and systems

The CPU is only part of a complete computer system. In addition to the memory, we also need I/O devices to build a useful system. We can build a computer from several different chips, but many useful computer systems come on a single chip. A microcontroller is one form of a single-chip computer that includes a processor, memory, and I/O devices. The term microcontroller is usually used to refer to a computer system chip with a relatively small CPU and one that includes some read-only memory for program storage. A system-on-chip generally refers to a larger processor that includes on-chip RAM that is usually supplemented by an off-chip memory.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780128053874000029

Tiny computers, hidden control

Tim Wilmshurst, in Designing Embedded Systems with PIC Microcontrollers [Second Edition], 2010

1.3.2 Instruction sets – the Complex Instruction Set Computer and the Reduced Instruction Set Computer

Any CPU has a set of instructions that it recognises and responds to; all programs are built up in one way or another from this instruction set. We want computers to execute code as fast as possible, but how to achieve this aim is not always an obvious matter. One approach is to build sophisticated CPUs with vast instruction sets, with an instruction ready for every foreseeable operation. This leads to the CISC, the Complex Instruction Set Computer. A CISC has many instructions and considerable sophistication. Yet the complexity of the design needed to achieve this tends to lead to slow operation. One characteristic of the CISC approach is that instructions have different levels of complexity. Simple ones can be expressed in a short instruction code, say one byte of data, and execute quickly. Complex ones may need several bytes of code to define them and take a long time to execute.

Another approach is to keep the CPU very simple and have a limited instruction set. This leads to the RISC approach – the Reduced Instruction Set Computer. The instruction set, and hence overall design, is kept simple. This leads to fast operation. One characteristic of the RISC approach is that each instruction is contained within a single binary word. That word must hold all information necessary, including the instruction code itself, as well as any address or data information also needed. A further characteristic, an outcome of the simplicity of the approach, is that every instruction normally takes the same amount of time to execute.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9781856177504100022

Why does a computer system need instructions to perform a task?

For a computer to know how to do anything, it must be provided instructions. For example, asking the computer to draw a square requires a set of instructions telling the computer how to draw the square. In Logo, a user could complete this task by giving the computer the below set of instructions.

What is the purpose of a set of instructions?

An instruction set is a group of commands for a central processing unit [CPU] in machine language. The term can refer to all possible instructions for a CPU or a subset of instructions to enhance its performance in certain situations.

What is a set of instructions given to the computer to perform a task?

A set of instruction given to a computer is called a program. Computer programming refers to the detail or steps of instructions given to a computer in an appropriate computer language, which enable the computer to perform a variety of tasks in sequence or even intermittently.

Is a set of instructions needed for a computer to work it helps to run the computer hardware?

Software – a set of instructions or programs that tells a computer what to do or how to perform a specific task [computer software runs on hardware]. Main types of software – systems software and application software.

Chủ Đề