How to Know When You Get Overflow When Doing Arithmetic and Logical Shifts

Organization of Computer Systems:
§ 3: Computer Arithmetic

Teacher: G.Southward. Schmalz

Reading Assignments and Exercises

This department is organized as follows:

    3.1. Arithmetic and Logic Operations
    3.2. Arithmetic Logic Units and the MIPS ALU
    3.3. Boolean Multiplication and Division
    three.four. Floating Signal Arithmetic
    3.v. Floating Signal in MIPS

Information independent herein was compiled from a multifariousness of text- and Web-based sources, is intended every bit a teaching help only (to be used in conjunction with the required text, and is not to exist used for whatsoever commercial purpose. Particular thanks is given to Dr. Enrique Mafla for his permission to utilize selected illustrations from his grade notes in these Web pages.

In lodge to secure your understanding of the topics in this section, students should review the give-and-take of number representation in Section 2.4, specially twos complement.

iii.1. Arithmetics and Logic Operations

Reading Assignments and Exercises

The ALU is the core of the computer - it performs arithmetic and logic operations on information that non only realize the goals of diverse applications (east.one thousand., scientific and applied science programs), but also manipulate addresses (east.chiliad., pointer arithmetic). In this section, we will overview algorithms used for the basic arithmetic and logical operations. A primal assumption is that twos complement representation will be employed, unless otherwise noted.

iii.1.ane. Boolean Addition

When calculation two numbers, if the sum of the digits in a given position equals or exceeds the modulus, then a carry is propagated. For example, in Boolean addition, if two ones are added, the sum is manifestly 2 (base x), which exceeds the modulus of 2 for Boolean numbers (B = Z 2 = {0,1}, the integers modulo two). Thus, we record a zero for the sum and propagate a carry valued at 1 into the next more significant digit, as shown in Figure 3.one.

Figure three.ane. Instance of Boolean add-on with carry propagation, adapted from [Maf01].

3.1.ii. Boolean Subtraction

When subtracting two numbers, two alternatives present themselves. Kickoff, 1 tin can codify a subtraction algorithm, which is distinct from addition. Second, 1 can negate the subtrahend (i.e., in a - b, the subtrahend is b) then perform addition. Since nosotros already know how to perform addition besides as twos complement negation, the 2d alternative is more than practical. Figure 3.two illustrates both processes, using the decimal subtraction 12 - 5 = 7 as an example.

Figure 3.2. Case of Boolean subtraction using (a) unsigned binary representation, and (b) addition with twos complement negation - adapted from [Maf01].

Just equally we have a carry in addition, the subtraction of Boolean numbers uses a borrow. For case, in Figure iii.2a, in the commencement (least significant) digit position, the difference 0 - 1 in the one'south place is realized by borrowing a 1 from the two's identify (side by side more than significant digit). The borrow is propagated upward (toward the most significant digit) until it is zeroed (i.due east., until we encounter a departure of i - 0).

three.one.three. Overflow

Overflow occurs when at that place are bereft bits in a binary number representation to portray the event of an arithmetics operation. Overflow occurs because estimator arithmetics is not closed with respect to addition, subtraction, multiplication, or division. Overflow cannot occur in addition (subtraction), if the operands accept different (resp. identical) signs.

To detect and compensate for overflow, 1 needs northward+1 $.25 if an n-bit number representation is employed. For instance, in 32-flake arithmetic, 33 bits are required to detect or compensate for overflow. This can be implemented in improver (subtraction) by letting a carry (infringe) occur into (from) the sign bit. To make a pictorial case of convenient size, Figure 3.iii illustrates the four possible sign combinations of differencing 7 and 6 using a number representation that is four bits long (i.e., tin can represent integers in the interval [-viii,7]).

Figure 3.3. Example of overflow in Boolean arithmetics, adapted from [Maf01].

3.one.4. MIPS Overflow Handling

MIPS raises an exception when overflow occurs. Exceptions (or interrupts) act like process calls. The register $epc stores the address of the teaching that caused the interrupt, and the pedagogy

mfc register, $epc

moves the contents of $epc to annals. For instance, register could exist $t1. This is an efficient approach, since no conditional branch is needed to test for overflow.

Ii'south complement arithmetic operations (add together, addi, and sub instructions) raise exceptions on overflow. In contrast, unsigned arithmetics (addu and addiu) instructions do non raise an exception on overflow, since they are used for arithmetic operations on addresses (recall our discussion of pointer arithmetic in Section 2.6). In terms of high-level languages, C ignores overflows (ever uses addu, addiu, and subu), while FORTRAN uses the appropriate pedagogy to detect overflow. Effigy 3.4 illustrates the apply of conditional co-operative on overflow for signed and unsigned improver operations.

Figure three.4. Example of overflow in Boolean arithmetic, adapted from [Maf01].

3.ane.5. Logical Operations

Logical operations apply to fields of $.25 within a 32-bit give-and-take, such as bytes or bit fields (in C, equally discussed in the side by side paragraph). These operations include shift-left and shift-right operations (sll and srl), as well every bit bitwise and, or (and, andi, or, ori). As we saw in Department 2, bitwise operations treat an operand as a vector of bits and operate on each bit position.

C bit fields are used, for example, in programming communications hardware, where manipulation of a scrap stream is required. In Figure 3.5 is presented C code for an example communications routine, where a structure called receiver is formed from an eight-scrap field chosen receivedByte and two 1-bit fields called set and enable. The C routine sets receiver.ready to 0 and receiver.enable to one.

Figure 3.5. Example of C bit field apply in MIPS, adapted from [Maf01].

Annotation how the MIPS code implements the functionality of the C code, where the land of the registers $s0 and $s1 is illustrated in the 5 lines of diagrammed register contents below the code. In particular, the initial register state is shown in the first two lines. The sll instruction loads the contents of $s1 (the receiver) into $s0 (the information register), and the result of this is shown on the 2nd line of the register contents. Next, the srl education left-shifts $s0 24 $.25, thereby discarding the enable and ready field information, leaving just the received byte. To signal the receiver that the data transfer is completed, the andi and ori instructions are used to fix the enable and gear up bits in $s1, which corresponds to the receiver. The data in $s0 has already been received and put in a register, so there is no need for its farther manipulation.

3.2. Arithmetic Logic Units and the MIPS ALU

Reading Assignments and Exercises

In this section, we discuss hardware building blocks, ALU design and implementation, besides every bit the pattern of a 1-bit ALU and a 32-flake ALU. Nosotros and then overview the implementation of the MIPS ALU.

3.2.one. Bones Concepts of ALU Design

ALUs are implemented using lower-level components such as logic gates, including and, or, not gates and multiplexers. These building blocks work with individual bits, but the bodily ALU works with 32-fleck registers to perform a diverseness of tasks such equally arithmetic and shift operations.

In principle, an ALU is built from 32 separate 1-fleck ALUs. Typically, one constructs separate hardware blocks for each task (e.g., arithmetic and logical operations), where each performance is applied to the 32-fleck registers in parallel, and the selection of an operation is controlled past a multiplexer. The advantage of this approach is that it is easy to add new operations to the teaching fix, simply past associating an operation with a multiplexer command lawmaking. This can be done provided that the mux has sufficient chapters. Otherwise, new information lines must be added to the mux(es), and the CPU must be modified to accomodate these changes.

three.two.2. one-bit ALU Pattern

As a result, the ALU consists of 32 muxes (one for each output bit) arranged in parallel to send output bits from each operation to the ALU output.

3.2.two.1. And/Or Operations. As shown in Figure 3.half dozen, a simple (1-bit) ALU operates in parallel, producing all possible results that are so selected by the multiplexer (represented past an oval shape at the output of the and / or gates. The output C is thus selected past the multiplexer. (Note: If the multiplexer were to exist applied at the input(s) rather than the output, twice the amount of hardware would be required, because there are two inputs versus one output.)

Figure iii.half-dozen. Example of a simple 1-bit ALU, where the oval represents a multiplexer with a command lawmaking denoted past Op and an output denoted by C - adjusted from [Maf01].

3.ii.2.two. Full Adder. Now let the states consider the one-bit adder. Recalling the carry situation shown in Figure 3.1, nosotros show in Figure three.7 that there are two types of carries - behave in (occurs at the input) and carry out (at the output).

Figure 3.7. Carry-in and carry-out in Boolean addition, adjusted from [Maf01].

Here, each bit of addition has 3 input $.25 (Ai, Bi, and CarryIni), as well every bit ii output $.25 (Sumi, CarryOuti), where CarryIni+one = CarryOuti. (Note: The "i" subscript denotes the i-th flake.) This relationship can exist seen when considering the total adder's truth tabular array, shown below:

Given the four ane-valued results in the truth table, we tin use the sum-of-products method to construct a one-flake adder circuit from four 3-input and gates and ane iv-input or gate, equally shown in Figure 3.8a. The CarryOut calculation can be similarly implemented with three two-input and gates and one three-input or gate, as shown in Effigy 3.8b. These two circuits tin can be combined to effect a 1-flake full adder with carry, as shown in Figure 3.8c.

(a)                                                                         (b)

(c)

Effigy 3.7. Total adder excursion (a) sum-of-products course from above-listed truth table, (b) CarryOut production, and (c) one-fleck full adder with carry - adapted from [Maf01].

Recalling the symbol for the one-scrap adder, nosotros tin add an add-on operation to the one-bit ALU shown in Figure 3.6. This is done by putting ii control lines on the output mux, and by having an boosted control line that inverts the b input (shown every bit "Binvert") in Figure 3.9).

(a)                                                             (b)

Figure 3.nine. One-bit ALU with three operations: and, or, and add-on: (a) Least significant bit, (b) Remaining bits - adjusted from [Maf01].

iii.2.3. 32-fleck ALU Pattern

The final implementation of the preceding technique is in a 32-chip ALU that incorporates the and, or, and add-on operations. The 32-bit ALU can be only constructed from the one-flake ALU by chaining the carry bits, such that CarryIni+one = CarryOuti, as shown in Effigy 3.x.

Effigy 3.x. 32-bit ALU with 3 operations: and, or, and add-on - adapted from [Maf01].

This yields a composite ALU with two 32-chip input vectors a and b, whose i-th bit is denoted by a i and b i, where i = 0..31. The upshot is besides a 32-bit vector, and at that place are two control buses - one for Binvert, and one for selecting the performance (using the mux shown in Figure 3.9). There is 1 CarryOut bit (at the lesser of Figure iii.x), and no CarryIn.

We side by side examine the MIPS ALU and how it supports operations such as shifting and branching.

three.ii.4. MIPS ALU Design

We begin by assuming that nosotros have the generic one-scrap ALU designed in Sections 3.2.1-iii.2.3, and shown beneath:

Here, the Bnegate input is the same equally the Binvert input in Figure 3.ix, and we assume that we have three control inputs to the mux whose control line configuration is associated with an operation, as follows:

3.2.4.1. Back up for the slt Instruction. The slt instruction (assault less-than) has the post-obit format:

slt rd, rs, rt

where rd = 1 if rs < rt, and rd = 0 otherwise.

Observe that the inputs rs and rt can correspond high-level language input variables A and B. Thus, we have the post-obit implication:

A < B => A - B < 0 ,

which is implemented equally follows:

    Step one. Perform subtraction using negation and a full adder

    Step ii. Bank check most meaning bit (sign bit)

    Step 3. Sign bit tells us whether or not A < B

To implement slt, we need (a) new input line called Less that goes directly to the mux, and (b) a new control lawmaking (111) to select the slt operation. Unfortunately, the result for slt cannot be taken directly as the output from the adder. Instead, we demand a new output line called Set up that is used only for the slt educational activity. Overflow detection logic is also associated with this scrap. The boosted logic that supports slt is shown in Figure iii.11.

Figure 3.11. One-scrap ALU with additional logic for slt functioning - adjusted from [Maf01].

Thus, for a 32-scrap ALU, the additional cost of the slt pedagogy is (a) augmentation of each of 32 muxes to accept three control lines instead of two, (b) augmentation of each of 32 one-bit ALU'southward command point structure to have an additional (Less) input, and (c) the addition of overflow detection circuitry, a Prepare output, and an xor gate on the output of the sign bit.

three.2.4.two. Back up for the bne Teaching. Recall the branch-on-non-equal instruction bne r1, r2, Label, where r1 and r2 denote registers and Label is a branch target label or address. To implement bne, we notice that the following implication holds:

A - B = 0 => A = B .

then add together hardware to exam if the comparing betwixt A and B implemented as (A - B) is zero. Again, this can be done using negation and the total adder that we have already designed as office of the ALU. The additional step is to or all 32 results from each of the one-scrap ALUs, and so invert the output of the or operation. Thus, if all 32 bits from the i-flake full adders are zero, and so the output of the or gate volition exist zero (inverted, it will exist one). Otherwise, the output of the or gate wil be one (inverted, information technology volition be cipher). We besides need to consider A - B, to see if there is overflow when A = 0. A cake diagram of the hardware modification is shown in Figure three.12.

Effigy 3.12. 32-bit ALU with additional logic to support bne and slt instructions - adapted from [Maf01].

Here, the additional hardware involves 32 separate output lines from the 342 one-bit adders, likewise as a pour of or gates to implement a 32-input nor gate (which doesn't exist in do, due to excessive fan-in requirement).

three.2.4.3. Support for Shift Instructions. Considering the sll, srl, and sra instructions, these are supported in the ALU under design by adding a information line for the shifter (both left and right). Notwithstanding, the shifters are much more easily implemented at the transistor level (due east.yard., exterior the ALU) rather than trying to fit more circuitry onto the ALU itself.

In club to implement a shifter external to the ALU, we consider the pattern of a barrel shifter, shown schematically in Effigy 3.xiii. Here, the closed siwtch pattern, denoted by black filled circles, is controlled by the CPU through control lines to a mux or decoder. This allows data line 10i to be sent to output tenj, where i and j tin be unequal.

Figure 3.thirteen. Iv bit barrel shifter, where "x >> 1" denotes a shift amount greater than 1 - adapted from [Maf01].

This type of N-bit shifter is well understood and like shooting fish in a barrel to construct, but has infinite complexity of O(Due north2).

3.2.4.iv. Support for Immediate Instructions. In the MIPS immediate instruction formats, the first input to the ALU is the first annals (nosotros'll call information technology rs) in the immediate command, while the second input is either data from a register rt or a zero or sign-extended constant (immediate). To support this type of pedagogy, we demand to add a mux at the second input of the ALU, every bit shown in Effigy 3.14. This allows us to select whether rt or the sign-extended immediate is input to the ALU.

Figure iii.14. Supporting immediate instructions on a MIPS ALU pattern, where IR denotes the pedagogy register, and (/xvi) denotes a 16-bit parallel bus - adapted from [Maf01].

3.2.5. ALU Operation Problems

When estimating or measuring ALU performance, one wonders if a 32-scrap ALU is every bit fast equally a ane-bit ALU - what is the degree of parallelism, and do all operations execute in parallel? In do, some operations on North-bit operands (east.g., addition with sequential propagation of carries) take O(N) time. Other operations, such as bitwise logical operations, take O(1) time. Since addition tin can be implemented in a multifariousness of ways, each with a certain level of parallelism, it is wise to consider the possibility of a full adder being a computational bottleneck in a simple ALU.

We previously discussed the ripple-bear adder (Effigy 3.ten) that propagates the carry bit from stage i to stage i+1. It is readily seen that, for an N-bit input, O(North) fourth dimension is required to propagate the bear to the most pregnant bit. In dissimilarity, the fastest Due north-chip adder uses O(log2Northward) stages in a tree-structured configuration with Due north-1 i-bit adders. Thus, the complexity of this technique is O(log2Due north) piece of work. In a sequential model of computation, this translates to O(logiiNorth) fourth dimension. If one is calculation smaller numbers (e.g., upwards to x-scrap integers with current memory technology), then a lookup table tin can be used that (one) forms a memory address A by concatenating binary representations of the two operands, and (2) produces a issue stored in memory that is accessed using A. This takes O(1) time, that is dependent upon retentivity bandwidth.

An intermediate arroyo between these extremes is to use a carry-lookahead adder (CLA). Suppose we practice not know the value of the carry-in bit (which is normally the instance). We tin express the generation (m) of a carry fleck for the i-thursday position of ii operands a and b, as follows:

gi = ai bi ,

where the i-th bits of a and b are and-ed. Similarly, the propagated acquit is expressed equally:

pi = ai + bi ,

where the i-th bits of a and b are or-ed. This allows the states to recursively express the carry $.25 in terms of the carry-in c0, equally follows:

Did we get rid of the ripple? (Well, sort of...) What we did was transform the work involved in conduct propagation from the adder circuitry to a large equation for cN. However, this equation must still be computed in hardware. (Lesson: In computing, you don't get much for costless.)

Unfortunately, it is prohibitively plush to build a CLA circuit for operands as large as 16 bits. Instead, nosotros can apply the CLA principle to create a two-tiered circuit, for case, at the bottom level an array of 4 4-bit full adders (economical to construct), continued at the tiptop level by a CLA, as shown below:

Using a two-level CLA architecture, where lower- (upper-)instance g and p denote the first (second) level generates and carries, we take the post-obit equations:

      P0 = p3 + p2 + pi + p0
      P1 = p7 + phalf dozen + p5 + p4
      P2 = pxi + p10 + pnine + p8
      Piii = p15 + pfourteen + p13 + p12

      Thou0 = g3 + pthree10002 + piiipiig1 + p3ptwop1g0
      G1 = mvii + pseveng6 + p7phalf-dozeng5 + p7p6p5gfour
      G2 = gxi + p11chiliad10 + pelevenp1010009 + p11p10p9yard8
      Chiliadthree = grand15 + p15g14 + pfifteenp14grand13 + pfifteenp14pxiiione thousand12

Assuming that and as well every bit or gates accept the same propagation filibuster, comparative analysis of the ripple bear vs. carry lookahead adders reveals that the total time to compute a CLA result is the summation of all gate delays along the longest path through the CLA. In the case of the 16-bit adder exemplified higher up, the CarryOut signals csixteen and C4 define the longest path. For the ripple carry adder, this path has length two(16) = 32.

For the two-level CLA, we get 2 levels of logic in terms of the compages (P and Chiliad versus p and g). Pi is specified in ane level of logic using pi. Gi is specified in 1 level of logic using pi and ki. Also, pi and gi each stand for one level of logic computed in terms of inputs ai and bi. Thus, the CLA critical path length is ii + two + 1 = 5, which means that ii-level xvi-bit CLA is 6.4 = 32/5 times faster than a 16-chip ripple carry adder.

It is also useful to note that the logic equation for a one-bit adder tin be expressed more than but with xor logic, for case:

A + B = A xor B xor CarryIn .

In some technologies, xor is more efficient than and/or gates. Likewise, processors are at present designed in CMOS engineering science, which allows fewer muxes (this also applies to the butt shifter). Notwithstanding, the blueprint principles are similar.

three.ii.vi. Summary

We take shown that it is feasible to build an ALU to support the MIPS ISA. The key idea is to use a multiplexer to select the output from a collection of functional units operating in parallel. Nosotros can replicate a 1-flake ALU that uses this principle, with appropriate connections between replicates, to produce an Northward-bit ALU.

Important things to remember about ALUs are: (a) all of the gates are working in parallel, (b) the speed of a gate is affected past the number of inputs (degree of fan-in), and (c) the speed of a circuit depends on the number of gates in the longest computational path through the circuit (this can vary per operation). Finally, nosotros have shown that changes in architectural organization tin can improve performance, similar to ameliorate algorithms in software.

3.iii. Boolean Multiplication and Division

Reading Assignments and Exercises

Multiplication is more complicated than addition, being implemented past shifting also as addition. Because of the partial products involved in near multiplication algorithms, more fourth dimension and more than excursion area is required to compute, allocate, and sum the partial products to obtain the multiplication upshot.

iii.3.1. Multiplier Design

We herein hash out three versions of the multiplier design based on the pencil-and-paper algorithm for multiplication that we all learned in grade school, which operates on Boolean numbers, as follows:

          Multiplicand:   0010   # Stored in annals r1           Multiplier:   x 1101   # Stored in register r2           -------------------- 	  Partial Prod    0010   # No shift for LSB of Multiplier              "      "    0000    # 1-bit shift of zeroes          (tin omit)          "      "   0010     # two-bit shift for chip 2 of Multiplier              "      "  0010      # 3-bit shift for bit three of Multiplier           --------------------   # Goose egg-fill up the partial products and add together           PRODUCT      0011010   # Sum of all partial products -> r3        

A flowchart of this algorithm, adjusted for multiplication of 32-bit numbers, is shown in Figure 3.15, beneath, together with a schematic representation of a simple ALU circuit that implements this version of the algorithm. Here, the multiplier and the multiplicand are shifted relative to each other, which is more efficient than shifting the partial products alone.


(a)

(b)
Effigy 3.15. Pencil-and-paper multiplication of 32-bit Boolean number representations: (a) algorithm, and (b) simple ALU circuitry - adjusted from [Maf01].

The 2d version of this algorithm is shown in Figure iii.16. Here, the product is shifted with respect to the multiplier, and the multiplicand is shifted after the product register has been shifted. A 64-bit annals is used to shop both the multiplicand and the product.


(a)

(b)
Effigy iii.sixteen. Second version of pencil-and-paper multiplication of 32-chip Boolean number representations: (a) algorithm, and (b) schematic diagram of ALU circuitry - adjusted from [Maf01].

The last version puts results in the product annals if and only if the least significant bit of the product produced on the previous iteration is one-valued. The product register simply is shifted. This reduces by approximately 50 pct the amount of shifting that has to be done, which reduces time and hardware requirements. The algorithm and ALU schematic diagram is shown in Effigy iii.17.


(a)

(b)
Effigy three.17. Third version of pencil-and-paper multiplication of 32-chip Boolean number representations: (a) algorithm, and (b) schematic diagram of ALU circuitry - adjusted from [Maf01].

Thus, we have the following shift-and-add scheme for multiplication:

The preceding algorithms and circuitry does not hold for signed multiplication, since the $.25 of the multiplier no longer correspond to shifts of the multiplicand. The following example is illustrative:

A solution to this problem is Booth's Algorithm, whose flowchart and corresponding schematic hardware diagram are shown in Figure three.18. Here, the exam of the multiplier is performed with lookahead toward the next bit. Depending on the bit configuration, the multiplicand is positively or negatively signed, and the multiplier is shifted or unshifted.


(a)

(b)
Figure iii.18. Booth'southward procedure for multiplication of 32-chip Boolean number representations: (a) algorithm, and (b) schematic diagram of ALU circuitry - adjusted from [Maf01].

Find that Booth's algorithm requires only the addition of a subtraction step and the comparison operations for the two-bit codes, versus the one-bit comparison in the preceding three algorithms. An example of Berth's algorithm follows:

Here N = iv iterations of the loop are required to produce a product from 2 North = 4 digit operands. Four shifts and two subtractions are required. From the analysis of the algorithm shown in Effigy 3.18a, it is hands seen that the maximum work for multiplying 2 N-bit numbers is given by O(Due north) shift and add-on operations. From this, the worst-example ciphering fourth dimension tin can be computed given CPI for the shift and improver instructions, as well as bicycle time of the ALU.

3.iii.two. Design of Arithmetic Division Hardware

Division is a like operation to multiplication, especially when implemented using a procedure like to the algorithm shown in Effigy 3.18a. For example, consider the pencil-and-paper method for dividing the byte 10010011 by the nybble 1011:

The governing equation is every bit follows:

Dividend = Quotient · Divisor + Remainder .

3.three.two.one. Unsigned Division. The unsigned division algorithm that is similar to Booth's algorithm is shown in Figure 3.19a, with an example shown in Figure three.19b. The ALU schematic diagram in given in Effigy three.19c. The analysis of the algorithm and circuit is very similar to the preceding discussion of Booth's algorithm.


(a)

(b)

(c)
Figure 3.xix. Division of 32-chip Boolean number representations: (a) algorithm, (b) example using division of the unsigned integer 7 by the unsigned integer 3, and (c) schematic diagram of ALU circuitry - adapted from [Maf01].

3.3.ii.2. Signed Divisiion. With signed division, nosotros negate the quotient if the signs of the divisor and dividend disagree. The remainder and the divident must have the same signs. The governing equation is as follows:

Remainder = Divident - (Quotient · Divisor) ,

and the following four cases employ:

We present the preceding partition algorithm, revised for signed numbers, as shown in Figure iii.20a. Four examples, corresponding to each of the four preceding sign permutations, are given in Figure 3.20b and 3.20c.

(a)

(b)

(c)
Figure 3.xx. Division of 32-bit Boolean number representations: (a) algorithm, and (b,c) examples using partition of +7 or -7 by the integer +iii or -3; adjusted from [Maf01].

Self-Exercise. Exist able to trace each example shown in Figure 3.20b,c through the algorithm whose flowchart is given in Effigy iii.20a. Know how each role of the algorithm works, and why it behaves that mode. Hint: This practice, or a role of it, is likely to be an test question.

three.3.2.3. Divisiion in MIPS. MIPS supports multiplication and partitioning using existing hardware, primarily the ALU and shifter. MIPS needs one extra hardware component - a 64-fleck register able to support sll and sra instructions. The upper (high) 32 bits of the register contains the remainder resulting from division. This is moved into a register in the MIPS register stack (e.g., $t0) by the mfhi command. The lower 32 bits of the 64-bit register contains the caliber resulting from sectionalisation. This is moved into a register in the MIPS register stack past the mflo command.

In MIPS assembly linguistic communication code, signed division is supported by the div educational activity and unsigned segmentation, by the divu education. MIPS hardware does non check for division by zero. Thus, divide-past-zero exception must be detected and handled in system software. A similar comment holds for overflow or underflow resulting from division.

Figure iii.21 illustrates the MIPS ALU that supports integer arithmetics operations (+,-,10,/).

Figure 3.21. MIPS ALU supporting the integer arithmetics operations (+,-,ten,/), adapted from [Maf01].

Cocky-Do. Show how the MIPS ALU in Effigy 3.21 supports the integer arithmetic operations (+,-,x,/) using the algorithms and hardware diagrams given thus far. Hint: This exercise, or a part of information technology, is likely to be an exam question.

3.four. Floating Indicate Arithmetic

Reading Assignments and Exercises

Floating point (FP) representations of decimal numbers are essential to scientific ciphering using scientific annotation. The standard for floating betoken representation is the IEEE 754 Standard. In a computer, at that place is a tradeoff between range and precision - given a fixed number of binary digits (bits), precision can vary inversely with range. In this section, we overview decimal to FP conversion, MIPS FP instructions, and how registers are used for FP computations.

We have seen that an n-bit annals can stand for unsigned integers in the range 0 to iin-one, also as signed integers in the range -2n-1 to -2n-1-one. Notwithstanding, there are very big numbers (due east.thousand., three.15576 · ten23), very small numbers (eastward.grand., 10-25), rational numbers with repeated digits (e.g., 2/3 = 0.666666...), irrationals such as 21/2, and transcendental numbers such as e = 2.718..., all of which demand to be represented in computers for scientific computation to be supported.

Nosotros call the manipulation of these types of numbers floating signal arithmetic because the decimal signal is not fixed (as for integers). In C, such variables are declared as the float datatype.

3.four.1. Scientific Notation and FP Representation

Scientific note has the following configuration:

and can be in normalized course (mantissa has exactly i digit to the left of the decimal point, e.chiliad., 2.3425 · x-19) or non-normalized form. Binary scientiic annotation has the folowing configuration, which corresponds to the decimal forms:

Assume that we have the following normal format for scientific notation in Boolean numbers:

+1.xxxxxxx2 · wyyyyy2 ,

where "xxxxxxx" denotes the significand and "yyyyy" denotes the exponent and we assume that the number has sign Due south. This implies the post-obit 32-bit representation for FP numbers:

which can represent decimal numbers ranging from -2.0 · 10-38 to 2.0 · x38.

3.iv.ii Overflow and Underflow

In FP, overflow and underflow are slightly different than in integer numbers. FP overflow (underflow) refers to the positive (negative) exponent being too large for the number of bits alloted to it. This trouble can exist somewhat ameliorated by the use of double precision, whose format is shown as follows:

Here, ii 32-flake words are combined to support an 11-bit signed exponent and a 52-bit significand. This representation is declared in C using the double datatype, and can back up numbers with exponents ranging from -30810 to 308ten. The chief advantage is greater precision in the mantissa.

The following chart illustrates specific types of overflow and underflow encountered in standard FP representation:

3.four.iii. IEEE 754 Standard

Both unmarried- and double-precision FP representations are supported by the IEEE 754 Standard, which is used in the vast majority of computers since its publication in 1980. IEEE 754 facilitates the porting of FP programs, and ensures minimum standards of quality for FP estimator arithmetic. The result is a signed representation - the sign bit is 1 if the FP number represented by IEEE754 is negative. Otherwise, the sign is zero. A leading value of ane in the significand is implicit for normalized numbers. Thus, the significand, which always has a value between zero and one, occupies 23 + i $.25 in single-precision FP and 52 + 1 bits in double precision. Zero is represented by a nada significand and a naught exponent - there is no leading value of one in the significand. The IEEE 754 representation is thus computed as:

FPnumber = (-one)S · (ane + Significand) · 2Exponent .

Every bit a parenthetical note, the significand can be translated into decimal values via the following expansion:

With IEEE 754, it is possible to manipulate FP numbers without having special-purpose FP hardware. For case, consider the sorting of FP numbers. IEEE 754 facilitates breaking FP numbers up into three parts (sign, significant, exponent). The numbers to exist sorted are ordered first co-ordinate to sign (negative < positive), second according to exponent (larger exponent => larger number), and third according to significand (when one has at least two numbers with the same exponents).

Another issue of interest in IEEE 754 is biased notation for exponents. Observe that twos complement note does not work for exponents: the largest negative (positive) exponent is 000000012 (11111111two). Thus, we must add a bias term to the exponent to center the range of exponents on the bias number, which is then equated to nothing. The bias term is 127 (1023) for the IEEE 754 single-precision (double-precision) representation. This implies that

FPnumber = (-i)S · (1 + Significand) · ii(Exponent - Bias) .

Equally a effect, we have the following case of binary to decimal floating point conversion:

Decimal-to-binary FP conversion is somewhat more hard. 3 cases pertain: (1) the decimal number can be expressed as a fraction n/d where d is a power of 2; (2) the decimal number has repeated digits (e.g., 0.33333); or (three) the decimal number does not fit either Case 1 or Case 2. In Case 1, one selects the exponent as -logtwo(d), and converts n to binary notation. Instance three is more difficult, and will not exist discussed hither. Example two is exemplified in the following diagram:

Hither, the significand is 101 0101 0101 0101 0101 0101, the sign is negative (representation = ane), and the exponent is computed as 1 + 127 = 12810 = thousand 00002. This yields the post-obit representation in IEEE 754 standard notation:

The following tabular array summarizes special values that can exist represented using the IEEE 754 standard.

Table 3.ane. Special values in the IEEE 754 standard.

Of item interest in the preceding tabular array is the NaN (not a number) representation. For example, when taking the square root of a negative number, or when dividing by goose egg, we encounter operations that are undefined in the arithmetics operations over real numbers. These results are called NaNs and are represented with an exponent of 255 and a zero significand. NaNs tin can assistance with debugging, just they contaminate calculations (due east.g., NaN + x = NaN). The recommended approach to NaNs, especially for software designers or engineers early on in their respective careers, is non to use NaNs.

Another variant of FP representation is denormalized numbers, as well called denorms. These number representations were developed to remedy the problem of a gap amidst representable FP numbers almost zippo. For example, the smallest positive number is 10 = 1.00... · 2-127, and the second smallest positive number is y = 1.001two · 2-127 = 2-127 + 2-150. This implies that the gap betwixt null and x is 2-127 and that the gap between x and y is 2-150, as shown in Figure three.22a.

(a)                                                                          (b)

Figure iii.22. Denorms: (a) Gap between zero and two-127, and (b) Denorms close this gap - adapted from [Maf01].

This state of affairs can be remedied past omitting the leading 1 from the significand, thereby denormalizing the FP representation. The smallest positive number is now the denorm 0.0...ane · ii-127 = two-150, and the 2d smallest positive number is 2-149.

3.four.four. FP Arithmetics

Applying mathematical operations to existent numbers implies that some error will occur due to the floating point representation. This is due to the fact that FP add-on and subtraction are not associative, because the FP representation is simply an approximation to a real number.

Case 1. Using decimal numbers for clarity, permit x = -i.five · 1038, y = one.5 · x38, and z = 1.0. With floating betoken representation, we take:

x + (y + z) = -one.5 · ten38 + (1.five · x38 + 1.0) = 0.0

and

(10 + y) + z = (-i.5 · x38 + 1.5 · 1038) + 1.0 = 1.0

The difference occurs because the value i.0 cannot be distinguished in the significand of 1.5 · ten38 due to insufficient precision (number of digits) of the significand in the FP representation of these numbers (IEEE 754 causeless).

The preceding example leads to several implementational bug in FP arithmetic. Firstly, rounding occurs when performing math on real numbers, due to lack of sufficient precision. For example, when multiplying ii N-bit numbers, a 2N-flake product results. Since merely the upper N $.25 of the 2N bit product are retained, the lower N $.25 are truncated. This is also chosen rounding toward zero.

Some other type of rounding is chosen rounding to infinity. Here, if rounding toward +infinity, then we ever round up. For instance, 2.001 is rounded up to three, -two.001 is rounded up to ii. Conversely, if rounding toward -infinity, and then nosotros always round down. For instance, 1.999 is rounded down to 1, -1.999 is rounded down to -2. There is a more familiar technique, for example, where three.vii is rounded to 4, and 3.1 is rounded to 3. In this instance, nosotros resolve rounding from n.5 to the nearest even number, due east.g., three.five is rounded to 4, and -2.five is rounded to ii.

A second implementational event in FP arithmetics is addition and subtraction of numbers that have nonzero significands and exponents. Unlike integer addition, nosotros tin't merely add the significands. Instead, one must:

  1. Denormalize the operands and shift one of the operands to make the exponents of both numbers equal (we denote the exponent by E).
  2. Add or subtract the significands to get the resulting significand.
  3. Normalize the resulting significand and change East to reverberate whatsoever shifts incurred by normalization.

We will review several approaches to floating indicate operations in MIPS in the following section.

three.five. Floating Point in MIPS

Reading Assignments and Exercises

The MIPS FP architecture uses split floating point insturctions for IEEE 754 unmarried and double precision. Single precision uses add.s, sub.s, mul.s, and div.south, whereas double precision instructions are add.d, sub.d, mul.d, and div.d. These instructions are much more complicated than their integer counterparts. Problems with implementing FP arithmetic include inefficiencies in having different instructions that take significantly unlike times to execute (east.g., division versus addition). Too, FP operations require much more than hardware than integer operations.

Thus, in the spirit of RISC blueprint philosophy, we annotation that (a) a detail datum is not likely to change its datatype within a programme, and (b) some types of programs practice non require FP computation. Thus, in 1990, the MIPS designers decided to separate the FP computations from the remainder of the ALU operations, and use a separate flake for FP (called the coprocessor). A MIPS coprocessor contains 32 32-bit registers designated every bit $f0, $f1, ..., etc. Most of these registers are specified in the .southward and .d instructions. Double precision operands are stored in register pairs (e.chiliad., $f0,$f1 up to $f30,$f31).

The CPU thus handles all the regular computation, while the coprocessor handles the floating point operations. Special instructions are required to movement information between the coprocessor(s) and CPU (e.thou., mfc0, mtc0, mfc0, mtc0, etc.), where cnorthward refers to coprocessor #n. Similarly, special I/O operations are required to load and store data between the coprocessor and retentiveness (e.g., lwc0, swc0, lwc1, swc1, etc.)

FP coprocessors require very complex hardware, as shown in Figure 3.23, which portrays only the hardware required for addition.

Figure 3.23. MIPS ALU supporting floating point addition, adapted from [Maf01].

The employ of floating indicate operations in MIPS associates code is described in the following simple instance, which implements a C program designed to convert Fahrenheit temperatures to Celsius.

Here, we presume that there is a coprocessor c1 continued to the CPU. The values v.0 and 9.0 are respectively loaded into registers $f16 and $f18 using the lwc1 instruction with the global pointer as base of operations accost and the variables const5 and const9 equally offsets. The single precision sectionalization functioning puts the quotient of 5.0/nine.0 into $f16, and the remainder of the computation is straightforward. As in all MIPS procedure calls, the jr educational activity returns command to the address stored in the $ra annals.

References

[Maf01] Mafla, East. Course Notes, CDA3101, at URL http://world wide web.cise.ufl.edu/~emafla/ (as-of 11 April 2001).

[Pat98] Patterson, D.A. and J.L. Hennesey. Computer Organization and Design: The Hardware/Software Interface, Second Edition, San Francisco, CA: Morgan Kaufman (1998).

olsonwhavuld.blogspot.com

Source: https://www.cise.ufl.edu/~mssz/CompOrg/CDA-arith.html

0 Response to "How to Know When You Get Overflow When Doing Arithmetic and Logical Shifts"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel