MESSAGE
DATE | 2014-12-09 |
FROM | Ruben Safir
|
SUBJECT | Subject: [NYLXS - HANGOUT] CPU Instructions - Notes for Chapter 14
|
the details of branching decisions have been sacrified for time
http://www.nylxs.com/docs/grad_school/arch/cpu_design.txt.html
Unless I missede something, I'm done with notes for this class for the time being
HW and reviews are next. I'm showering and will be in the school in about 1.5 hours, doing HW.
1 15.0 CPU construction: 2 3 15.01 CU 4 15.02 ALU 5 0.021 Status Flag 6 0.022 Shifter 7 0.025 Complimentor 8 0.026 Internal CPU Bus 9 0.027 registers 10 11 15.1 Registers: Two kinds 12 13 15.11 User Visible Registers 14 15.111 Enable Machine Language Programs 15 15.112 Minimize main memory references 16 17 15.12 Control Status Registers 18 15.121 Used by CU 19 15.122 Privileged Operations 20 15.123 OS control to execute programs 21 22 15.13 The separation is not cut and dry 23 24 15.13 User Visible Registers Usage: 25 15.131 General Purpose Registers 26 1311 Various programmatic functions 27 1312 Can have an operand for any opcode 28 1313 Some may by dedicated for Floating Point or Stack Operations 29 1314 Can be used, as shown, for addressing fuctions (Register, Indirect, 30 Displacement) 31 1315 Sometimes they are separated between Data and Address Registers 32 13151 Data Registers: Only to hold data and can not be employed for 33 calculation of an operand address 34 13152 Address Registers: 35 521 Segment Pointers 36 522 Index Registers 37 523 Stack Pointers: Points to the top of the stack 38 39 15.132 Optcodes have be limited to specialized registers according to their 40 function. It saves a bit but limits the programmer. 41 42 15.133 The more registers require the more operand specific bits 43 15.134 Between 8-32 registers seem optimum 44 15.135 RISK processors use 100s of registers 45 15.136 Registers must be large enough to do their job of holding memory 46 addresses or storing data 47 48 15.137 Condition Code Registers - Flags 49 15.1371 Reduce Tests and Compares 50 15.1372 Branch Flags are simpler that optcodes for these purposes 51 15.1373 Facilitate Multiway branching 52 53 15.1374 They add complexity for the programmer 54 15.1375 They are irregular and not part of the main memory branch 55 15.1376 Often condition code machines must add special non-condition code 56 instructions for special situations anyway, such as bit checking, loop 57 control, and atomic semaphore operations. ??? WHAT??? 58 15.1377 Need to be synchronized in pipeline burst usage 59 15.1378 Subroutines will autosave all visible registers to be returned 60 when the routine is finished 61 62 15.14 Control and Status Registers: Not usually user visable 63 15.1401 Some are visiable to Machine Codes and functions in Operating 64 system modes. 65 66 15.141 Essential Registers 67 15.1411 Program Counter 68 15.1412 Instruction Register 69 15.1413 Memory Address Register 70 15.1414 Memory Buffer Register 71 15.14141 The fetched instruction is loaded into an IR, 72 where the opcode and operand specifiers are analyzed. 73 74 15.142 The ALU might have direct access to the MBR and the registers 75 76 15.143 Program Status Word - Register that contains status information 77 15.1431 Sign: Contains the sign bit of the result of the last arithmetic operation. 78 15.1432 Zero: Set when the result is 0. 79 15.1433 Carry: Set if an operation resulted in a carry (addition) into or borrow 80 (subtraction) out of a high-order bit. Used for multiword arithmetic operations. 81 15.1434 Equal: Set if a logical compare result is equality. 82 15.1435 Overflow: Used to indicate arithmetic overflow. 83 15.1436 Interrupt Enable/Disable: Used to enable or disable interrupts. 84 15.1437 Supervisor: Indicates whether the processor is executing in supervisor or 85 user mode. Certain privileged instructions can be executed only in 86 supervisor mode, and certain areas of memory can be accessed only in 87 supervisor mode. 88 15.144 Blocks, Sectors Stacks and Subroutines need controls and pointers 89 15.145 Sample CPU Register Design 90 91 http://www.nylxs.com/images/sample_cpu_register_design.png 92 93 15.2 Instruction Cycle: As we learned from before: 94 15.21 Fetch, Execute, Interupt 95 15.22 Indirect Cycle: Fetching Indirect Addresses are one more 96 instruction stage. 97 http://www.nylxs.com/images/instructioncycle_with_indirection.png 98 99 15.3 Data Flow: 100 .31 During the fetch cycle, an instruction is read from memory. 101 .32 The PC contains the address of the next instruction to be fetched. 102 This address is moved to the MAR and placed on the address bus. 103 .33 The control unit requests a memory read 104 .331 the result is placed on the data bus 105 .332 Result is copied into the MBR 106 .333 and then moved to the IR. 107 108 .34 control unit examines the contents of the IR 109 .341 Checks for Indirection 110 .3411 Indrection Cycle, puts the A on the MAR to fetch the real 111 operand 112 .35 Execute Cycle is perform: Very specific to the hardware and 113 difficult to generalize 114 .36 Interrupt Cycle : simple and predictable. The current contents of 115 the PC must be saved so that the processor can resume normal activity 116 after the interrupt. 117 .361 PC are transferred to the MBR to be written into memory. 118 .362 Special memory location reserved for this purpose is loaded 119 into the MAR from the control unit. 120 .363 It might, for example, be a stack pointer. 121 .364 The PC is loaded with the address of the interrupt 122 routine. 123 124 15.4 Pipelining Strategy 125 .41 An Assembly Line approach to memory usage. 126 .42 Inputs are received prior to the finish of the instruction cycle 127 for the previous instruction 128 .43 Instruction prefetch or fetch overlap. 129 .44 When a conditional branch instruction is passed on from the fetch to the ex- 130 ecute stage, the fetch stage fetches the next instruction in memory after the branch 131 instruction. Then, if the branch is not taken, no time is lost. If the branch is taken, the 132 fetched instruction must be discarded and a new instruction fetched. 133 134 .45 Example: 135 Let us consider the following decomposition of the instruction processing. 136 • Fetch instruction (FI): Read the next expected instruction into a buffer. 137 • Decode instruction (DI): Determine the opcode and the operand specifiers. 138 • Calculate operands (CO): Calculate the effective address of each source 139 operand. This may involve displacement, register indirect, indirect, or other 140 forms of address calculation. 141 • Fetch operands (FO): Fetch each operand from memory. Operands in regis- 142 ters need not be fetched. 143 • Execute instruction (EI): Perform the indicated operation and store the result, 144 if any, in the specified destination operand location. 145 • Write operand (WO): Store the result in memory. 146 147 The Savings in Time is viewable in this chart: 148 http://www.nylxs.com/images/pipeline_savings.png 149 150 .46 Pipeline has to have logic to know that if a condition is changed 151 that affects instructions in the pipeline, that those pipelined 152 instruction are not valid. Data in a memory location might be changed, 153 for example. 154 155 .461 Pipeline by breaking down increasingly smaller tasks has overhead 156 and can limit pipelining efficiency 157 158 .4611 two factors that frustrate this seemingly simple pattern 159 for high- performance design [ANDE67a], and they remain elements 160 that designer must still consider: 161 162 .46111. At each stage of the pipeline, there is some overhead 163 involved in moving data from buffer to buffer and in performing 164 various preparation and delivery functions. This overhead 165 can appreciably lengthen the total execution time of a single 166 instruction. This is significant when sequential instructions are 167 logical- ly dependent, either through heavy use of branching or 168 through memory access dependencies. 169 170 .46112. The amount of control logic required to handle memory 171 and register dependencies and to optimize the use of the 172 pipeline increases enormously with the number of stages. This 173 can lead to a situation where the logic controlling the gating 174 between stages is more complex than the stages being controlled. 175 176 .4612 Latching Delay - it takes time for the buffers to fill 177 178 .462 Pipeline Performance: 179 t = max [ti] + d = tm + d 180 1 =< i =< k 181 where 182 183 ti = time delay of the circuitry in the ith stage of the pipeline 184 tm = maximum stage delay (delay through stage which experiences the 185 largest delay) 186 k = number of stages in the instruction pipeline 187 d = time delay of a latch, needed to advance signals and data from one stage 188 to the next 189 190 .463 Pipeline Hazards: 191 .4631 Resource Hazards: two instructions in the pipeline need the same 192 resource 193 .4632 Data Hazards: Two instructions in the pipeline are affecting the 194 same data and stepping on each other. 195 .46321 Read after Write (RAW) - True Dependency 196 .46322 Write after Read - Anti-dependency 197 .46323 Write after Write - Output Dependency 198 .4633 Control Hazard: Unexpected branches - pipeline has to be flushed 199 .4634 Branching Strategies: 200 341 Multiple Streams - Guess both branches and do them both until one 201 is discarded 202 342 Prefetch Branch Target: Do the target and store it in a cache 203 until needed 204 343 Loop Buffer: Cache recent instructions and look to see if they 205 are be recalled. If so, pull them from the buffer. 206 .4635 Branch Prediction - Good luck with that 207 .46351 • Predict never taken 208 .46352 • Predict always taken 209 .46353 • Predict by opcode 210 .46354 • Taken/not taken switch 211 .46355 • Branch history table 212 213 214 215
|
|