1 # Single-Issue, In-Order Processor Core
3 note: as of the time of writing, this task is 95-98% completed and requires
4 approximately 10-15 lines of python code to get it actually running a first unit test.
6 * First steps for a newbie developer [[docs/firststeps]]
7 * bugreport <http://bugs.libre-riscv.org/show_bug.cgi?id=1039>
9 The Libre-SOC TestIssuer core
10 utilises a Finite-State Machine (FSM) to control the fetch/dec/issue/exec
11 Computational Units, with only one such CompUnit (a FSM or a pipeline) being active at any given time. This is good
12 for debugging the HDL, but severly restricts performance as a single
13 instruction will take tens of clock cycles to complete. In-development
14 (Andrey to research and link to the relevant bugreport) is an in-order
15 core and following on from that will be an out-of-order core.
17 A Single-Issue In-Order control unit (written 12+ months ago) will allow every pipepline to be active,
18 and raises the ideal maximum throughput to 1 instruction per clock cycle,
19 bearing any register hazards.
21 This control unit has not been written in HDL yet (incorrect: the first version was written 12+ months ago, and is in soc/ and there are options in the Makefile to enable it), however there's currently a
22 task to develop the model for the simulator first. The model will be used to
23 determine performance.
25 Diagram that Luke drew comparing pipelines and fsms which allows for a transition from FSM to in-order to out-of-order and also allows "Micro-Coding".
27 [[!img /3d_gpu/pipeline_vs_fsms.jpg size="600x"]]
32 * [Bug description](https://bugs.libre-soc.org/show_bug.cgi?id=1039)
34 The model for the Single-Issue In-Order core needs to be added to the in-house
35 Python simulator (`ISACaller`, called by `pypowersim`), which will allow basic
36 *performance estimates*. INCORRECT - pypowersim *outputs an execution trace log*
37 which **after the fact** may be passed to **any** model of which the in-order
38 model is **just the very first**.
40 For now, this model resides outside the simulator, and
41 is *completely standalone* **and will ALWAYS remain standalone**
43 A subtask to be carried out **as incremental development**
44 is that avatools source code will need to be studied to extract
45 power consumption estimation and add that into the inorder model
50 * [Bug comment #1](https://bugs.libre-soc.org/show_bug.cgi?id=1039#c1)
51 * [IRC log](https://libre-soc.org/irclog/%23libre-soc.2023-05-02.log.html#t2023-05-02T10:51:45)
53 The offline instruction ordering analyser need to be **COMPLETED**
54 (it is currently 98% complete) that models a
55 (simple, initially V3.0-only) **in-order core** and gives an estimate of
56 instructions per clock (IPC).
58 Hazard Protection **WHICH IS ALREADY COMPLETED** is a straightforward, simple bit vector
59 (WRONG it is a "length of pipeline countdown until result is ready" which models the
60 clock cycles needed in the ACTUAL pipeline(s)? the "bit" you refer to is
61 "is there an entry in the python set() for this register yes-or-no")
63 - Take the write result register number: set bit WRONG "add num-cycles-until-ready to the set()"
64 - For all read registers, check corresponding bit WRONG call the function that checks if there is an entry in the "python set() of expected outstanding results to be written" . If bit is set, STALL (fake/
67 A stall is defined as a delay in execution of an instruction in order to
68 resolve a hazard (i.e. trying to read a register while it is being written to).
69 See the [wikipedia article on Pipeline Stall](https://en.wikipedia.org/wiki/Pipeline_stall)
71 Input **IS** (98% completed, remember?):
73 - Instruction with its operands (as assembler listing)
74 - plus an optional memory-address and whether it is read or written.
76 The input will come as a trace output from the ISACaller simulator,
77 [see bug comments #7-#16](https://bugs.libre-soc.org/show_bug.cgi?id=1039#c7)
79 Some classes needed (WRONG: ALREADY WRITTEN) which "model" pipeline stages: fetch, decode, issue,
82 One global "STALL" flag will cause all buses to stop:
84 - Tells fetch to stop fetching
85 - Decode stops (either because empty, or has instrution whose read reg's and
88 - Execute (pipelines) run as an empty slot (except for the initial instruction
91 Example (PC chosen arbitrarily):
94 cmpi 1, 0, 3, 4 #PC=12
95 ld 1, 2(3) #PC=16 EA=0x12345678
97 The third operand of `cmpi` is the register which to use in comparison, so
98 register 3 needs to be read. However, `addi` will be writing to this register,
99 and thus a STALL will occur when `cmpi` is in the decode phase.
101 The output diagram will look like this:
103 TODO, move this to a separate file then *include it twice*, once with triple-quotes
104 and once without. grep "inline raw=yes" for examples on how to include in mdwn
107 | clk # | fetch | decode | issue | execute |
108 |:-----:|:------------:|:------------:|:------------:|:------------:|
109 | 1 | addi 3,4,5 | | | |
110 | 2 | cmpi 1,0,3,4 | addi 3,4,5 | | |
111 | 3 | STALL | cmpi 1,0,3,4 | addi 3,4,5 | |
112 | 4 | STALL | cmpi 1,0,3,4 | | addi 3,4,5 |
113 | 5 | ld 1,2(3) | | cmpi 1,0,3,4 | |
114 | 6 | | ld 1,2(3) | | cmpi 1,0,3,4 |
115 | 7 | | | ld 1,2(3) | |
116 | 8 | | | | ld 1,2(3) |
122 2: Decoded addi, fetched cmpi.
123 3: Issued addi, decoded cmpi, must stall decode phase, stop fetching.
124 4: Executed addi, everything else stalled.
125 5: Issued cmpi, fetched ld.
126 6: Executed cmpi, decoded ld.
130 For this initial model, it is assumed that all instructions take one cycle to
131 execute (not the case for mul/div etc., but will be dealt with later.
135 # Code Explanation - *IN PROGRESS*
137 *(Not all of the code has been explained, just the general classes.)*
139 Source code: <https://git.libre-soc.org/?p=openpower-isa.git;a=tree;f=src/openpower/cyclemodel>
141 ## `Hazard` namedtuple data structure
143 A `namedtuple` object stores the attributes of the register access. The
144 python `namedtuple` is immutable (like a normal tuple), while also allowing to
145 access elements by predefined names. Immutability is great because the register
146 access attributes won't change from fetch to execution stages, which is why a
147 normal `list` or `dict` wouldn't be appropriate.
149 Unlike a normal dictionary, a `namedtuple` is also ordered (so the initially
150 defined order is preserved). See the
151 [python wiki on `namedtuple`](https://docs.python.org/3.7/library/collections.html#collections.namedtuple),
152 [online namedtuple tutorial](https://realpython.com/python-namedtuple/),
155 `namedtuple` instances can also be stored in sets, which is exactly how it is
156 used with the `RegisterWrite` class. One instruction trace may contain zero or
157 more `Hazard` register access objects (depending on whether registers are
158 needed for the instruction).
162 A dictionary of currently supported register file types. Each entry (register
163 file type) defines the number of read and write ports, written as a tuple, with
164 the first entry being the number of read ports, and second entry being the
165 number of write ports.
167 Having multiple read and/or write ports means that multiple **different**
168 entries in the same register file can be read from and/or written to in the
170 This doesn't prevent a stall if the same register entry is used
171 by a consecutive instruction, even if a spare port is available
172 (Read-after-Write hazard).
174 ## Parsing trace file dump using `read_file` function
176 The `CPU` model class takes as input, a single instruction trace `list` object.
178 This trace `list` object, is produced by the function
179 `read_file` which itself reads an instruction trace file from modified
180 `ISACaller` ([link to code needed](LINK)).
181 From now on, the trace `list` object will simply be referred to as `trace`.
183 Each line of the trace dump is of the form
184 `[{rw}:FILE:regnum:offset:width]* # insn` where:
186 - `rw` is the register to be used for reading (operands), or writing
187 (to store result, condition codes, etc.).
188 - `FILE` is the register file type (GPR/integer, FPR/floating-point, etc. see
189 Additional Information section at the end of this page).
190 *(TODO: use section reference link instead)*.
191 - `regnum` is the register number
192 - `offset` *TODO: Perhaps the offset of data in bytes??? no idea (right now not
193 important, as examples all show 0 offset)*
194 - `width` is the length of the data in bits to be accessed from the register.
195 - `insn` is the full instruction written in PowerISA assembler.
197 The block `[{rw}:FILE:regnum:offset:width]` is used zero or more times,
198 based on the total number of read and write registers used for the instruction.
200 Example trace file with three instructions:
202 r:GPR:0:0:64 w:GPR:1:0:64 # addi 1, 0, 0x0010
203 r:GPR:0:0:64 w:GPR:2:0:64 # addi 2, 0, 0x1234
204 r:GPR:1:0:64 r:GPR:2:0:64 # stw 2, 0(1)
206 The instruction trace file is processed line by line, where each line split into
207 the register access atributes (from which a new namedtuple is created using
208 `_make()` and the `Hazard` definition; see
209 [python wiki on _make() method](https://docs.python.org/3.7/library/collections.html#collections.somenamedtuple._make)).
211 Each line is converted to a `trace` object of the form:
212 `[insn, Hazard(...), Hazard(...), ...]`. An example trace looks like this:
214 ['addi 1, 0, 0x0010',
215 Hazard(action='r', target='GPR', ident='0', offs='0',elwid='64'),
216 Hazard(action='w', target='GPR', ident='1', offs='0', elwid='64')]
218 The function `read_file` yields (see [python wiki on yield]()) a single `trace`
219 for each line of the trace file. To produces a full list of
220 traces all the user needs to do is to call `read_file` with the filename of the
221 `ISACaller` instruction trace dump, and assign to a new variable (which will
222 end up being a list of `trace` objects, ready to be iterated over for the CPU
227 A class which is based on a Python set, and is used to keep track of current
228 registers used for writing (for detecting Read-after-Write Hazards).
230 A [python wiki on sets](https://docs.python.org/3.7/tutorial/datastructures.html#sets)
231 is an unordered collection with **no duplicate elements**.
233 By checking if next instruction's read registers match any of the write
234 registers in the RegWrite set, the model can raise a STALL.
236 Anything in the set **MUST STALL** at the Decode phase because the
237 currently issued/executed instruction's result has not been written to the
238 register/s needed for the consecutive instruction.
245 Initialise `RegisterWrite` set.
247 def expect_write(self, regs):
248 return self.storage.update(regs)
250 If there are new registers to be written to, add them to the current
253 def write_expected(self, regs):
254 return (len(self.storage.intersection(regs)) != 0)
256 Boolean flag which is true if no read registers need to be written to (by
257 previous instruction).
259 def retire_write(self, regs):
260 return self.storage.difference_update(regs)
262 Remove write registers from `RegisterWrite` set if they match the given read
265 ## `get_input_regs` and `get_output_regs` functions
271 The `CPU` class models the in-order, single-issue core. Contains the
272 `RegisterWrite` set for tracking Read-after-Write Hazards, fetch, decode, issue,
273 and execute stages, as well as a `stall` flag for indicating if the CPU is
276 The input to the model is a trace `list` object.
278 The main methods used during the running of the model is
279 `process_instructions()`, which is called every time an instruction trace
280 `list` object is read from a trace file.
285 self.regs = RegisterWrite()
286 self.fetch = Fetch(self)
287 self.decode = Decode(self)
288 self.issue = Issue(self)
289 self.exe = Execute(self)
292 def reads_possible(self, regs):
293 # TODO: subdivide this down by GPR FPR CR-field.
294 # currently assumes total of 3 regs are readable at one time
297 while len(possible) < 3 and len(r) > 0:
298 possible.add(r.pop())
301 def writes_possible(self, regs):
302 # TODO: subdivide this down by GPR FPR CR-field.
303 # currently assumes total of 1 reg is possible regardless of what it is
306 while len(possible) < 1 and len(r) > 0:
307 possible.add(r.pop())
310 def process_instructions(self):
312 stall = self.fetch.process_instructions(stall)
313 stall = self.decode.process_instructions(stall)
314 stall = self.issue.process_instructions(stall)
315 stall = self.exe.process_instructions(stall)
325 The `Execute` class models the execute phase of the processor.
330 def __init__(self, cpu):
334 def add_stage(self, cycles_away, stage):
335 while cycles_away > len(self.stages):
336 self.stages.append([])
337 self.stages[cycles_away].append(stage)
339 def add_instruction(self, insn, writeregs):
340 self.add_stage(2, {'insn': insn, 'writes': writeregs})
343 self.stages.pop(0) # tick drops anything at time "zero"
345 def process_instructions(self, stall):
346 instructions = self.stages[0] # get list of instructions
347 to_write = set() # need to know total writes
348 for instruction in instructions:
349 to_write.update(instruction['writes'])
350 # see if all writes can be done, otherwise stall
351 writes_possible = self.cpu.writes_possible(to_write)
352 if writes_possible != to_write:
354 # retire the writes that are possible in this cycle (regfile writes)
355 self.cpu.regs.retire_write(writes_possible)
356 # and now go through the instructions, removing those regs written
357 for instruction in instructions:
358 instruction['writes'].difference_update(writes_possible)
361 # Additional Information
363 ## On register file types
365 Currently (20th Aug 2023), the following register files are included in the CPU
368 - General Purpose Registers (GPR) - stores integers (0-31 in default PowerISA,
369 0-127 for Libre-SOC with SVP64)
370 - Floating Point Registers (FPR) - stores floating-point numbers
371 - Condition Register (CR) - broken up into 4-bit fields
372 - Condition Register Fields (CRf) - stores arithmetic condition of an operation
373 (less than, greater than, equal to zero, overflow)
374 - Fixed-Point Exception Register (XER)
375 - Machine State Register (MSR)
376 - Floating-Point Status and Control Register (FPSCR)
377 - Program Counter (PC); PowerISA spec primarilly calls this *Current
378 Instruction Address (CIA)*. See PowerISA v3.1, section 1.3.4 Description of
379 Instruction Operation
380 - Slow Special Purpose Registers (SPRs)
383 *TODO: Special Purpose Registers and fields need better explation. The initial
384 writer of this page (Andrey) has very little understanding of whether SPR is
385 actually a register, or if it's just a category of registers (XER, etc.)*
387 See the [PowerISA 3.1 spec](LINK) for detailed information on register files
388 (Book I, Chapters 1.3.4, 2.3, 3.2, 4.2, 5.2, 5.3).