3d_gpu/architecture/decoder.mdwn

   1 # Decoder
   2
   3 * Context and walkthrough <https://libre-soc.org/irclog/%23libre-soc.2021-07-13.log.html>
   4 * First steps for a newbie developer [[docs/firststeps]]
   5 * bugreport <http://bugs.libre-riscv.org/show_bug.cgi?id=186>
   6
   7 The decoder is in charge of translating the POWER instruction stream into operations that can be handled by the backend.
   8
   9 Source code: <https://git.libre-riscv.org/?p=soc.git;a=tree;f=src/soc/decoder;hb=HEAD>
  10
  11 # POWER
  12
  13 The decoder has been written in python, to parse straight CSV files and other information taken directly from the Power ISA Standards PDF files. This significantly reduces the possibility of manual transcription errors and greatly reduces code size.  Based on Anton Blanchard's excellent microwatt design, these tables are in [[openpower/isatables]] which includes links to download the csv files.
  14
  15 The top level decoder object recursively drops through progressive levels of case statement groups, covering additional portions of the incoming instruction bits.  More on this technique - for which python and nmigen were *specifically* and strategically chosen - is outlined here <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/004882.html>
  16
  17 The PowerDecoder2, on encountering for example an ADD
  18 operation, needs to know whether Rc=0/1, whether OE=0/1, whether
  19 RB is to be read, whether an immediate is to be read and so on.
  20 With all of this information being specified in the CSV files, on
  21 a per-instruction basis, it is simply a matter of expanding that
  22 information out into a data structure called Decode2ToExecute1Type.
  23 From there it becomes easily possible for other parts of the processor
  24 to take appropriate action.
  25
  26 * [Decode2ToExecute1Type](https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/decode2execute1.py;hb=HEAD)
  27
  28 ## Link to Function Units
  29
  30 The Decoder (PowerDecode2) knows which registers are needed, however what
  31 it does not know is:
  32
  33 * which Register file ports to connect to (this is defined by regspecs)
  34 * the order of those regfile ports (again: defined by regspecs)
  35
  36 Neither do the Phase-aware Function Units (derived from MultiCompUnit)
  37 themselves know anything about the PowerDecoder, and they certainly
  38 do not know when a given instruction will need to tell *them* to read
  39 RA, or RB.  For example: negation of RA only requires one operand,
  40 where add RA, RB requires two.  Who tells whom that information, when
  41 the ALU's job is simply to add, and the Decoder's job is simply to decode?
  42
  43 This is where a special function called "rdflags()" comes into play.
  44 rdflags works closely in conjunction with regspecs and the PowerDecoder2,
  45 in each Function Unit's "pipe\_data.py" file.  It defines the flags that
  46 determine, from current instruction, whether the Function Unit actually
  47 *wants* any given Register Read Ports activated or not.
  48
  49 That dynamically-determined information will then actively disable
  50 (or allow) Register file Read requests (rd.req) on a per-port basis.
  51
  52 Example:
  53
  54     class ALUInputData(IntegerData):
  55         regspec = [('INT', 'ra', '0:63'), # RA
  56                    ('INT', 'rb', '0:63'), # RB/immediate
  57                    ('XER', 'xer_so', '32'), # XER bit 32: SO
  58                    ('XER', 'xer_ca', '34,45')] # XER bit 34/45: CA/CA32
  59
  60 This shows us that, for the ALU pipeline, it expects two INTEGER
  61 operands (RA and RB) both 64-bit, and it expects XER SO, CA and CA32
  62 bits.  However this information - as to which operands are required -
  63 is *dynamic*.
  64
  65 Continuing from the OP_ADD example, where inspection of the CSV files
  66 (or the ISA tables) shows that we optionally need xer_so (OE=1),
  67 optionally need xer_ca (Rc=1), and even optionally need RB (add with
  68 immediate), we begin to understand that a dynamic system linking the
  69 PowerDecoder2 information to the Function Units is needed.  This is
  70 where power\_regspec\_map.py comes into play.
  71
  72     def regspec_decode_read(e, regfile, name):
  73         if regfile == 'INT':
  74             # Int register numbering is *unary* encoded
  75             if name == 'ra': # RA
  76                 return e.read_reg1.ok, 1<<e.read_reg1.data
  77             if name == 'rb': # RB
  78                 return e.read_reg2.ok, 1<<e.read_reg2.data
  79
  80 Here we can see that, for INTEGER registers, if the Function Unit
  81 has a connection (an incoming operand) named "RA", the tuple returned
  82 contains two crucial pieces of information:
  83
  84 1. The field from PowerDecoder2 which tells us if RA is even actually
  85   required by this (decoded) instruction
  86 2. The INTEGER Register file read port activation signal (its read-enable
  87   line-activation) which, if sent to the INTEGER Register file, will
  88   request the actual register required by this current (decoded)
  89   instruction.
  90
  91 Thus we have the *dynamic* information - not hardcoded in RTL but
  92 specified in *python* - encoding both if (first item of tuple) and
  93 what (second item of tuple) each Function Unit receives, and this
  94 for each and every operand.  A corresponding process exists for write,
  95 as well.
  96
  97 * [[architecture/regfile]]
  98 * [CompUnits](https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/compunits/compunits.py;hb=HEAD)
  99 * Example [ALU pipe_data.py specification](https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/alu/pipe_data.py;hb=HEAD)
 100 * [power_regspec_map.py](https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_regspec_map.py;hb=HEAD)
 101
 102 ## Fixed point instructions
 103
 104  - addi, addis, mulli - fairly straightforward - extract registers and immediate and translate to the appropriate op
 105  - addic, addic., subfic - similar to above, but now carry needs to be saved somewhere
 106  - add[o][.], subf[o][.], adde\*, subfe\*, addze\*, neg\*, mullw\*, divw\* - These are more fun. They need to set the carry (if . is present) and overflow (if o is present) flags, as well as taking in the carry flag for the extended versions.
 107  - addex - uses the overflow flag as a carry in, and if CY is set to 1, sets overflow like it would carry.
 108  - cmp, cmpi - sets bits of the selected comparison result register based on whether the comparison result was greater than, less than, or equal to
 109  - andi., ori, andis., oris, xori, xoris - similar to above, though the and versions set the flags in CR0
 110  - and\*, or\*, xor\*, nand\*, eqv\*, andc\*, orc\* - similar to the register-register arithmetic instructions above
 111
 112 # Decoder internals
 113
 114 The Decoder uses a class called PowerOp which get instantiated
 115 for every instruction. PowerOp class instantiation has member signals
 116 whose values get set respectively for each instruction.
 117
 118 We use Python Enums to help with common decoder values.
 119 Below is the POWER add insruction.
 120
 121 | opcode       | unit | internal op | in1 | in2 | in3  | out | CR in | CR out | inv A | inv out | cry in | cry out | ldst len | BR | sgn ext | upd | rsrv | 32b | sgn | rc | lk | sgl pipe | comment | form |
 122 |--------------|------|-------------|-----|-----|------|-----|-------|--------|-------|---------|--------|---------|----------|----|---------|-----|------|-----|-----|----|----|----------|---------|------|
 123 | 0b0100001010 | ALU  | OP_ADD      | RA  | RB  | NONE | RT  | 0     | 0      | 0     | 0       | ZERO   | 0       | NONE     | 0  | 0       | 0   | 0    | 0   | 0   | RC | 0  | 0        | add     | XO   |
 124
 125 Here is an example of a toy multiplexer that sets various fields in the
 126 PowerOP signal class to the correct values for the add instruction when
 127 select is set equal to 1.  This should give you a feel for how we work with
 128 enums and PowerOP.
 129
 130     from nmigen import Module, Elaboratable, Signal, Cat, Mux
 131     from soc.decoder.power_enums import (Function, Form, InternalOp,
 132                              In1Sel, In2Sel, In3Sel, OutSel, RC, LdstLen,
 133                              CryIn, get_csv, single_bit_flags,
 134                              get_signal_name, default_values)
 135     from soc.decoder.power_fields import DecodeFields
 136     from soc.decoder.power_fieldsn import SigDecode, SignalBitRange
 137     from soc.decoder.power_decoder import PowerOp
 138
 139     class Op_Add_Example(Elaboratable):
 140         def __init__(self):
 141             self.select = Signal(reset_less=True)
 142             self.op_add = PowerOp()
 143
 144         def elaborate(self, platform):
 145             m = Module()
 146             op_add = self.op_add
 147
 148             with m.If(self.select == 1):
 149                 m.d.comb += op_add.function_unit.eq(Function.ALU)
 150                 m.d.comb += op_add.form.eq(Form.XO)
 151                 m.d.comb += op_add.internal_op.eq(InternalOp.OP_ADD)
 152                 m.d.comb += op_add.in1_sel.eq(In1Sel.RA)
 153                 m.d.comb += op_add.in2_sel.eq(In2Sel.RB)
 154                 m.d.comb += op_add.in3_sel.eq(In3Sel.NONE)
 155                 m.d.comb += op_add.out_sel.eq(OutSel.RT)
 156                 m.d.comb += op_add.rc_sel.eq(RC.RC)
 157                 m.d.comb += op_add.ldst_len.eq(LdstLen.NONE)
 158                 m.d.comb += op_add.cry_in.eq(CryIn.ZERO)
 159
 160             return m
 161
 162     from nmigen.back import verilog
 163     verilog_file = "op_add_example.v"
 164     top = Op_Add_Example()
 165     f = open(verilog_file, "w")
 166     verilog = verilog.convert(top, name='top', strip_internal_attrs=True,
 167                               ports=top.op_add.ports())
 168     f.write(verilog)
 169     print(f"Verilog Written to: {verilog_file}")
 170
 171 The [actual POWER9 Decoder](https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_decoder2.py;hb=HEAD)
 172 uses this principle, in conjunction with reading the information shown
 173 in the table above from CSV files (as opposed to hardcoding them in
 174 python source).  These [[CSV files|openpower/isatables]],
 175 being machine-readable in a wide variety
 176 of programming languages, are conveniently available for use by
 177 other projects well beyond just this SOC.
 178
 179 This also demonstrates one of the design aspects taken in this project: to
 180 *combine* the power of python's full capabilities in order to create
 181 advanced dynamically generated HDL, rather than (as done with MyHDL)
 182 limit python code to a subset of its full capabilities.
 183
 184 The CSV Files are loaded by
 185 [power_decoder.py](https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_decoder.py;hb=HEAD)
 186 and are used to construct a hierarchical cascade of switch statements.  The original code came from
 187 [microwatt](https://github.com/antonblanchard/microwatt/blob/master/decode1.vhdl)
 188 where the original hardcoded cascade can be seen.
 189
 190 The docstring for power_decoder.py gives more details: each level in the hierarchy, just as in the original decode1.vhdl, will take slices of the instruction bitpattern, match against it, and if successful will continue with further subdecoders until a line is met that contains the required Operand Information (a PowerOp) exactly as shown at the top of this page.
 191
 192 In this way, different sections of the instruction are successively decoded (major opcode, then minor opcode, then sub-patterns under those) until the required instruction is fully recognised, and the hierarchical cascade of switch patterns results in a flat interpretation being produced that is useful internally.
 193
 194 # second explanation / walkthrough
 195
 196 the general idea here is to minimise the actual amount of work
 197 by using human-and-machine-readable files as much as possible,
 198 and performing automated translation (compilation) into executable
 199 form.
 200
 201 we (manually) extracted the pseudo-code from the v3.0B specification:
 202 <https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/isa/fixedlogical.mdwn;hb=HEAD>
 203
 204 then wrote a parser and language translator (aka compiler) to convert
 205 those code-fragments to python:
 206 <https://git.libre-soc.org/?p=soc.git;a=tree;f=src/soc/decoder/pseudo;hb=HEAD>
 207
 208 then went to a lot of trouble over the course of several months to
 209 co-simulate them, update them, and make them accurate according to the
 210 actual spec:
 211 <https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/isa/fixedarith.mdwn;h=470a833ca2b8a826f5511c4122114583ef169e55;hb=HEAD#l721>
 212
 213 and created a fully-functioning python-based OpenPOWER ISA simulator:
 214 <https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/isa/caller.py;hb=HEAD>
 215
 216 there is absolutely no reason why this language-translator (aka compiler)
 217 here
 218 <https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/pseudo/parser.py;hb=HEAD>
 219
 220 should not be joined by another compiler, targetting c for use inside
 221 the linux kernel or, another compiler which auto-generates c++ for use
 222 inside power-gem5, such that this:
 223 <https://github.com/power-gem5/gem5/blob/cae53531103ebc5bccddf874db85f2659b64000a/src/arch/power/isa/decoder.isa#L1214>
 224
 225 becomes an absolute breeze to update.
 226
 227 note that we maintain a decoder which is based on Microwatt: we extracted
 228 microwatt's decode1.vhdl into CSV files, and parse them in python as
 229 hierarchical recursive data structures:
 230 <https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_decoder.py;hb=HEAD>
 231
 232 where the actual CSV files that it reads are here:
 233 <https://git.libre-soc.org/?p=libreriscv.git;a=tree;f=openpower/isatables;hb=HEAD>
 234
 235 this is then combined with *another* table that was extracted from the
 236 OpenPOWER v3.0B PDF:
 237 <https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/isatables/fields.text;hb=HEAD>
 238
 239 (the parser for that recognises "vertical bars" as being
 240 field-separators):
 241 <https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_fields.py;hb=HEAD>
 242
 243 and FINALLY - and this is about the only major piece of code that
 244 actually involves any kind of manual code - again it is based on Microwatt
 245 decode2.vhdl - we put everything together to turn a binary opcode into
 246 "something that needs to be executed":
 247 <https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_decoder2.py;hb=HEAD>
 248
 249 so our OpenPOWER simulator is actually based on:
 250
 251 * machine-readable CSV files
 252 * machine-readable Field-Form files
 253 * machine-readable spec-accurate pseudocode files
 254
 255 the only reason we haven't used those to turn it into HDL is because
 256 doing so is a massive research project, where a first pass would be
 257 highly likely to generate sub-optimal HDL
 258