From: lkcl Date: Mon, 10 Apr 2023 14:04:47 +0000 (+0100) Subject: (no commit message) X-Git-Tag: opf_rfc_ls012_v1~27 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=454dacbb79174dbd3db91cee5edf4839a1ca869b;p=libreriscv.git --- diff --git a/openpower/sv/rfc/ls012.mdwn b/openpower/sv/rfc/ls012.mdwn index aaf55281c..0e903519c 100644 --- a/openpower/sv/rfc/ls012.mdwn +++ b/openpower/sv/rfc/ls012.mdwn @@ -405,6 +405,52 @@ EXT022 Sandbox. **How many registers does it need?** +The basic RISC Paradigm is not only to make instruction encoding simple (often +"wasting" encoding space compared to highly-compacted ISAs such as x86), but +also to keep the number of registers used down to a minimum. + +Counter-examples are FMAC which had to be added to IEEE754 because the +*internal* product requires more accuracy than can fit into a register. +Another would be a dotproduct instruction, which again requires an accumulator +of at least double the width of the two vector inputs. And in the AMDGPU +ISA, there are Texture-mapping instructions taking up to an astounding +*twelve* input operands! + +The downside of going too far however has to be a trade-off with the next +question. Both MIPS and RISC-V lack Condition Codes, which means that emulating +x86 Branch-Conditional requires *ten* MIPS instructions. + +The downside of creating too complex instructions is that the Dependency Hazard +Management in high-performance multi-issue out-of-order microarchitectures +becomes infeasibly large, and even simple in-order systems may have performance +severely compromised by an overabundance of stalls. Also worth remembering +is that register file ports are insanely costly, not just to design but also +use considerable power. + +That said there do exist genuine reasons why more registers is better than less: +Compare-and-Swap has huge benefits but is costly to implement, and DCT/FFT Twin-Butterfly +instructions allow creation of in-place in-register algorithms reducing the number +of registers needed and thus saving power due to making the *overall* algorithm +more efficient, as opposed to micro-focussing on a localised power increase. + +**Can other existing instructions (plural) do the same job** + +The general +rule being: if two or more instructions can do the same job, leave it out... +*unless* the number of occurrences of that instruction being missing is causing +huge increases in binary size. RISC-V has gone too far in this regard, +as explained here: + +Good examples are LD-ST-Indexed-shifted (multiply RB by 2, 4 8 or 16) +which are high-priority instructions in x86 and ARM, but lacking in +Power ISA, MIPS, and RISC-V. With many critical hot-loops in Computer +Science having to perform shift and add as explicit instructions, adding +LD/ST-shifted should be considered high priority, except that the sheer +*number* of such instructions needing to be added takes us into the next +question + +**How costly is the encoding?** + # Tables