From: Luke Kenneth Casson Leighton Date: Wed, 12 Apr 2023 13:10:57 +0000 (+0100) Subject: whitespace X-Git-Tag: opf_rfc_ls010_v1~55 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=ba0a0055b8618aad9f75a6f5a138a6a2e14a7133;p=libreriscv.git whitespace --- diff --git a/openpower/sv/svp64.mdwn b/openpower/sv/svp64.mdwn index 54f25aabf..f662ba2a4 100644 --- a/openpower/sv/svp64.mdwn +++ b/openpower/sv/svp64.mdwn @@ -44,12 +44,12 @@ Table of contents Simple-V is a type of Vectorisation best described as a "Prefix Loop Subsystem" similar to the 5 decades-old Zilog Z80 `LDIR` instruction and to the 8086 `REP` Prefix instruction. More advanced features are similar -to the Z80 `CPIR` instruction. If naively viewed one-dimensionally as an actual -Vector ISA it introduces over 1.5 million 64-bit True-Scalable Vector instructions -on the SFFS Subset and closer to 10 million 64-bit True-Scalable Vector -instructions if introduced on VSX. -SVP64, the instruction format used by Simple-V, is therefore best viewed -as an orthogonal RISC-paradigm "Prefixing" subsystem instead. +to the Z80 `CPIR` instruction. If naively viewed one-dimensionally as an +actual Vector ISA it introduces over 1.5 million 64-bit True-Scalable +Vector instructions on the SFFS Subset and closer to 10 million 64-bit +True-Scalable Vector instructions if introduced on VSX. SVP64, the +instruction format used by Simple-V, is therefore best viewed as an +orthogonal RISC-paradigm "Prefixing" subsystem instead. Except where explicitly stated all bit numbers remain as in the rest of the Power ISA: in MSB0 form (the bits are numbered from 0 at the MSB on @@ -64,16 +64,15 @@ the following instruction, but does **not** change the actual Decoding of that following instruction. **All prefixed 32-bit instructions (Defined Words) retain their non-prefixed encoding and definition**. -Two apparent exceptions to the above hard rule exist: SV Branch-Conditional -operations and LD/ST-update "Post-Increment" Mode. Post-Increment -was considered sufficiently high priority (significantly reducing hot-loop -instruction count) that one bit in the Prefix is reserved for it -(Note the intention to release that bit and move Post-Increment instructions -to EXT2xx). -Vectorised Branch-Conditional operations "embed" the original Scalar -Branch-Conditional behaviour into a much more advanced variant that -is highly suited to High-Performance Computation (HPC), Supercomputing, -and parallel GPU Workloads. +Two apparent exceptions to the above hard rule exist: SV +Branch-Conditional operations and LD/ST-update "Post-Increment" Mode. +Post-Increment was considered sufficiently high priority (significantly +reducing hot-loop instruction count) that one bit in the Prefix +is reserved for it (Note the intention to release that bit and move +Post-Increment instructions to EXT2xx). Vectorised Branch-Conditional +operations "embed" the original Scalar Branch-Conditional behaviour into +a much more advanced variant that is highly suited to High-Performance +Computation (HPC), Supercomputing, and parallel GPU Workloads. *Architectural Resource Allocation note: it is prohibited to accept RFCs which fundamentally violate this hard requirement. Under no circumstances @@ -150,8 +149,8 @@ overrides is expressed as follows: * element-width overrides set the width of the *elements* in the sequentially-numbered contiguous array. -The relationship is best defined in Canonical form, below, in ANSI c -as a union data structure. A key difference is that VSR elements are bounded +The relationship is best defined in Canonical form, below, in ANSI c as a +union data structure. A key difference is that VSR elements are bounded fixed at 128-bit, where SVP64 elements are conceptually unbounded and only limited by the Maximum Vector Length. @@ -196,14 +195,13 @@ sequentially from the LSB end incrementally to the MSB end (confusingly numbered the lowest in MSB0 ordering). -When exclusively using MSB0-numbering, SVP64 -becomes unnecessarily complex to both express and subsequently understand: -the required conditional subtractions from 63, -31, 15 and 7 needed to express the fact that elements are LSB0-sequential -unfortunately become a hostile minefield, obscuring both -intent and meaning. Therefore for the -purposes of this section the more natural **LSB0 numbering is assumed** -and it is left to the reader to translate to MSB0 numbering. +When exclusively using MSB0-numbering, SVP64 becomes unnecessarily complex +to both express and subsequently understand: the required conditional +subtractions from 63, 31, 15 and 7 needed to express the fact that +elements are LSB0-sequential unfortunately become a hostile minefield, +obscuring both intent and meaning. Therefore for the purposes of this +section the more natural **LSB0 numbering is assumed** and it is left +to the reader to translate to MSB0 numbering. The Canonical specification for how element-sequential numbering and element-width overrides is defined is expressed in the following c @@ -265,14 +263,15 @@ However if elwidth overrides are set to 16 for both source and destination: int_regfile[RT].hwords[i] = int_regfile[RA].hwords[i] + int_regfile[RB].hwords[i] ``` -The most fundamental aspect here to understand is that the wrapping into -subsequent Scalar GPRs that occurs on larger-numbered elements -including and especially on smaller element widths is **deliberate and intentional**. -From this Canonical definition it should be clear that sequential elements begin -at the LSB end of any given underlying Scalar GPR, progress to the MSB end, and -then to the LSB end of the *next numerically-larger Scalar GPR*. In the -example above if VL=5 and RT=1 then the contents of GPR(1) and GPR(2) will -be as follows. For clarity in the table below: +The most fundamental aspect here to understand is that the wrapping +into subsequent Scalar GPRs that occurs on larger-numbered elements +including and especially on smaller element widths is **deliberate +and intentional**. From this Canonical definition it should be clear +that sequential elements begin at the LSB end of any given underlying +Scalar GPR, progress to the MSB end, and then to the LSB end of the +*next numerically-larger Scalar GPR*. In the example above if VL=5 +and RT=1 then the contents of GPR(1) and GPR(2) will be as follows. +For clarity in the table below: * Both MSB0-ordered bitnumbering *and* LSB-ordered bitnumbering are shown * The GPR-numbering is considered LSB0-ordered @@ -293,9 +292,9 @@ be as follows. For clarity in the table below: ``` Note that the upper 48 bits of GPR(2) would **not** be modified due to -the example having VL=5. Thus on "wrapping" - sequential progression from -GPR(1) into GPR(2) - the 5th result modifies -**only** the bottom 16 LSBs of GPR(1). +the example having VL=5. Thus on "wrapping" - sequential progression +from GPR(1) into GPR(2) - the 5th result modifies **only** the bottom +16 LSBs of GPR(1). Hardware Architectural note: to avoid a Read-Modify-Write at the register file it is strongly recommended to implement byte-level write-enable lines @@ -328,27 +327,27 @@ Operation, the exact same contents would be viewed as follows: ``` In other words, this perspective really is no different from the situation -where the actual Register File is treated as an Industry-standard byte-level-addressable -Little-Endian-addressed SRAM. Note that this perspective does **not** -involve `MSR.LE` in any way shape or form because `MSR.LE` is directly -in control of the Memory-to-Register byte-ordering. This section is -exclusively about how to correctly perceive Simple-V-Augmented **Register** -Files. +where the actual Register File is treated as an Industry-standard +byte-level-addressable Little-Endian-addressed SRAM. Note that +this perspective does **not** involve `MSR.LE` in any way shape or +form because `MSR.LE` is directly in control of the Memory-to-Register +byte-ordering. This section is exclusively about how to correctly perceive +Simple-V-Augmented **Register** Files. **Comparative equivalent using VSR registers** For a comparative data point the VSR Registers may be expressed in the same fashion. The c code below is directly an expression of Figure 97 in -Power ISA Public v3.1 Book I Section 6.3 page 258, *after compensating for -MSB0 numbering in both bits and elements, adapting in full to LSB0 numbering, -and obeying LE ordering*. +Power ISA Public v3.1 Book I Section 6.3 page 258, *after compensating +for MSB0 numbering in both bits and elements, adapting in full to LSB0 +numbering, and obeying LE ordering*. -**Crucial to understanding why the subtraction from 1,3,7,15 is present -is because the Power ISA numbers VSX Registers elements also in MSB0 order**. +**Crucial to understanding why the subtraction from 1,3,7,15 is present is +because the Power ISA numbers VSX Registers elements also in MSB0 order**. SVP64 very specifically numbers elements in **LSB0** order with the first -element (numbered zero) being at the bitwise-numbered **LSB** end of the register, where VSX -does the reverse: places the numerically-*highest* (last-numbered) element at -the LSB end of the register. +element (numbered zero) being at the bitwise-numbered **LSB** end of the +register, where VSX does the reverse: places the numerically-*highest* +(last-numbered) element at the LSB end of the register. ``` @@ -394,21 +393,21 @@ the LSB end of the register. } ``` -For VSR Registers one key difference is that the overlay of different element -widths is clearly a *bounded static quantity*, whereas for Simple-V the -elements are -unrestrained and permitted to flow into *successive underlying Scalar registers*. -This difference is absolutely critical to a full understanding of the entire -Simple-V paradigm and why element-ordering, bit-numbering *and register numbering* -are all so strictly defined. +For VSR Registers one key difference is that the overlay of different +element widths is clearly a *bounded static quantity*, whereas for +Simple-V the elements are unrestrained and permitted to flow into +*successive underlying Scalar registers*. This difference is absolutely +critical to a full understanding of the entire Simple-V paradigm and +why element-ordering, bit-numbering *and register numbering* are all so +strictly defined. -Implementations are not permitted to violate the Canonical definition. Software -will be critically relying on the wrapped (overflow) behaviour inherently -implied by the unbounded variable-length c arrays. +Implementations are not permitted to violate the Canonical +definition. Software will be critically relying on the wrapped (overflow) +behaviour inherently implied by the unbounded variable-length c arrays. -Illustrating the exact same loop with the exact same effect as achieved by Simple-V -we are first forced to create wrapper functions, to cater for the fact -that VSR register elements are static bounded: +Illustrating the exact same loop with the exact same effect as achieved +by Simple-V we are first forced to create wrapper functions, to cater +for the fact that VSR register elements are static bounded: ``` int calc_VSR_reg_offs(int elt, int width) { @@ -462,19 +461,19 @@ whereas when VL=1 and the SV prefix is all zeros, the operation simply acts as if SV had not been applied at all to the instruction (an "identity transformation"). -The fact that `VL` is dynamic and can be set to any value at runtime based -on program conditions and behaviour means very specifically that -`scalar identity behaviour` is **not** a redundant encoding. If the -only means by which VL could be set was by way of static-compiled -immediates then this assertion would be false. VL should not -be confused with MAXVL when understanding this key aspect of SimpleV. +The fact that `VL` is dynamic and can be set to any value at runtime +based on program conditions and behaviour means very specifically that +`scalar identity behaviour` is **not** a redundant encoding. If the only +means by which VL could be set was by way of static-compiled immediates +then this assertion would be false. VL should not be confused with +MAXVL when understanding this key aspect of SimpleV. ## Register Naming and size -As indicated above SV Registers are simply the GPR, FPR and CR -register files extended linearly to larger sizes; SV Vectorisation -iterates sequentially through these registers (LSB0 sequential ordering -from 0 to VL-1). +As indicated above SV Registers are simply the GPR, FPR and CR register +files extended linearly to larger sizes; SV Vectorisation iterates +sequentially through these registers (LSB0 sequential ordering from 0 +to VL-1). Where the integer regfile in standard scalar Power ISA v3.0B/v3.1B is r0 to r31, SV extends this as r0 to r127. Likewise FP registers are