# RFC ls008 SVP64 Management instructions **URLs**: * * * * **Severity**: Major **Status**: New **Date**: 24 Mar 2023 **Target**: v3.2B **Source**: v3.0B **Books and Section affected**: ``` Book I, new Scalar Chapter. (Or, new Book on "Zero-Overhead Loop Subsystem") Appendix E Power ISA sorted by opcode Appendix F Power ISA sorted by version Appendix G Power ISA sorted by Compliancy Subset Appendix H Power ISA sorted by mnemonic ``` **Summary** ``` setvl - Cray-style "Set Vector Length" instruction svstep - Vertical-First Mode explicit Step and Status ``` **Submitter**: Luke Leighton (Libre-SOC) **Requester**: Libre-SOC **Impact on processor**: ``` Addition of two new "Zero-Overhead-Loop-Control" DSP-style Vector-style Management Instructions which can be implemented extremely efficiently and effectively by inserting an additional phase between Decode and Issue. More complex designs are NOT adversely impacted and in fact greatly benefit ``` **Impact on software**: ``` Requires support for new instructions in assembler, debuggers, and related tools. ``` **Keywords**: ``` Cray Supercomputing, Vectorisation, Zero-Overhead-Loop-Control (ZOLC), Scalable Vectors, Multi-Issue Out-of-Order, Sequential Programming Model, Digital Signal Processing (DSP) ``` **Motivation** Power ISA is synonymous with Supercomputing and the early Supercomputers (ETA-10, ILLIAC-IV, CDC200, Cray) had Vectorisation. It is therefore anomalous that Power ISA does not have Scalable Vectors. This presents the opportunity to modernise Power ISA keeping it at the top of Supercomputing. **Notes and Observations**: 1. SVP64 is very much designed for ultra-light-weight Embedded use-cases all the way up to moving the bar of Supercomputing orders of magnitude above its present perception, whilst retaining at all times Sequential Programming Execution. 2. This proposal is the **base** for further Extensions. These include extending SVP64 onto the Scalar VSX instructions (with a **LONG TERM** view in 10+ years to deprecating the PackedSIMD aspects of VSX), to be discussed at a later time, the potential for extending VSX registers to 128 or beyond, and Arithmetic operations to a runtime-selectable choice of 128-bit, 256-bit, 512-bit or 1024-bit. 3. Massive reductions in instruction count of between 2x and 20x have been demonstrated with SVP64, which is far beyond anything ever achieved by any *general-purpose* ISA Extension added to any ISA in the history of Computing. **Changes** Add the following entries to: * Section 1.3.2 Notation * the Appendices of Book I * Instructions of Book I as a new Section * SVL-Form of Book I Section 1.6.1.6 and 1.6.2 ---------------- \newpage{} # Notation, Section 1.3.2 When register operands (`RA, RT, BF`) are prefixed by a single underscore (`_RT, _RA, _BF`) the variable contains the contents of the instruction field not the contents of the Register File referenced *by* that field. Example: `_RT` contains the contents of bits 5 thru 10. The relationship `RT = GPR(_RT)` is thus always true. Uses include making alternative decisions within an instruction based on whether the operand field is zero or non-zero. ---------------- \newpage{} [[!inline pages="openpower/sv/svstep" raw=yes ]] [[!inline pages="openpower/sv/setvl" raw=yes ]] # SVSTATE SPR The format of the SVSTATE SPR is as follows: | Field | Name | Description | | ----- | -------- | --------------------- | | 0:6 | maxvl | Max Vector Length | | 7:13 | vl | Vector Length | | 14:20 | srcstep | for srcstep = 0..VL-1 | | 21:27 | dststep | for dststep = 0..VL-1 | | 28:29 | dsubstep | for substep = 0..SUBVL-1 | | 30:31 | ssubstep | for substep = 0..SUBVL-1 | | 32:33 | mi0 | REMAP RA/FRA/BFA SVSHAPE0-3 | | 34:35 | mi1 | REMAP RB/FRB/BFB SVSHAPE0-3 | | 36:37 | mi2 | REMAP RC/FRT SVSHAPE0-3 | | 38:39 | mo0 | REMAP RT/FRT/BF SVSHAPE0-3 | | 40:41 | mo1 | REMAP EA/RS/FRS SVSHAPE0-3 | | 42:46 | SVme | REMAP enable (RA-RT) | | 47:52 | rsvd | reserved | | 53 | pack | PACK (srcstrp reorder) | | 54 | unpack | UNPACK (dststep order) | | 55:61 | hphint | Horizontal Hint | | 62 | RMpst | REMAP persistence | | 63 | vfirst | Vertical First mode | Notes: * The entries are truncated to be within range. Attempts to set VL to greater than MAXVL will truncate VL. * Setting srcstep, dststep to 64 or greater, or VL or MVL to greater than 64 is reserved and will cause an illegal instruction trap. **SVSTATE Fields** SVSTATE is a standard SPR that (if REMAP is not activated) contains sufficient self-contaned information for a full context save/restore. SVSTATE contains (and permits setting of): * MVL (the Maximum Vector Length) - declares (statically) how much of a regfile is to be reserved for Vector elements * VL - Vector Length * dststep - the destination element offset of the current parallel instruction being executed * srcstep - for twin-predication, the source element offset as well. * ssubstep - the source subvector element offset of the current parallel instruction being executed * dsubstep - the destination subvector element offset of the current parallel instruction being executed * vfirst - Vertical First mode. srcstep, dststep and substep **do not advance** unless explicitly requested to do so with pseudo-op svstep (a mode of setvl) * RMpst - REMAP persistence. REMAP will apply only to the following instruction unless this bit is set, in which case REMAP "persists". Reset (cleared) on use of the `setvl` instruction if used to alter VL or MVL. * Pack - if set then srcstep/substep VL/SUBVL loop-ordering is inverted. * UnPack - if set then dststep/substep VL/SUBVL loop-ordering is inverted. * hphint - Horizontal Parallelism Hint. Indicates that no Hazards exist between groups of elements in sequential multiples of this number (before REMAP). By definition: elements for which `FLOOR(srcstep/hphint)` is equal *before REMAP* are in the same parallelism "group". In Vertical First Mode hardware **MUST ONLY** process elements in the same group, and must stop Horizontal Issue at the last element of a given group. Set to zero to indicate "no hint". * SVme - REMAP enable bits, indicating which register is to be REMAPed: RA, RB, RC, RT and EA are the canonical (typical) register names associated with each bit, with RA being the LSB and EA being the MSB. See table below for ordering. When `SVme` is zero (0b00000) REMAP is **fully disabled and inactive** regardless of the contents of `SVSTATE`, `mi0-mi2/mo0-mo1`, or the four `SVSHAPEn` SPRs * mi0-mi2/mo0-mo1 - when the corresponding SVme bit is enabled, these indicate the SVSHAPE (0-3) that the corresponding register (RA etc) should use, as long as the register's corresponding SVme bit is set Programmer's Note: the fact that REMAP is entirely dormant when `SVme` is zero allows establishment of REMAP context well in advance, followed by utilising `svremap` at a precise (or the very last) moment. Some implementations may exploit this to cache (or take some time to prepare caches) in the background whilst other (unrelated) instructions are being executed. This is particularly important to bear in mind when using `svindex` which will require hardware to perform (and cache) additional GPR reads. Programmer's Note: when REMAP is activated it becomes necessary on any context-switch (Interrupt or Function call) to detect (or know in advance) that REMAP is enabled and to additionally save/restore the four SVSHAPE SPRs, SVHAPE0-3. Given that this is expected to be a rare occurrence it was deemed unreasonable to burden every context-switch or function call with mandatory save/restore of SVSHAPEs, and consequently it is a *callee* (and Trap Handler) responsibility. Callees (and Trap Handlers) **MUST** avoid using all and any SVP64 instructions during the period where state could be adversely affected. SVP64 purely relies on Scalar instructions, so Scalar instructions (except the SVP64 Management ones and mtspr and mfspr) are 100% guaranteed to have zero impact on SVP64 state. **Max Vector Length (maxvl)** MAXVECTORLENGTH is the same concept as MVL in RISC-V RVV, except that it is variable length and may be dynamically set (normally from an immediate field only). MVL is limited to 7 bits (in the first version of SVP64) and consequently the maximum number of elements is limited to between 0 and 127. Programmer's Note: Except by directly using `mtspr` on SVSTATE, which may result in performance penalties on some hardware implementations, SVSTATE's `maxvl` field may only be set **statically** as an immediate, by the `setvl` instruction. It may **NOT** be set dynamically from a register. Compiler writers and assembly programmers are expected to perform static register file analysis, subdivision, and allocation and only utilise `setvl`. Direct writing to SVSTATE in order to "bypass" this Note could, in less-advanced implementations, potentially cause stalling, particularly if SVP64 instructions are issued directly after the `mtspr` to SVSTATE. **Vector Length (vl)** The actual Vector length, the number of elements in a "Vector", `SVSTATE.vl` may be set entirely dynamically at runtime from a number of sources. `setvl` is the primary instruction for setting Vector Length. `setvl` is conceptually similar but different from the Cray, SX Aurora, and RISC-V RVV equivalent. Similar to RVV, VL is set to be within the range 0 <= VL <= MVL. Unlike RVV, VL is set **exactly** according to the following: VL = (RT|0) = MIN(vlen, MVL) where 0 <= MVL <= 127 and vlen may come from an immediate, `RA`, or from the `CTR` SPR, depending on options selected with the `setvl` instruction. Programmer's Note: conceptual understanding of Cray-style Vectors is far beyond the scope of the Power ISA Technical Reference. Guidance on the 50-year-old Cray Vector paradigm is best sought elsewhere: good studies include Academic Courses given on the 1970s Cray Supercomputers over at least the past three decades. **SUBVL - Sub Vector Length** This is a "group by quantity" that effectively asks each iteration of the hardware loop to load SUBVL elements of width elwidth at a time. Effectively, SUBVL is like a SIMD multiplier: instead of just 1 operation issued, SUBVL operations are issued. The main effect of SUBVL is that predication bits are applied per **group**, rather than by individual element. Legal values are 0 to 3, representing 1 operation (1 element) thru 4 operations (4 elements) respectively. Elements are best though of in the context of 3D, Audio and Video: two Left and Right Channel "elements" or four ARGB "elements", or three XYZ coordinate "elements". `subvl` is again primarily set by the `setvl` instruction. Not to be confused with `hphint`. Directly related to `subvl` is the `pack` and `unpack` Mode bits of `SVSTATE`. See `svstep` instruction for how to set Pack and Unpack Modes. **Horizontal Parallelism** A problem exists for hardware where it may not be able to detect that a programmer (or compiler) knows of opportunities for parallelism and lack of overlap between loops. For hphint, the number chosen must be consistently executed **every time**. Hardware is not permitted to execute five computations for one instruction then three on the next. hphint is a hint from the compiler to hardware that exactly this many elements may be safely executed in parallel, without hazards (including Memory accesses). Interestingly, when hphint is set equal to VL, it is in effect as if Vertical First mode were not set, because the hardware is given the option to run through all elements in an instruction. This is exactly what Horizontal-First is: a for-loop from 0 to VL-1 except that the hardware may *choose* the number of elements. *Note to programmers: changing VL during the middle of such modes should be done only with due care and respect for the fact that SVSTATE has exactly the same peer-level status as a Program Counter.* ------------- \newpage{} # SVL-Form Add the following to Book I, 1.6.1, SVL-Form ``` |0 |6 |11 |16 |23 |24 |25 |26 |31 | | PO | RT | RA | SVi |ms |vs |vf | XO |Rc | | PO | RT | / | SVi |/ |/ |vf | XO |Rc | ``` * Add `SVL` to `RA (11:15)` Field in Book I, 1.6.2 * Add `SVL` to `RT (6:10)` Field in Book I, 1.6.2 * Add `SVL` to `Rc (31)` Field in Book I, 1.6.2 * Add `SVL` to `XO (26:31)` Field in Book I, 1.6.2 Add the following to Book I, 1.6.2 ``` ms (23) Field used in Simple-V to specify whether MVL (maxvl in the SVSTATE SPR) is to be set Formats: SVL vf (25) Field used in Simple-V to specify whether "Vertical" Mode is set (vfirst in the SVSTATE SPR) Formats: SVL vs (24) Field used in Simple-V to specify whether VL (vl in the SVSTATE SPR) is to be set Formats: SVL SVi (16:22) Simple-V immediate field used by setvl for setting VL or MVL (vl, maxvl in the SVSTATE SPR) and used as a "Mode of Operation" selector in svstep Formats: SVL ``` # Appendices Appendix E Power ISA sorted by opcode Appendix F Power ISA sorted by version Appendix G Power ISA sorted by Compliancy Subset Appendix H Power ISA sorted by mnemonic | Form | Book | Page | Version | mnemonic | Description | |------|------|------|---------|----------|-------------| | SVL | I | # | 3.0B | svstep | Vertical-First Stepping and status reporting | | SVL | I | # | 3.0B | setvl | Cray-like establishment of Looping (Vector) context | [[!tag opf_rfc]]