openpower/sv/rfc/ls008.mdwn

   1 # RFC ls008 SVP64 Management instructions
   2
   3 **URLs**:
   4
   5 * <https://libre-soc.org/openpower/sv/>
   6 * <https://libre-soc.org/openpower/sv/rfc/ls008/>
   7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1040>
   8 * <https://git.openpower.foundation/isa/PowerISA/issues/87>
   9
  10 **Severity**: Major
  11
  12 **Status**: New
  13
  14 **Date**: 24 Mar 2023
  15
  16 **Target**: v3.2B
  17
  18 **Source**: v3.0B
  19
  20 **Books and Section affected**:
  21
  22 ```
  23     Book I, new Scalar Chapter.  (Or, new Book on "Zero-Overhead Loop Subsystem")
  24     Appendix E Power ISA sorted by opcode
  25     Appendix F Power ISA sorted by version
  26     Appendix G Power ISA sorted by Compliancy Subset
  27     Appendix H Power ISA sorted by mnemonic
  28 ```
  29
  30 **Summary**
  31
  32 ```
  33     setvl    - Cray-style "Set Vector Length" instruction
  34     svstep   - Vertical-First Mode explicit Step and Status
  35 ```
  36
  37 **Submitter**: Luke Leighton (Libre-SOC)
  38
  39 **Requester**: Libre-SOC
  40
  41 **Impact on processor**:
  42
  43 ```
  44     Addition of two new "Zero-Overhead-Loop-Control" DSP-style Vector-style
  45     Management Instructions which can be implemented extremely efficiently
  46     and effectively by inserting an additional phase between Decode and Issue.
  47     More complex designs are NOT adversely impacted and in fact greatly benefit
  48 ```
  49
  50 **Impact on software**:
  51
  52 ```
  53     Requires support for new instructions in assembler, debuggers,
  54     and related tools.
  55 ```
  56
  57 **Keywords**:
  58
  59 ```
  60     Cray Supercomputing, Vectorisation, Zero-Overhead-Loop-Control (ZOLC),
  61     Scalable Vectors, Multi-Issue Out-of-Order, Sequential Programming Model,
  62     Digital Signal Processing (DSP)
  63 ```
  64
  65 **Motivation**
  66
  67 Power ISA is synonymous with Supercomputing and the early Supercomputers
  68 (ETA-10, ILLIAC-IV, CDC200, Cray) had Vectorisation. It is therefore anomalous
  69 that Power ISA does not have Scalable Vectors.  This presents the opportunity to
  70 modernise Power ISA keeping it at the top of Supercomputing.
  71
  72 **Notes and Observations**:
  73
  74 1. SVP64 is very much designed for ultra-light-weight Embedded use-cases all the
  75   way up to moving the bar of Supercomputing orders of magnitude above its present
  76   perception, whilst retaining at all times Sequential Programming Execution.
  77 2. This proposal is the **base** for further Extensions.  These include
  78   extending SVP64 onto the Scalar VSX instructions (with a **LONG TERM** view in 10+ years
  79   to deprecating the PackedSIMD aspects of VSX), to be discussed at a later
  80   time, the potential for extending VSX registers to 128 or beyond, and Arithmetic
  81   operations to a runtime-selectable choice of 128-bit, 256-bit, 512-bit or 1024-bit.
  82 3. Massive reductions in instruction count of between 2x and 20x have been demonstrated
  83   with SVP64, which is far beyond anything ever achieved by any *general-purpose*
  84   ISA Extension added to any ISA in the history of Computing.
  85
  86 **Changes**
  87
  88 Add the following entries to:
  89
  90 * Section 1.3.2 Notation
  91 * the Appendices of Book I
  92 * Instructions of Book I as a new Section
  93 * SVL-Form of Book I Section 1.6.1.6 and 1.6.2
  94
  95 ----------------
  96
  97 \newpage{}
  98
  99 # Notation, Section 1.3.2
 100
 101 When register operands (`RA, RT, BF`) are prefixed by a single underscore
 102 (`_RT, _RA, _BF`) the variable contains the contents of the instruction field
 103 not the contents of the Register File referenced *by* that field. Example:
 104 `_RT` contains the contents of bits 5 thru 10. The relationship
 105 `RT = GPR(_RT)` is thus always true. Uses include making alternative
 106 decisions within an instruction based on whether the operand field
 107 is zero or non-zero.
 108
 109 ----------------
 110
 111 \newpage{}
 112
 113 # svstep: Vertical-First Stepping and status reporting
 114
 115 SVL-Form
 116
 117 * svstep RT,SVi,vf (Rc=0)
 118 * svstep. RT,SVi,vf (Rc=1)
 119
 120 | 0-5|6-10|11.15|16..22| 23-25    | 26-30 |31|   Form   |
 121 |----|----|-----|------|----------|-------|--|--------- |
 122 |PO  | RT | /   | SVi  |  / / vf  | XO    |Rc| SVL-Form |
 123
 124 Pseudo-code:
 125
 126 ```
 127     if SVi[3:4] = 0b11 then
 128         # store pack and unpack in SVSTATE
 129         SVSTATE[53] <- SVi[5]
 130         SVSTATE[54] <- SVi[6]
 131         RT <- [0]*62 || SVSTATE[53:54]
 132     else
 133         # Vertical-First explicit stepping.
 134         step <- SVSTATE_NEXT(SVi, vf)
 135         RT <- [0]*57 || step
 136 ```
 137
 138 Special Registers Altered:
 139
 140     CR0                     (if Rc=1)
 141
 142 **Description**
 143
 144 svstep may be used
 145 to enquire about the REMAP Schedule and it may be used to alter Vectorisation
 146 State.  When `vf=1` then stepping occurs.
 147 When `vf=0` the enquiry is performed without altering internal
 148 state.  If `SVi=0, Rc=0, vf=0` the instruction is a `nop`.
 149
 150 The following Modes exist:
 151
 152 * `SVi=0`: appropriately step srcstep, dststep, subsrcstep and subdststep to the next
 153    element, taking pack and unpack into consideration.
 154 * When `SVi` is 1-4 the REMAP Schedule for a given SVSHAPE may be
 155 returned in `RT`.  SVi=1 selects SVSHAPE0 current state,
 156 through to SVi=4 selects SVSHAPE3.
 157 * When `SVi` is 5, `SVSTATE.srcstep` is returned.
 158 * When `SVi` is 6, `SVSTATE.dststep` is returned.
 159 * When `SVi` is 0b1100 pack/unpack in SVSTATE is cleared
 160 * When `SVi` is 0b1101 pack in SVSTATE is set, unpack is cleared
 161 * When `SVi` is 0b1110 unpack in SVSTATE is set, pack is cleared
 162 * When `SVi` is 0b1111 pack/unpack in SVSTATE are set
 163
 164 As this is a Single-Predicated (1P) instruction, predication may be applied
 165 to skip (or zero) elements.
 166
 167 * Vertical-First Mode will return the requested index
 168   (and move to the next state if `vf=1`)
 169 * Horizontal-First Mode can be used to return all indices,
 170   i.e. walks through all possible states.
 171
 172 **Vectorisation of svstep itself**
 173
 174 As a 32-bit instruction, `svstep` may be itself be Vector-Prefixed, as
 175 `sv.svstep`. This will work perfectly well in Horizontal-First
 176 as it will in Vertical-First Mode.
 177
 178 Example: to obtain the full set of possible computed element
 179 indices use `sv.svstep RT.v,SVI,1` which will store all computed element
 180 indices, starting from RT.  If Rc=1 then a co-result Vector of CR Fields
 181 will also be returned, comprising the "loop end-points" of each of the inner
 182 loops when either Matrix Mode or DCT/FFT is set.  In other words,
 183 for example, when the `xdim` inner loop reaches the end and on the next
 184 iteration it will begin again at zero, the CR Field `EQ` will be set.
 185 With a maximum of three loops within both Matrix and DCT/FFT Modes,
 186 the CR Field's EQ bit will be set at the end of the first inner loop,
 187 the LE bit for the second, the GT bit for the outermost loop and the
 188 SO bit set on the very last element, when all loops reach their maximum
 189 extent.
 190
 191 *Programmer's note (1): VL in some situations, particularly larger Matrices,
 192 may exceed 64,
 193 meaning that `sv.svshape` returning a considerable number of values. Under
 194 such circumstances `sv.svshape/ew=8` is recommended.*
 195
 196 *Programmer's note (2): having conveniently obtained a pre-computed
 197 Schedule with `sv.svstep`,
 198 it may then be used as the input to Indexed REMAP Mode
 199 to achieve the exact same Schedule. It is evident however that
 200 before use some of the Indices may be arbitrarily altered as desired.
 201 `sv.svstep` helps the programmer avoid having to manually recreate
 202 Indices for certain
 203 types of common Loop patterns, and in its simplest form, without REMAP
 204 (SVi=5 or SVi=6),
 205 is equivalent to the `iota` instruction found in other Vector ISAs*
 206
 207 **Vertical First Mode**
 208
 209 Vertical First is effectively like an implicit single bit predicate
 210 applied to every SVP64 instruction.  **ONLY** one element in each
 211 SVP64 Vector instruction is executed; srcstep and dststep do **not**
 212 increment, and the Program Counter progresses **immediately** to
 213 the next instruction just as it would for any standard scalar v3.0B
 214 instruction.
 215
 216 A mode of srcstep (SVi=0) is called which can move srcstep and
 217 dststep on to the next element, still respecting predicate
 218 masks.
 219
 220 In other words, where normal SVP64 Vectorisation acts "horizontally"
 221 by looping first through 0 to VL-1 and only then moving the PC
 222 to the next instruction, Vertical-First moves the PC onwards
 223 (vertically) through multiple instructions **with the same
 224 srcstep and dststep**, then an explict instruction used to
 225 advance srcstep/dststep. An outer loop is expected to be
 226 used (branch instruction) which completes a series of
 227 Vector operations.
 228
 229 Testing any end condition of any loop of any REMAP state allows branches to be
 230 used to create loops.
 231
 232 Programmer's note: when Predicate Non-Zeroing is used this indicates to
 233 the underlying hardware that any masked-out element must be skipped.
 234 *This includes in Vertical-First Mode*, and programmers should be keenly
 235 aware that srcstep or dststep or both *may* jump by more than one as
 236 a result, because the actual request under these circumstances was to execute
 237 on the first available next *non-masked-out* element.
 238
 239 *Programmers should be aware that VL, srcstep and dststep are global in nature.
 240 Nested looping with different schedules is perfectly possible, as is
 241 calling of functions, however SVSTATE (and any associated SVSTATE) should
 242 obviously be stored on the stack in order to achieve this benefit*
 243
 244 -------------
 245
 246 \newpage{}
 247
 248
 249 # setvl
 250
 251 SVL-Form
 252
 253 | 0-5|6-10|11-15|16-22 | 23 24 25 | 26-30 |31|   FORM   |
 254 | -- | -- | --- | ---- |----------| ----- |--|----------|
 255 |PO  | RT | RA  | SVi  | ms vs vf | XO    |Rc| SVL-Form |
 256
 257 * setvl RT,RA,SVi,vf,vs,ms (Rc=0)
 258 * setvl. RT,RA,SVi,vf,vs,ms (Rc=1)
 259
 260 Pseudo-code:
 261
 262 ```
 263     overflow <- 0b0    # sets CR.SO if set and if Rc=1
 264     VLimm <- SVi + 1
 265     # set or get MVL
 266     if ms = 1 then MVL <- VLimm[0:6]
 267     else           MVL <- SVSTATE[0:6]
 268     # set or get VL
 269     if vs = 0                then VL <- SVSTATE[7:13]
 270     else if _RA != 0         then
 271         if (RA) >u 0b1111111 then
 272             VL <- 0b1111111
 273             overflow <- 0b1
 274         else                      VL <- (RA)[57:63]
 275     else if _RT = 0          then VL <- VLimm[0:6]
 276     else if CTR >u 0b1111111 then
 277         VL <- 0b1111111
 278         overflow <- 0b1
 279     else                          VL <- CTR[57:63]
 280     # limit VL to within MVL
 281     if VL >u MVL then
 282         overflow <- 0b1
 283         VL <- MVL
 284     SVSTATE[0:6] <- MVL
 285     SVSTATE[7:13] <- VL
 286     if _RT != 0 then
 287        GPR(_RT) <- [0]*57 || VL
 288     # MAXVL is a static "state-reset" opportunity so VF is only set then.
 289     if ms = 1 then
 290          SVSTATE[63] <- vf   # set Vertical-First mode
 291          SVSTATE[62] <- 0b0  # clear persist bit
 292 ```
 293
 294 Special Registers Altered:
 295
 296 ```
 297     CR0                     (if Rc=1)
 298 ```
 299
 300 * `SVi` - bits 16-22 - an immediate operand for setting MVL and/or VL
 301 * `ms` - bit 23 - allows for setting of MVL
 302 * `vs` - bit 24 - allows for setting of VL
 303 * `vf` - bit 25 - sets "Vertical First Mode".
 304
 305 Note that in immediate setting mode VL and MVL start from **one**
 306 but that this is compensated for in the assembly notation.
 307 i.e. that an immediate value of 1 in assembler notation
 308 actually places the value 0b0000000 in the `SVi` field bits:
 309 on execution the `setvl` instruction adds one to the decoded
 310 `SVi` field bits, resulting in
 311 VL/MVL being set to 1. This allows VL to be set to values
 312 ranging from 1 to 128 with only 7 bits instead of 8.
 313 Setting VL/MVL
 314 to 0 would result in all Vector operations becoming `nop`.  If this is
 315 truly desired (nop behaviour) then setting VL and MVL to zero is to be
 316 done via the [[SVSTATE SPR|sv/sprs]].
 317
 318 Note that setmvli is a pseudo-op, based on RA/RT=0, and setvli likewise
 319
 320     setvli   VL=8   : setvl  r0, r0, VL=8, vf=0, vs=1, ms=0
 321     setvli.  VL=8   : setvl. r0, r0, VL=8, vf=0, vs=1, ms=0
 322     setmvli  MVL=8  : setvl  r0, r0, MVL=8, vf=0, vs=0, ms=1
 323     setmvli. MVL=8  : setvl. r0, r0, MVL=8, vf=0, vs=0, ms=1
 324
 325 Additional pseudo-op for obtaining VL without modifying it (or any state):
 326
 327     getvl  r5      : setvl  r5, r0, vf=0, vs=0, ms=0
 328     getvl. r5      : setvl. r5, r0, vf=0, vs=0, ms=0
 329
 330 Note that whilst it is possible to set both MVL and VL from the same
 331 immediate, it is not possible to set them to different immediates in
 332 the same instruction.  Doing so would require two instructions.
 333
 334 **Selecting sources for VL**
 335
 336 There is considerable opcode pressure, consequently to set MVL and VL
 337 from different sources is as follows:
 338
 339 | condition           | effect         |
 340 | - | - |
 341 | `vs=1, RA=0, RT!=0` | VL,RT set to MIN(MVL, CTR)  |
 342 | `vs=1, RA=0, RT=0`  | VL set to MIN(MVL, SVi+1)  |
 343 | `vs=1, RA!=0, RT=0` | VL set to MIN(MVL, RA)  |
 344 | `vs=1, RA!=0, RT!=0` | VL,RT set to MIN(MVL, RA)  |
 345
 346 The reasoning here is that the opportunity to set RT equal to the
 347 immediate `SVi+1` is sacrificed in favour of setting from CTR.
 348
 349 # Unusual Rc=1 behaviour
 350
 351 Normally, the return result from an instruction is in `RT`. With
 352 it being possible for `RT=0` to mean that `CTR` mode is to be read,
 353 some different semantics are needed.
 354
 355 CR Field 0, when `Rc=1`, may be set even if `RT=0`. The reason is that
 356 overflow may occur: `VL`, if set either from an immediate or from `CTR`,
 357 may not exceed `MAXVL`, and if it is, `CR0.SO` must be set.
 358
 359 Additionally, in reality it is **`VL`** being set. Therefore, rather
 360 than `CR0` testing `RT` when `Rc=1`, CR0.EQ is set if `VL=0`, CR0.GE
 361 is set if `VL` is non-zero.
 362
 363 **SUBVL**
 364
 365 Sub-vector elements are not be considered "Vertical". The vec2/3/4
 366 is to be considered as if the "single element".  Caveats exist for
 367 [[sv/mv.swizzle]] and [[sv/mv.vec]] when Pack/Unpack is enabled,
 368 due to the order in which VL and SUBVL loops are applied being
 369 swapped (outer-inner becomes inner-outer)
 370
 371 # Examples
 372
 373 ## Core concept loop
 374
 375 ```
 376 loop:
 377     setvl a3, a0, MVL=8    #  update a3 with vl
 378                            # (# of elements this iteration)
 379                            # set MVL to 8
 380     # do vector operations at up to 8 length (MVL=8)
 381     # ...
 382     sub a0, a0, a3   # Decrement count by vl
 383     bnez a0, loop    # Any more?
 384 ```
 385
 386 ## Loop using Rc=1
 387
 388     my_fn:
 389       li r3, 1000
 390       b test
 391     loop:
 392       sub r3, r3, r4
 393       ...
 394     test:
 395       setvli. r4, r3, MVL=64
 396       bne cr0, loop
 397     end:
 398       blr
 399
 400 ## Load/Store-Multi (selective)
 401
 402 Up to 64 FPRs will be loaded, here.  `r3` is set one per bit
 403 for each FP register required to be loaded.  The block of memory
 404 from which the registers are loaded is contiguous (no gaps):
 405 any FP register which has a corresponding zero bit in `r3`
 406 is *unaltered*.  In essence this is a selective LD-multi with
 407 "Scatter" capability.
 408
 409     setvli r0, MVL=64, VL=64
 410     sv.fld/dm=r3 *r0, 0(r30) # selective load 64 FP registers
 411
 412 Up to 64 FPRs will be saved, here.  Again, `r3`
 413
 414     setvli r0, MVL=64, VL=64
 415     sv.stfd/sm=r3 *fp0, 0(r30) # selective store 64 FP registers
 416
 417 -------------
 418
 419 \newpage{}
 420
 421 # SVSTATE SPR
 422
 423 The format of the SVSTATE SPR is as follows:
 424
 425 | Field | Name     | Description           |
 426 | ----- | -------- | --------------------- |
 427 | 0:6   | maxvl    | Max Vector Length     |
 428 | 7:13  |    vl    | Vector Length         |
 429 | 14:20 | srcstep  | for srcstep = 0..VL-1 |
 430 | 21:27 | dststep  | for dststep = 0..VL-1 |
 431 | 28:29 | dsubstep | for substep = 0..SUBVL-1  |
 432 | 30:31 | ssubstep | for substep = 0..SUBVL-1  |
 433 | 32:33 | mi0      | REMAP RA/FRA/BFA SVSHAPE0-3    |
 434 | 34:35 | mi1      | REMAP RB/FRB/BFB SVSHAPE0-3    |
 435 | 36:37 | mi2      | REMAP RC/FRT SVSHAPE0-3    |
 436 | 38:39 | mo0      | REMAP RT/FRT/BF SVSHAPE0-3    |
 437 | 40:41 | mo1      | REMAP EA/RS/FRS SVSHAPE0-3    |
 438 | 42:46 | SVme     | REMAP enable (RA-RT)  |
 439 | 47:52 | rsvd     | reserved              |
 440 | 53    | pack     | PACK (srcstrp reorder)  |
 441 | 54    | unpack   | UNPACK (dststep order)  |
 442 | 55:61 | hphint   | Horizontal Hint       |
 443 | 62    | RMpst    | REMAP persistence     |
 444 | 63    | vfirst   | Vertical First mode   |
 445
 446 Notes:
 447
 448 * The entries are truncated to be within range.  Attempts to set VL to
 449   greater than MAXVL will truncate VL.
 450 * Setting srcstep, dststep to 64 or greater, or VL or MVL to greater
 451   than 64 is reserved and will cause an illegal instruction trap.
 452
 453 **SVSTATE Fields**
 454
 455 SVSTATE is a standard SPR that (if REMAP is not activated) contains sufficient
 456 self-contaned information for a full context save/restore.
 457 SVSTATE contains (and permits setting of):
 458
 459 * MVL (the Maximum Vector Length) - declares (statically) how
 460   much of a regfile is to be reserved for Vector elements
 461 * VL - Vector Length
 462 * dststep - the destination element offset of the current parallel
 463   instruction being executed
 464 * srcstep - for twin-predication, the source element offset as well.
 465 * ssubstep - the source subvector element offset of the current
 466   parallel instruction being executed
 467 * dsubstep - the destination subvector element offset of the current
 468   parallel instruction being executed
 469 * vfirst - Vertical First mode.  srcstep, dststep and substep
 470     **do not advance** unless explicitly requested to do so with
 471     pseudo-op svstep (a mode of setvl)
 472 * RMpst - REMAP persistence.  REMAP will apply only to the following
 473   instruction unless this bit is set, in which case REMAP "persists".
 474   Reset (cleared) on use of the `setvl` instruction if used to
 475   alter VL or MVL.
 476 * Pack - if set then srcstep/substep VL/SUBVL loop-ordering is inverted.
 477 * UnPack - if set then dststep/substep VL/SUBVL loop-ordering is inverted.
 478 * hphint - Horizontal Parallelism Hint. Indicates that
 479   no Hazards exist between groups of elements in sequential multiples of this number
 480    (before REMAP).  By definition: elements for which `FLOOR(srcstep/hphint)` is
 481    equal *before REMAP* are in the same parallelism "group". In Vertical First Mode
 482    hardware **MUST ONLY** process elements in the same group, and must stop
 483    Horizontal Issue at the last element of a given group. Set to zero to indicate "no hint".
 484 * SVme - REMAP enable bits, indicating which register is to be
 485    REMAPed: RA, RB, RC, RT and EA are the canonical (typical) register names
 486    associated with each bit, with RA being the LSB and EA being the MSB.
 487    See table below for ordering. When `SVme` is zero (0b00000) REMAP
 488    is **fully disabled and inactive** regardless of the contents of
 489   `SVSTATE`, `mi0-mi2/mo0-mo1`, or the four `SVSHAPEn` SPRs
 490 * mi0-mi2/mo0-mo1 - when the corresponding SVme bit is enabled, these
 491   indicate the SVSHAPE (0-3) that the corresponding register (RA etc)
 492   should use, as long as the register's corresponding SVme bit is set
 493
 494 Programmer's Note: the fact that REMAP is entirely dormant when `SVme` is zero
 495 allows establishment of REMAP context well in advance, followed by utilising `svremap`
 496 at a precise (or the very last) moment.  Some implementations may exploit this
 497 to cache (or take some time to prepare caches) in the background whilst other
 498 (unrelated) instructions are being executed. This is particularly important to
 499 bear in mind when using `svindex` which will require hardware to perform (and
 500 cache) additional GPR reads.
 501
 502 Programmer's Note: when REMAP is activated it becomes necessary on any
 503 context-switch (Interrupt or Function call) to detect (or know in advance)
 504 that REMAP is enabled and to additionally save/restore the four SVSHAPE
 505 SPRs, SVHAPE0-3.  Given that this is expected to be a rare occurrence it was
 506 deemed unreasonable to burden every context-switch or function call with
 507 mandatory save/restore of SVSHAPEs, and consequently it is a *callee*
 508 (and Trap Handler) responsibility.  Callees (and Trap Handlers) **MUST**
 509 avoid using all and any SVP64 instructions during the period where state
 510 could be adversely affected.  SVP64 purely relies on Scalar instructions,
 511 so Scalar instructions (except the SVP64 Management ones and mtspr and
 512 mfspr) are 100% guaranteed to have zero impact on SVP64 state.
 513
 514 **Max Vector Length (maxvl)** <a name="mvl" />
 515
 516 MAXVECTORLENGTH is the same concept as MVL in RISC-V RVV, except that it
 517 is variable length and may be dynamically set (normally from an immediate
 518 field only).  MVL is limited to 7 bits
 519 (in the first version of SVP64) and consequently the maximum number of
 520 elements is limited to between 0 and 127.
 521
 522 Programmer's Note: Except by directly using `mtspr` on SVSTATE, which may
 523 result in performance penalties on some hardware implementations, SVSTATE's `maxvl`
 524 field may only be set **statically** as an immediate, by the `setvl` instruction.
 525 It may **NOT** be set dynamically from a register.  Compiler writers and assembly
 526 programmers are expected to perform static register file analysis, subdivision,
 527 and allocation and only utilise `setvl`. Direct writing to SVSTATE in order to
 528 "bypass" this Note could, in less-advanced implementations, potentially cause stalling,
 529 particularly if SVP64 instructions are issued directly after the `mtspr` to SVSTATE.
 530
 531 **Vector Length (vl)** <a name="vl" />
 532
 533 The actual Vector length, the number of elements in a "Vector", `SVSTATE.vl` may be set
 534 entirely dynamically at runtime from a number of sources. `setvl` is the primary
 535 instruction for setting Vector Length.
 536 `setvl` is conceptually similar but different from the Cray, SX Aurora, and RISC-V RVV
 537 equivalent. Similar to RVV, VL is set to be within
 538 the range 0 <= VL <= MVL. Unlike RVV, VL is set **exactly** according to the following:
 539
 540     VL = (RT|0) = MIN(vlen, MVL)
 541
 542 where 0 <= MVL <= 127 and vlen may come from an immediate, `RA`, or from the `CTR` SPR,
 543 depending on options selected with the `setvl` instruction.
 544
 545 Programmer's Note: conceptual understanding of Cray-style Vectors is far beyond the scope
 546 of the Power ISA Technical Reference.  Guidance on the 50-year-old Cray Vector paradigm is
 547 best sought elsewhere: good studies include Academic Courses given on the 1970s
 548 Cray Supercomputers over at least the past three decades.
 549
 550 **SUBVL - Sub Vector Length**
 551
 552 This is a "group by quantity" that effectively asks each iteration
 553 of the hardware loop to load SUBVL elements of width elwidth at a
 554 time. Effectively, SUBVL is like a SIMD multiplier: instead of just 1
 555 operation issued, SUBVL operations are issued.
 556
 557 The main effect of SUBVL is that predication bits are applied per
 558 **group**, rather than by individual element.  Legal values are 0 to 3,
 559 representing 1 operation (1 element) thru 4 operations (4 elements) respectively.
 560 Elements are best though of in the context of 3D, Audio and Video: two Left and Right
 561 Channel "elements" or four ARGB "elements", or three XYZ coordinate "elements".
 562
 563 `subvl` is again primarily set by the `setvl` instruction. Not to be confused
 564 with `hphint`.
 565
 566 Directly related to `subvl` is the `pack` and `unpack` Mode bits of `SVSTATE`.
 567 See `svstep` instruction for how to set Pack and Unpack Modes.
 568
 569
 570 **Horizontal Parallelism**
 571
 572 A problem exists for hardware where it may not be able to detect
 573 that a programmer (or compiler) knows of opportunities for parallelism
 574 and lack of overlap between loops.
 575
 576 For hphint, the number chosen must be consistently
 577 executed **every time**. Hardware is not permitted to execute five
 578 computations for one instruction then three on the next.
 579 hphint is a hint from the compiler to hardware that exactly this
 580 many elements may be safely executed in parallel, without hazards
 581 (including Memory accesses).
 582 Interestingly, when hphint is set equal to VL, it is in effect
 583 as if Vertical First mode were not set, because the hardware is
 584 given the option to run through all elements in an instruction.
 585 This is exactly what Horizontal-First is: a for-loop from 0 to VL-1
 586 except that the hardware may *choose* the number of elements.
 587
 588 *Note to programmers: changing VL during the middle of such modes
 589 should be done only with due care and respect for the fact that SVSTATE
 590 has exactly the same peer-level status as a Program Counter.*
 591
 592 -------------
 593
 594 \newpage{}
 595
 596 # SVL-Form
 597
 598 Add the following to Book I, 1.6.1, SVL-Form
 599
 600 ```
 601     |0     |6    |11    |16   |23 |24 |25 |26    |31 |
 602     | PO   |  RT |   RA | SVi |ms |vs |vf |   XO |Rc |
 603     | PO   |  RT | /    | SVi |/  |/  |vf |   XO |Rc |
 604 ```
 605
 606 * Add `SVL` to `RA (11:15)` Field in Book I, 1.6.2
 607 * Add `SVL` to `RT (6:10)` Field in Book I, 1.6.2
 608 * Add `SVL` to `Rc (31)` Field in Book I, 1.6.2
 609 * Add `SVL` to `XO (26:31)` Field in Book I, 1.6.2
 610
 611 Add the following to Book I, 1.6.2
 612
 613 ```
 614     ms (23)
 615         Field used in Simple-V to specify whether MVL (maxvl in the SVSTATE SPR)
 616         is to be set
 617         Formats: SVL
 618     vf (25)
 619         Field used in Simple-V to specify whether "Vertical" Mode is set
 620         (vfirst in the SVSTATE SPR)
 621         Formats: SVL
 622     vs (24)
 623         Field used in Simple-V to specify whether VL (vl in the SVSTATE SPR) is to be set
 624         Formats: SVL
 625     SVi (16:22)
 626          Simple-V immediate field used by setvl for setting VL or MVL
 627          (vl, maxvl in the SVSTATE SPR)
 628          and used as a "Mode of Operation" selector in svstep
 629          Formats: SVL
 630 ```
 631
 632 # Appendices
 633
 634     Appendix E Power ISA sorted by opcode
 635     Appendix F Power ISA sorted by version
 636     Appendix G Power ISA sorted by Compliancy Subset
 637     Appendix H Power ISA sorted by mnemonic
 638
 639 | Form | Book | Page | Version | mnemonic | Description |
 640 |------|------|------|---------|----------|-------------|
 641 | SVL  | I    | #    | 3.0B    | svstep   | Vertical-First Stepping and status reporting |
 642 | SVL  | I    | #    | 3.0B    | setvl    | Cray-like establishment of Looping (Vector) context |
 643
 644 [[!tag opf_rfc]]