1 # RFC ls008 SVP64 Management instructions
7 * <https://libre-soc.org/openpower/sv/>
8 * <https://libre-soc.org/openpower/sv/rfc/ls008/>
9 * <https://bugs.libre-soc.org/show_bug.cgi?id=1040>
10 * <https://git.openpower.foundation/isa/PowerISA/issues/87>
22 **Books and Section affected**:
25 Book I, new Scalar Chapter. (Or, new Book on "Zero-Overhead Loop Subsystem")
26 Appendix E Power ISA sorted by opcode
27 Appendix F Power ISA sorted by version
28 Appendix G Power ISA sorted by Compliancy Subset
29 Appendix H Power ISA sorted by mnemonic
36 setvl - Cray-style "Set Vector Length" instruction
37 svstep - Vertical-First Mode explicit Step and Status
38 svremap - Re-Mapping of Register Element Offsets
39 svindex - General-purpose setting of SHAPEs to be re-mapped
40 svshape - Hardware-level setting of SHAPEs for element re-mapping
41 svshape2 - Hardware-level setting of SHAPEs for element re-mapping (v2)
44 **Submitter**: Luke Leighton (Libre-SOC)
46 **Requester**: Libre-SOC
48 **Impact on processor**:
51 Addition of six new "Zero-Overhead-Loop-Control" DSP-style Vector-style
52 Management Instructions which can be implemented extremely efficiently
53 and effectively by inserting an additional phase between Decode and Issue.
54 More complex designs are NOT adversely impacted and in fact greatly benefit
55 whilst still retaining an obvious linear sequential execution programming model.
58 **Impact on software**:
61 Requires support for new instructions in assembler, debuggers,
68 Cray Supercomputing, Vectorisation, Zero-Overhead-Loop-Control,
69 Scalable Vectors, Multi-Issue Out-of-Order, Sequential Programming Model
76 **Notes and Observations**:
82 Add the following entries to:
84 * the Appendices of Book I
85 * Instructions of Book I as a new Section
86 * SVL-Form of Book I Section 1.6.1.6 and 1.6.2
92 # svstep: Vertical-First Stepping and status reporting
96 * svstep RT,SVi,vf (Rc=0)
97 * svstep. RT,SVi,vf (Rc=1)
99 | 0-5|6-10|11.15|16..22| 23-25 | 26-30 |31| Form |
100 |----|----|-----|------|----------|-------|--|--------- |
101 |PO | RT | / | SVi | / / vf | XO |Rc| SVL-Form |
106 if SVi[3:4] = 0b11 then
107 # store pack and unpack in SVSTATE
108 SVSTATE[53] <- SVi[5]
109 SVSTATE[54] <- SVi[6]
110 RT <- [0]*62 || SVSTATE[53:54]
112 # Vertical-First explicit stepping.
113 step <- SVSTATE_NEXT(SVi, vf)
117 Special Registers Altered:
130 | 0-5|6-10|11-15|16-22 | 23 24 25 | 26-30 |31| FORM |
131 | -- | -- | --- | ---- |----------| ----- |--|----------|
132 |PO | RT | RA | SVi | ms vs vf | XO |Rc| SVL-Form |
134 * setvl RT,RA,SVi,vf,vs,ms (Rc=0)
135 * setvl. RT,RA,SVi,vf,vs,ms (Rc=1)
143 if ms = 1 then MVL <- VLimm[0:6]
144 else MVL <- SVSTATE[0:6]
146 if vs = 0 then VL <- SVSTATE[7:13]
147 else if _RA != 0 then
148 if (RA) >u 0b1111111 then
151 else VL <- (RA)[57:63]
152 else if _RT = 0 then VL <- VLimm[0:6]
153 else if CTR >u 0b1111111 then
156 else VL <- CTR[57:63]
157 # limit VL to within MVL
164 GPR(_RT) <- [0]*57 || VL
165 if ((¬vs) & ¬(ms)) = 0 then
166 # set requested Vertical-First mode, clear persist
171 Special Registers Altered:
183 The format of the SVSTATE SPR is as follows:
185 | Field | Name | Description |
186 | ----- | -------- | --------------------- |
187 | 0:6 | maxvl | Max Vector Length |
188 | 7:13 | vl | Vector Length |
189 | 14:20 | srcstep | for srcstep = 0..VL-1 |
190 | 21:27 | dststep | for dststep = 0..VL-1 |
191 | 28:29 | dsubstep | for substep = 0..SUBVL-1 |
192 | 30:31 | ssubstep | for substep = 0..SUBVL-1 |
193 | 32:33 | mi0 | REMAP RA/FRA/BFA SVSHAPE0-3 |
194 | 34:35 | mi1 | REMAP RB/FRB/BFB SVSHAPE0-3 |
195 | 36:37 | mi2 | REMAP RC/FRT SVSHAPE0-3 |
196 | 38:39 | mo0 | REMAP RT/FRT/BF SVSHAPE0-3 |
197 | 40:41 | mo1 | REMAP EA/RS/FRS SVSHAPE0-3 |
198 | 42:46 | SVme | REMAP enable (RA-RT) |
199 | 47:52 | rsvd | reserved |
200 | 53 | pack | PACK (srcstrp reorder) |
201 | 54 | unpack | UNPACK (dststep order) |
202 | 55:61 | hphint | Horizontal Hint |
203 | 62 | RMpst | REMAP persistence |
204 | 63 | vfirst | Vertical First mode |
208 * The entries are truncated to be within range. Attempts to set VL to
209 greater than MAXVL will truncate VL.
210 * Setting srcstep, dststep to 64 or greater, or VL or MVL to greater
211 than 64 is reserved and will cause an illegal instruction trap.
215 SVSTATE is a standard SPR that (if REMAP is not activated) contains sufficient
216 self-contaned information for a full context save/restore.
217 SVSTATE contains (and permits setting of):
219 * MVL (the Maximum Vector Length) - declares (statically) how
220 much of a regfile is to be reserved for Vector elements
222 * dststep - the destination element offset of the current parallel
223 instruction being executed
224 * srcstep - for twin-predication, the source element offset as well.
225 * ssubstep - the source subvector element offset of the current
226 parallel instruction being executed
227 * dsubstep - the destination subvector element offset of the current
228 parallel instruction being executed
229 * vfirst - Vertical First mode. srcstep, dststep and substep
230 **do not advance** unless explicitly requested to do so with
231 pseudo-op svstep (a mode of setvl)
232 * RMpst - REMAP persistence. REMAP will apply only to the following
233 instruction unless this bit is set, in which case REMAP "persists".
234 Reset (cleared) on use of the `setvl` instruction if used to
236 * Pack - if set then srcstep/substep VL/SUBVL loop-ordering is inverted.
237 * UnPack - if set then dststep/substep VL/SUBVL loop-ordering is inverted.
238 * hphint - Horizontal Parallelism Hint. Indicates that
239 no Hazards exist between groups of elements in sequential multiples of this number
240 (before REMAP). By definition: elements for which `FLOOR(srcstep/hphint)` is
241 equal *before REMAP* are in the same parallelism "group". In Vertical First Mode
242 hardware **MUST ONLY** process elements in the same group, and must stop
243 Horizontal Issue at the last element of a given group. Set to zero to indicate "no hint".
244 * SVme - REMAP enable bits, indicating which register is to be
245 REMAPed: RA, RB, RC, RT and EA are the canonical (typical) register names
246 associated with each bit, with RA being the LSB and EA being the MSB.
247 See table below for ordering. When `SVme` is zero (0b00000) REMAP
248 is **fully disabled and inactive** regardless of the contents of
249 `SVSTATE`, `mi0-mi2/mo0-mo1`, or the four `SVSHAPEn` SPRs
250 * mi0-mi2/mo0-mo1 - when the corresponding SVme bit is enabled, these
251 indicate the SVSHAPE (0-3) that the corresponding register (RA etc)
252 should use, as long as the register's corresponding SVme bit is set
254 Programmer's Note: the fact that REMAP is entirely dormant when `SVme` is zero
255 allows establishment of REMAP context well in advance, followed by utilising `svremap`
256 at a precise (or the very last) moment. Some implementations may exploit this
257 to cache (or take some time to prepare caches) in the background whilst other
258 (unrelated) instructions are being executed. This is particularly important to
259 bear in mind when using `svindex` which will require hardware to perform (and
260 cache) additional GPR reads.
262 Programmer's Note: when REMAP is activated it becomes necessary on any
263 context-switch (Interrupt or Function call) to detect (or know in advance)
264 that REMAP is enabled and to additionally save/restore the four SVSHAPE
265 SPRs, SVHAPE0-3. Given that this is expected to be a rare occurrence it was
266 deemed unreasonable to burden every context-switch or function call with
267 mandatory save/restore of SVSHAPEs, and consequently it is a *callee*
268 (and Trap Handler) responsibility. Callees (and Trap Handlers) **MUST**
269 avoid using all and any SVP64 instructions during the period where state
270 could be adversely affected. SVP64 purely relies on Scalar instructions,
271 so Scalar instructions (except the SVP64 Management ones and mtspr and
272 mfspr) are 100% guaranteed to have zero impact on SVP64 state.
274 **Max Vector Length (maxvl)** <a name="mvl" />
276 MAXVECTORLENGTH is the same concept as MVL in RISC-V RVV, except that it
277 is variable length and may be dynamically set (normally from an immediate
278 field only). MVL is limited to 7 bits
279 (in the first version of SVP64) and consequently the maximum number of
280 elements is limited to between 0 and 127.
282 Programmer's Note: Except by directly using `mtspr` on SVSTATE, which may
283 result in performance penalties on some hardware implementations, SVSTATE's `maxvl`
284 field may only be set **statically** as an immediate, by the `setvl` instruction.
285 It may **NOT** be set dynamically from a register. Compiler writers and assembly
286 programmers are expected to perform static register file analysis, subdivision,
287 and allocation and only utilise `setvl`. Direct writing to SVSTATE in order to
288 "bypass" this Note could, in less-advanced implementations, potentially cause stalling,
289 particularly if SVP64 instructions are issued directly after the `mtspr` to SVSTATE.
291 **Vector Length (vl)** <a name="vl" />
293 The actual Vector length, the number of elements in a "Vector", `SVSTATE.vl` may be set
294 entirely dynamically at runtime from a number of sources. `setvl` is the primary
295 instruction for setting Vector Length.
296 `setvl` is conceptually similar but different from the Cray, SX Aurora, and RISC-V RVV
297 equivalent. Similar to RVV, VL is set to be within
298 the range 0 <= VL <= MVL. Unlike RVV, VL is set **exactly** according to the following:
300 VL = (RT|0) = MIN(vlen, MVL)
302 where 0 <= MVL <= 127 and vlen may come from an immediate, `RA`, or from the `CTR` SPR,
303 depending on options selected with the `setvl` instruction.
305 Programmer's Note: conceptual understanding of Cray-style Vectors is far beyond the scope
306 of the Power ISA Technical Reference. Guidance on the 50-year-old Cray Vector paradigm is
307 best sought elsewhere: good studies include Academic Courses given on the 1970s
308 Cray Supercomputers over at least the past three decades.
310 **SUBVL - Sub Vector Length**
312 This is a "group by quantity" that effectively asks each iteration
313 of the hardware loop to load SUBVL elements of width elwidth at a
314 time. Effectively, SUBVL is like a SIMD multiplier: instead of just 1
315 operation issued, SUBVL operations are issued.
317 The main effect of SUBVL is that predication bits are applied per
318 **group**, rather than by individual element. Legal values are 0 to 3,
319 representing 1 operation (1 element) thru 4 operations (4 elements) respectively.
320 Elements are best though of in the context of 3D, Audio and Video: two Left and Right
321 Channel "elements" or four ARGB "elements", or three XYZ coordinate "elements".
323 `subvl` is again primarily set by the `setvl` instruction. Not to be confused
326 Directly related to `subvl` is the `pack` and `unpack` Mode bits of `SVSTATE`.
327 See `svstep` instruction for how to set Pack and Unpack Modes.
330 **Horizontal Parallelism**
332 A problem exists for hardware where it may not be able to detect
333 that a programmer (or compiler) knows of opportunities for parallelism
334 and lack of overlap between loops.
336 For hphint, the number chosen must be consistently
337 executed **every time**. Hardware is not permitted to execute five
338 computations for one instruction then three on the next.
339 hphint is a hint from the compiler to hardware that exactly this
340 many elements may be safely executed in parallel, without hazards
341 (including Memory accesses).
342 Interestingly, when hphint is set equal to VL, it is in effect
343 as if Vertical First mode were not set, because the hardware is
344 given the option to run through all elements in an instruction.
345 This is exactly what Horizontal-First is: a for-loop from 0 to VL-1
346 except that the hardware may *choose* the number of elements.
348 *Note to programmers: changing VL during the middle of such modes
349 should be done only with due care and respect for the fact that SVSTATE
350 has exactly the same peer-level status as a Program Counter.*
358 Add the following to Book I, 1.6.1, SVL-Form
361 |0 |6 |11 |16 |23 |24 |25 |26 |31 |
362 | PO | RT | RA | SVi |ms |vs |vf | XO |Rc |
363 | PO | RT | / | SVi |/ |/ |vf | XO |Rc |
366 * Add `SVL` to `RA (11:15)` Field in Book I, 1.6.2
367 * Add `SVL` to `RT (6:10)` Field in Book I, 1.6.2
368 * Add `SVL` to `Rc (31)` Field in Book I, 1.6.2
369 * Add `SVL` to `XO (26:31)` Field in Book I, 1.6.2
371 Add the following to Book I, 1.6.2
375 Field used in Simple-V to specify whether MVL (maxvl in the SVSTATE SPR)
379 Field used in Simple-V to specify whether "Vertical" Mode is set
380 (vfirst in the SVSTATE SPR)
383 Field used in Simple-V to specify whether VL (vl in the SVSTATE SPR) is to be set
386 Simple-V immediate field for setting VL or MVL (vl, maxvl in the SVSTATE SPR)
393 Appendix E Power ISA sorted by opcode
394 Appendix F Power ISA sorted by version
395 Appendix G Power ISA sorted by Compliancy Subset
396 Appendix H Power ISA sorted by mnemonic
398 | Form | Book | Page | Version | mnemonic | Description |
399 |------|------|------|---------|----------|-------------|
400 | SVL | I | # | 3.0B | svstep | Vertical-First Stepping and status reporting |