(no commit message)
[libreriscv.git] / openpower / sv / rfc / ls008.mdwn
1 # RFC ls008 SVP64 Management instructions
2
3 [[!tag opf_rfc]]
4
5 **URLs**:
6
7 * <https://libre-soc.org/openpower/sv/>
8 * <https://libre-soc.org/openpower/sv/rfc/ls008/>
9 * <https://bugs.libre-soc.org/show_bug.cgi?id=1040>
10 * <https://git.openpower.foundation/isa/PowerISA/issues/87>
11
12 **Severity**: Major
13
14 **Status**: New
15
16 **Date**: 24 Mar 2023
17
18 **Target**: v3.2B
19
20 **Source**: v3.0B
21
22 **Books and Section affected**:
23
24 ```
25 Book I, new Scalar Chapter. (Or, new Book on "Zero-Overhead Loop Subsystem")
26 Appendix E Power ISA sorted by opcode
27 Appendix F Power ISA sorted by version
28 Appendix G Power ISA sorted by Compliancy Subset
29 Appendix H Power ISA sorted by mnemonic
30 ```
31
32 **Summary**
33
34 ```
35 Instructions added
36 setvl - Cray-style "Set Vector Length" instruction
37 svstep - Vertical-First Mode explicit Step and Status
38 svremap - Re-Mapping of Register Element Offsets
39 svindex - General-purpose setting of SHAPEs to be re-mapped
40 svshape - Hardware-level setting of SHAPEs for element re-mapping
41 svshape2 - Hardware-level setting of SHAPEs for element re-mapping (v2)
42 ```
43
44 **Submitter**: Luke Leighton (Libre-SOC)
45
46 **Requester**: Libre-SOC
47
48 **Impact on processor**:
49
50 ```
51 Addition of six new "Zero-Overhead-Loop-Control" DSP-style Vector-style
52 Management Instructions which can be implemented extremely efficiently
53 and effectively by inserting an additional phase between Decode and Issue.
54 More complex designs are NOT adversely impacted and in fact greatly benefit
55 whilst still retaining an obvious linear sequential execution programming model.
56 ```
57
58 **Impact on software**:
59
60 ```
61 Requires support for new instructions in assembler, debuggers,
62 and related tools.
63 ```
64
65 **Keywords**:
66
67 ```
68 Cray Supercomputing, Vectorisation, Zero-Overhead-Loop-Control,
69 Scalable Vectors, Multi-Issue Out-of-Order, Sequential Programming Model
70 ```
71
72 **Motivation**
73
74 TODO
75
76 **Notes and Observations**:
77
78 1. TODO
79
80 **Changes**
81
82 Add the following entries to:
83
84 * the Appendices of Book I
85 * Instructions of Book I as a new Section
86 * SVL-Form of Book I Section 1.6.1.6 and 1.6.2
87
88 ----------------
89
90 \newpage{}
91
92 # svstep: Vertical-First Stepping and status reporting
93
94 SVL-Form
95
96 * svstep RT,SVi,vf (Rc=0)
97 * svstep. RT,SVi,vf (Rc=1)
98
99 | 0-5|6-10|11.15|16..22| 23-25 | 26-30 |31| Form |
100 |----|----|-----|------|----------|-------|--|--------- |
101 |PO | RT | / | SVi | / / vf | XO |Rc| SVL-Form |
102
103 Pseudo-code:
104
105 ```
106 if SVi[3:4] = 0b11 then
107 # store pack and unpack in SVSTATE
108 SVSTATE[53] <- SVi[5]
109 SVSTATE[54] <- SVi[6]
110 RT <- [0]*62 || SVSTATE[53:54]
111 else
112 # Vertical-First explicit stepping.
113 step <- SVSTATE_NEXT(SVi, vf)
114 RT <- [0]*57 || step
115 ```
116
117 Special Registers Altered:
118
119 CR0 (if Rc=1)
120
121 -------------
122
123 \newpage{}
124
125
126 # setvl
127
128 SVL-Form
129
130 | 0-5|6-10|11-15|16-22 | 23 24 25 | 26-30 |31| FORM |
131 | -- | -- | --- | ---- |----------| ----- |--|----------|
132 |PO | RT | RA | SVi | ms vs vf | XO |Rc| SVL-Form |
133
134 * setvl RT,RA,SVi,vf,vs,ms (Rc=0)
135 * setvl. RT,RA,SVi,vf,vs,ms (Rc=1)
136
137 Pseudo-code:
138
139 ```
140 overflow <- 0b0 # sets CR.SO if set and if Rc=1
141 VLimm <- SVi + 1
142 # set or get MVL
143 if ms = 1 then MVL <- VLimm[0:6]
144 else MVL <- SVSTATE[0:6]
145 # set or get VL
146 if vs = 0 then VL <- SVSTATE[7:13]
147 else if _RA != 0 then
148 if (RA) >u 0b1111111 then
149 VL <- 0b1111111
150 overflow <- 0b1
151 else VL <- (RA)[57:63]
152 else if _RT = 0 then VL <- VLimm[0:6]
153 else if CTR >u 0b1111111 then
154 VL <- 0b1111111
155 overflow <- 0b1
156 else VL <- CTR[57:63]
157 # limit VL to within MVL
158 if VL >u MVL then
159 overflow <- 0b1
160 VL <- MVL
161 SVSTATE[0:6] <- MVL
162 SVSTATE[7:13] <- VL
163 if _RT != 0 then
164 GPR(_RT) <- [0]*57 || VL
165 if ((¬vs) & ¬(ms)) = 0 then
166 # set requested Vertical-First mode, clear persist
167 SVSTATE[63] <- vf
168 SVSTATE[62] <- 0b0
169 ```
170
171 Special Registers Altered:
172
173 ```
174 CR0 (if Rc=1)
175 ```
176
177 * `SVi` - bits 16-22 - an immediate operand for setting MVL and/or VL
178 * `ms` - bit 23 - allows for setting of MVL
179 * `vs` - bit 24 - allows for setting of VL
180 * `vf` - bit 25 - sets "Vertical First Mode".
181
182 Note that in immediate setting mode VL and MVL start from **one**
183 but that this is compensated for in the assembly notation.
184 i.e. that an immediate value of 1 in assembler notation
185 actually places the value 0b0000000 in the `SVi` field bits:
186 on execution the `setvl` instruction adds one to the decoded
187 `SVi` field bits, resulting in
188 VL/MVL being set to 1. This allows VL to be set to values
189 ranging from 1 to 128 with only 7 bits instead of 8.
190 Setting VL/MVL
191 to 0 would result in all Vector operations becoming `nop`. If this is
192 truly desired (nop behaviour) then setting VL and MVL to zero is to be
193 done via the [[SVSTATE SPR|sv/sprs]].
194
195 Note that setmvli is a pseudo-op, based on RA/RT=0, and setvli likewise
196
197 setvli VL=8 : setvl r0, r0, VL=8, vf=0, vs=1, ms=0
198 setvli. VL=8 : setvl. r0, r0, VL=8, vf=0, vs=1, ms=0
199 setmvli MVL=8 : setvl r0, r0, MVL=8, vf=0, vs=0, ms=1
200 setmvli. MVL=8 : setvl. r0, r0, MVL=8, vf=0, vs=0, ms=1
201
202 Additional pseudo-op for obtaining VL without modifying it (or any state):
203
204 getvl r5 : setvl r5, r0, vf=0, vs=0, ms=0
205 getvl. r5 : setvl. r5, r0, vf=0, vs=0, ms=0
206
207 -------------
208
209 \newpage{}
210
211 # SVSTATE SPR
212
213 The format of the SVSTATE SPR is as follows:
214
215 | Field | Name | Description |
216 | ----- | -------- | --------------------- |
217 | 0:6 | maxvl | Max Vector Length |
218 | 7:13 | vl | Vector Length |
219 | 14:20 | srcstep | for srcstep = 0..VL-1 |
220 | 21:27 | dststep | for dststep = 0..VL-1 |
221 | 28:29 | dsubstep | for substep = 0..SUBVL-1 |
222 | 30:31 | ssubstep | for substep = 0..SUBVL-1 |
223 | 32:33 | mi0 | REMAP RA/FRA/BFA SVSHAPE0-3 |
224 | 34:35 | mi1 | REMAP RB/FRB/BFB SVSHAPE0-3 |
225 | 36:37 | mi2 | REMAP RC/FRT SVSHAPE0-3 |
226 | 38:39 | mo0 | REMAP RT/FRT/BF SVSHAPE0-3 |
227 | 40:41 | mo1 | REMAP EA/RS/FRS SVSHAPE0-3 |
228 | 42:46 | SVme | REMAP enable (RA-RT) |
229 | 47:52 | rsvd | reserved |
230 | 53 | pack | PACK (srcstrp reorder) |
231 | 54 | unpack | UNPACK (dststep order) |
232 | 55:61 | hphint | Horizontal Hint |
233 | 62 | RMpst | REMAP persistence |
234 | 63 | vfirst | Vertical First mode |
235
236 Notes:
237
238 * The entries are truncated to be within range. Attempts to set VL to
239 greater than MAXVL will truncate VL.
240 * Setting srcstep, dststep to 64 or greater, or VL or MVL to greater
241 than 64 is reserved and will cause an illegal instruction trap.
242
243 **SVSTATE Fields**
244
245 SVSTATE is a standard SPR that (if REMAP is not activated) contains sufficient
246 self-contaned information for a full context save/restore.
247 SVSTATE contains (and permits setting of):
248
249 * MVL (the Maximum Vector Length) - declares (statically) how
250 much of a regfile is to be reserved for Vector elements
251 * VL - Vector Length
252 * dststep - the destination element offset of the current parallel
253 instruction being executed
254 * srcstep - for twin-predication, the source element offset as well.
255 * ssubstep - the source subvector element offset of the current
256 parallel instruction being executed
257 * dsubstep - the destination subvector element offset of the current
258 parallel instruction being executed
259 * vfirst - Vertical First mode. srcstep, dststep and substep
260 **do not advance** unless explicitly requested to do so with
261 pseudo-op svstep (a mode of setvl)
262 * RMpst - REMAP persistence. REMAP will apply only to the following
263 instruction unless this bit is set, in which case REMAP "persists".
264 Reset (cleared) on use of the `setvl` instruction if used to
265 alter VL or MVL.
266 * Pack - if set then srcstep/substep VL/SUBVL loop-ordering is inverted.
267 * UnPack - if set then dststep/substep VL/SUBVL loop-ordering is inverted.
268 * hphint - Horizontal Parallelism Hint. Indicates that
269 no Hazards exist between groups of elements in sequential multiples of this number
270 (before REMAP). By definition: elements for which `FLOOR(srcstep/hphint)` is
271 equal *before REMAP* are in the same parallelism "group". In Vertical First Mode
272 hardware **MUST ONLY** process elements in the same group, and must stop
273 Horizontal Issue at the last element of a given group. Set to zero to indicate "no hint".
274 * SVme - REMAP enable bits, indicating which register is to be
275 REMAPed: RA, RB, RC, RT and EA are the canonical (typical) register names
276 associated with each bit, with RA being the LSB and EA being the MSB.
277 See table below for ordering. When `SVme` is zero (0b00000) REMAP
278 is **fully disabled and inactive** regardless of the contents of
279 `SVSTATE`, `mi0-mi2/mo0-mo1`, or the four `SVSHAPEn` SPRs
280 * mi0-mi2/mo0-mo1 - when the corresponding SVme bit is enabled, these
281 indicate the SVSHAPE (0-3) that the corresponding register (RA etc)
282 should use, as long as the register's corresponding SVme bit is set
283
284 Programmer's Note: the fact that REMAP is entirely dormant when `SVme` is zero
285 allows establishment of REMAP context well in advance, followed by utilising `svremap`
286 at a precise (or the very last) moment. Some implementations may exploit this
287 to cache (or take some time to prepare caches) in the background whilst other
288 (unrelated) instructions are being executed. This is particularly important to
289 bear in mind when using `svindex` which will require hardware to perform (and
290 cache) additional GPR reads.
291
292 Programmer's Note: when REMAP is activated it becomes necessary on any
293 context-switch (Interrupt or Function call) to detect (or know in advance)
294 that REMAP is enabled and to additionally save/restore the four SVSHAPE
295 SPRs, SVHAPE0-3. Given that this is expected to be a rare occurrence it was
296 deemed unreasonable to burden every context-switch or function call with
297 mandatory save/restore of SVSHAPEs, and consequently it is a *callee*
298 (and Trap Handler) responsibility. Callees (and Trap Handlers) **MUST**
299 avoid using all and any SVP64 instructions during the period where state
300 could be adversely affected. SVP64 purely relies on Scalar instructions,
301 so Scalar instructions (except the SVP64 Management ones and mtspr and
302 mfspr) are 100% guaranteed to have zero impact on SVP64 state.
303
304 **Max Vector Length (maxvl)** <a name="mvl" />
305
306 MAXVECTORLENGTH is the same concept as MVL in RISC-V RVV, except that it
307 is variable length and may be dynamically set (normally from an immediate
308 field only). MVL is limited to 7 bits
309 (in the first version of SVP64) and consequently the maximum number of
310 elements is limited to between 0 and 127.
311
312 Programmer's Note: Except by directly using `mtspr` on SVSTATE, which may
313 result in performance penalties on some hardware implementations, SVSTATE's `maxvl`
314 field may only be set **statically** as an immediate, by the `setvl` instruction.
315 It may **NOT** be set dynamically from a register. Compiler writers and assembly
316 programmers are expected to perform static register file analysis, subdivision,
317 and allocation and only utilise `setvl`. Direct writing to SVSTATE in order to
318 "bypass" this Note could, in less-advanced implementations, potentially cause stalling,
319 particularly if SVP64 instructions are issued directly after the `mtspr` to SVSTATE.
320
321 **Vector Length (vl)** <a name="vl" />
322
323 The actual Vector length, the number of elements in a "Vector", `SVSTATE.vl` may be set
324 entirely dynamically at runtime from a number of sources. `setvl` is the primary
325 instruction for setting Vector Length.
326 `setvl` is conceptually similar but different from the Cray, SX Aurora, and RISC-V RVV
327 equivalent. Similar to RVV, VL is set to be within
328 the range 0 <= VL <= MVL. Unlike RVV, VL is set **exactly** according to the following:
329
330 VL = (RT|0) = MIN(vlen, MVL)
331
332 where 0 <= MVL <= 127 and vlen may come from an immediate, `RA`, or from the `CTR` SPR,
333 depending on options selected with the `setvl` instruction.
334
335 Programmer's Note: conceptual understanding of Cray-style Vectors is far beyond the scope
336 of the Power ISA Technical Reference. Guidance on the 50-year-old Cray Vector paradigm is
337 best sought elsewhere: good studies include Academic Courses given on the 1970s
338 Cray Supercomputers over at least the past three decades.
339
340 **SUBVL - Sub Vector Length**
341
342 This is a "group by quantity" that effectively asks each iteration
343 of the hardware loop to load SUBVL elements of width elwidth at a
344 time. Effectively, SUBVL is like a SIMD multiplier: instead of just 1
345 operation issued, SUBVL operations are issued.
346
347 The main effect of SUBVL is that predication bits are applied per
348 **group**, rather than by individual element. Legal values are 0 to 3,
349 representing 1 operation (1 element) thru 4 operations (4 elements) respectively.
350 Elements are best though of in the context of 3D, Audio and Video: two Left and Right
351 Channel "elements" or four ARGB "elements", or three XYZ coordinate "elements".
352
353 `subvl` is again primarily set by the `setvl` instruction. Not to be confused
354 with `hphint`.
355
356 Directly related to `subvl` is the `pack` and `unpack` Mode bits of `SVSTATE`.
357 See `svstep` instruction for how to set Pack and Unpack Modes.
358
359
360 **Horizontal Parallelism**
361
362 A problem exists for hardware where it may not be able to detect
363 that a programmer (or compiler) knows of opportunities for parallelism
364 and lack of overlap between loops.
365
366 For hphint, the number chosen must be consistently
367 executed **every time**. Hardware is not permitted to execute five
368 computations for one instruction then three on the next.
369 hphint is a hint from the compiler to hardware that exactly this
370 many elements may be safely executed in parallel, without hazards
371 (including Memory accesses).
372 Interestingly, when hphint is set equal to VL, it is in effect
373 as if Vertical First mode were not set, because the hardware is
374 given the option to run through all elements in an instruction.
375 This is exactly what Horizontal-First is: a for-loop from 0 to VL-1
376 except that the hardware may *choose* the number of elements.
377
378 *Note to programmers: changing VL during the middle of such modes
379 should be done only with due care and respect for the fact that SVSTATE
380 has exactly the same peer-level status as a Program Counter.*
381
382 -------------
383
384 \newpage{}
385
386 # SVL-Form
387
388 Add the following to Book I, 1.6.1, SVL-Form
389
390 ```
391 |0 |6 |11 |16 |23 |24 |25 |26 |31 |
392 | PO | RT | RA | SVi |ms |vs |vf | XO |Rc |
393 | PO | RT | / | SVi |/ |/ |vf | XO |Rc |
394 ```
395
396 * Add `SVL` to `RA (11:15)` Field in Book I, 1.6.2
397 * Add `SVL` to `RT (6:10)` Field in Book I, 1.6.2
398 * Add `SVL` to `Rc (31)` Field in Book I, 1.6.2
399 * Add `SVL` to `XO (26:31)` Field in Book I, 1.6.2
400
401 Add the following to Book I, 1.6.2
402
403 ```
404 ms (23)
405 Field used in Simple-V to specify whether MVL (maxvl in the SVSTATE SPR)
406 is to be set
407 Formats: SVL
408 vf (25)
409 Field used in Simple-V to specify whether "Vertical" Mode is set
410 (vfirst in the SVSTATE SPR)
411 Formats: SVL
412 vs (24)
413 Field used in Simple-V to specify whether VL (vl in the SVSTATE SPR) is to be set
414 Formats: SVL
415 SVi (16:22)
416 Simple-V immediate field for setting VL or MVL (vl, maxvl in the SVSTATE SPR)
417 Formats: SVL
418 ```
419
420
421 # Appendices
422
423 Appendix E Power ISA sorted by opcode
424 Appendix F Power ISA sorted by version
425 Appendix G Power ISA sorted by Compliancy Subset
426 Appendix H Power ISA sorted by mnemonic
427
428 | Form | Book | Page | Version | mnemonic | Description |
429 |------|------|------|---------|----------|-------------|
430 | SVL | I | # | 3.0B | svstep | Vertical-First Stepping and status reporting |
431