(no commit message)
[libreriscv.git] / openpower / sv / rfc / ls008.mdwn
1 # RFC ls008 SVP64 Management instructions
2
3 [[!tag opf_rfc]]
4
5 **URLs**:
6
7 * <https://libre-soc.org/openpower/sv/>
8 * <https://libre-soc.org/openpower/sv/rfc/ls008/>
9 * <https://bugs.libre-soc.org/show_bug.cgi?id=1040>
10 * <https://git.openpower.foundation/isa/PowerISA/issues/87>
11
12 **Severity**: Major
13
14 **Status**: New
15
16 **Date**: 24 Mar 2023
17
18 **Target**: v3.2B
19
20 **Source**: v3.0B
21
22 **Books and Section affected**:
23
24 ```
25 Book I, new Scalar Chapter. (Or, new Book on "Zero-Overhead Loop Subsystem")
26 Appendix E Power ISA sorted by opcode
27 Appendix F Power ISA sorted by version
28 Appendix G Power ISA sorted by Compliancy Subset
29 Appendix H Power ISA sorted by mnemonic
30 ```
31
32 **Summary**
33
34 ```
35 Instructions added
36 setvl - Cray-style "Set Vector Length" instruction
37 svstep - Vertical-First Mode explicit Step and Status
38 svremap - Re-Mapping of Register Element Offsets
39 svindex - General-purpose setting of SHAPEs to be re-mapped
40 svshape - Hardware-level setting of SHAPEs for element re-mapping
41 svshape2 - Hardware-level setting of SHAPEs for element re-mapping (v2)
42 ```
43
44 **Submitter**: Luke Leighton (Libre-SOC)
45
46 **Requester**: Libre-SOC
47
48 **Impact on processor**:
49
50 ```
51 Addition of six new "Zero-Overhead-Loop-Control" DSP-style Vector-style
52 Management Instructions which can be implemented extremely efficiently
53 and effectively by inserting an additional phase between Decode and Issue.
54 More complex designs are NOT adversely impacted and in fact greatly benefit
55 whilst still retaining an obvious linear sequential execution programming model.
56 ```
57
58 **Impact on software**:
59
60 ```
61 Requires support for new instructions in assembler, debuggers,
62 and related tools.
63 ```
64
65 **Keywords**:
66
67 ```
68 Cray Supercomputing, Vectorisation, Zero-Overhead-Loop-Control,
69 Scalable Vectors, Multi-Issue Out-of-Order, Sequential Programming Model
70 ```
71
72 **Motivation**
73
74 TODO
75
76 **Notes and Observations**:
77
78 1. TODO
79
80 **Changes**
81
82 Add the following entries to:
83
84 * Section 1.3.2 Notation
85 * the Appendices of Book I
86 * Instructions of Book I as a new Section
87 * SVL-Form of Book I Section 1.6.1.6 and 1.6.2
88
89 ----------------
90
91 \newpage{}
92
93 # Notation, Section 1.3.2
94
95 When register operands (RA, RT, BF) are prefixed by a single underscore
96 (_RT, _RA, _BF) the variable contains the contents of the instruction field
97 not the contents of the Register File referenced *by* that field. Example:
98 `_RT` contains the contents of bits 5 thru 10. The relationship
99 `RT = GPR(_RT)` is thus always true. Uses include making alternative
100 decisions within an instruction based on whether the operand field
101 is zero or non-zero.
102
103 ----------------
104
105 \newpage{}
106
107 # svstep: Vertical-First Stepping and status reporting
108
109 SVL-Form
110
111 * svstep RT,SVi,vf (Rc=0)
112 * svstep. RT,SVi,vf (Rc=1)
113
114 | 0-5|6-10|11.15|16..22| 23-25 | 26-30 |31| Form |
115 |----|----|-----|------|----------|-------|--|--------- |
116 |PO | RT | / | SVi | / / vf | XO |Rc| SVL-Form |
117
118 Pseudo-code:
119
120 ```
121 if SVi[3:4] = 0b11 then
122 # store pack and unpack in SVSTATE
123 SVSTATE[53] <- SVi[5]
124 SVSTATE[54] <- SVi[6]
125 RT <- [0]*62 || SVSTATE[53:54]
126 else
127 # Vertical-First explicit stepping.
128 step <- SVSTATE_NEXT(SVi, vf)
129 RT <- [0]*57 || step
130 ```
131
132 Special Registers Altered:
133
134 CR0 (if Rc=1)
135
136 -------------
137
138 \newpage{}
139
140
141 # setvl
142
143 SVL-Form
144
145 | 0-5|6-10|11-15|16-22 | 23 24 25 | 26-30 |31| FORM |
146 | -- | -- | --- | ---- |----------| ----- |--|----------|
147 |PO | RT | RA | SVi | ms vs vf | XO |Rc| SVL-Form |
148
149 * setvl RT,RA,SVi,vf,vs,ms (Rc=0)
150 * setvl. RT,RA,SVi,vf,vs,ms (Rc=1)
151
152 Pseudo-code:
153
154 ```
155 overflow <- 0b0 # sets CR.SO if set and if Rc=1
156 VLimm <- SVi + 1
157 # set or get MVL
158 if ms = 1 then MVL <- VLimm[0:6]
159 else MVL <- SVSTATE[0:6]
160 # set or get VL
161 if vs = 0 then VL <- SVSTATE[7:13]
162 else if _RA != 0 then
163 if (RA) >u 0b1111111 then
164 VL <- 0b1111111
165 overflow <- 0b1
166 else VL <- (RA)[57:63]
167 else if _RT = 0 then VL <- VLimm[0:6]
168 else if CTR >u 0b1111111 then
169 VL <- 0b1111111
170 overflow <- 0b1
171 else VL <- CTR[57:63]
172 # limit VL to within MVL
173 if VL >u MVL then
174 overflow <- 0b1
175 VL <- MVL
176 SVSTATE[0:6] <- MVL
177 SVSTATE[7:13] <- VL
178 if _RT != 0 then
179 GPR(_RT) <- [0]*57 || VL
180 # MAXVL is a static "state-reset" opportunity so VF is only set then.
181 if ms = 1 then
182 SVSTATE[63] <- vf # set Vertical-First mode
183 SVSTATE[62] <- 0b0 # clear persist bit
184 ```
185
186 Special Registers Altered:
187
188 ```
189 CR0 (if Rc=1)
190 ```
191
192 * `SVi` - bits 16-22 - an immediate operand for setting MVL and/or VL
193 * `ms` - bit 23 - allows for setting of MVL
194 * `vs` - bit 24 - allows for setting of VL
195 * `vf` - bit 25 - sets "Vertical First Mode".
196
197 Note that in immediate setting mode VL and MVL start from **one**
198 but that this is compensated for in the assembly notation.
199 i.e. that an immediate value of 1 in assembler notation
200 actually places the value 0b0000000 in the `SVi` field bits:
201 on execution the `setvl` instruction adds one to the decoded
202 `SVi` field bits, resulting in
203 VL/MVL being set to 1. This allows VL to be set to values
204 ranging from 1 to 128 with only 7 bits instead of 8.
205 Setting VL/MVL
206 to 0 would result in all Vector operations becoming `nop`. If this is
207 truly desired (nop behaviour) then setting VL and MVL to zero is to be
208 done via the [[SVSTATE SPR|sv/sprs]].
209
210 Note that setmvli is a pseudo-op, based on RA/RT=0, and setvli likewise
211
212 setvli VL=8 : setvl r0, r0, VL=8, vf=0, vs=1, ms=0
213 setvli. VL=8 : setvl. r0, r0, VL=8, vf=0, vs=1, ms=0
214 setmvli MVL=8 : setvl r0, r0, MVL=8, vf=0, vs=0, ms=1
215 setmvli. MVL=8 : setvl. r0, r0, MVL=8, vf=0, vs=0, ms=1
216
217 Additional pseudo-op for obtaining VL without modifying it (or any state):
218
219 getvl r5 : setvl r5, r0, vf=0, vs=0, ms=0
220 getvl. r5 : setvl. r5, r0, vf=0, vs=0, ms=0
221
222 -------------
223
224 \newpage{}
225
226 # SVSTATE SPR
227
228 The format of the SVSTATE SPR is as follows:
229
230 | Field | Name | Description |
231 | ----- | -------- | --------------------- |
232 | 0:6 | maxvl | Max Vector Length |
233 | 7:13 | vl | Vector Length |
234 | 14:20 | srcstep | for srcstep = 0..VL-1 |
235 | 21:27 | dststep | for dststep = 0..VL-1 |
236 | 28:29 | dsubstep | for substep = 0..SUBVL-1 |
237 | 30:31 | ssubstep | for substep = 0..SUBVL-1 |
238 | 32:33 | mi0 | REMAP RA/FRA/BFA SVSHAPE0-3 |
239 | 34:35 | mi1 | REMAP RB/FRB/BFB SVSHAPE0-3 |
240 | 36:37 | mi2 | REMAP RC/FRT SVSHAPE0-3 |
241 | 38:39 | mo0 | REMAP RT/FRT/BF SVSHAPE0-3 |
242 | 40:41 | mo1 | REMAP EA/RS/FRS SVSHAPE0-3 |
243 | 42:46 | SVme | REMAP enable (RA-RT) |
244 | 47:52 | rsvd | reserved |
245 | 53 | pack | PACK (srcstrp reorder) |
246 | 54 | unpack | UNPACK (dststep order) |
247 | 55:61 | hphint | Horizontal Hint |
248 | 62 | RMpst | REMAP persistence |
249 | 63 | vfirst | Vertical First mode |
250
251 Notes:
252
253 * The entries are truncated to be within range. Attempts to set VL to
254 greater than MAXVL will truncate VL.
255 * Setting srcstep, dststep to 64 or greater, or VL or MVL to greater
256 than 64 is reserved and will cause an illegal instruction trap.
257
258 **SVSTATE Fields**
259
260 SVSTATE is a standard SPR that (if REMAP is not activated) contains sufficient
261 self-contaned information for a full context save/restore.
262 SVSTATE contains (and permits setting of):
263
264 * MVL (the Maximum Vector Length) - declares (statically) how
265 much of a regfile is to be reserved for Vector elements
266 * VL - Vector Length
267 * dststep - the destination element offset of the current parallel
268 instruction being executed
269 * srcstep - for twin-predication, the source element offset as well.
270 * ssubstep - the source subvector element offset of the current
271 parallel instruction being executed
272 * dsubstep - the destination subvector element offset of the current
273 parallel instruction being executed
274 * vfirst - Vertical First mode. srcstep, dststep and substep
275 **do not advance** unless explicitly requested to do so with
276 pseudo-op svstep (a mode of setvl)
277 * RMpst - REMAP persistence. REMAP will apply only to the following
278 instruction unless this bit is set, in which case REMAP "persists".
279 Reset (cleared) on use of the `setvl` instruction if used to
280 alter VL or MVL.
281 * Pack - if set then srcstep/substep VL/SUBVL loop-ordering is inverted.
282 * UnPack - if set then dststep/substep VL/SUBVL loop-ordering is inverted.
283 * hphint - Horizontal Parallelism Hint. Indicates that
284 no Hazards exist between groups of elements in sequential multiples of this number
285 (before REMAP). By definition: elements for which `FLOOR(srcstep/hphint)` is
286 equal *before REMAP* are in the same parallelism "group". In Vertical First Mode
287 hardware **MUST ONLY** process elements in the same group, and must stop
288 Horizontal Issue at the last element of a given group. Set to zero to indicate "no hint".
289 * SVme - REMAP enable bits, indicating which register is to be
290 REMAPed: RA, RB, RC, RT and EA are the canonical (typical) register names
291 associated with each bit, with RA being the LSB and EA being the MSB.
292 See table below for ordering. When `SVme` is zero (0b00000) REMAP
293 is **fully disabled and inactive** regardless of the contents of
294 `SVSTATE`, `mi0-mi2/mo0-mo1`, or the four `SVSHAPEn` SPRs
295 * mi0-mi2/mo0-mo1 - when the corresponding SVme bit is enabled, these
296 indicate the SVSHAPE (0-3) that the corresponding register (RA etc)
297 should use, as long as the register's corresponding SVme bit is set
298
299 Programmer's Note: the fact that REMAP is entirely dormant when `SVme` is zero
300 allows establishment of REMAP context well in advance, followed by utilising `svremap`
301 at a precise (or the very last) moment. Some implementations may exploit this
302 to cache (or take some time to prepare caches) in the background whilst other
303 (unrelated) instructions are being executed. This is particularly important to
304 bear in mind when using `svindex` which will require hardware to perform (and
305 cache) additional GPR reads.
306
307 Programmer's Note: when REMAP is activated it becomes necessary on any
308 context-switch (Interrupt or Function call) to detect (or know in advance)
309 that REMAP is enabled and to additionally save/restore the four SVSHAPE
310 SPRs, SVHAPE0-3. Given that this is expected to be a rare occurrence it was
311 deemed unreasonable to burden every context-switch or function call with
312 mandatory save/restore of SVSHAPEs, and consequently it is a *callee*
313 (and Trap Handler) responsibility. Callees (and Trap Handlers) **MUST**
314 avoid using all and any SVP64 instructions during the period where state
315 could be adversely affected. SVP64 purely relies on Scalar instructions,
316 so Scalar instructions (except the SVP64 Management ones and mtspr and
317 mfspr) are 100% guaranteed to have zero impact on SVP64 state.
318
319 **Max Vector Length (maxvl)** <a name="mvl" />
320
321 MAXVECTORLENGTH is the same concept as MVL in RISC-V RVV, except that it
322 is variable length and may be dynamically set (normally from an immediate
323 field only). MVL is limited to 7 bits
324 (in the first version of SVP64) and consequently the maximum number of
325 elements is limited to between 0 and 127.
326
327 Programmer's Note: Except by directly using `mtspr` on SVSTATE, which may
328 result in performance penalties on some hardware implementations, SVSTATE's `maxvl`
329 field may only be set **statically** as an immediate, by the `setvl` instruction.
330 It may **NOT** be set dynamically from a register. Compiler writers and assembly
331 programmers are expected to perform static register file analysis, subdivision,
332 and allocation and only utilise `setvl`. Direct writing to SVSTATE in order to
333 "bypass" this Note could, in less-advanced implementations, potentially cause stalling,
334 particularly if SVP64 instructions are issued directly after the `mtspr` to SVSTATE.
335
336 **Vector Length (vl)** <a name="vl" />
337
338 The actual Vector length, the number of elements in a "Vector", `SVSTATE.vl` may be set
339 entirely dynamically at runtime from a number of sources. `setvl` is the primary
340 instruction for setting Vector Length.
341 `setvl` is conceptually similar but different from the Cray, SX Aurora, and RISC-V RVV
342 equivalent. Similar to RVV, VL is set to be within
343 the range 0 <= VL <= MVL. Unlike RVV, VL is set **exactly** according to the following:
344
345 VL = (RT|0) = MIN(vlen, MVL)
346
347 where 0 <= MVL <= 127 and vlen may come from an immediate, `RA`, or from the `CTR` SPR,
348 depending on options selected with the `setvl` instruction.
349
350 Programmer's Note: conceptual understanding of Cray-style Vectors is far beyond the scope
351 of the Power ISA Technical Reference. Guidance on the 50-year-old Cray Vector paradigm is
352 best sought elsewhere: good studies include Academic Courses given on the 1970s
353 Cray Supercomputers over at least the past three decades.
354
355 **SUBVL - Sub Vector Length**
356
357 This is a "group by quantity" that effectively asks each iteration
358 of the hardware loop to load SUBVL elements of width elwidth at a
359 time. Effectively, SUBVL is like a SIMD multiplier: instead of just 1
360 operation issued, SUBVL operations are issued.
361
362 The main effect of SUBVL is that predication bits are applied per
363 **group**, rather than by individual element. Legal values are 0 to 3,
364 representing 1 operation (1 element) thru 4 operations (4 elements) respectively.
365 Elements are best though of in the context of 3D, Audio and Video: two Left and Right
366 Channel "elements" or four ARGB "elements", or three XYZ coordinate "elements".
367
368 `subvl` is again primarily set by the `setvl` instruction. Not to be confused
369 with `hphint`.
370
371 Directly related to `subvl` is the `pack` and `unpack` Mode bits of `SVSTATE`.
372 See `svstep` instruction for how to set Pack and Unpack Modes.
373
374
375 **Horizontal Parallelism**
376
377 A problem exists for hardware where it may not be able to detect
378 that a programmer (or compiler) knows of opportunities for parallelism
379 and lack of overlap between loops.
380
381 For hphint, the number chosen must be consistently
382 executed **every time**. Hardware is not permitted to execute five
383 computations for one instruction then three on the next.
384 hphint is a hint from the compiler to hardware that exactly this
385 many elements may be safely executed in parallel, without hazards
386 (including Memory accesses).
387 Interestingly, when hphint is set equal to VL, it is in effect
388 as if Vertical First mode were not set, because the hardware is
389 given the option to run through all elements in an instruction.
390 This is exactly what Horizontal-First is: a for-loop from 0 to VL-1
391 except that the hardware may *choose* the number of elements.
392
393 *Note to programmers: changing VL during the middle of such modes
394 should be done only with due care and respect for the fact that SVSTATE
395 has exactly the same peer-level status as a Program Counter.*
396
397 -------------
398
399 \newpage{}
400
401 # SVL-Form
402
403 Add the following to Book I, 1.6.1, SVL-Form
404
405 ```
406 |0 |6 |11 |16 |23 |24 |25 |26 |31 |
407 | PO | RT | RA | SVi |ms |vs |vf | XO |Rc |
408 | PO | RT | / | SVi |/ |/ |vf | XO |Rc |
409 ```
410
411 * Add `SVL` to `RA (11:15)` Field in Book I, 1.6.2
412 * Add `SVL` to `RT (6:10)` Field in Book I, 1.6.2
413 * Add `SVL` to `Rc (31)` Field in Book I, 1.6.2
414 * Add `SVL` to `XO (26:31)` Field in Book I, 1.6.2
415
416 Add the following to Book I, 1.6.2
417
418 ```
419 ms (23)
420 Field used in Simple-V to specify whether MVL (maxvl in the SVSTATE SPR)
421 is to be set
422 Formats: SVL
423 vf (25)
424 Field used in Simple-V to specify whether "Vertical" Mode is set
425 (vfirst in the SVSTATE SPR)
426 Formats: SVL
427 vs (24)
428 Field used in Simple-V to specify whether VL (vl in the SVSTATE SPR) is to be set
429 Formats: SVL
430 SVi (16:22)
431 Simple-V immediate field used by setvl for setting VL or MVL
432 (vl, maxvl in the SVSTATE SPR)
433 and used as a "Mode of Operation" selector in svstep
434 Formats: SVL
435 ```
436
437 # Appendices
438
439 Appendix E Power ISA sorted by opcode
440 Appendix F Power ISA sorted by version
441 Appendix G Power ISA sorted by Compliancy Subset
442 Appendix H Power ISA sorted by mnemonic
443
444 | Form | Book | Page | Version | mnemonic | Description |
445 |------|------|------|---------|----------|-------------|
446 | SVL | I | # | 3.0B | svstep | Vertical-First Stepping and status reporting |
447 | SVL | I | # | 3.0B | setvl | Cray-like establishment of Looping (Vector) context |
448