a0c1bfc8a536a5cf837adea62174db5345aca4e5
[libreriscv.git] / openpower / sv / rfc / ls008.mdwn
1 # RFC ls008 SVP64 Management instructions
2
3 [[!tag opf_rfc]]
4
5 **URLs**:
6
7 * <https://libre-soc.org/openpower/sv/>
8 * <https://libre-soc.org/openpower/sv/rfc/ls008/>
9 * <https://bugs.libre-soc.org/show_bug.cgi?id=1040>
10 * <https://git.openpower.foundation/isa/PowerISA/issues/87>
11
12 **Severity**: Major
13
14 **Status**: New
15
16 **Date**: 24 Mar 2023
17
18 **Target**: v3.2B
19
20 **Source**: v3.0B
21
22 **Books and Section affected**:
23
24 ```
25 Book I, new Scalar Chapter. (Or, new Book on "Zero-Overhead Loop Subsystem")
26 Appendix E Power ISA sorted by opcode
27 Appendix F Power ISA sorted by version
28 Appendix G Power ISA sorted by Compliancy Subset
29 Appendix H Power ISA sorted by mnemonic
30 ```
31
32 **Summary**
33
34 ```
35 Instructions added
36 setvl - Cray-style "Set Vector Length" instruction
37 svstep - Vertical-First Mode explicit Step and Status
38 svremap - Re-Mapping of Register Element Offsets
39 svindex - General-purpose setting of SHAPEs to be re-mapped
40 svshape - Hardware-level setting of SHAPEs for element re-mapping
41 svshape2 - Hardware-level setting of SHAPEs for element re-mapping (v2)
42 ```
43
44 **Submitter**: Luke Leighton (Libre-SOC)
45
46 **Requester**: Libre-SOC
47
48 **Impact on processor**:
49
50 ```
51 Addition of six new "Zero-Overhead-Loop-Control" DSP-style Vector-style
52 Management Instructions which can be implemented extremely efficiently
53 and effectively by inserting an additional phase between Decode and Issue.
54 More complex designs are NOT adversely impacted and in fact greatly benefit
55 whilst still retaining an obvious linear sequential execution programming model.
56 ```
57
58 **Impact on software**:
59
60 ```
61 Requires support for new instructions in assembler, debuggers,
62 and related tools.
63 ```
64
65 **Keywords**:
66
67 ```
68 Cray Supercomputing, Vectorisation, Zero-Overhead-Loop-Control,
69 Scalable Vectors, Multi-Issue Out-of-Order, Sequential Programming Model
70 ```
71
72 **Motivation**
73
74 TODO
75
76 **Notes and Observations**:
77
78 1. TODO
79
80 **Changes**
81
82 Add the following entries to:
83
84 * the Appendices of Book I
85 * Instructions of Book I as a new Section
86 * SVL-Form of Book I Section 1.6.1.6 and 1.6.2
87
88 ----------------
89
90 \newpage{}
91
92 # svstep: Vertical-First Stepping and status reporting
93
94 SVL-Form
95
96 * svstep RT,SVi,vf (Rc=0)
97 * svstep. RT,SVi,vf (Rc=1)
98
99 | 0-5|6-10|11.15|16..22| 23-25 | 26-30 |31| Form |
100 |----|----|-----|------|----------|-------|--|--------- |
101 |PO | RT | / | SVi | / / vf | XO |Rc| SVL-Form |
102
103 Pseudo-code:
104
105 ```
106 if SVi[3:4] = 0b11 then
107 # store pack and unpack in SVSTATE
108 SVSTATE[53] <- SVi[5]
109 SVSTATE[54] <- SVi[6]
110 RT <- [0]*62 || SVSTATE[53:54]
111 else
112 step <- SVSTATE_NEXT(SVi, vf)
113 RT <- [0]*57 || step
114
115 ```
116
117 Special Registers Altered:
118
119 CR0 (if Rc=1)
120
121
122 -------------
123
124 \newpage{}
125
126
127 # setvl
128
129 SVL-Form
130
131 * setvl RT,RA,SVi,vf,vs,ms (Rc=0)
132 * setvl. RT,RA,SVi,vf,vs,ms (Rc=1)
133
134 Pseudo-code:
135
136 ```
137 overflow <- 0b0
138 VLimm <- SVi + 1
139 # set or get MVL
140 if ms = 1 then MVL <- VLimm[0:6]
141 else MVL <- SVSTATE[0:6]
142 # set or get VL
143 if vs = 0 then VL <- SVSTATE[7:13]
144 else if _RA != 0 then
145 if (RA) >u 0b1111111 then
146 VL <- 0b1111111
147 overflow <- 0b1
148 else VL <- (RA)[57:63]
149 else if _RT = 0 then VL <- VLimm[0:6]
150 else if CTR >u 0b1111111 then
151 VL <- 0b1111111
152 overflow <- 0b1
153 else VL <- CTR[57:63]
154 # limit VL to within MVL
155 if VL >u MVL then
156 overflow <- 0b1
157 VL <- MVL
158 SVSTATE[0:6] <- MVL
159 SVSTATE[7:13] <- VL
160 if _RT != 0 then
161 GPR(_RT) <- [0]*57 || VL
162 if ((¬vs) & ¬(ms)) = 0 then
163 # set requested Vertical-First mode, clear persist
164 SVSTATE[63] <- vf
165 SVSTATE[62] <- 0b0
166 ```
167
168 Special Registers Altered:
169
170 ```
171 CR0 (if Rc=1)
172 ```
173
174 -------------
175
176 \newpage{}
177
178 # SVSTATE SPR
179
180 The format of the SVSTATE SPR is as follows:
181
182 | Field | Name | Description |
183 | ----- | -------- | --------------------- |
184 | 0:6 | maxvl | Max Vector Length |
185 | 7:13 | vl | Vector Length |
186 | 14:20 | srcstep | for srcstep = 0..VL-1 |
187 | 21:27 | dststep | for dststep = 0..VL-1 |
188 | 28:29 | dsubstep | for substep = 0..SUBVL-1 |
189 | 30:31 | ssubstep | for substep = 0..SUBVL-1 |
190 | 32:33 | mi0 | REMAP RA/FRA/BFA SVSHAPE0-3 |
191 | 34:35 | mi1 | REMAP RB/FRB/BFB SVSHAPE0-3 |
192 | 36:37 | mi2 | REMAP RC/FRT SVSHAPE0-3 |
193 | 38:39 | mo0 | REMAP RT/FRT/BF SVSHAPE0-3 |
194 | 40:41 | mo1 | REMAP EA/RS/FRS SVSHAPE0-3 |
195 | 42:46 | SVme | REMAP enable (RA-RT) |
196 | 47:52 | rsvd | reserved |
197 | 53 | pack | PACK (srcstrp reorder) |
198 | 54 | unpack | UNPACK (dststep order) |
199 | 55:61 | hphint | Horizontal Hint |
200 | 62 | RMpst | REMAP persistence |
201 | 63 | vfirst | Vertical First mode |
202
203 Notes:
204
205 * The entries are truncated to be within range. Attempts to set VL to
206 greater than MAXVL will truncate VL.
207 * Setting srcstep, dststep to 64 or greater, or VL or MVL to greater
208 than 64 is reserved and will cause an illegal instruction trap.
209
210 **SVSTATE Fields**
211
212 SVSTATE is a standard SPR that (if REMAP is not activated) contains sufficient
213 self-contaned information for a full context save/restore.
214 SVSTATE contains (and permits setting of):
215
216 * MVL (the Maximum Vector Length) - declares (statically) how
217 much of a regfile is to be reserved for Vector elements
218 * VL - Vector Length
219 * dststep - the destination element offset of the current parallel
220 instruction being executed
221 * srcstep - for twin-predication, the source element offset as well.
222 * ssubstep - the source subvector element offset of the current
223 parallel instruction being executed
224 * dsubstep - the destination subvector element offset of the current
225 parallel instruction being executed
226 * vfirst - Vertical First mode. srcstep, dststep and substep
227 **do not advance** unless explicitly requested to do so with
228 pseudo-op svstep (a mode of setvl)
229 * RMpst - REMAP persistence. REMAP will apply only to the following
230 instruction unless this bit is set, in which case REMAP "persists".
231 Reset (cleared) on use of the `setvl` instruction if used to
232 alter VL or MVL.
233 * Pack - if set then srcstep/substep VL/SUBVL loop-ordering is inverted.
234 * UnPack - if set then dststep/substep VL/SUBVL loop-ordering is inverted.
235 * hphint - Horizontal Parallelism Hint. Indicates that
236 no Hazards exist between groups of elements in sequential multiples of this number
237 (before REMAP). By definition: elements for which `FLOOR(srcstep/hphint)` is
238 equal *before REMAP* are in the same parallelism "group". In Vertical First Mode
239 hardware **MUST ONLY** process elements in the same group, and must stop
240 Horizontal Issue at the last element of a given group. Set to zero to indicate "no hint".
241 * SVme - REMAP enable bits, indicating which register is to be
242 REMAPed: RA, RB, RC, RT and EA are the canonical (typical) register names
243 associated with each bit, with RA being the LSB and EA being the MSB.
244 See table below for ordering. When `SVme` is zero (0b00000) REMAP
245 is **fully disabled and inactive** regardless of the contents of
246 `SVSTATE`, `mi0-mi2/mo0-mo1`, or the four `SVSHAPEn` SPRs
247 * mi0-mi2/mo0-mo1 - when the corresponding SVme bit is enabled, these
248 indicate the SVSHAPE (0-3) that the corresponding register (RA etc)
249 should use, as long as the register's corresponding SVme bit is set
250
251 Programmer's Note: the fact that REMAP is entirely dormant when `SVme` is zero
252 allows establishment of REMAP context well in advance, followed by utilising `svremap`
253 at a precise (or the very last) moment. Some implementations may exploit this
254 to cache (or take some time to prepare caches) in the background whilst other
255 (unrelated) instructions are being executed. This is particularly important to
256 bear in mind when using `svindex` which will require hardware to perform (and
257 cache) additional GPR reads.
258
259 Programmer's Note: when REMAP is activated it becomes necessary on any
260 context-switch (Interrupt or Function call) to detect (or know in advance)
261 that REMAP is enabled and to additionally save/restore the four SVSHAPE
262 SPRs, SVHAPE0-3. Given that this is expected to be a rare occurrence it was
263 deemed unreasonable to burden every context-switch or function call with
264 mandatory save/restore of SVSHAPEs, and consequently it is a *callee*
265 (and Trap Handler) responsibility. Callees (and Trap Handlers) **MUST**
266 avoid using all and any SVP64 instructions during the period where state
267 could be adversely affected. SVP64 purely relies on Scalar instructions,
268 so Scalar instructions (except the SVP64 Management ones and mtspr and
269 mfspr) are 100% guaranteed to have zero impact on SVP64 state.
270
271 **Max Vector Length (maxvl)** <a name="mvl" />
272
273 MAXVECTORLENGTH is the same concept as MVL in RISC-V RVV, except that it
274 is variable length and may be dynamically set (normally from an immediate
275 field only). MVL is limited to 7 bits
276 (in the first version of SVP64) and consequently the maximum number of
277 elements is limited to between 0 and 127.
278
279 Programmer's Note: Except by directly using `mtspr` on SVSTATE, which may
280 result in performance penalties on some hardware implementations, SVSTATE's `maxvl`
281 field may only be set **statically** as an immediate, by the `setvl` instruction.
282 It may **NOT** be set dynamically from a register. Compiler writers and assembly
283 programmers are expected to perform static register file analysis, subdivision,
284 and allocation and only utilise `setvl`. Direct writing to SVSTATE in order to
285 "bypass" this Note could, in less-advanced implementations, potentially cause stalling,
286 particularly if SVP64 instructions are issued directly after the `mtspr` to SVSTATE.
287
288 **Vector Length (vl)** <a name="vl" />
289
290 The actual Vector length, the number of elements in a "Vector", vl may be set
291 entirely dynamically at runtime from a number of sources. `setvl` is the primary
292 instruction for setting Vector Length.
293 `setvl` is conceptually similar but different from the Cray, SX Aurora, and RISC-V RVV
294 equivalent. Similar to RVV, VL is set to be within
295 the range 0 <= VL <= MVL. Unlike RVV, VL is set **exactly** according to the following:
296
297 VL = (RT|0) = MIN(vlen, MVL)
298
299 where 0 <= MVL <= 127 and vlen may come from an immediate, `RA`, or from the `CTR` SPR,
300 depending on options selected with the `setvl` instruction.
301
302 Programmer's Note: understanding of Cray-style Vectors is far beyond the scope
303 of the Power ISA Technical Reference. Guidance on the Cray Vector paradigm is
304 best sought elsewhere: good studies include Academic Courses given on the 1970s
305 Cray Supercomputers over the past 30 years.
306
307 **SUBVL - Sub Vector Length**
308
309 This is a "group by quantity" that effectively asks each iteration
310 of the hardware loop to load SUBVL elements of width elwidth at a
311 time. Effectively, SUBVL is like a SIMD multiplier: instead of just 1
312 operation issued, SUBVL operations are issued.
313
314 The main effect of SUBVL is that predication bits are applied per
315 **group**, rather than by individual element. Legal values are 0 to 3,
316 representing 1 operation (1 element) thru 4 operations (4 elements) respectively.
317 Elements are best though of in the context of 3D, Audio and Video: two Left and Right
318 Channel "elements" or four ARGB "elements", or three XYZ coordinate "elements".
319
320 `subvl` is again primarily set by the `setvl` instruction.
321
322 **Horizontal Parallelism**
323
324 A problem exists for hardware where it may not be able to detect
325 that a programmer (or compiler) knows of opportunities for parallelism
326 and lack of overlap between loops.
327
328 For hphint, the number chosen must be consistently
329 executed **every time**. Hardware is not permitted to execute five
330 computations for one instruction then three on the next.
331 hphint is a hint from the compiler to hardware that exactly this
332 many elements may be safely executed in parallel, without hazards
333 (including Memory accesses).
334 Interestingly, when hphint is set equal to VL, it is in effect
335 as if Vertical First mode were not set, because the hardware is
336 given the option to run through all elements in an instruction.
337 This is exactly what Horizontal-First is: a for-loop from 0 to VL-1
338 except that the hardware may *choose* the number of elements.
339
340 *Note to programmers: changing VL during the middle of such modes
341 should be done only with due care and respect for the fact that SVSTATE
342 has exactly the same peer-level status as a Program Counter.*
343
344 -------------
345
346 \newpage{}
347
348 # SVL-Form
349
350 Add the following to Book I, 1.6.1, SVL-Form
351
352 ```
353 |0 |6 |11 |16 |23 |24 |25 |26 |31 |
354 | PO | RT | RA | SVi |ms |vs |vf | XO |Rc |
355 | PO | RT | / | SVi |/ |/ |vf | XO |Rc |
356 ```
357
358 * Add `SVL` to `RA (11:15)` Field in Book I, 1.6.2
359 * Add `SVL` to `RT (6:10)` Field in Book I, 1.6.2
360 * Add `SVL` to `Rc (31)` Field in Book I, 1.6.2
361 * Add `SVL` to `XO (26:31)` Field in Book I, 1.6.2
362
363 Add the following to Book I, 1.6.2
364
365 ```
366 ms (23)
367 Field used in Simple-V to specify whether MVL (maxvl in the SVSTATE SPR)
368 is to be set
369 Formats: SVL
370 vf (25)
371 Field used in Simple-V to specify whether "Vertical" Mode is set
372 (vfirst in the SVSTATE SPR)
373 Formats: SVL
374 vs (24)
375 Field used in Simple-V to specify whether VL (vl in the SVSTATE SPR) is to be set
376 Formats: SVL
377 SVi (16:22)
378 Simple-V immediate field for setting VL or MVL (vl, maxvl in the SVSTATE SPR)
379 Formats: SVL
380 ```
381
382
383 # Appendices
384
385 Appendix E Power ISA sorted by opcode
386 Appendix F Power ISA sorted by version
387 Appendix G Power ISA sorted by Compliancy Subset
388 Appendix H Power ISA sorted by mnemonic
389
390 | Form | Book | Page | Version | mnemonic | Description |
391 |------|------|------|---------|----------|-------------|
392 | SVL | I | # | 3.0B | svstep | Vertical-First Stepping and status reporting |
393