8e8d6dbbe22a0ae6e574d348ef01e15819578a1c
[libreriscv.git] / openpower / sv / rfc / ls008.mdwn
1 # RFC ls008 SVP64 Management instructions
2
3 [[!tag opf_rfc]]
4
5 **URLs**:
6
7 * <https://libre-soc.org/openpower/sv/>
8 * <https://libre-soc.org/openpower/sv/rfc/ls008/>
9 * <https://bugs.libre-soc.org/show_bug.cgi?id=1040>
10 * <https://git.openpower.foundation/isa/PowerISA/issues/87>
11
12 **Severity**: Major
13
14 **Status**: New
15
16 **Date**: 24 Mar 2023
17
18 **Target**: v3.2B
19
20 **Source**: v3.0B
21
22 **Books and Section affected**:
23
24 ```
25 Book I, new Scalar Chapter. (Or, new Book on "Zero-Overhead Loop Subsystem")
26 Appendix E Power ISA sorted by opcode
27 Appendix F Power ISA sorted by version
28 Appendix G Power ISA sorted by Compliancy Subset
29 Appendix H Power ISA sorted by mnemonic
30 ```
31
32 **Summary**
33
34 ```
35 Instructions added
36 setvl - Cray-style "Set Vector Length" instruction
37 svstep - Vertical-First Mode explicit Step and Status
38 svremap - Re-Mapping of Register Element Offsets
39 svindex - General-purpose setting of SHAPEs to be re-mapped
40 svshape - Hardware-level setting of SHAPEs for element re-mapping
41 svshape2 - Hardware-level setting of SHAPEs for element re-mapping (v2)
42 ```
43
44 **Submitter**: Luke Leighton (Libre-SOC)
45
46 **Requester**: Libre-SOC
47
48 **Impact on processor**:
49
50 ```
51 Addition of six new "Zero-Overhead-Loop-Control" DSP-style Vector-style
52 Management Instructions which can be implemented extremely efficiently
53 and effectively by inserting an additional phase between Decode and Issue.
54 More complex designs are NOT adversely impacted and in fact greatly benefit
55 whilst still retaining an obvious linear sequential execution programming model.
56 ```
57
58 **Impact on software**:
59
60 ```
61 Requires support for new instructions in assembler, debuggers,
62 and related tools.
63 ```
64
65 **Keywords**:
66
67 ```
68 Cray Supercomputing, Vectorisation, Zero-Overhead-Loop-Control,
69 Scalable Vectors, Multi-Issue Out-of-Order, Sequential Programming Model
70 ```
71
72 **Motivation**
73
74 TODO
75
76 **Notes and Observations**:
77
78 1. TODO
79
80 **Changes**
81
82 Add the following entries to:
83
84 * the Appendices of Book I
85 * Instructions of Book I as a new Section
86 * SVL-Form of Book I Section 1.6.1.6 and 1.6.2
87
88 ----------------
89
90 \newpage{}
91
92 # svstep: Vertical-First Stepping and status reporting
93
94 SVL-Form
95
96 * svstep RT,SVi,vf (Rc=0)
97 * svstep. RT,SVi,vf (Rc=1)
98
99 | 0-5|6-10|11.15|16..22| 23-25 | 26-30 |31| Form |
100 |----|----|-----|------|----------|-------|--|--------- |
101 |PO | RT | / | SVi | / / vf | XO |Rc| SVL-Form |
102
103 Pseudo-code:
104
105 ```
106 if SVi[3:4] = 0b11 then
107 # store subvl, pack and unpack in SVSTATE
108 SVSTATE[53] <- SVi[5]
109 SVSTATE[54] <- SVi[6]
110 RT <- [0]*62 || SVSTATE[53:54]
111 else
112 step <- SVSTATE_NEXT(SVi, vf)
113 RT <- [0]*57 || step
114
115 ```
116
117 Special Registers Altered:
118
119 CR0 (if Rc=1)
120
121
122 -------------
123
124 \newpage{}
125
126
127 # setvl
128
129 SVL-Form
130
131 * setvl RT,RA,SVi,vf,vs,ms (Rc=0)
132 * setvl. RT,RA,SVi,vf,vs,ms (Rc=1)
133
134 Pseudo-code:
135
136 overflow <- 0b0
137 VLimm <- SVi + 1
138 # set or get MVL
139 if ms = 1 then MVL <- VLimm[0:6]
140 else MVL <- SVSTATE[0:6]
141 # set or get VL
142 if vs = 0 then VL <- SVSTATE[7:13]
143 else if _RA != 0 then
144 if (RA) >u 0b1111111 then
145 VL <- 0b1111111
146 overflow <- 0b1
147 else VL <- (RA)[57:63]
148 else if _RT = 0 then VL <- VLimm[0:6]
149 else if CTR >u 0b1111111 then
150 VL <- 0b1111111
151 overflow <- 0b1
152 else VL <- CTR[57:63]
153 # limit VL to within MVL
154 if VL >u MVL then
155 overflow <- 0b1
156 VL <- MVL
157 SVSTATE[0:6] <- MVL
158 SVSTATE[7:13] <- VL
159 if _RT != 0 then
160 GPR(_RT) <- [0]*57 || VL
161 if ((¬vs) & ¬(ms)) = 0 then
162 # set requested Vertical-First mode, clear persist
163 SVSTATE[63] <- vf
164 SVSTATE[62] <- 0b0
165
166 Special Registers Altered:
167
168 CR0 (if Rc=1)
169
170 -------------
171
172 \newpage{}
173
174 # SVSTATE SPR
175
176 The format of the SVSTATE SPR is as follows:
177
178 | Field | Name | Description |
179 | ----- | -------- | --------------------- |
180 | 0:6 | maxvl | Max Vector Length |
181 | 7:13 | vl | Vector Length |
182 | 14:20 | srcstep | for srcstep = 0..VL-1 |
183 | 21:27 | dststep | for dststep = 0..VL-1 |
184 | 28:29 | dsubstep | for substep = 0..SUBVL-1 |
185 | 30:31 | ssubstep | for substep = 0..SUBVL-1 |
186 | 32:33 | mi0 | REMAP RA/FRA/BFA SVSHAPE0-3 |
187 | 34:35 | mi1 | REMAP RB/FRB/BFB SVSHAPE0-3 |
188 | 36:37 | mi2 | REMAP RC/FRT SVSHAPE0-3 |
189 | 38:39 | mo0 | REMAP RT/FRT/BF SVSHAPE0-3 |
190 | 40:41 | mo1 | REMAP EA/RS/FRS SVSHAPE0-3 |
191 | 42:46 | SVme | REMAP enable (RA-RT) |
192 | 47:52 | rsvd | reserved |
193 | 53 | pack | PACK (srcstrp reorder) |
194 | 54 | unpack | UNPACK (dststep order) |
195 | 55:61 | hphint | Horizontal Hint |
196 | 62 | RMpst | REMAP persistence |
197 | 63 | vfirst | Vertical First mode |
198
199 Notes:
200
201 * The entries are truncated to be within range. Attempts to set VL to
202 greater than MAXVL will truncate VL.
203 * Setting srcstep, dststep to 64 or greater, or VL or MVL to greater
204 than 64 is reserved and will cause an illegal instruction trap.
205
206 **SVSTATE Fields**
207
208 SVSTATE is a standard SPR that (if REMAP is not activated) contains sufficient
209 self-contaned information for a full context save/restore.
210 SVSTATE contains (and permits setting of):
211
212 * MVL (the Maximum Vector Length) - declares (statically) how
213 much of a regfile is to be reserved for Vector elements
214 * VL - Vector Length
215 * dststep - the destination element offset of the current parallel
216 instruction being executed
217 * srcstep - for twin-predication, the source element offset as well.
218 * ssubstep - the source subvector element offset of the current
219 parallel instruction being executed
220 * dsubstep - the destination subvector element offset of the current
221 parallel instruction being executed
222 * vfirst - Vertical First mode. srcstep, dststep and substep
223 **do not advance** unless explicitly requested to do so with
224 pseudo-op svstep (a mode of setvl)
225 * RMpst - REMAP persistence. REMAP will apply only to the following
226 instruction unless this bit is set, in which case REMAP "persists".
227 Reset (cleared) on use of the `setvl` instruction if used to
228 alter VL or MVL.
229 * Pack - if set then srcstep/substep VL/SUBVL loop-ordering is inverted.
230 * UnPack - if set then dststep/substep VL/SUBVL loop-ordering is inverted.
231 * hphint - Horizontal Parallelism Hint. Indicates that
232 no Hazards exist between this number of sequentially-accessed
233 elements (including after REMAP). In Vertical First Mode
234 hardware **MUST** perform this many elements in parallel
235 per instruction. Set to zero to indicate "no hint".
236 * SVme - REMAP enable bits, indicating which register is to be
237 REMAPed: RA, RB, RC, RT and EA are the canonical (typical) register names
238 associated with each bit, with RA being the LSB and EA being the MSB.
239 See table below for ordering. When `SVme` is zero (0b00000) REMAP
240 is **fully disabled and inactive** regardless of the contents of
241 `SVSTATE`, `mi0-mi2/mo0-mo1`, or the four `SVSHAPEn` SPRs
242 * mi0-mi2/mo0-mo1 - when the corresponding SVme bit is enabled, these
243 indicate the SVSHAPE (0-3) that the corresponding register (RA etc)
244 should use, as long as the register's corresponding SVme bit is set
245
246 Programmer's Note: the fact that REMAP is entirely dormant when `SVme` is zero
247 allows establishment of REMAP context well in advance, followed by utilising `svremap`
248 at a precise (or the very last) moment. Some implementations may exploit this
249 to cache (or take some time to prepare caches) in the background whilst other
250 (unrelated) instructions are being executed. This is particularly important to
251 bear in mind when using `svindex` which will require hardware to perform (and
252 cache) additional GPR reads.
253
254 Programmer's Note: when REMAP is activated it becomes necessary on any
255 context-switch (Interrupt or Function call) to detect (or know in advance)
256 that REMAP is enabled and to additionally save/restore the four SVSHAPE
257 SPRs, SVHAPE0-3. Given that this is expected to be a rare occurrence it was
258 deemed unreasonable to burden every context-switch or function call with
259 mandatory save/restore of SVSHAPEs, and consequently it is a *callee*
260 (and Trap Handler) responsibility. Callees (and Trap Handlers) **MUST**
261 avoid using all and any SVP64 instructions during the period where state
262 could be adversely affected. SVP64 purely relies on Scalar instructions,
263 so Scalar instructions (except the SVP64 Management ones and mtspr and
264 mfspr) are 100% guaranteed to have zero impact on SVP64 state.
265
266 **Max Vector Length (maxvl)** <a name="mvl" />
267
268 MAXVECTORLENGTH is the same concept as MVL in RISC-V RVV, except that it
269 is variable length and may be dynamically set (normally from an immediate
270 field only). MVL is limited to 7 bits
271 (in the first version of SVP64) and consequently the maximum number of
272 elements is limited to between 0 and 127.
273
274 Programmer's Note: Except by directly using `mtspr` on SVSTATE, which may
275 result in performance penalties on some hardware implementations, SVSTATE's `maxvl`
276 field may only be set **statically** as an immediate, by the `setvl` instruction.
277 It may **NOT** be set dynamically from a register. Compiler writers and assembly
278 programmers are expected to perform static register file analysis, subdivision,
279 and allocation and only utilise `setvl`. Direct writing to SVSTATE in order to
280 "bypass" this Note could, in less-advanced implementations, potentially cause stalling.
281
282 **Vector Length (vl)** <a name="vl" />
283
284 `setvl` is conceptually similar but different from the Cray, SX Aurora, and RISC-V RVV
285 equivalent. Similar to RVV, VL is set to be within
286 the range 0 <= VL <= MVL. Unlike RVV, VL is set **exactly** according to the following:
287
288 VL = (RT|0) = MIN(vlen, MVL)
289
290 where 0 <= MVL <= XLEN and vlen may come from an immediate, `RA`, or from the `CTR` SPR,
291 depending on options set with the `setvl` instruction.
292
293 **SUBVL - Sub Vector Length**
294
295 This is a "group by quantity" that effectively asks each iteration
296 of the hardware loop to load SUBVL elements of width elwidth at a
297 time. Effectively, SUBVL is like a SIMD multiplier: instead of just 1
298 operation issued, SUBVL operations are issued.
299
300 The main effect of SUBVL is that predication bits are applied per
301 **group**, rather than by individual element. Legal values are 0 to 3,
302 representing 1 operation (1 element) thru 4 operations (4 elements) respectively.
303 Elements are best though of in the context of 3D, Audio and Video: two Left and Right
304 Channel "elements" or four ARGB "elements", or three XYZ coordinate "elements".
305
306 **Horizontal Parallelism**
307
308 A problem exists for hardware where it may not be able to detect
309 that a programmer (or compiler) knows of opportunities for parallelism
310 and lack of overlap between loops.
311
312 For hphint, the number chosen must be consistently
313 executed **every time**. Hardware is not permitted to execute five
314 computations for one instruction then three on the next.
315 hphint is a hint from the compiler to hardware that exactly this
316 many elements may be safely executed in parallel, without hazards
317 (including Memory accesses).
318 Interestingly, when hphint is set equal to VL, it is in effect
319 as if Vertical First mode were not set, because the hardware is
320 given the option to run through all elements in an instruction.
321 This is exactly what Horizontal-First is: a for-loop from 0 to VL-1
322 except that the hardware may *choose* the number of elements.
323
324 *Note to programmers: changing VL during the middle of such modes
325 should be done only with due care and respect for the fact that SVSTATE
326 has exactly the same peer-level status as a Program Counter.*
327
328 -------------
329
330 \newpage{}
331
332 # SVL-Form
333
334 Add the following to Book I, 1.6.1, SVL-Form
335
336 ```
337 |0 |6 |11 |16 |23 |24 |25 |26 |31 |
338 | PO | RT | RA | SVi |ms |vs |vf | XO |Rc |
339 | PO | RT | / | SVi |/ |/ |vf | XO |Rc |
340 ```
341
342 * Add `SVL` to `RA (11:15)` Field in Book I, 1.6.2
343 * Add `SVL` to `RT (6:10)` Field in Book I, 1.6.2
344 * Add `SVL` to `Rc (31)` Field in Book I, 1.6.2
345 * Add `SVL` to `XO (26:31)` Field in Book I, 1.6.2
346
347 Add the following to Book I, 1.6.2
348
349 ```
350 ms (23)
351 Field used in Simple-V to specify whether MVL (maxvl in the SVSTATE SPR)
352 is to be set
353 Formats: SVL
354 vf (25)
355 Field used in Simple-V to specify whether "Vertical" Mode is set
356 (vfirst in the SVSTATE SPR)
357 Formats: SVL
358 vs (24)
359 Field used in Simple-V to specify whether VL (vl in the SVSTATE SPR) is to be set
360 Formats: SVL
361 SVi (16:22)
362 Simple-V immediate field for setting VL or MVL (vl, maxvl in the SVSTATE SPR)
363 Formats: SVL
364 ```
365
366
367 # Appendices
368
369 Appendix E Power ISA sorted by opcode
370 Appendix F Power ISA sorted by version
371 Appendix G Power ISA sorted by Compliancy Subset
372 Appendix H Power ISA sorted by mnemonic
373
374 | Form | Book | Page | Version | mnemonic | Description |
375 |------|------|------|---------|----------|-------------|
376 | SVL | I | # | 3.0B | svstep | Vertical-First Stepping and status reporting |
377