move svstep back to mdwn file, out of ls008.mdwn
[libreriscv.git] / openpower / sv / rfc / ls008.mdwn
1 # RFC ls008 SVP64 Management instructions
2
3 **URLs**:
4
5 * <https://libre-soc.org/openpower/sv/>
6 * <https://libre-soc.org/openpower/sv/rfc/ls008/>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1040>
8 * <https://git.openpower.foundation/isa/PowerISA/issues/87>
9
10 **Severity**: Major
11
12 **Status**: New
13
14 **Date**: 24 Mar 2023
15
16 **Target**: v3.2B
17
18 **Source**: v3.0B
19
20 **Books and Section affected**:
21
22 ```
23 Book I, new Scalar Chapter. (Or, new Book on "Zero-Overhead Loop Subsystem")
24 Appendix E Power ISA sorted by opcode
25 Appendix F Power ISA sorted by version
26 Appendix G Power ISA sorted by Compliancy Subset
27 Appendix H Power ISA sorted by mnemonic
28 ```
29
30 **Summary**
31
32 ```
33 setvl - Cray-style "Set Vector Length" instruction
34 svstep - Vertical-First Mode explicit Step and Status
35 ```
36
37 **Submitter**: Luke Leighton (Libre-SOC)
38
39 **Requester**: Libre-SOC
40
41 **Impact on processor**:
42
43 ```
44 Addition of two new "Zero-Overhead-Loop-Control" DSP-style Vector-style
45 Management Instructions which can be implemented extremely efficiently
46 and effectively by inserting an additional phase between Decode and Issue.
47 More complex designs are NOT adversely impacted and in fact greatly benefit
48 ```
49
50 **Impact on software**:
51
52 ```
53 Requires support for new instructions in assembler, debuggers,
54 and related tools.
55 ```
56
57 **Keywords**:
58
59 ```
60 Cray Supercomputing, Vectorisation, Zero-Overhead-Loop-Control (ZOLC),
61 Scalable Vectors, Multi-Issue Out-of-Order, Sequential Programming Model,
62 Digital Signal Processing (DSP)
63 ```
64
65 **Motivation**
66
67 Power ISA is synonymous with Supercomputing and the early Supercomputers
68 (ETA-10, ILLIAC-IV, CDC200, Cray) had Vectorisation. It is therefore anomalous
69 that Power ISA does not have Scalable Vectors. This presents the opportunity to
70 modernise Power ISA keeping it at the top of Supercomputing.
71
72 **Notes and Observations**:
73
74 1. SVP64 is very much designed for ultra-light-weight Embedded use-cases all the
75 way up to moving the bar of Supercomputing orders of magnitude above its present
76 perception, whilst retaining at all times Sequential Programming Execution.
77 2. This proposal is the **base** for further Extensions. These include
78 extending SVP64 onto the Scalar VSX instructions (with a **LONG TERM** view in 10+ years
79 to deprecating the PackedSIMD aspects of VSX), to be discussed at a later
80 time, the potential for extending VSX registers to 128 or beyond, and Arithmetic
81 operations to a runtime-selectable choice of 128-bit, 256-bit, 512-bit or 1024-bit.
82 3. Massive reductions in instruction count of between 2x and 20x have been demonstrated
83 with SVP64, which is far beyond anything ever achieved by any *general-purpose*
84 ISA Extension added to any ISA in the history of Computing.
85
86 **Changes**
87
88 Add the following entries to:
89
90 * Section 1.3.2 Notation
91 * the Appendices of Book I
92 * Instructions of Book I as a new Section
93 * SVL-Form of Book I Section 1.6.1.6 and 1.6.2
94
95 ----------------
96
97 \newpage{}
98
99 # Notation, Section 1.3.2
100
101 When register operands (`RA, RT, BF`) are prefixed by a single underscore
102 (`_RT, _RA, _BF`) the variable contains the contents of the instruction field
103 not the contents of the Register File referenced *by* that field. Example:
104 `_RT` contains the contents of bits 5 thru 10. The relationship
105 `RT = GPR(_RT)` is thus always true. Uses include making alternative
106 decisions within an instruction based on whether the operand field
107 is zero or non-zero.
108
109 ----------------
110
111 \newpage{}
112
113 [[!inline pages="openpower/sv/svstep" raw=yes ]]
114 [[!inline pages="openpower/sv/setvl" raw=yes ]]
115
116 # SVSTATE SPR
117
118 The format of the SVSTATE SPR is as follows:
119
120 | Field | Name | Description |
121 | ----- | -------- | --------------------- |
122 | 0:6 | maxvl | Max Vector Length |
123 | 7:13 | vl | Vector Length |
124 | 14:20 | srcstep | for srcstep = 0..VL-1 |
125 | 21:27 | dststep | for dststep = 0..VL-1 |
126 | 28:29 | dsubstep | for substep = 0..SUBVL-1 |
127 | 30:31 | ssubstep | for substep = 0..SUBVL-1 |
128 | 32:33 | mi0 | REMAP RA/FRA/BFA SVSHAPE0-3 |
129 | 34:35 | mi1 | REMAP RB/FRB/BFB SVSHAPE0-3 |
130 | 36:37 | mi2 | REMAP RC/FRT SVSHAPE0-3 |
131 | 38:39 | mo0 | REMAP RT/FRT/BF SVSHAPE0-3 |
132 | 40:41 | mo1 | REMAP EA/RS/FRS SVSHAPE0-3 |
133 | 42:46 | SVme | REMAP enable (RA-RT) |
134 | 47:52 | rsvd | reserved |
135 | 53 | pack | PACK (srcstrp reorder) |
136 | 54 | unpack | UNPACK (dststep order) |
137 | 55:61 | hphint | Horizontal Hint |
138 | 62 | RMpst | REMAP persistence |
139 | 63 | vfirst | Vertical First mode |
140
141 Notes:
142
143 * The entries are truncated to be within range. Attempts to set VL to
144 greater than MAXVL will truncate VL.
145 * Setting srcstep, dststep to 64 or greater, or VL or MVL to greater
146 than 64 is reserved and will cause an illegal instruction trap.
147
148 **SVSTATE Fields**
149
150 SVSTATE is a standard SPR that (if REMAP is not activated) contains sufficient
151 self-contaned information for a full context save/restore.
152 SVSTATE contains (and permits setting of):
153
154 * MVL (the Maximum Vector Length) - declares (statically) how
155 much of a regfile is to be reserved for Vector elements
156 * VL - Vector Length
157 * dststep - the destination element offset of the current parallel
158 instruction being executed
159 * srcstep - for twin-predication, the source element offset as well.
160 * ssubstep - the source subvector element offset of the current
161 parallel instruction being executed
162 * dsubstep - the destination subvector element offset of the current
163 parallel instruction being executed
164 * vfirst - Vertical First mode. srcstep, dststep and substep
165 **do not advance** unless explicitly requested to do so with
166 pseudo-op svstep (a mode of setvl)
167 * RMpst - REMAP persistence. REMAP will apply only to the following
168 instruction unless this bit is set, in which case REMAP "persists".
169 Reset (cleared) on use of the `setvl` instruction if used to
170 alter VL or MVL.
171 * Pack - if set then srcstep/substep VL/SUBVL loop-ordering is inverted.
172 * UnPack - if set then dststep/substep VL/SUBVL loop-ordering is inverted.
173 * hphint - Horizontal Parallelism Hint. Indicates that
174 no Hazards exist between groups of elements in sequential multiples of this number
175 (before REMAP). By definition: elements for which `FLOOR(srcstep/hphint)` is
176 equal *before REMAP* are in the same parallelism "group". In Vertical First Mode
177 hardware **MUST ONLY** process elements in the same group, and must stop
178 Horizontal Issue at the last element of a given group. Set to zero to indicate "no hint".
179 * SVme - REMAP enable bits, indicating which register is to be
180 REMAPed: RA, RB, RC, RT and EA are the canonical (typical) register names
181 associated with each bit, with RA being the LSB and EA being the MSB.
182 See table below for ordering. When `SVme` is zero (0b00000) REMAP
183 is **fully disabled and inactive** regardless of the contents of
184 `SVSTATE`, `mi0-mi2/mo0-mo1`, or the four `SVSHAPEn` SPRs
185 * mi0-mi2/mo0-mo1 - when the corresponding SVme bit is enabled, these
186 indicate the SVSHAPE (0-3) that the corresponding register (RA etc)
187 should use, as long as the register's corresponding SVme bit is set
188
189 Programmer's Note: the fact that REMAP is entirely dormant when `SVme` is zero
190 allows establishment of REMAP context well in advance, followed by utilising `svremap`
191 at a precise (or the very last) moment. Some implementations may exploit this
192 to cache (or take some time to prepare caches) in the background whilst other
193 (unrelated) instructions are being executed. This is particularly important to
194 bear in mind when using `svindex` which will require hardware to perform (and
195 cache) additional GPR reads.
196
197 Programmer's Note: when REMAP is activated it becomes necessary on any
198 context-switch (Interrupt or Function call) to detect (or know in advance)
199 that REMAP is enabled and to additionally save/restore the four SVSHAPE
200 SPRs, SVHAPE0-3. Given that this is expected to be a rare occurrence it was
201 deemed unreasonable to burden every context-switch or function call with
202 mandatory save/restore of SVSHAPEs, and consequently it is a *callee*
203 (and Trap Handler) responsibility. Callees (and Trap Handlers) **MUST**
204 avoid using all and any SVP64 instructions during the period where state
205 could be adversely affected. SVP64 purely relies on Scalar instructions,
206 so Scalar instructions (except the SVP64 Management ones and mtspr and
207 mfspr) are 100% guaranteed to have zero impact on SVP64 state.
208
209 **Max Vector Length (maxvl)** <a name="mvl" />
210
211 MAXVECTORLENGTH is the same concept as MVL in RISC-V RVV, except that it
212 is variable length and may be dynamically set (normally from an immediate
213 field only). MVL is limited to 7 bits
214 (in the first version of SVP64) and consequently the maximum number of
215 elements is limited to between 0 and 127.
216
217 Programmer's Note: Except by directly using `mtspr` on SVSTATE, which may
218 result in performance penalties on some hardware implementations, SVSTATE's `maxvl`
219 field may only be set **statically** as an immediate, by the `setvl` instruction.
220 It may **NOT** be set dynamically from a register. Compiler writers and assembly
221 programmers are expected to perform static register file analysis, subdivision,
222 and allocation and only utilise `setvl`. Direct writing to SVSTATE in order to
223 "bypass" this Note could, in less-advanced implementations, potentially cause stalling,
224 particularly if SVP64 instructions are issued directly after the `mtspr` to SVSTATE.
225
226 **Vector Length (vl)** <a name="vl" />
227
228 The actual Vector length, the number of elements in a "Vector", `SVSTATE.vl` may be set
229 entirely dynamically at runtime from a number of sources. `setvl` is the primary
230 instruction for setting Vector Length.
231 `setvl` is conceptually similar but different from the Cray, SX Aurora, and RISC-V RVV
232 equivalent. Similar to RVV, VL is set to be within
233 the range 0 <= VL <= MVL. Unlike RVV, VL is set **exactly** according to the following:
234
235 VL = (RT|0) = MIN(vlen, MVL)
236
237 where 0 <= MVL <= 127 and vlen may come from an immediate, `RA`, or from the `CTR` SPR,
238 depending on options selected with the `setvl` instruction.
239
240 Programmer's Note: conceptual understanding of Cray-style Vectors is far beyond the scope
241 of the Power ISA Technical Reference. Guidance on the 50-year-old Cray Vector paradigm is
242 best sought elsewhere: good studies include Academic Courses given on the 1970s
243 Cray Supercomputers over at least the past three decades.
244
245 **SUBVL - Sub Vector Length**
246
247 This is a "group by quantity" that effectively asks each iteration
248 of the hardware loop to load SUBVL elements of width elwidth at a
249 time. Effectively, SUBVL is like a SIMD multiplier: instead of just 1
250 operation issued, SUBVL operations are issued.
251
252 The main effect of SUBVL is that predication bits are applied per
253 **group**, rather than by individual element. Legal values are 0 to 3,
254 representing 1 operation (1 element) thru 4 operations (4 elements) respectively.
255 Elements are best though of in the context of 3D, Audio and Video: two Left and Right
256 Channel "elements" or four ARGB "elements", or three XYZ coordinate "elements".
257
258 `subvl` is again primarily set by the `setvl` instruction. Not to be confused
259 with `hphint`.
260
261 Directly related to `subvl` is the `pack` and `unpack` Mode bits of `SVSTATE`.
262 See `svstep` instruction for how to set Pack and Unpack Modes.
263
264
265 **Horizontal Parallelism**
266
267 A problem exists for hardware where it may not be able to detect
268 that a programmer (or compiler) knows of opportunities for parallelism
269 and lack of overlap between loops.
270
271 For hphint, the number chosen must be consistently
272 executed **every time**. Hardware is not permitted to execute five
273 computations for one instruction then three on the next.
274 hphint is a hint from the compiler to hardware that exactly this
275 many elements may be safely executed in parallel, without hazards
276 (including Memory accesses).
277 Interestingly, when hphint is set equal to VL, it is in effect
278 as if Vertical First mode were not set, because the hardware is
279 given the option to run through all elements in an instruction.
280 This is exactly what Horizontal-First is: a for-loop from 0 to VL-1
281 except that the hardware may *choose* the number of elements.
282
283 *Note to programmers: changing VL during the middle of such modes
284 should be done only with due care and respect for the fact that SVSTATE
285 has exactly the same peer-level status as a Program Counter.*
286
287 -------------
288
289 \newpage{}
290
291 # SVL-Form
292
293 Add the following to Book I, 1.6.1, SVL-Form
294
295 ```
296 |0 |6 |11 |16 |23 |24 |25 |26 |31 |
297 | PO | RT | RA | SVi |ms |vs |vf | XO |Rc |
298 | PO | RT | / | SVi |/ |/ |vf | XO |Rc |
299 ```
300
301 * Add `SVL` to `RA (11:15)` Field in Book I, 1.6.2
302 * Add `SVL` to `RT (6:10)` Field in Book I, 1.6.2
303 * Add `SVL` to `Rc (31)` Field in Book I, 1.6.2
304 * Add `SVL` to `XO (26:31)` Field in Book I, 1.6.2
305
306 Add the following to Book I, 1.6.2
307
308 ```
309 ms (23)
310 Field used in Simple-V to specify whether MVL (maxvl in the SVSTATE SPR)
311 is to be set
312 Formats: SVL
313 vf (25)
314 Field used in Simple-V to specify whether "Vertical" Mode is set
315 (vfirst in the SVSTATE SPR)
316 Formats: SVL
317 vs (24)
318 Field used in Simple-V to specify whether VL (vl in the SVSTATE SPR) is to be set
319 Formats: SVL
320 SVi (16:22)
321 Simple-V immediate field used by setvl for setting VL or MVL
322 (vl, maxvl in the SVSTATE SPR)
323 and used as a "Mode of Operation" selector in svstep
324 Formats: SVL
325 ```
326
327 # Appendices
328
329 Appendix E Power ISA sorted by opcode
330 Appendix F Power ISA sorted by version
331 Appendix G Power ISA sorted by Compliancy Subset
332 Appendix H Power ISA sorted by mnemonic
333
334 | Form | Book | Page | Version | mnemonic | Description |
335 |------|------|------|---------|----------|-------------|
336 | SVL | I | # | 3.0B | svstep | Vertical-First Stepping and status reporting |
337 | SVL | I | # | 3.0B | setvl | Cray-like establishment of Looping (Vector) context |
338
339 [[!tag opf_rfc]]