clarify notation on _RT in ls008
[libreriscv.git] / openpower / sv / rfc / ls008.mdwn
1 # RFC ls008 SVP64 Management instructions
2
3 **URLs**:
4
5 * <https://libre-soc.org/openpower/sv/>
6 * <https://libre-soc.org/openpower/sv/rfc/ls008/>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1040>
8 * <https://git.openpower.foundation/isa/PowerISA/issues/123>
9
10 **Severity**: Major
11
12 **Status**: New
13
14 **Date**: 24 Mar 2023
15
16 **Target**: v3.2B
17
18 **Source**: v3.0B
19
20 **Books and Section affected**:
21
22 ```
23 Book I, new Scalar Chapter. (Or, new Book on "Zero-Overhead Loop Subsystem")
24 Appendix E Power ISA sorted by opcode
25 Appendix F Power ISA sorted by version
26 Appendix G Power ISA sorted by Compliancy Subset
27 Appendix H Power ISA sorted by mnemonic
28 ```
29
30 **Summary**
31
32 ```
33 setvl - Cray-style "Set Vector Length" instruction
34 svstep - Vertical-First Mode explicit Step and Status
35 ```
36
37 **Submitter**: Luke Leighton (Libre-SOC)
38
39 **Requester**: Libre-SOC
40
41 **Impact on processor**:
42
43 ```
44 Addition of two new "Zero-Overhead-Loop-Control" DSP-style Vector-style
45 Management Instructions which can be implemented extremely efficiently
46 and effectively by inserting an additional phase between Decode and Issue.
47 More complex designs are NOT adversely impacted and in fact greatly benefit
48 ```
49
50 **Impact on software**:
51
52 ```
53 Requires support for new instructions in assembler, debuggers,
54 and related tools.
55 ```
56
57 **Keywords**:
58
59 ```
60 Cray Supercomputing, Vectorisation, Zero-Overhead-Loop-Control (ZOLC),
61 Scalable Vectors, Multi-Issue Out-of-Order, Sequential Programming Model,
62 Digital Signal Processing (DSP)
63 ```
64
65 **Motivation**
66
67 Power ISA is synonymous with Supercomputing and the early Supercomputers
68 (ETA-10, ILLIAC-IV, CDC200, Cray) had Vectorisation. It is therefore anomalous
69 that Power ISA does not have Scalable Vectors. This presents the opportunity to
70 modernise Power ISA keeping it at the top of Supercomputing.
71
72 **Notes and Observations**:
73
74 1. SVP64 is very much designed for ultra-light-weight Embedded use-cases all the
75 way up to moving the bar of Supercomputing orders of magnitude above its present
76 perception, whilst retaining at all times Sequential Programming Execution.
77 2. This proposal is the **base** for further Extensions. These include
78 extending SVP64 onto the Scalar VSX instructions (with a **LONG TERM** view in 10+ years
79 to deprecating the PackedSIMD aspects of VSX), to be discussed at a later
80 time, the potential for extending VSX registers to 128 or beyond, and Arithmetic
81 operations to a runtime-selectable choice of 128-bit, 256-bit, 512-bit or 1024-bit.
82 3. Massive reductions in instruction count of between 2x and 20x have been demonstrated
83 with SVP64, which is far beyond anything ever achieved by any *general-purpose*
84 ISA Extension added to any ISA in the history of Computing.
85
86 **Changes**
87
88 Add the following entries to:
89
90 * Section 1.3.2 Notation
91 * the Appendices of Book I
92 * Instructions of Book I as a new Section
93 * SVL-Form of Book I Section 1.6.1.6 and 1.6.2
94
95 ----------------
96
97 \newpage{}
98
99 # Notation, Section 1.3.2
100
101 When destination register operands (`RT, RS`) are prefixed by a single
102 underscore (`_RT, _RS`) the variable also contains the contents of the
103 instruction field.
104 This avoids confusion in pseudocode when a destination register is
105 assigned (`RT <- x`) but earlier it was the operand bits that were
106 checked (`if RT = 0`)
107
108 ----------------
109
110 \newpage{}
111
112 [[!inline pages="openpower/sv/svstep" raw=yes ]]
113 [[!inline pages="openpower/sv/setvl" raw=yes ]]
114
115 # SVSTATE SPR
116
117 The format of the SVSTATE SPR is as follows:
118
119 | Field | Name | Description |
120 | ----- | -------- | --------------------- |
121 | 0:6 | maxvl | Max Vector Length |
122 | 7:13 | vl | Vector Length |
123 | 14:20 | srcstep | for srcstep = 0..VL-1 |
124 | 21:27 | dststep | for dststep = 0..VL-1 |
125 | 28:29 | dsubstep | for substep = 0..SUBVL-1 |
126 | 30:31 | ssubstep | for substep = 0..SUBVL-1 |
127 | 32:33 | mi0 | REMAP RA/FRA/BFA SVSHAPE0-3 |
128 | 34:35 | mi1 | REMAP RB/FRB/BFB SVSHAPE0-3 |
129 | 36:37 | mi2 | REMAP RC/FRT SVSHAPE0-3 |
130 | 38:39 | mo0 | REMAP RT/FRT/BF SVSHAPE0-3 |
131 | 40:41 | mo1 | REMAP EA/RS/FRS SVSHAPE0-3 |
132 | 42:46 | SVme | REMAP enable (RA-RT) |
133 | 47:52 | rsvd | reserved |
134 | 53 | pack | PACK (srcstrp reorder) |
135 | 54 | unpack | UNPACK (dststep order) |
136 | 55:61 | hphint | Horizontal Hint |
137 | 62 | RMpst | REMAP persistence |
138 | 63 | vfirst | Vertical First mode |
139
140 Notes:
141
142 * The entries are truncated to be within range. Attempts to set VL to
143 greater than MAXVL will truncate VL.
144 * Setting srcstep, dststep to 64 or greater, or VL or MVL to greater
145 than 64 is reserved and will cause an illegal instruction trap.
146
147 **SVSTATE Fields**
148
149 SVSTATE is a standard SPR that (if REMAP is not activated) contains sufficient
150 self-contaned information for a full context save/restore.
151 SVSTATE contains (and permits setting of):
152
153 * MVL (the Maximum Vector Length) - declares (statically) how
154 much of a regfile is to be reserved for Vector elements
155 * VL - Vector Length
156 * dststep - the destination element offset of the current parallel
157 instruction being executed
158 * srcstep - for twin-predication, the source element offset as well.
159 * ssubstep - the source subvector element offset of the current
160 parallel instruction being executed
161 * dsubstep - the destination subvector element offset of the current
162 parallel instruction being executed
163 * vfirst - Vertical First mode. srcstep, dststep and substep
164 **do not advance** unless explicitly requested to do so with
165 pseudo-op svstep (a mode of setvl)
166 * RMpst - REMAP persistence. REMAP will apply only to the following
167 instruction unless this bit is set, in which case REMAP "persists".
168 Reset (cleared) on use of the `setvl` instruction if used to
169 alter VL or MVL.
170 * Pack - if set then srcstep/substep VL/SUBVL loop-ordering is inverted.
171 * UnPack - if set then dststep/substep VL/SUBVL loop-ordering is inverted.
172 * hphint - Horizontal Parallelism Hint. Indicates that
173 no Hazards exist between groups of elements in sequential multiples of this number
174 (before REMAP). By definition: elements for which `FLOOR(srcstep/hphint)` is
175 equal *before REMAP* are in the same parallelism "group". In Vertical First Mode
176 hardware **MUST ONLY** process elements in the same group, and must stop
177 Horizontal Issue at the last element of a given group. Set to zero to indicate "no hint".
178 * SVme - REMAP enable bits, indicating which register is to be
179 REMAPed: RA, RB, RC, RT and EA are the canonical (typical) register names
180 associated with each bit, with RA being the LSB and EA being the MSB.
181 See table below for ordering. When `SVme` is zero (0b00000) REMAP
182 is **fully disabled and inactive** regardless of the contents of
183 `SVSTATE`, `mi0-mi2/mo0-mo1`, or the four `SVSHAPEn` SPRs
184 * mi0-mi2/mo0-mo1 - when the corresponding SVme bit is enabled, these
185 indicate the SVSHAPE (0-3) that the corresponding register (RA etc)
186 should use, as long as the register's corresponding SVme bit is set
187
188 Programmer's Note: the fact that REMAP is entirely dormant when `SVme` is zero
189 allows establishment of REMAP context well in advance, followed by utilising `svremap`
190 at a precise (or the very last) moment. Some implementations may exploit this
191 to cache (or take some time to prepare caches) in the background whilst other
192 (unrelated) instructions are being executed. This is particularly important to
193 bear in mind when using `svindex` which will require hardware to perform (and
194 cache) additional GPR reads.
195
196 Programmer's Note: when REMAP is activated it becomes necessary on any
197 context-switch (Interrupt or Function call) to detect (or know in advance)
198 that REMAP is enabled and to additionally save/restore the four SVSHAPE
199 SPRs, SVHAPE0-3. Given that this is expected to be a rare occurrence it was
200 deemed unreasonable to burden every context-switch or function call with
201 mandatory save/restore of SVSHAPEs, and consequently it is a *callee*
202 (and Trap Handler) responsibility. Callees (and Trap Handlers) **MUST**
203 avoid using all and any SVP64 instructions during the period where state
204 could be adversely affected. SVP64 purely relies on Scalar instructions,
205 so Scalar instructions (except the SVP64 Management ones and mtspr and
206 mfspr) are 100% guaranteed to have zero impact on SVP64 state.
207
208 **Max Vector Length (maxvl)** <a name="mvl" />
209
210 MAXVECTORLENGTH is the same concept as MVL in RISC-V RVV, except that it
211 is variable length and may be dynamically set (normally from an immediate
212 field only). MVL is limited to 7 bits
213 (in the first version of SVP64) and consequently the maximum number of
214 elements is limited to between 0 and 127.
215
216 Programmer's Note: Except by directly using `mtspr` on SVSTATE, which may
217 result in performance penalties on some hardware implementations, SVSTATE's `maxvl`
218 field may only be set **statically** as an immediate, by the `setvl` instruction.
219 It may **NOT** be set dynamically from a register. Compiler writers and assembly
220 programmers are expected to perform static register file analysis, subdivision,
221 and allocation and only utilise `setvl`. Direct writing to SVSTATE in order to
222 "bypass" this Note could, in less-advanced implementations, potentially cause stalling,
223 particularly if SVP64 instructions are issued directly after the `mtspr` to SVSTATE.
224
225 **Vector Length (vl)** <a name="vl" />
226
227 The actual Vector length, the number of elements in a "Vector", `SVSTATE.vl` may be set
228 entirely dynamically at runtime from a number of sources. `setvl` is the primary
229 instruction for setting Vector Length.
230 `setvl` is conceptually similar but different from the Cray, SX Aurora, and RISC-V RVV
231 equivalent. Similar to RVV, VL is set to be within
232 the range 0 <= VL <= MVL. Unlike RVV, VL is set **exactly** according to the following:
233
234 VL = (RT|0) = MIN(vlen, MVL)
235
236 where 0 <= MVL <= 127 and vlen may come from an immediate, `RA`, or from the `CTR` SPR,
237 depending on options selected with the `setvl` instruction.
238
239 Programmer's Note: conceptual understanding of Cray-style Vectors is far beyond the scope
240 of the Power ISA Technical Reference. Guidance on the 50-year-old Cray Vector paradigm is
241 best sought elsewhere: good studies include Academic Courses given on the 1970s
242 Cray Supercomputers over at least the past three decades.
243
244 **SUBVL - Sub Vector Length**
245
246 This is a "group by quantity" that effectively asks each iteration
247 of the hardware loop to load SUBVL elements of width elwidth at a
248 time. Effectively, SUBVL is like a SIMD multiplier: instead of just 1
249 operation issued, SUBVL operations are issued.
250
251 The main effect of SUBVL is that predication bits are applied per
252 **group**, rather than by individual element. Legal values are 0 to 3,
253 representing 1 operation (1 element) thru 4 operations (4 elements) respectively.
254 Elements are best though of in the context of 3D, Audio and Video: two Left and Right
255 Channel "elements" or four ARGB "elements", or three XYZ coordinate "elements".
256
257 `subvl` is again primarily set by the `setvl` instruction. Not to be confused
258 with `hphint`.
259
260 Directly related to `subvl` is the `pack` and `unpack` Mode bits of `SVSTATE`.
261 See `svstep` instruction for how to set Pack and Unpack Modes.
262
263
264 **Horizontal Parallelism**
265
266 A problem exists for hardware where it may not be able to detect
267 that a programmer (or compiler) knows of opportunities for parallelism
268 and lack of overlap between loops.
269
270 For hphint, the number chosen must be consistently
271 executed **every time**. Hardware is not permitted to execute five
272 computations for one instruction then three on the next.
273 hphint is a hint from the compiler to hardware that exactly this
274 many elements may be safely executed in parallel, without hazards
275 (including Memory accesses).
276 Interestingly, when hphint is set equal to VL, it is in effect
277 as if Vertical First mode were not set, because the hardware is
278 given the option to run through all elements in an instruction.
279 This is exactly what Horizontal-First is: a for-loop from 0 to VL-1
280 except that the hardware may *choose* the number of elements.
281
282 *Note to programmers: changing VL during the middle of such modes
283 should be done only with due care and respect for the fact that SVSTATE
284 has exactly the same peer-level status as a Program Counter.*
285
286 -------------
287
288 \newpage{}
289
290 # SVL-Form
291
292 Add the following to Book I, 1.6.1, SVL-Form
293
294 ```
295 |0 |6 |11 |16 |23 |24 |25 |26 |31 |
296 | PO | RT | RA | SVi |ms |vs |vf | XO |Rc |
297 | PO | RT | / | SVi |/ |/ |vf | XO |Rc |
298 ```
299
300 * Add `SVL` to `RA (11:15)` Field in Book I, 1.6.2
301 * Add `SVL` to `RT (6:10)` Field in Book I, 1.6.2
302 * Add `SVL` to `Rc (31)` Field in Book I, 1.6.2
303 * Add `SVL` to `XO (26:31)` Field in Book I, 1.6.2
304
305 Add the following to Book I, 1.6.2
306
307 ```
308 ms (23)
309 Field used in Simple-V to specify whether MVL (maxvl in the SVSTATE SPR)
310 is to be set
311 Formats: SVL
312 vf (25)
313 Field used in Simple-V to specify whether "Vertical" Mode is set
314 (vfirst in the SVSTATE SPR)
315 Formats: SVL
316 vs (24)
317 Field used in Simple-V to specify whether VL (vl in the SVSTATE SPR) is to be set
318 Formats: SVL
319 SVi (16:22)
320 Simple-V immediate field used by setvl for setting VL or MVL
321 (vl, maxvl in the SVSTATE SPR)
322 and used as a "Mode of Operation" selector in svstep
323 Formats: SVL
324 ```
325
326 # Appendices
327
328 Appendix E Power ISA sorted by opcode
329 Appendix F Power ISA sorted by version
330 Appendix G Power ISA sorted by Compliancy Subset
331 Appendix H Power ISA sorted by mnemonic
332
333 | Form | Book | Page | Version | mnemonic | Description |
334 |------|------|------|---------|----------|-------------|
335 | SVL | I | # | 3.0B | svstep | Vertical-First Stepping and status reporting |
336 | SVL | I | # | 3.0B | setvl | Cray-like establishment of Looping (Vector) context |
337
338 [[!tag opf_rfc]]