(no commit message)
[libreriscv.git] / openpower / sv / rfc / ls008.mdwn
1 # RFC ls008 SVP64 Management instructions
2
3 [[!tag opf_rfc]]
4
5 **URLs**:
6
7 * <https://libre-soc.org/openpower/sv/>
8 * <https://libre-soc.org/openpower/sv/rfc/ls008/>
9 * <https://bugs.libre-soc.org/show_bug.cgi?id=1040>
10 * <https://git.openpower.foundation/isa/PowerISA/issues/87>
11
12 **Severity**: Major
13
14 **Status**: New
15
16 **Date**: 24 Mar 2023
17
18 **Target**: v3.2B
19
20 **Source**: v3.0B
21
22 **Books and Section affected**:
23
24 ```
25 Book I, new Scalar Chapter. (Or, new Book on "Zero-Overhead Loop Subsystem")
26 Appendix E Power ISA sorted by opcode
27 Appendix F Power ISA sorted by version
28 Appendix G Power ISA sorted by Compliancy Subset
29 Appendix H Power ISA sorted by mnemonic
30 ```
31
32 **Summary**
33
34 ```
35 setvl - Cray-style "Set Vector Length" instruction
36 svstep - Vertical-First Mode explicit Step and Status
37 svremap - Re-Mapping of Register Element Offsets
38 svindex - General-purpose setting of SHAPEs to be re-mapped
39 svshape - Hardware-level setting of SHAPEs for element re-mapping
40 svshape2 - Hardware-level setting of SHAPEs for element re-mapping (v2)
41 ```
42
43 **Submitter**: Luke Leighton (Libre-SOC)
44
45 **Requester**: Libre-SOC
46
47 **Impact on processor**:
48
49 ```
50 Addition of six new "Zero-Overhead-Loop-Control" DSP-style Vector-style
51 Management Instructions which can be implemented extremely efficiently
52 and effectively by inserting an additional phase between Decode and Issue.
53 More complex designs are NOT adversely impacted and in fact greatly benefit
54
55 **Impact on software**:
56
57 ```
58 Requires support for new instructions in assembler, debuggers,
59 and related tools.
60 ```
61
62 **Keywords**:
63
64 ```
65 Cray Supercomputing, Vectorisation, Zero-Overhead-Loop-Control (ZOLC),
66 Scalable Vectors, Multi-Issue Out-of-Order, Sequential Programming Model,
67 Digital Signal Processing (DSP)
68 ```
69
70 **Motivation**
71
72 Power ISA is synonymous with Supercomputing and the early Supercomputers
73 (ETA-10, ILLIAC-IV, CDC200, Cray) had Vectorisation. It is therefore anomalous
74 that Power ISA does not have Scalable Vectors, instead having the legacy
75 "PackedSIMD" paradigm. Fortunately this presents
76 the opportunity to modernise Power ISA learning from both past ISA features and
77 mistakes placing it far above the top of Supercomputing for the next two decades
78 and beyond.
79
80 **Notes and Observations**:
81
82 1. SVP64 is very much designed for ultra-light-weight Embedded use-cases all the
83 way up to moving the bar of Supercomputing orders of magnitude above its present
84 perception, whilst retaining at all times Sequential Programming Execution.
85 2. This proposal is the **base** for further Extensions. These include
86 extending SVP64 onto the Scalar VSX instructions (with a **LONG TERM** view in 10+ years
87 to deprecating the PackedSIMD aspects of VSX), to be discussed at a later
88 time, the potential for extending VSX registers to 128 or beyond, and Arithmetic
89 operations to a runtime-selectable choice of 128-bit, 256-bit, 512-bit or 1024-bit.
90 3. Massive reductions in instruction count of between 2x and 20x have been demonstrated
91 with SVP64, which is far beyond anything ever achieved by any *general-purpose*
92 ISA Extension added to any ISA in the history of Computing. Normal reductions
93 expected are of the order of 5 to 10% being considered a highly worthwhile exercise
94 to pursue inclusion.
95
96 **Changes**
97
98 Add the following entries to:
99
100 * Section 1.3.2 Notation
101 * the Appendices of Book I
102 * Instructions of Book I as a new Section
103 * SVL-Form of Book I Section 1.6.1.6 and 1.6.2
104
105 ----------------
106
107 \newpage{}
108
109 # Notation, Section 1.3.2
110
111 When register operands (RA, RT, BF) are prefixed by a single underscore
112 (_RT, _RA, _BF) the variable contains the contents of the instruction field
113 not the contents of the Register File referenced *by* that field. Example:
114 `_RT` contains the contents of bits 5 thru 10. The relationship
115 `RT = GPR(_RT)` is thus always true. Uses include making alternative
116 decisions within an instruction based on whether the operand field
117 is zero or non-zero.
118
119 ----------------
120
121 \newpage{}
122
123 # svstep: Vertical-First Stepping and status reporting
124
125 SVL-Form
126
127 * svstep RT,SVi,vf (Rc=0)
128 * svstep. RT,SVi,vf (Rc=1)
129
130 | 0-5|6-10|11.15|16..22| 23-25 | 26-30 |31| Form |
131 |----|----|-----|------|----------|-------|--|--------- |
132 |PO | RT | / | SVi | / / vf | XO |Rc| SVL-Form |
133
134 Pseudo-code:
135
136 ```
137 if SVi[3:4] = 0b11 then
138 # store pack and unpack in SVSTATE
139 SVSTATE[53] <- SVi[5]
140 SVSTATE[54] <- SVi[6]
141 RT <- [0]*62 || SVSTATE[53:54]
142 else
143 # Vertical-First explicit stepping.
144 step <- SVSTATE_NEXT(SVi, vf)
145 RT <- [0]*57 || step
146 ```
147
148 Special Registers Altered:
149
150 CR0 (if Rc=1)
151
152 **Description**
153
154 svstep may be used
155 to enquire about the REMAP Schedule and it may be used to alter Vectorisation
156 State. When `vf=1` then stepping occurs.
157 When `vf=0` the enquiry is performed without altering internal
158 state. If `SVi=0, Rc=0, vf=0` the instruction is a `nop`.
159
160 The following Modes exist:
161
162 * `SVi=0`: appropriately step srcstep, dststep, subsrcstep and subdststep to the next
163 element, taking pack and unpack into consideration.
164 * When `SVi` is 1-4 the REMAP Schedule for a given SVSHAPE may be
165 returned in `RT`. SVi=1 selects SVSHAPE0 current state,
166 through to SVi=4 selects SVSHAPE3.
167 * When `SVi` is 5, `SVSTATE.srcstep` is returned.
168 * When `SVi` is 6, `SVSTATE.dststep` is returned.
169 * When `SVi` is 0b1100 pack/unpack in SVSTATE is cleared
170 * When `SVi` is 0b1101 pack in SVSTATE is set, unpack is cleared
171 * When `SVi` is 0b1110 unpack in SVSTATE is set, pack is cleared
172 * When `SVi` is 0b1111 pack/unpack in SVSTATE are set
173
174 As this is a Single-Predicated (1P) instruction, predication may be applied
175 to skip (or zero) elements.
176
177 * Vertical-First Mode will return the requested index
178 (and move to the next state if `vf=1`)
179 * Horizontal-First Mode can be used to return all indices,
180 i.e. walks through all possible states.
181
182 **Vectorisation of svstep itself**
183
184 As a 32-bit instruction, `svstep` may be itself be Vector-Prefixed, as
185 `sv.svstep`. This will work perfectly well in Horizontal-First
186 as it will in Vertical-First Mode.
187
188 Example: to obtain the full set of possible computed element
189 indices use `sv.svstep RT.v,SVI,1` which will store all computed element
190 indices, starting from RT. If Rc=1 then a co-result Vector of CR Fields
191 will also be returned, comprising the "loop end-points" of each of the inner
192 loops when either Matrix Mode or DCT/FFT is set. In other words,
193 for example, when the `xdim` inner loop reaches the end and on the next
194 iteration it will begin again at zero, the CR Field `EQ` will be set.
195 With a maximum of three loops within both Matrix and DCT/FFT Modes,
196 the CR Field's EQ bit will be set at the end of the first inner loop,
197 the LE bit for the second, the GT bit for the outermost loop and the
198 SO bit set on the very last element, when all loops reach their maximum
199 extent.
200
201 *Programmer's note (1): VL in some situations, particularly larger Matrices,
202 may exceed 64,
203 meaning that `sv.svshape` returning a considerable number of values. Under
204 such circumstances `sv.svshape/ew=8` is recommended.*
205
206 *Programmer's note (2): having conveniently obtained a pre-computed
207 Schedule with `sv.svstep`,
208 it may then be used as the input to Indexed REMAP Mode
209 to achieve the exact same Schedule. It is evident however that
210 before use some of the Indices may be arbitrarily altered as desired.
211 `sv.svstep` helps the programmer avoid having to manually recreate
212 Indices for certain
213 types of common Loop patterns, and in its simplest form, without REMAP
214 (SVi=5 or SVi=6),
215 is equivalent to the `iota` instruction found in other Vector ISAs*
216
217 **Vertical First Mode**
218
219 Vertical First is effectively like an implicit single bit predicate
220 applied to every SVP64 instruction. **ONLY** one element in each
221 SVP64 Vector instruction is executed; srcstep and dststep do **not**
222 increment, and the Program Counter progresses **immediately** to
223 the next instruction just as it would for any standard scalar v3.0B
224 instruction.
225
226 A mode of srcstep (SVi=0) is called which can move srcstep and
227 dststep on to the next element, still respecting predicate
228 masks.
229
230 In other words, where normal SVP64 Vectorisation acts "horizontally"
231 by looping first through 0 to VL-1 and only then moving the PC
232 to the next instruction, Vertical-First moves the PC onwards
233 (vertically) through multiple instructions **with the same
234 srcstep and dststep**, then an explict instruction used to
235 advance srcstep/dststep. An outer loop is expected to be
236 used (branch instruction) which completes a series of
237 Vector operations.
238
239 Testing any end condition of any loop of any REMAP state allows branches to be
240 used to create loops.
241
242 Programmer's note: when Predicate Non-Zeroing is used this indicates to
243 the underlying hardware that any masked-out element must be skipped.
244 *This includes in Vertical-First Mode*, and programmers should be keenly
245 aware that srcstep or dststep or both *may* jump by more than one as
246 a result, because the actual request under these circumstances was to execute
247 on the first available next *non-masked-out* element.
248
249 *Programmers should be aware that VL, srcstep and dststep are global in nature.
250 Nested looping with different schedules is perfectly possible, as is
251 calling of functions, however SVSTATE (and any associated SVSTATE) should
252 obviously be stored on the stack in order to achieve this benefit*
253
254 -------------
255
256 \newpage{}
257
258
259 # setvl
260
261 SVL-Form
262
263 | 0-5|6-10|11-15|16-22 | 23 24 25 | 26-30 |31| FORM |
264 | -- | -- | --- | ---- |----------| ----- |--|----------|
265 |PO | RT | RA | SVi | ms vs vf | XO |Rc| SVL-Form |
266
267 * setvl RT,RA,SVi,vf,vs,ms (Rc=0)
268 * setvl. RT,RA,SVi,vf,vs,ms (Rc=1)
269
270 Pseudo-code:
271
272 ```
273 overflow <- 0b0 # sets CR.SO if set and if Rc=1
274 VLimm <- SVi + 1
275 # set or get MVL
276 if ms = 1 then MVL <- VLimm[0:6]
277 else MVL <- SVSTATE[0:6]
278 # set or get VL
279 if vs = 0 then VL <- SVSTATE[7:13]
280 else if _RA != 0 then
281 if (RA) >u 0b1111111 then
282 VL <- 0b1111111
283 overflow <- 0b1
284 else VL <- (RA)[57:63]
285 else if _RT = 0 then VL <- VLimm[0:6]
286 else if CTR >u 0b1111111 then
287 VL <- 0b1111111
288 overflow <- 0b1
289 else VL <- CTR[57:63]
290 # limit VL to within MVL
291 if VL >u MVL then
292 overflow <- 0b1
293 VL <- MVL
294 SVSTATE[0:6] <- MVL
295 SVSTATE[7:13] <- VL
296 if _RT != 0 then
297 GPR(_RT) <- [0]*57 || VL
298 # MAXVL is a static "state-reset" opportunity so VF is only set then.
299 if ms = 1 then
300 SVSTATE[63] <- vf # set Vertical-First mode
301 SVSTATE[62] <- 0b0 # clear persist bit
302 ```
303
304 Special Registers Altered:
305
306 ```
307 CR0 (if Rc=1)
308 ```
309
310 * `SVi` - bits 16-22 - an immediate operand for setting MVL and/or VL
311 * `ms` - bit 23 - allows for setting of MVL
312 * `vs` - bit 24 - allows for setting of VL
313 * `vf` - bit 25 - sets "Vertical First Mode".
314
315 Note that in immediate setting mode VL and MVL start from **one**
316 but that this is compensated for in the assembly notation.
317 i.e. that an immediate value of 1 in assembler notation
318 actually places the value 0b0000000 in the `SVi` field bits:
319 on execution the `setvl` instruction adds one to the decoded
320 `SVi` field bits, resulting in
321 VL/MVL being set to 1. This allows VL to be set to values
322 ranging from 1 to 128 with only 7 bits instead of 8.
323 Setting VL/MVL
324 to 0 would result in all Vector operations becoming `nop`. If this is
325 truly desired (nop behaviour) then setting VL and MVL to zero is to be
326 done via the [[SVSTATE SPR|sv/sprs]].
327
328 Note that setmvli is a pseudo-op, based on RA/RT=0, and setvli likewise
329
330 setvli VL=8 : setvl r0, r0, VL=8, vf=0, vs=1, ms=0
331 setvli. VL=8 : setvl. r0, r0, VL=8, vf=0, vs=1, ms=0
332 setmvli MVL=8 : setvl r0, r0, MVL=8, vf=0, vs=0, ms=1
333 setmvli. MVL=8 : setvl. r0, r0, MVL=8, vf=0, vs=0, ms=1
334
335 Additional pseudo-op for obtaining VL without modifying it (or any state):
336
337 getvl r5 : setvl r5, r0, vf=0, vs=0, ms=0
338 getvl. r5 : setvl. r5, r0, vf=0, vs=0, ms=0
339
340 Note that whilst it is possible to set both MVL and VL from the same
341 immediate, it is not possible to set them to different immediates in
342 the same instruction. Doing so would require two instructions.
343
344 **Selecting sources for VL**
345
346 There is considerable opcode pressure, consequently to set MVL and VL
347 from different sources is as follows:
348
349 | condition | effect |
350 | - | - |
351 | `vs=1, RA=0, RT!=0` | VL,RT set to MIN(MVL, CTR) |
352 | `vs=1, RA=0, RT=0` | VL set to MIN(MVL, SVi+1) |
353 | `vs=1, RA!=0, RT=0` | VL set to MIN(MVL, RA) |
354 | `vs=1, RA!=0, RT!=0` | VL,RT set to MIN(MVL, RA) |
355
356 The reasoning here is that the opportunity to set RT equal to the
357 immediate `SVi+1` is sacrificed in favour of setting from CTR.
358
359 # Unusual Rc=1 behaviour
360
361 Normally, the return result from an instruction is in `RT`. With
362 it being possible for `RT=0` to mean that `CTR` mode is to be read,
363 some different semantics are needed.
364
365 CR Field 0, when `Rc=1`, may be set even if `RT=0`. The reason is that
366 overflow may occur: `VL`, if set either from an immediate or from `CTR`,
367 may not exceed `MAXVL`, and if it is, `CR0.SO` must be set.
368
369 Additionally, in reality it is **`VL`** being set. Therefore, rather
370 than `CR0` testing `RT` when `Rc=1`, CR0.EQ is set if `VL=0`, CR0.GE
371 is set if `VL` is non-zero.
372
373 **SUBVL**
374
375 Sub-vector elements are not be considered "Vertical". The vec2/3/4
376 is to be considered as if the "single element". Caveats exist for
377 [[sv/mv.swizzle]] and [[sv/mv.vec]] when Pack/Unpack is enabled,
378 due to the order in which VL and SUBVL loops are applied being
379 swapped (outer-inner becomes inner-outer)
380
381 # Examples
382
383 ## Core concept loop
384
385 ```
386 loop:
387 setvl a3, a0, MVL=8 # update a3 with vl
388 # (# of elements this iteration)
389 # set MVL to 8
390 # do vector operations at up to 8 length (MVL=8)
391 # ...
392 sub a0, a0, a3 # Decrement count by vl
393 bnez a0, loop # Any more?
394 ```
395
396 ## Loop using Rc=1
397
398 my_fn:
399 li r3, 1000
400 b test
401 loop:
402 sub r3, r3, r4
403 ...
404 test:
405 setvli. r4, r3, MVL=64
406 bne cr0, loop
407 end:
408 blr
409
410 ## Load/Store-Multi (selective)
411
412 Up to 64 FPRs will be loaded, here. `r3` is set one per bit
413 for each FP register required to be loaded. The block of memory
414 from which the registers are loaded is contiguous (no gaps):
415 any FP register which has a corresponding zero bit in `r3`
416 is *unaltered*. In essence this is a selective LD-multi with
417 "Scatter" capability.
418
419 setvli r0, MVL=64, VL=64
420 sv.fld/dm=r3 *r0, 0(r30) # selective load 64 FP registers
421
422 Up to 64 FPRs will be saved, here. Again, `r3`
423
424 setvli r0, MVL=64, VL=64
425 sv.stfd/sm=r3 *fp0, 0(r30) # selective store 64 FP registers
426
427 -------------
428
429 \newpage{}
430
431 # SVSTATE SPR
432
433 The format of the SVSTATE SPR is as follows:
434
435 | Field | Name | Description |
436 | ----- | -------- | --------------------- |
437 | 0:6 | maxvl | Max Vector Length |
438 | 7:13 | vl | Vector Length |
439 | 14:20 | srcstep | for srcstep = 0..VL-1 |
440 | 21:27 | dststep | for dststep = 0..VL-1 |
441 | 28:29 | dsubstep | for substep = 0..SUBVL-1 |
442 | 30:31 | ssubstep | for substep = 0..SUBVL-1 |
443 | 32:33 | mi0 | REMAP RA/FRA/BFA SVSHAPE0-3 |
444 | 34:35 | mi1 | REMAP RB/FRB/BFB SVSHAPE0-3 |
445 | 36:37 | mi2 | REMAP RC/FRT SVSHAPE0-3 |
446 | 38:39 | mo0 | REMAP RT/FRT/BF SVSHAPE0-3 |
447 | 40:41 | mo1 | REMAP EA/RS/FRS SVSHAPE0-3 |
448 | 42:46 | SVme | REMAP enable (RA-RT) |
449 | 47:52 | rsvd | reserved |
450 | 53 | pack | PACK (srcstrp reorder) |
451 | 54 | unpack | UNPACK (dststep order) |
452 | 55:61 | hphint | Horizontal Hint |
453 | 62 | RMpst | REMAP persistence |
454 | 63 | vfirst | Vertical First mode |
455
456 Notes:
457
458 * The entries are truncated to be within range. Attempts to set VL to
459 greater than MAXVL will truncate VL.
460 * Setting srcstep, dststep to 64 or greater, or VL or MVL to greater
461 than 64 is reserved and will cause an illegal instruction trap.
462
463 **SVSTATE Fields**
464
465 SVSTATE is a standard SPR that (if REMAP is not activated) contains sufficient
466 self-contaned information for a full context save/restore.
467 SVSTATE contains (and permits setting of):
468
469 * MVL (the Maximum Vector Length) - declares (statically) how
470 much of a regfile is to be reserved for Vector elements
471 * VL - Vector Length
472 * dststep - the destination element offset of the current parallel
473 instruction being executed
474 * srcstep - for twin-predication, the source element offset as well.
475 * ssubstep - the source subvector element offset of the current
476 parallel instruction being executed
477 * dsubstep - the destination subvector element offset of the current
478 parallel instruction being executed
479 * vfirst - Vertical First mode. srcstep, dststep and substep
480 **do not advance** unless explicitly requested to do so with
481 pseudo-op svstep (a mode of setvl)
482 * RMpst - REMAP persistence. REMAP will apply only to the following
483 instruction unless this bit is set, in which case REMAP "persists".
484 Reset (cleared) on use of the `setvl` instruction if used to
485 alter VL or MVL.
486 * Pack - if set then srcstep/substep VL/SUBVL loop-ordering is inverted.
487 * UnPack - if set then dststep/substep VL/SUBVL loop-ordering is inverted.
488 * hphint - Horizontal Parallelism Hint. Indicates that
489 no Hazards exist between groups of elements in sequential multiples of this number
490 (before REMAP). By definition: elements for which `FLOOR(srcstep/hphint)` is
491 equal *before REMAP* are in the same parallelism "group". In Vertical First Mode
492 hardware **MUST ONLY** process elements in the same group, and must stop
493 Horizontal Issue at the last element of a given group. Set to zero to indicate "no hint".
494 * SVme - REMAP enable bits, indicating which register is to be
495 REMAPed: RA, RB, RC, RT and EA are the canonical (typical) register names
496 associated with each bit, with RA being the LSB and EA being the MSB.
497 See table below for ordering. When `SVme` is zero (0b00000) REMAP
498 is **fully disabled and inactive** regardless of the contents of
499 `SVSTATE`, `mi0-mi2/mo0-mo1`, or the four `SVSHAPEn` SPRs
500 * mi0-mi2/mo0-mo1 - when the corresponding SVme bit is enabled, these
501 indicate the SVSHAPE (0-3) that the corresponding register (RA etc)
502 should use, as long as the register's corresponding SVme bit is set
503
504 Programmer's Note: the fact that REMAP is entirely dormant when `SVme` is zero
505 allows establishment of REMAP context well in advance, followed by utilising `svremap`
506 at a precise (or the very last) moment. Some implementations may exploit this
507 to cache (or take some time to prepare caches) in the background whilst other
508 (unrelated) instructions are being executed. This is particularly important to
509 bear in mind when using `svindex` which will require hardware to perform (and
510 cache) additional GPR reads.
511
512 Programmer's Note: when REMAP is activated it becomes necessary on any
513 context-switch (Interrupt or Function call) to detect (or know in advance)
514 that REMAP is enabled and to additionally save/restore the four SVSHAPE
515 SPRs, SVHAPE0-3. Given that this is expected to be a rare occurrence it was
516 deemed unreasonable to burden every context-switch or function call with
517 mandatory save/restore of SVSHAPEs, and consequently it is a *callee*
518 (and Trap Handler) responsibility. Callees (and Trap Handlers) **MUST**
519 avoid using all and any SVP64 instructions during the period where state
520 could be adversely affected. SVP64 purely relies on Scalar instructions,
521 so Scalar instructions (except the SVP64 Management ones and mtspr and
522 mfspr) are 100% guaranteed to have zero impact on SVP64 state.
523
524 **Max Vector Length (maxvl)** <a name="mvl" />
525
526 MAXVECTORLENGTH is the same concept as MVL in RISC-V RVV, except that it
527 is variable length and may be dynamically set (normally from an immediate
528 field only). MVL is limited to 7 bits
529 (in the first version of SVP64) and consequently the maximum number of
530 elements is limited to between 0 and 127.
531
532 Programmer's Note: Except by directly using `mtspr` on SVSTATE, which may
533 result in performance penalties on some hardware implementations, SVSTATE's `maxvl`
534 field may only be set **statically** as an immediate, by the `setvl` instruction.
535 It may **NOT** be set dynamically from a register. Compiler writers and assembly
536 programmers are expected to perform static register file analysis, subdivision,
537 and allocation and only utilise `setvl`. Direct writing to SVSTATE in order to
538 "bypass" this Note could, in less-advanced implementations, potentially cause stalling,
539 particularly if SVP64 instructions are issued directly after the `mtspr` to SVSTATE.
540
541 **Vector Length (vl)** <a name="vl" />
542
543 The actual Vector length, the number of elements in a "Vector", `SVSTATE.vl` may be set
544 entirely dynamically at runtime from a number of sources. `setvl` is the primary
545 instruction for setting Vector Length.
546 `setvl` is conceptually similar but different from the Cray, SX Aurora, and RISC-V RVV
547 equivalent. Similar to RVV, VL is set to be within
548 the range 0 <= VL <= MVL. Unlike RVV, VL is set **exactly** according to the following:
549
550 VL = (RT|0) = MIN(vlen, MVL)
551
552 where 0 <= MVL <= 127 and vlen may come from an immediate, `RA`, or from the `CTR` SPR,
553 depending on options selected with the `setvl` instruction.
554
555 Programmer's Note: conceptual understanding of Cray-style Vectors is far beyond the scope
556 of the Power ISA Technical Reference. Guidance on the 50-year-old Cray Vector paradigm is
557 best sought elsewhere: good studies include Academic Courses given on the 1970s
558 Cray Supercomputers over at least the past three decades.
559
560 **SUBVL - Sub Vector Length**
561
562 This is a "group by quantity" that effectively asks each iteration
563 of the hardware loop to load SUBVL elements of width elwidth at a
564 time. Effectively, SUBVL is like a SIMD multiplier: instead of just 1
565 operation issued, SUBVL operations are issued.
566
567 The main effect of SUBVL is that predication bits are applied per
568 **group**, rather than by individual element. Legal values are 0 to 3,
569 representing 1 operation (1 element) thru 4 operations (4 elements) respectively.
570 Elements are best though of in the context of 3D, Audio and Video: two Left and Right
571 Channel "elements" or four ARGB "elements", or three XYZ coordinate "elements".
572
573 `subvl` is again primarily set by the `setvl` instruction. Not to be confused
574 with `hphint`.
575
576 Directly related to `subvl` is the `pack` and `unpack` Mode bits of `SVSTATE`.
577 See `svstep` instruction for how to set Pack and Unpack Modes.
578
579
580 **Horizontal Parallelism**
581
582 A problem exists for hardware where it may not be able to detect
583 that a programmer (or compiler) knows of opportunities for parallelism
584 and lack of overlap between loops.
585
586 For hphint, the number chosen must be consistently
587 executed **every time**. Hardware is not permitted to execute five
588 computations for one instruction then three on the next.
589 hphint is a hint from the compiler to hardware that exactly this
590 many elements may be safely executed in parallel, without hazards
591 (including Memory accesses).
592 Interestingly, when hphint is set equal to VL, it is in effect
593 as if Vertical First mode were not set, because the hardware is
594 given the option to run through all elements in an instruction.
595 This is exactly what Horizontal-First is: a for-loop from 0 to VL-1
596 except that the hardware may *choose* the number of elements.
597
598 *Note to programmers: changing VL during the middle of such modes
599 should be done only with due care and respect for the fact that SVSTATE
600 has exactly the same peer-level status as a Program Counter.*
601
602 -------------
603
604 \newpage{}
605
606 # SVL-Form
607
608 Add the following to Book I, 1.6.1, SVL-Form
609
610 ```
611 |0 |6 |11 |16 |23 |24 |25 |26 |31 |
612 | PO | RT | RA | SVi |ms |vs |vf | XO |Rc |
613 | PO | RT | / | SVi |/ |/ |vf | XO |Rc |
614 ```
615
616 * Add `SVL` to `RA (11:15)` Field in Book I, 1.6.2
617 * Add `SVL` to `RT (6:10)` Field in Book I, 1.6.2
618 * Add `SVL` to `Rc (31)` Field in Book I, 1.6.2
619 * Add `SVL` to `XO (26:31)` Field in Book I, 1.6.2
620
621 Add the following to Book I, 1.6.2
622
623 ```
624 ms (23)
625 Field used in Simple-V to specify whether MVL (maxvl in the SVSTATE SPR)
626 is to be set
627 Formats: SVL
628 vf (25)
629 Field used in Simple-V to specify whether "Vertical" Mode is set
630 (vfirst in the SVSTATE SPR)
631 Formats: SVL
632 vs (24)
633 Field used in Simple-V to specify whether VL (vl in the SVSTATE SPR) is to be set
634 Formats: SVL
635 SVi (16:22)
636 Simple-V immediate field used by setvl for setting VL or MVL
637 (vl, maxvl in the SVSTATE SPR)
638 and used as a "Mode of Operation" selector in svstep
639 Formats: SVL
640 ```
641
642 # Appendices
643
644 Appendix E Power ISA sorted by opcode
645 Appendix F Power ISA sorted by version
646 Appendix G Power ISA sorted by Compliancy Subset
647 Appendix H Power ISA sorted by mnemonic
648
649 | Form | Book | Page | Version | mnemonic | Description |
650 |------|------|------|---------|----------|-------------|
651 | SVL | I | # | 3.0B | svstep | Vertical-First Stepping and status reporting |
652 | SVL | I | # | 3.0B | setvl | Cray-like establishment of Looping (Vector) context |
653