(no commit message)
[libreriscv.git] / openpower / sv / rfc / ls008.mdwn
1 # RFC ls008 SVP64 Management instructions
2
3 [[!tag opf_rfc]]
4
5 **URLs**:
6
7 * <https://libre-soc.org/openpower/sv/>
8 * <https://libre-soc.org/openpower/sv/rfc/ls008/>
9 * <https://bugs.libre-soc.org/show_bug.cgi?id=1040>
10 * <https://git.openpower.foundation/isa/PowerISA/issues/87>
11
12 **Severity**: Major
13
14 **Status**: New
15
16 **Date**: 24 Mar 2023
17
18 **Target**: v3.2B
19
20 **Source**: v3.0B
21
22 **Books and Section affected**:
23
24 ```
25 Book I, new Scalar Chapter. (Or, new Book on "Zero-Overhead Loop Subsystem")
26 Appendix E Power ISA sorted by opcode
27 Appendix F Power ISA sorted by version
28 Appendix G Power ISA sorted by Compliancy Subset
29 Appendix H Power ISA sorted by mnemonic
30 ```
31
32 **Summary**
33
34 ```
35 Instructions added
36 setvl - Cray-style "Set Vector Length" instruction
37 svstep - Vertical-First Mode explicit Step and Status
38 svremap - Re-Mapping of Register Element Offsets
39 svindex - General-purpose setting of SHAPEs to be re-mapped
40 svshape - Hardware-level setting of SHAPEs for element re-mapping
41 svshape2 - Hardware-level setting of SHAPEs for element re-mapping (v2)
42 ```
43
44 **Submitter**: Luke Leighton (Libre-SOC)
45
46 **Requester**: Libre-SOC
47
48 **Impact on processor**:
49
50 ```
51 Addition of six new "Zero-Overhead-Loop-Control" DSP-style Vector-style
52 Management Instructions which can be implemented extremely efficiently
53 and effectively by inserting an additional phase between Decode and Issue.
54 More complex designs are NOT adversely impacted and in fact greatly benefit
55 whilst still retaining an obvious linear sequential execution programming model.
56 ```
57
58 **Impact on software**:
59
60 ```
61 Requires support for new instructions in assembler, debuggers,
62 and related tools.
63 ```
64
65 **Keywords**:
66
67 ```
68 Cray Supercomputing, Vectorisation, Zero-Overhead-Loop-Control (ZOLC),
69 Scalable Vectors, Multi-Issue Out-of-Order, Sequential Programming Model,
70 Digital Signal Processing (DSP)
71 ```
72
73 **Motivation**
74
75 Power ISA is synonymous with Supercomputing and the early Supercomputers
76 (ETA-10, ILLIAC-IV, CDC200, Cray) had Vectorisation. It is therefore anomalous
77 that Power ISA does not have Scalable Vectors, instead having the legacy
78 "PackedSIMD" paradigm. Fortunately this presents
79 the opportunity to modernise Power ISA learning from both past ISA features and
80 mistakes placing it far above the top of Supercomputing for the next two decades
81 and beyond.
82
83 **Notes and Observations**:
84
85 1. SVP64 is very much designed for ultra-light-weight Embedded use-cases all the
86 way up to moving the bar of Supercomputing orders of magnitude above its present
87 perception, whilst retaining at all times the Sequential Programming Execution
88 Model.
89 2. This proposal is the **base** for further Extensions. These include
90 extending SVP64 onto the Scalar VSX instructions (with a **LONG TERM** view in 10+ years
91 to deprecating the PackedSIMD aspects of VSX), to be discussed at a later
92 time, the potential for extending VSX registers to 128 or beyond, and Arithmetic
93 operations to a runtime-selectable choice of 128-bit, 256-bit, 512-bit or 1024-bit.
94 3. Massive reductions in instruction count of between 2x and 20x have been demonstrated
95 with SVP64, which is far beyond anything ever achieved by any *general-purpose*
96 ISA Extension added to any ISA in the history of Computing. Normal reductions
97 expected are of the order of 5 to 10% being considered a highly worthwhile exercise
98 to pursue inclusion. not fractions of former sizes.
99 4. Other potential extensions include work inspired by EXTRA-V and Eth-Zurich "Snitch"
100 to reduce CPU workload by 95% in the case of EXTRA-V and power consumption by
101 85% in the case of Snitch. Addition massive reductions from ZOLC Research are
102 also anticipated.
103
104 **Changes**
105
106 Add the following entries to:
107
108 * Section 1.3.2 Notation
109 * the Appendices of Book I
110 * Instructions of Book I as a new Section
111 * SVL-Form of Book I Section 1.6.1.6 and 1.6.2
112
113 ----------------
114
115 \newpage{}
116
117 # Notation, Section 1.3.2
118
119 When register operands (RA, RT, BF) are prefixed by a single underscore
120 (_RT, _RA, _BF) the variable contains the contents of the instruction field
121 not the contents of the Register File referenced *by* that field. Example:
122 `_RT` contains the contents of bits 5 thru 10. The relationship
123 `RT = GPR(_RT)` is thus always true. Uses include making alternative
124 decisions within an instruction based on whether the operand field
125 is zero or non-zero.
126
127 ----------------
128
129 \newpage{}
130
131 # svstep: Vertical-First Stepping and status reporting
132
133 SVL-Form
134
135 * svstep RT,SVi,vf (Rc=0)
136 * svstep. RT,SVi,vf (Rc=1)
137
138 | 0-5|6-10|11.15|16..22| 23-25 | 26-30 |31| Form |
139 |----|----|-----|------|----------|-------|--|--------- |
140 |PO | RT | / | SVi | / / vf | XO |Rc| SVL-Form |
141
142 Pseudo-code:
143
144 ```
145 if SVi[3:4] = 0b11 then
146 # store pack and unpack in SVSTATE
147 SVSTATE[53] <- SVi[5]
148 SVSTATE[54] <- SVi[6]
149 RT <- [0]*62 || SVSTATE[53:54]
150 else
151 # Vertical-First explicit stepping.
152 step <- SVSTATE_NEXT(SVi, vf)
153 RT <- [0]*57 || step
154 ```
155
156 Special Registers Altered:
157
158 CR0 (if Rc=1)
159
160 -------------
161
162 **svstep "Mode of Enquiry"**
163
164 It is possible to
165
166 * `SVi=0`: appropriately step srcstep, dststep, subsrcstep and subdststep to the next
167 element, taking pack and unpack into consideration.
168 * `SVi=1`: test inner middle and outer
169 loop end conditions from SVSTATE0 and store in CR.EQ CR.LE CR.GT
170 * `SVi=2`: test SVSTATE1 (and return conditions)
171 * `SVi=3`: test SVSTATE2 (and return conditions)
172 * `SVi=4`: test SVSTATE3 (and return conditions)
173 * `SVi=5`: `SVSTATE.srcstep` is returned.
174 * `SVi=6`: `SVSTATE.dststep` is returned.
175 * `SVi=12`: `SVSTATE.pack` is set to zero and `SVSTATE.unpack` set to zero
176 * `SVi=13`: `SVSTATE.pack` is set to zero and `SVSTATE.unpack` set to zero
177 * `SVi=14`: `SVSTATE.pack` is set to zero and `SVSTATE.unpack` set to zero
178 * `SVi=15`: `SVSTATE.pack` is set to zero and `SVSTATE.unpack` set to zero
179
180 \newpage{}
181
182
183 # setvl
184
185 SVL-Form
186
187 | 0-5|6-10|11-15|16-22 | 23 24 25 | 26-30 |31| FORM |
188 | -- | -- | --- | ---- |----------| ----- |--|----------|
189 |PO | RT | RA | SVi | ms vs vf | XO |Rc| SVL-Form |
190
191 * setvl RT,RA,SVi,vf,vs,ms (Rc=0)
192 * setvl. RT,RA,SVi,vf,vs,ms (Rc=1)
193
194 Pseudo-code:
195
196 ```
197 overflow <- 0b0 # sets CR.SO if set and if Rc=1
198 VLimm <- SVi + 1
199 # set or get MVL
200 if ms = 1 then MVL <- VLimm[0:6]
201 else MVL <- SVSTATE[0:6]
202 # set or get VL
203 if vs = 0 then VL <- SVSTATE[7:13]
204 else if _RA != 0 then
205 if (RA) >u 0b1111111 then
206 VL <- 0b1111111
207 overflow <- 0b1
208 else VL <- (RA)[57:63]
209 else if _RT = 0 then VL <- VLimm[0:6]
210 else if CTR >u 0b1111111 then
211 VL <- 0b1111111
212 overflow <- 0b1
213 else VL <- CTR[57:63]
214 # limit VL to within MVL
215 if VL >u MVL then
216 overflow <- 0b1
217 VL <- MVL
218 SVSTATE[0:6] <- MVL
219 SVSTATE[7:13] <- VL
220 if _RT != 0 then
221 GPR(_RT) <- [0]*57 || VL
222 # MAXVL is a static "state-reset" opportunity so VF is only set then.
223 if ms = 1 then
224 SVSTATE[63] <- vf # set Vertical-First mode
225 SVSTATE[62] <- 0b0 # clear persist bit
226 ```
227
228 Special Registers Altered:
229
230 ```
231 CR0 (if Rc=1)
232 ```
233
234 * `SVi` - bits 16-22 - an immediate operand for setting MVL and/or VL
235 * `ms` - bit 23 - allows for setting of MVL
236 * `vs` - bit 24 - allows for setting of VL
237 * `vf` - bit 25 - sets "Vertical First Mode".
238
239 Note that in immediate setting mode VL and MVL start from **one**
240 but that this is compensated for in the assembly notation.
241 i.e. that an immediate value of 1 in assembler notation
242 actually places the value 0b0000000 in the `SVi` field bits:
243 on execution the `setvl` instruction adds one to the decoded
244 `SVi` field bits, resulting in
245 VL/MVL being set to 1. This allows VL to be set to values
246 ranging from 1 to 128 with only 7 bits instead of 8.
247 Setting VL/MVL
248 to 0 would result in all Vector operations becoming `nop`. If this is
249 truly desired (nop behaviour) then setting VL and MVL to zero is to be
250 done via the [[SVSTATE SPR|sv/sprs]].
251
252 Note that setmvli is a pseudo-op, based on RA/RT=0, and setvli likewise
253
254 setvli VL=8 : setvl r0, r0, VL=8, vf=0, vs=1, ms=0
255 setvli. VL=8 : setvl. r0, r0, VL=8, vf=0, vs=1, ms=0
256 setmvli MVL=8 : setvl r0, r0, MVL=8, vf=0, vs=0, ms=1
257 setmvli. MVL=8 : setvl. r0, r0, MVL=8, vf=0, vs=0, ms=1
258
259 Additional pseudo-op for obtaining VL without modifying it (or any state):
260
261 getvl r5 : setvl r5, r0, vf=0, vs=0, ms=0
262 getvl. r5 : setvl. r5, r0, vf=0, vs=0, ms=0
263
264 Note that whilst it is possible to set both MVL and VL from the same
265 immediate, it is not possible to set them to different immediates in
266 the same instruction. Doing so would require two instructions.
267
268 **Selecting sources for VL**
269
270 There is considerable opcode pressure, consequently to set MVL and VL
271 from different sources is as follows:
272
273 | condition | effect |
274 | - | - |
275 | `vs=1, RA=0, RT!=0` | VL,RT set to MIN(MVL, CTR) |
276 | `vs=1, RA=0, RT=0` | VL set to MIN(MVL, SVi+1) |
277 | `vs=1, RA!=0, RT=0` | VL set to MIN(MVL, RA) |
278 | `vs=1, RA!=0, RT!=0` | VL,RT set to MIN(MVL, RA) |
279
280 The reasoning here is that the opportunity to set RT equal to the
281 immediate `SVi+1` is sacrificed in favour of setting from CTR.
282
283 # Unusual Rc=1 behaviour
284
285 Normally, the return result from an instruction is in `RT`. With
286 it being possible for `RT=0` to mean that `CTR` mode is to be read,
287 some different semantics are needed.
288
289 CR Field 0, when `Rc=1`, may be set even if `RT=0`. The reason is that
290 overflow may occur: `VL`, if set either from an immediate or from `CTR`,
291 may not exceed `MAXVL`, and if it is, `CR0.SO` must be set.
292
293 Additionally, in reality it is **`VL`** being set. Therefore, rather
294 than `CR0` testing `RT` when `Rc=1`, CR0.EQ is set if `VL=0`, CR0.GE
295 is set if `VL` is non-zero.
296
297 # Vertical First Mode
298
299 Vertical First is effectively like an implicit single bit predicate
300 applied to every SVP64 instruction. **ONLY** one element in each
301 SVP64 Vector instruction is executed; srcstep and dststep do **not**
302 increment, and the Program Counter progresses **immediately** to
303 the next instruction just as it would for any standard scalar v3.0B
304 instruction.
305
306 An explicit mode of setvl is called which can move srcstep and
307 dststep on to the next element, still respecting predicate
308 masks.
309
310 In other words, where normal SVP64 Vectorisation acts "horizontally"
311 by looping first through 0 to VL-1 and only then moving the PC
312 to the next instruction, Vertical-First moves the PC onwards
313 (vertically) through multiple instructions **with the same
314 srcstep and dststep**, then an explict instruction used to
315 advance srcstep/dststep. An outer loop is expected to be
316 used (branch instruction) which completes a series of
317 Vector operations.
318
319 ```svfstep``` mode is enabled when vf=1, vs=0 and ms=0.
320 When Rc=1 it is possible to determine when any level of
321 loops reach an end condition, or if VL has been reached. The immediate can
322 be reinterpreted as indicating which SVSTATE (0-3)
323 should be tested and placed into CR0 (when Rc=1)
324
325 When RT is not zero, an internal stepping index may also be returned,
326 either the REMAP index or srcstep or dststep. This table is identical
327 to that of [[sv/svstep]]:
328
329 * `SVi=1`: also include inner middle and outer
330 loop end conditions from SVSTATE0 into CR.EQ CR.LE CR.GT
331 * `SVi=2`: test SVSTATE1 (and return conditions)
332 * `SVi=3`: test SVSTATE2 (and return conditions)
333 * `SVi=4`: test SVSTATE3 (and return conditions)
334 * `SVi=5`: `SVSTATE.srcstep` is returned.
335 * `SVi=6`: `SVSTATE.dststep` is returned.
336
337 Testing any end condition of any loop of any REMAP state allows branches to be used to create loops.
338
339 *Programmers should be aware that VL, srcstep and dststep are global in nature.
340 Nested looping with different schedules is perfectly possible, as is
341 calling of functions, however SVSTATE (and any associated SVSTATE) should be stored on the stack.*
342
343 **SUBVL**
344
345 Sub-vector elements are not be considered "Vertical". The vec2/3/4
346 is to be considered as if the "single element". Caveats exist for
347 [[sv/mv.swizzle]] and [[sv/mv.vec]] when Pack/Unpack is enabled,
348 due to the order in which VL and SUBVL loops are applied being
349 swapped (outer-inner becomes inner-outer)
350
351 # Examples
352
353 ## Core concept loop
354
355 ```
356 loop:
357 setvl a3, a0, MVL=8 # update a3 with vl
358 # (# of elements this iteration)
359 # set MVL to 8
360 # do vector operations at up to 8 length (MVL=8)
361 # ...
362 sub a0, a0, a3 # Decrement count by vl
363 bnez a0, loop # Any more?
364 ```
365
366 ## Loop using Rc=1
367
368 my_fn:
369 li r3, 1000
370 b test
371 loop:
372 sub r3, r3, r4
373 ...
374 test:
375 setvli. r4, r3, MVL=64
376 bne cr0, loop
377 end:
378 blr
379
380 ## Load/Store-Multi (selective)
381
382 Up to 64 FPRs will be loaded, here. `r3` is set one per bit
383 for each FP register required to be loaded. The block of memory
384 from which the registers are loaded is contiguous (no gaps):
385 any FP register which has a corresponding zero bit in `r3`
386 is *unaltered*. In essence this is a selective LD-multi with
387 "Scatter" capability.
388
389 setvli r0, MVL=64, VL=64
390 sv.fld/dm=r3 *r0, 0(r30) # selective load 64 FP registers
391
392 Up to 64 FPRs will be saved, here. Again, `r3`
393
394 setvli r0, MVL=64, VL=64
395 sv.stfd/sm=r3 *fp0, 0(r30) # selective store 64 FP registers
396
397 -------------
398
399 \newpage{}
400
401 # SVSTATE SPR
402
403 The format of the SVSTATE SPR is as follows:
404
405 | Field | Name | Description |
406 | ----- | -------- | --------------------- |
407 | 0:6 | maxvl | Max Vector Length |
408 | 7:13 | vl | Vector Length |
409 | 14:20 | srcstep | for srcstep = 0..VL-1 |
410 | 21:27 | dststep | for dststep = 0..VL-1 |
411 | 28:29 | dsubstep | for substep = 0..SUBVL-1 |
412 | 30:31 | ssubstep | for substep = 0..SUBVL-1 |
413 | 32:33 | mi0 | REMAP RA/FRA/BFA SVSHAPE0-3 |
414 | 34:35 | mi1 | REMAP RB/FRB/BFB SVSHAPE0-3 |
415 | 36:37 | mi2 | REMAP RC/FRT SVSHAPE0-3 |
416 | 38:39 | mo0 | REMAP RT/FRT/BF SVSHAPE0-3 |
417 | 40:41 | mo1 | REMAP EA/RS/FRS SVSHAPE0-3 |
418 | 42:46 | SVme | REMAP enable (RA-RT) |
419 | 47:52 | rsvd | reserved |
420 | 53 | pack | PACK (srcstrp reorder) |
421 | 54 | unpack | UNPACK (dststep order) |
422 | 55:61 | hphint | Horizontal Hint |
423 | 62 | RMpst | REMAP persistence |
424 | 63 | vfirst | Vertical First mode |
425
426 Notes:
427
428 * The entries are truncated to be within range. Attempts to set VL to
429 greater than MAXVL will truncate VL.
430 * Setting srcstep, dststep to 64 or greater, or VL or MVL to greater
431 than 64 is reserved and will cause an illegal instruction trap.
432
433 **SVSTATE Fields**
434
435 SVSTATE is a standard SPR that (if REMAP is not activated) contains sufficient
436 self-contaned information for a full context save/restore.
437 SVSTATE contains (and permits setting of):
438
439 * MVL (the Maximum Vector Length) - declares (statically) how
440 much of a regfile is to be reserved for Vector elements
441 * VL - Vector Length
442 * dststep - the destination element offset of the current parallel
443 instruction being executed
444 * srcstep - for twin-predication, the source element offset as well.
445 * ssubstep - the source subvector element offset of the current
446 parallel instruction being executed
447 * dsubstep - the destination subvector element offset of the current
448 parallel instruction being executed
449 * vfirst - Vertical First mode. srcstep, dststep and substep
450 **do not advance** unless explicitly requested to do so with
451 pseudo-op svstep (a mode of setvl)
452 * RMpst - REMAP persistence. REMAP will apply only to the following
453 instruction unless this bit is set, in which case REMAP "persists".
454 Reset (cleared) on use of the `setvl` instruction if used to
455 alter VL or MVL.
456 * Pack - if set then srcstep/substep VL/SUBVL loop-ordering is inverted.
457 * UnPack - if set then dststep/substep VL/SUBVL loop-ordering is inverted.
458 * hphint - Horizontal Parallelism Hint. Indicates that
459 no Hazards exist between groups of elements in sequential multiples of this number
460 (before REMAP). By definition: elements for which `FLOOR(srcstep/hphint)` is
461 equal *before REMAP* are in the same parallelism "group". In Vertical First Mode
462 hardware **MUST ONLY** process elements in the same group, and must stop
463 Horizontal Issue at the last element of a given group. Set to zero to indicate "no hint".
464 * SVme - REMAP enable bits, indicating which register is to be
465 REMAPed: RA, RB, RC, RT and EA are the canonical (typical) register names
466 associated with each bit, with RA being the LSB and EA being the MSB.
467 See table below for ordering. When `SVme` is zero (0b00000) REMAP
468 is **fully disabled and inactive** regardless of the contents of
469 `SVSTATE`, `mi0-mi2/mo0-mo1`, or the four `SVSHAPEn` SPRs
470 * mi0-mi2/mo0-mo1 - when the corresponding SVme bit is enabled, these
471 indicate the SVSHAPE (0-3) that the corresponding register (RA etc)
472 should use, as long as the register's corresponding SVme bit is set
473
474 Programmer's Note: the fact that REMAP is entirely dormant when `SVme` is zero
475 allows establishment of REMAP context well in advance, followed by utilising `svremap`
476 at a precise (or the very last) moment. Some implementations may exploit this
477 to cache (or take some time to prepare caches) in the background whilst other
478 (unrelated) instructions are being executed. This is particularly important to
479 bear in mind when using `svindex` which will require hardware to perform (and
480 cache) additional GPR reads.
481
482 Programmer's Note: when REMAP is activated it becomes necessary on any
483 context-switch (Interrupt or Function call) to detect (or know in advance)
484 that REMAP is enabled and to additionally save/restore the four SVSHAPE
485 SPRs, SVHAPE0-3. Given that this is expected to be a rare occurrence it was
486 deemed unreasonable to burden every context-switch or function call with
487 mandatory save/restore of SVSHAPEs, and consequently it is a *callee*
488 (and Trap Handler) responsibility. Callees (and Trap Handlers) **MUST**
489 avoid using all and any SVP64 instructions during the period where state
490 could be adversely affected. SVP64 purely relies on Scalar instructions,
491 so Scalar instructions (except the SVP64 Management ones and mtspr and
492 mfspr) are 100% guaranteed to have zero impact on SVP64 state.
493
494 **Max Vector Length (maxvl)** <a name="mvl" />
495
496 MAXVECTORLENGTH is the same concept as MVL in RISC-V RVV, except that it
497 is variable length and may be dynamically set (normally from an immediate
498 field only). MVL is limited to 7 bits
499 (in the first version of SVP64) and consequently the maximum number of
500 elements is limited to between 0 and 127.
501
502 Programmer's Note: Except by directly using `mtspr` on SVSTATE, which may
503 result in performance penalties on some hardware implementations, SVSTATE's `maxvl`
504 field may only be set **statically** as an immediate, by the `setvl` instruction.
505 It may **NOT** be set dynamically from a register. Compiler writers and assembly
506 programmers are expected to perform static register file analysis, subdivision,
507 and allocation and only utilise `setvl`. Direct writing to SVSTATE in order to
508 "bypass" this Note could, in less-advanced implementations, potentially cause stalling,
509 particularly if SVP64 instructions are issued directly after the `mtspr` to SVSTATE.
510
511 **Vector Length (vl)** <a name="vl" />
512
513 The actual Vector length, the number of elements in a "Vector", `SVSTATE.vl` may be set
514 entirely dynamically at runtime from a number of sources. `setvl` is the primary
515 instruction for setting Vector Length.
516 `setvl` is conceptually similar but different from the Cray, SX Aurora, and RISC-V RVV
517 equivalent. Similar to RVV, VL is set to be within
518 the range 0 <= VL <= MVL. Unlike RVV, VL is set **exactly** according to the following:
519
520 VL = (RT|0) = MIN(vlen, MVL)
521
522 where 0 <= MVL <= 127 and vlen may come from an immediate, `RA`, or from the `CTR` SPR,
523 depending on options selected with the `setvl` instruction.
524
525 Programmer's Note: conceptual understanding of Cray-style Vectors is far beyond the scope
526 of the Power ISA Technical Reference. Guidance on the 50-year-old Cray Vector paradigm is
527 best sought elsewhere: good studies include Academic Courses given on the 1970s
528 Cray Supercomputers over at least the past three decades.
529
530 **SUBVL - Sub Vector Length**
531
532 This is a "group by quantity" that effectively asks each iteration
533 of the hardware loop to load SUBVL elements of width elwidth at a
534 time. Effectively, SUBVL is like a SIMD multiplier: instead of just 1
535 operation issued, SUBVL operations are issued.
536
537 The main effect of SUBVL is that predication bits are applied per
538 **group**, rather than by individual element. Legal values are 0 to 3,
539 representing 1 operation (1 element) thru 4 operations (4 elements) respectively.
540 Elements are best though of in the context of 3D, Audio and Video: two Left and Right
541 Channel "elements" or four ARGB "elements", or three XYZ coordinate "elements".
542
543 `subvl` is again primarily set by the `setvl` instruction. Not to be confused
544 with `hphint`.
545
546 Directly related to `subvl` is the `pack` and `unpack` Mode bits of `SVSTATE`.
547 See `svstep` instruction for how to set Pack and Unpack Modes.
548
549
550 **Horizontal Parallelism**
551
552 A problem exists for hardware where it may not be able to detect
553 that a programmer (or compiler) knows of opportunities for parallelism
554 and lack of overlap between loops.
555
556 For hphint, the number chosen must be consistently
557 executed **every time**. Hardware is not permitted to execute five
558 computations for one instruction then three on the next.
559 hphint is a hint from the compiler to hardware that exactly this
560 many elements may be safely executed in parallel, without hazards
561 (including Memory accesses).
562 Interestingly, when hphint is set equal to VL, it is in effect
563 as if Vertical First mode were not set, because the hardware is
564 given the option to run through all elements in an instruction.
565 This is exactly what Horizontal-First is: a for-loop from 0 to VL-1
566 except that the hardware may *choose* the number of elements.
567
568 *Note to programmers: changing VL during the middle of such modes
569 should be done only with due care and respect for the fact that SVSTATE
570 has exactly the same peer-level status as a Program Counter.*
571
572 -------------
573
574 \newpage{}
575
576 # SVL-Form
577
578 Add the following to Book I, 1.6.1, SVL-Form
579
580 ```
581 |0 |6 |11 |16 |23 |24 |25 |26 |31 |
582 | PO | RT | RA | SVi |ms |vs |vf | XO |Rc |
583 | PO | RT | / | SVi |/ |/ |vf | XO |Rc |
584 ```
585
586 * Add `SVL` to `RA (11:15)` Field in Book I, 1.6.2
587 * Add `SVL` to `RT (6:10)` Field in Book I, 1.6.2
588 * Add `SVL` to `Rc (31)` Field in Book I, 1.6.2
589 * Add `SVL` to `XO (26:31)` Field in Book I, 1.6.2
590
591 Add the following to Book I, 1.6.2
592
593 ```
594 ms (23)
595 Field used in Simple-V to specify whether MVL (maxvl in the SVSTATE SPR)
596 is to be set
597 Formats: SVL
598 vf (25)
599 Field used in Simple-V to specify whether "Vertical" Mode is set
600 (vfirst in the SVSTATE SPR)
601 Formats: SVL
602 vs (24)
603 Field used in Simple-V to specify whether VL (vl in the SVSTATE SPR) is to be set
604 Formats: SVL
605 SVi (16:22)
606 Simple-V immediate field used by setvl for setting VL or MVL
607 (vl, maxvl in the SVSTATE SPR)
608 and used as a "Mode of Operation" selector in svstep
609 Formats: SVL
610 ```
611
612 # Appendices
613
614 Appendix E Power ISA sorted by opcode
615 Appendix F Power ISA sorted by version
616 Appendix G Power ISA sorted by Compliancy Subset
617 Appendix H Power ISA sorted by mnemonic
618
619 | Form | Book | Page | Version | mnemonic | Description |
620 |------|------|------|---------|----------|-------------|
621 | SVL | I | # | 3.0B | svstep | Vertical-First Stepping and status reporting |
622 | SVL | I | # | 3.0B | setvl | Cray-like establishment of Looping (Vector) context |
623