-[[!tag standards]]
-
-# DRAFT setvl/setvli
+# setvl: Set Vector Length
See links:
* [[sv/svstep]]
* pseudocode [[openpower/isa/simplev]]
-Use of setvl results in changes to the SVSTATE SPR. see [[sv/sprs]]
-
-# Behaviour and Rationale
-
-SV's Vector Engine is based on Cray-style Variable-length Vectorisation,
-just like RVV. However unlike RVV, SV sits on top of the standard Scalar
-regfiles: there is no separate Vector register numbering. Therefore, also
-unlike RVV, SV does not have hard-coded "Lanes": microarchitects
-may use *ordinary* in-order, out-of-order, or superscalar designs
-as the basis for SV. By contrast, the relevant parameter
-in RVV is "MAXVL" and this is architecturally hard-coded into RVV systems,
-anywhere from 1 to tens of thousands of Lanes in supercomputers.
-
-SV is more like how MMX used to sit on top of the x86 FP regfile.
-Therefore when Vector operations are performed, the question has to
-be asked, "well, how much of the regfile do you want to allocate to
-this operation?" because if it is too small an amount performance may
-be affected, and if too large then other registers would overlap and
-cause data corruption, or even if allocated correctly would require
-spill to memory.
-
-The answer effectively needs to be parameterised. Hence: MAXVL (MVL)
-is set from an immediate, so that the compiler may decide, statically, a
-guaranteed resource allocation according to the needs of the application.
-
-While RVV's MAXVL was a hw limit, SV's MVL is simply a loop
-optimization. It does not carry side-effects for the arch, though for
-a specific cpu it may affect hw unit usage.
-
-Other than being able to set MVL, SV's VL (Vector Length) works just like
-RVV's VL, with one minor twist. RVV permits the `setvl` instruction to
-set VL to an arbitrary explicit value. Within the limit of MVL, VL
-**MUST** be set to the requested value. Given that RVV only works on Vector Loops,
-this is fine and part of its value and design. However, SV sits on top
-of the standard register files. When MVL=VL=2, a Vector Add on `r3`
-will perform two Scalar Adds: one on `r3` and one on `r4`.
-
-Thus there is the opportunity to set VL to an explicit value (within the
-limits of MVL) with the reasonable expectation that if two operations
-are requested (by setting VL=2) then two operations are guaranteed.
-This avoids the need for a loop (with not-insignificant use of the
-regfiles for counters), simply two instructions:
-
- setvli r0, MVL=64, VL=64
- sv.ld *r0, 0(r30) # load exactly 64 registers from memory
+Add the following section to the Simple-V Chapter
-Page Faults etc. aside this is *guaranteed* 100% without fail to perform
-64 unit-strided LDs starting from the address pointed to by r30 and put
-the contents into r0 through r63. Thus it becomes a "LOAD-MULTI". Twin
-Predication could even be used to only load relevant registers from
-the stack. This *only works if VL is set to the requested value* rather
-than, as in RVV, allowing the hardware to set VL to an arbitrary value
-(due to variances in implementation choices).
+## setvl
-Also available is the option to set VL from CTR (`VL = MIN(CTR, MVL)`.
-In combination with SVP64 [[sv/branches]] this can save one instruction
-inside critical inner loops. A caveat: to avoid having an extra opcode
-bit in `setvl`, selection of CTR mode is slightly convoluted.
+SVL-Form
-# Format
+| 0-5|6-10|11-15|16-22 | 23 24 25 | 26-30 |31| FORM |
+| -- | -- | --- | ---- |----------| ----- |--|----------|
+|PO | RT | RA | SVi | ms vs vf | XO |Rc| SVL-Form |
-*(Allocation of opcode TBD pending OPF ISA WG approval)*,
-using EXT22 temporarily and fitting into the
-[[sv/bitmanip]] space
+* setvl RT,RA,SVi,vf,vs,ms (Rc=0)
+* setvl. RT,RA,SVi,vf,vs,ms (Rc=1)
-Form: SVL-Form (see [[isatables/fields.text]])
+Pseudo-code:
-| 0.5|6.10|11.15|16..22| 23...25 | 26.30 |31| name |
-| -- | -- | --- | ---- |----------- | ----- |--| ------- |
-|OPCD| RT | RA | SVi | ms vs vf | 11011 |Rc| setvl |
-
-Instruction format:
+```
+ overflow <- 0b0 # sets CR.SO if set and if Rc=1
+ VLimm <- SVi + 1
+ # set or get MVL
+ if ms = 1 then MVL <- VLimm[0:6]
+ else MVL <- SVSTATE[0:6]
+ # set or get VL
+ if vs = 0 then VL <- SVSTATE[7:13]
+ else if _RA != 0 then
+ if (RA) >u 0b1111111 then
+ VL <- 0b1111111
+ overflow <- 0b1
+ else VL <- (RA)[57:63]
+ else if _RT = 0 then VL <- VLimm[0:6]
+ else if CTR >u 0b1111111 then
+ VL <- 0b1111111
+ overflow <- 0b1
+ else VL <- CTR[57:63]
+ # limit VL to within MVL
+ if VL >u MVL then
+ overflow <- 0b1
+ VL <- MVL
+ SVSTATE[0:6] <- MVL
+ SVSTATE[7:13] <- VL
+ if _RT != 0 then
+ GPR(_RT) <- [0]*57 || VL
+ # MAXVL is a static "state-reset" opportunity so VF is only set then.
+ if ms = 1 then
+ SVSTATE[63] <- vf # set Vertical-First mode
+ SVSTATE[62] <- 0b0 # clear persist bit
+```
- setvl RT,RA,SVi,vf,vs,ms
- setvl. RT,RA,SVi,vf,vs,ms
+Special Registers Altered:
-Note that the immediate (`SVi`) spans 7 bits (16 to 22)
+```
+ CR0 (if Rc=1)
+ SVSTATE
+```
+* `SVi` - bits 16-22 - an immediate operand for setting MVL and/or VL
* `ms` - bit 23 - allows for setting of MVL
* `vs` - bit 24 - allows for setting of VL
* `vf` - bit 25 - sets "Vertical First Mode".
getvl r5 : setvl r5, r0, vf=0, vs=0, ms=0
getvl. r5 : setvl. r5, r0, vf=0, vs=0, ms=0
-This pseudocode op is different from [[sv/svstep]] which is used to
-perform detailed enquiries about internal state.
-
Note that whilst it is possible to set both MVL and VL from the same
immediate, it is not possible to set them to different immediates in
the same instruction. Doing so would require two instructions.
+Use of setvl results in changes to the SVSTATE SPR. see [[sv/sprs]]
+
**Selecting sources for VL**
There is considerable opcode pressure, consequently to set MVL and VL
The reasoning here is that the opportunity to set RT equal to the
immediate `SVi+1` is sacrificed in favour of setting from CTR.
-# Unusual Rc=1 behaviour
+## Unusual Rc=1 behaviour
Normally, the return result from an instruction is in `RT`. With
it being possible for `RT=0` to mean that `CTR` mode is to be read,
overflow may occur: `VL`, if set either from an immediate or from `CTR`,
may not exceed `MAXVL`, and if it is, `CR0.SO` must be set.
-Additionally, in reality it is **`VL`** being set. Therefore, rather
+In reality it is **`VL`** being set. Therefore, rather
than `CR0` testing `RT` when `Rc=1`, CR0.EQ is set if `VL=0`, CR0.GE
is set if `VL` is non-zero.
-
-*Programmers should be aware that VL, srcstep and dststep are global in nature.
-Nested looping with different schedules is perfectly possible, as is
-calling of functions, however SVSTATE (and any associated SVSTATE) should be stored on the stack.*
-
**SUBVL**
Sub-vector elements are not be considered "Vertical". The vec2/3/4
due to the order in which VL and SUBVL loops are applied being
swapped (outer-inner becomes inner-outer)
-# Examples
+## Examples
-## Core concept loop
+### Core concept loop
```
loop:
bnez a0, loop # Any more?
```
-## Loop using Rc=1
+### Loop using Rc=1
my_fn:
li r3, 1000
end:
blr
-## Load/Store-Multi (selective)
+### Load/Store-Multi (selective)
Up to 64 FPRs will be loaded, here. `r3` is set one per bit
for each FP register required to be loaded. The block of memory
setvli r0, MVL=64, VL=64
sv.stfd/sm=r3 *fp0, 0(r30) # selective store 64 FP registers
+
+[[!tag standards]]
+
+------
+
+\newpage{}
+