f7132b6b5468559bb9a215b7fe2ceb1267d65339
[libreriscv.git] / openpower / sv / setvl.mdwn
1 # setvl: Set Vector Length
2
3 See links:
4
5 * <http://lists.libre-soc.org/pipermail/libre-soc-dev/2020-November/001366.html>
6 * <https://bugs.libre-soc.org/show_bug.cgi?id=535>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=587>
8 * <https://bugs.libre-soc.org/show_bug.cgi?id=568> TODO
9 * <https://bugs.libre-soc.org/show_bug.cgi?id=927> bug - RT>=32
10 * <https://bugs.libre-soc.org/show_bug.cgi?id=862> VF Predication
11 * <https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vsetvlivsetvl-instructions>
12 * [[sv/svstep]]
13 * pseudocode [[openpower/isa/simplev]]
14
15 Add the following section to the Simple-V Chapter
16
17 ## setvl
18
19 SVL-Form
20
21 | 0-5|6-10|11-15|16-22 | 23 24 25 | 26-30 |31| FORM |
22 | -- | -- | --- | ---- |----------| ----- |--|----------|
23 |PO | RT | RA | SVi | ms vs vf | XO |Rc| SVL-Form |
24
25 * setvl RT,RA,SVi,vf,vs,ms (Rc=0)
26 * setvl. RT,RA,SVi,vf,vs,ms (Rc=1)
27
28 Pseudo-code:
29
30 ```
31 overflow <- 0b0 # sets CR.SO if set and if Rc=1
32 VLimm <- SVi + 1
33 # set or get MVL
34 if ms = 1 then MVL <- VLimm[0:6]
35 else MVL <- SVSTATE[0:6]
36 # set or get VL
37 if vs = 0 then VL <- SVSTATE[7:13]
38 else if _RA != 0 then
39 if (RA) >u 0b1111111 then
40 VL <- 0b1111111
41 overflow <- 0b1
42 else VL <- (RA)[57:63]
43 else if _RT = 0 then VL <- VLimm[0:6]
44 else if CTR >u 0b1111111 then
45 VL <- 0b1111111
46 overflow <- 0b1
47 else VL <- CTR[57:63]
48 # limit VL to within MVL
49 if VL >u MVL then
50 overflow <- 0b1
51 VL <- MVL
52 SVSTATE[0:6] <- MVL
53 SVSTATE[7:13] <- VL
54 if _RT != 0 then
55 GPR(_RT) <- [0]*57 || VL
56 # MAXVL is a static "state-reset" opportunity so VF is only set then.
57 if ms = 1 then
58 SVSTATE[63] <- vf # set Vertical-First mode
59 SVSTATE[62] <- 0b0 # clear persist bit
60 ```
61
62 Special Registers Altered:
63
64 ```
65 CR0 (if Rc=1)
66 SVSTATE
67 ```
68
69 * `SVi` - bits 16-22 - an immediate operand for setting MVL and/or VL
70 * `ms` - bit 23 - allows for setting of MVL
71 * `vs` - bit 24 - allows for setting of VL
72 * `vf` - bit 25 - sets "Vertical First Mode".
73
74 Note that in immediate setting mode VL and MVL start from **one**
75 but that this is compensated for in the assembly notation.
76 i.e. that an immediate value of 1 in assembler notation
77 actually places the value 0b0000000 in the `SVi` field bits:
78 on execution the `setvl` instruction adds one to the decoded
79 `SVi` field bits, resulting in
80 VL/MVL being set to 1. This allows VL to be set to values
81 ranging from 1 to 128 with only 7 bits instead of 8.
82 Setting VL/MVL
83 to 0 would result in all Vector operations becoming `nop`. If this is
84 truly desired (nop behaviour) then setting VL and MVL to zero is to be
85 done via the [[SVSTATE SPR|sv/sprs]].
86
87 Note that setmvli is a pseudo-op, based on RA/RT=0, and setvli likewise
88
89 setvli VL=8 : setvl r0, r0, VL=8, vf=0, vs=1, ms=0
90 setvli. VL=8 : setvl. r0, r0, VL=8, vf=0, vs=1, ms=0
91 setmvli MVL=8 : setvl r0, r0, MVL=8, vf=0, vs=0, ms=1
92 setmvli. MVL=8 : setvl. r0, r0, MVL=8, vf=0, vs=0, ms=1
93
94 Additional pseudo-op for obtaining VL without modifying it (or any state):
95
96 getvl r5 : setvl r5, r0, vf=0, vs=0, ms=0
97 getvl. r5 : setvl. r5, r0, vf=0, vs=0, ms=0
98
99 Note that whilst it is possible to set both MVL and VL from the same
100 immediate, it is not possible to set them to different immediates in
101 the same instruction. Doing so would require two instructions.
102
103 Use of setvl results in changes to the SVSTATE SPR. see [[sv/sprs]]
104
105 **Selecting sources for VL**
106
107 There is considerable opcode pressure, consequently to set MVL and VL
108 from different sources is as follows:
109
110 | condition | effect |
111 | - | - |
112 | `vs=1, RA=0, RT!=0` | VL,RT set to MIN(MVL, CTR) |
113 | `vs=1, RA=0, RT=0` | VL set to MIN(MVL, SVi+1) |
114 | `vs=1, RA!=0, RT=0` | VL set to MIN(MVL, RA) |
115 | `vs=1, RA!=0, RT!=0` | VL,RT set to MIN(MVL, RA) |
116
117 The reasoning here is that the opportunity to set RT equal to the
118 immediate `SVi+1` is sacrificed in favour of setting from CTR.
119
120 ## Unusual Rc=1 behaviour
121
122 Normally, the return result from an instruction is in `RT`. With
123 it being possible for `RT=0` to mean that `CTR` mode is to be read,
124 some different semantics are needed.
125
126 CR Field 0, when `Rc=1`, may be set even if `RT=0`. The reason is that
127 overflow may occur: `VL`, if set either from an immediate or from `CTR`,
128 may not exceed `MAXVL`, and if it is, `CR0.SO` must be set.
129
130 In reality it is **`VL`** being set. Therefore, rather
131 than `CR0` testing `RT` when `Rc=1`, CR0.EQ is set if `VL=0`, CR0.GE
132 is set if `VL` is non-zero.
133
134 **SUBVL**
135
136 Sub-vector elements are not be considered "Vertical". The vec2/3/4
137 is to be considered as if the "single element". Caveats exist for
138 [[sv/mv.swizzle]] and [[sv/mv.vec]] when Pack/Unpack is enabled,
139 due to the order in which VL and SUBVL loops are applied being
140 swapped (outer-inner becomes inner-outer)
141
142 ## Examples
143
144 ### Core concept loop
145
146 ```
147 loop:
148 setvl a3, a0, MVL=8 # update a3 with vl
149 # (# of elements this iteration)
150 # set MVL to 8
151 # do vector operations at up to 8 length (MVL=8)
152 # ...
153 sub a0, a0, a3 # Decrement count by vl
154 bnez a0, loop # Any more?
155 ```
156
157 ### Loop using Rc=1
158
159 my_fn:
160 li r3, 1000
161 b test
162 loop:
163 sub r3, r3, r4
164 ...
165 test:
166 setvli. r4, r3, MVL=64
167 bne cr0, loop
168 end:
169 blr
170
171 ### Load/Store-Multi (selective)
172
173 Up to 64 FPRs will be loaded, here. `r3` is set one per bit
174 for each FP register required to be loaded. The block of memory
175 from which the registers are loaded is contiguous (no gaps):
176 any FP register which has a corresponding zero bit in `r3`
177 is *unaltered*. In essence this is a selective LD-multi with
178 "Scatter" capability.
179
180 setvli r0, MVL=64, VL=64
181 sv.fld/dm=r3 *r0, 0(r30) # selective load 64 FP registers
182
183 Up to 64 FPRs will be saved, here. Again, `r3`
184
185 setvli r0, MVL=64, VL=64
186 sv.stfd/sm=r3 *fp0, 0(r30) # selective store 64 FP registers
187
188 [[!tag standards]]
189
190 ------
191
192 \newpage{}
193