980771e75b08ba443edf888740421bde18f2212f
[libreriscv.git] / openpower / sv / svstep.mdwn
1 # svstep: Vertical-First Stepping and status reporting
2
3 SVL-Form
4
5 * svstep RT,SVi,vf (Rc=0)
6 * svstep. RT,SVi,vf (Rc=1)
7
8 | 0-5|6-10|11.15|16..22| 23-25 | 26-30 |31| Form |
9 |----|----|-----|------|----------|-------|--|--------- |
10 |PO | RT | / | SVi | / / vf | XO |Rc| SVL-Form |
11
12 Pseudo-code:
13
14 ```
15 if SVi[3:4] = 0b11 then
16 # store pack and unpack in SVSTATE
17 SVSTATE[53] <- SVi[5]
18 SVSTATE[54] <- SVi[6]
19 RT <- [0]*62 || SVSTATE[53:54]
20 else
21 # Vertical-First explicit stepping.
22 step <- SVSTATE_NEXT(SVi, vf)
23 RT <- [0]*57 || step
24 ```
25
26 Special Registers Altered:
27
28 CR0 (if Rc=1)
29
30 **Description**
31
32 svstep may be used to enquire about the REMAP Schedule and it may be
33 used to alter Vectorisation State. When `vf=1` then stepping occurs.
34 When `vf=0` the enquiry is performed without altering internal state.
35 If `SVi=0, Rc=0, vf=0` the instruction is a `nop`.
36
37 The following Modes exist:
38
39 * `SVi=0`: appropriately step srcstep, dststep, subsrcstep and subdststep
40 to the next element, taking pack and unpack into consideration.
41 * When `SVi` is 1-4 the REMAP Schedule for a given SVSHAPE may be
42 returned in `RT`. SVi=1 selects SVSHAPE0 current state,
43 through to SVi=4 selects SVSHAPE3.
44 * When `SVi` is 5, `SVSTATE.srcstep` is returned.
45 * When `SVi` is 6, `SVSTATE.dststep` is returned.
46 * When `SVi` is 7, `SVSTATE.ssubstep` is returned.
47 * When `SVi` is 8, `SVSTATE.dsubstep` is returned.
48 * When `SVi` is 0b1100 pack/unpack in SVSTATE is cleared
49 * When `SVi` is 0b1101 pack in SVSTATE is set, unpack is cleared
50 * When `SVi` is 0b1110 unpack in SVSTATE is set, pack is cleared
51 * When `SVi` is 0b1111 pack/unpack in SVSTATE are set
52
53 As this is a Single-Predicated (1P) instruction, predication may be applied
54 to skip (or zero) elements.
55
56 * Vertical-First Mode will return the requested index
57 (and move to the next state if `vf=1`)
58 * Horizontal-First Mode can be used to return all indices,
59 i.e. walks through all possible states.
60
61 **Vectorisation of svstep itself**
62
63 As a 32-bit instruction, `svstep` may be itself be Vector-Prefixed, as
64 `sv.svstep`. This will work perfectly well in Horizontal-First
65 as it will in Vertical-First Mode.
66
67 Example: to obtain the full set of possible computed element
68 indices use `sv.svstep *RT,SVi,1` which will store all computed element
69 indices, starting from RT. If Rc=1 then a co-result Vector of CR Fields
70 will also be returned, comprising the "loop end-points" of each of the inner
71 loops when either Matrix Mode or DCT/FFT is set. In other words,
72 for example, when the `xdim` inner loop reaches the end and on the next
73 iteration it will begin again at zero, the CR Field `EQ` will be set.
74 With a maximum of three loops within both Matrix and DCT/FFT Modes,
75 the CR Field's EQ bit will be set at the end of the first inner loop,
76 the LE bit for the second, the GT bit for the outermost loop and the
77 SO bit set on the very last element, when all loops reach their maximum
78 extent.
79
80 *Programmer's note: VL in some situations, particularly larger
81 Matrices (5x7x3 will set MAXVL=105), will cause `sv.svstep` to return a
82 considerable number of values. Under such circumstances `sv.svstep/ew=8`
83 is recommended, followed likewise by setting elwidth=8 on `svindex`*
84
85 *Programmer's note: having conveniently obtained a pre-computed Schedule
86 with `sv.svstep`, it may then be used as the input to Indexed REMAP
87 Mode to achieve the exact same Schedule. It is evident however that
88 before use some of the Indices may be arbitrarily altered as desired.
89 `sv.svstep` helps the programmer avoid having to manually recreate
90 Indices for certain types of common Loop patterns. In its simplest form,
91 without REMAP (SVi=5 or SVi=6), is equivalent to the `iota` instruction
92 found in other Vector ISAs*
93
94 **Vertical First Mode**
95
96 Vertical First is effectively like an implicit single bit predicate
97 applied to every SVP64 instruction. **ONLY** one element in each SVP64
98 Vector instruction is executed; srcstep and dststep do **not** increment
99 automatically on completion of one instruction, and the Program Counter
100 progresses **immediately** to the next instruction just as it would for
101 any standard scalar v3.0B instruction.
102
103 A mode of srcstep (SVi=0) is called which can move srcstep and dststep
104 on to the next element, still respecting predicate masks.
105
106 In other words, where normal SVP64 Vectorisation acts "horizontally"
107 by looping first through 0 to VL-1 and only then moving the PC to the
108 next instruction, Vertical-First moves the PC onwards (vertically)
109 through multiple instructions **with the same srcstep and dststep**,
110 then an explict instruction used to advance srcstep/dststep. An outer
111 loop is expected to be used (branch instruction) which completes a series
112 of Vector operations.
113
114 Testing any end condition of any loop of any REMAP state allows branches
115 to be used to create loops.
116
117 *Programmer's note: when Predicate Non-Zeroing is used this indicates to
118 the underlying hardware that any masked-out element must be skipped.
119 *This includes in Vertical-First Mode*, and programmers should be
120 keenly aware that srcstep or dststep or both *may* jump by more than
121 one as a result, because the actual request under these circumstances
122 was to execute on the first available next *non-masked-out* element.
123 It should be evident that it is the `sv.svstep` instruction that must
124 be Predicated in order for the **entire** loop to use the Predicate
125 correctly, and it is strongly recommended for all instructions within
126 the same Vertical-First Loop to utilise the exact same Predicate Mask(s).*
127
128 Programmers should be aware that VL, srcstep and dststep and the SUBVL
129 substeps are global in nature. Nested looping with different schedules
130 is perfectly possible, as is calling of functions, however SVSTATE
131 (and any associated SVSHAPEs if REMAP is being used) should obviously
132 be stored on the stack in order to achieve this benefit not normally
133 found in Vector ISAs.
134
135 -------------
136
137 \newpage{}
138
139 # Appendix
140
141 **src_iterate**
142
143 Note that `srcstep` and `ssubstep` are not the absolute final Element
144 (and Sub-Element) offsets. `srcstep` still has to go through individual
145 `REMAP` translation before becoming a per-operand (RA, RB, RC, RT, RS)
146 Element-level Source offset.
147
148 Note also critically that `PACK` mode simply inverts the outer/order
149 loops making SUBVL the outer loop and VL the inner.
150
151 ```
152 # source-stepping iterator
153 subvl = SVSTATE.subvl
154 vl = SVSTATE.vl
155 pack = SVSTATE.pack
156 unpack = SVSTATE.unpack
157 ssubstep = SVSTATE.ssubstep
158 end_ssub = ssubstep == subvl
159 end_src = SVSTATE.srcstep == vl-1
160 # first source step.
161 srcstep = SVSTATE.srcstep
162 # used below:
163 # sz - from RM.MODE, source-zeroing
164 # srcmask - from RM.MODE, the source predicate
165 if pack:
166 # pack advances subvl in *outer* loop
167 while True:
168 assert srcstep <= vl-1
169 end_src = srcstep == vl-1
170 if end_src:
171 if end_ssub:
172 loopend = True
173 else:
174 SVSTATE.ssubstep += 1
175 srcstep = 0 # reset
176 break
177 else:
178 srcstep += 1 # advance srcstep
179 if not sz:
180 break
181 if ((1 << srcstep) & srcmask) != 0:
182 break
183 else:
184 # advance subvl in *inner* loop
185 if end_ssub:
186 while True:
187 assert srcstep <= vl-1
188 end_src = srcstep == vl-1
189 if end_src: # end-point
190 loopend = True
191 srcstep = 0
192 break
193 else:
194 srcstep += 1
195 if not sz:
196 break
197 if ((1 << srcstep) & srcmask) != 0:
198 break
199 else:
200 log(" sskip", bin(srcmask), bin(1 << srcstep))
201 SVSTATE.ssubstep = 0b00 # reset
202 else:
203 # advance ssubstep
204 SVSTATE.ssubstep += 1
205
206 SVSTATE.srcstep = srcstep
207 ```
208
209 -------------
210
211 \newpage{}
212
213 **dest_iterate**
214
215 Note that `dststep` and `dsubstep` are not the absolute final Element
216 (and Sub-Element) offsets. `dststep` still has to go through individual
217 `REMAP` translation before becoming a per-operand (RT, RS/EA) destination
218 Element-level offset, and `dsubstep` may also go through `(f)mv.swizzle`
219 reordering.
220
221 Note also critically that `UNPACK` mode simply inverts the outer/order
222 loops making SUBVL the outer loop and VL the inner.
223
224 ```
225 # dest step iterator
226 vl = SVSTATE.vl
227 subvl = SVSTATE.subvl
228 unpack = SVSTATE.unpack
229 dsubstep = SVSTATE.dsubstep
230 end_dsub = dsubstep == subvl
231 dststep = SVSTATE.dststep
232 end_dst = dststep == vl-1
233 # used below:
234 # dz - from RM.MODE, destination-zeroing
235 # dstmask - from RM.MODE, the destination predicate
236 if unpack:
237 # unpack advances subvl in *outer* loop
238 while True:
239 assert dststep <= vl-1
240 end_dst = dststep == vl-1
241 if end_dst:
242 if end_dsub:
243 loopend = True
244 else:
245 SVSTATE.dsubstep += 1
246 dststep = 0 # reset
247 break
248 else:
249 dststep += 1 # advance dststep
250 if not dz:
251 break
252 if ((1 << dststep) & dstmask) != 0:
253 break
254 else:
255 # advance subvl in *inner* loop
256 if end_dsub:
257 while True:
258 assert dststep <= vl-1
259 end_dst = dststep == vl-1
260 if end_dst: # end-point
261 loopend = True
262 dststep = 0
263 break
264 else:
265 dststep += 1
266 if not dz:
267 break
268 if ((1 << dststep) & dstmask) != 0:
269 break
270 SVSTATE.dsubstep = 0b00 # reset
271 else:
272 # advance ssubstep
273 SVSTATE.dsubstep += 1
274
275 SVSTATE.dststep = dststep
276 ```
277
278 -------------
279
280 \newpage{}
281
282 **SVSTATE_NEXT**
283
284 ```
285 if SVi = 1 then return REMAP SVSHAPE0 current offset
286 if SVi = 2 then return REMAP SVSHAPE1 current offset
287 if SVi = 3 then return REMAP SVSHAPE2 current offset
288 if SVi = 4 then return REMAP SVSHAPE3 current offset
289 if SVi = 5 then return SVSTATE.srcstep # VL source step
290 if SVi = 6 then return SVSTATE.dststep # VL dest step
291 if SVi = 7 then return SVSTATE.ssubstep # SUBVL source step
292 if SVi = 8 then return SVSTATE.dsubstep # SUBVL dest step
293
294 # SVi=0, explicit iteration requezted
295 src_iterate();
296 dst_iterate();
297 return 0
298 ```
299
300 **at_loopend**
301
302 Both Vertical-First and Horizontal-First may use this algorithm to
303 determine if the "end-of-looping" (end of Sub-Program-Counter) has
304 been reached. Horizontal-First Mode will immediately move to the
305 next instruction, where `svstep.` will set `CR0.EQ` to 1.
306
307 ```
308 # tells if this is the last possible element.
309 subvl = SVSTATE.subvl
310 vl = SVSTATE.vl
311 end_ssub = SVSTATE.ssubstep == subvl
312 end_dsub = SVSTATE.dsubstep == subvl
313 if SVSTATE.srcstep == vl-1 and end_ssub:
314 return True
315 if SVSTATE.dststep == vl-1 and end_dsub:
316 return True
317 return False
318 ```
319
320 [[!tag standards]]
321
322 -------------
323
324 \newpage{}
325
326