(no commit message)
[libreriscv.git] / openpower / sv / svstep.mdwn
1 # svstep: Vertical-First Stepping and status reporting
2
3 SVL-Form
4
5 * svstep RT,SVi,vf (Rc=0)
6 * svstep. RT,SVi,vf (Rc=1)
7
8 | 0-5|6-10|11.15|16..22| 23-25 | 26-30 |31| Form |
9 |----|----|-----|------|----------|-------|--|--------- |
10 |PO | RT | / | SVi | / / vf | XO |Rc| SVL-Form |
11
12 Pseudo-code:
13
14 ```
15 if SVi[3:4] = 0b11 then
16 # store pack and unpack in SVSTATE
17 SVSTATE[53] <- SVi[5]
18 SVSTATE[54] <- SVi[6]
19 RT <- [0]*62 || SVSTATE[53:54]
20 else
21 # Vertical-First explicit stepping.
22 step <- SVSTATE_NEXT(SVi, vf)
23 RT <- [0]*57 || step
24 ```
25
26 Special Registers Altered:
27
28 CR0 (if Rc=1)
29
30 **Description**
31
32 svstep may be used to enquire about the REMAP Schedule and it may be
33 used to alter Vectorisation State. When `vf=1` then stepping occurs.
34 When `vf=0` the enquiry is performed without altering internal state.
35 If `SVi=0, Rc=0, vf=0` the instruction is a `nop`.
36
37 The following Modes exist:
38
39 * `SVi=0`: appropriately step srcstep, dststep, subsrcstep and subdststep
40 to the next element, taking pack and unpack into consideration.
41 * When `SVi` is 1-4 the REMAP Schedule for a given SVSHAPE may be
42 returned in `RT`. SVi=1 selects SVSHAPE0 current state,
43 through to SVi=4 selects SVSHAPE3.
44 * When `SVi` is 5, `SVSTATE.srcstep` is returned.
45 * When `SVi` is 6, `SVSTATE.dststep` is returned.
46 * When `SVi` is 0b1100 pack/unpack in SVSTATE is cleared
47 * When `SVi` is 0b1101 pack in SVSTATE is set, unpack is cleared
48 * When `SVi` is 0b1110 unpack in SVSTATE is set, pack is cleared
49 * When `SVi` is 0b1111 pack/unpack in SVSTATE are set
50
51 As this is a Single-Predicated (1P) instruction, predication may be applied
52 to skip (or zero) elements.
53
54 * Vertical-First Mode will return the requested index
55 (and move to the next state if `vf=1`)
56 * Horizontal-First Mode can be used to return all indices,
57 i.e. walks through all possible states.
58
59 **Vectorisation of svstep itself**
60
61 As a 32-bit instruction, `svstep` may be itself be Vector-Prefixed, as
62 `sv.svstep`. This will work perfectly well in Horizontal-First
63 as it will in Vertical-First Mode.
64
65 Example: to obtain the full set of possible computed element
66 indices use `sv.svstep RT.v,SVI,1` which will store all computed element
67 indices, starting from RT. If Rc=1 then a co-result Vector of CR Fields
68 will also be returned, comprising the "loop end-points" of each of the inner
69 loops when either Matrix Mode or DCT/FFT is set. In other words,
70 for example, when the `xdim` inner loop reaches the end and on the next
71 iteration it will begin again at zero, the CR Field `EQ` will be set.
72 With a maximum of three loops within both Matrix and DCT/FFT Modes,
73 the CR Field's EQ bit will be set at the end of the first inner loop,
74 the LE bit for the second, the GT bit for the outermost loop and the
75 SO bit set on the very last element, when all loops reach their maximum
76 extent.
77
78 *Programmer's note: VL in some situations, particularly larger Matrices
79 (5x7x3 will set MAXVL=105),
80 will cause `sv.svstep` to return a considerable number of values. Under
81 such circumstances `sv.svstep/ew=8` is recommended.*
82
83 *Programmer's note: having conveniently obtained a pre-computed
84 Schedule with `sv.svstep`,
85 it may then be used as the input to Indexed REMAP Mode
86 to achieve the exact same Schedule. It is evident however that
87 before use some of the Indices may be arbitrarily altered as desired.
88 `sv.svstep` helps the programmer avoid having to manually recreate
89 Indices for certain
90 types of common Loop patterns. In its simplest form, without REMAP
91 (SVi=5 or SVi=6),
92 is equivalent to the `iota` instruction found in other Vector ISAs*
93
94 **Vertical First Mode**
95
96 Vertical First is effectively like an implicit single bit predicate
97 applied to every SVP64 instruction. **ONLY** one element in each
98 SVP64 Vector instruction is executed; srcstep and dststep do **not**
99 increment automatically on completion of one instruction,
100 and the Program Counter progresses **immediately** to
101 the next instruction just as it would for any standard scalar v3.0B
102 instruction.
103
104 A mode of srcstep (SVi=0) is called which can move srcstep and
105 dststep on to the next element, still respecting predicate
106 masks.
107
108 In other words, where normal SVP64 Vectorisation acts "horizontally"
109 by looping first through 0 to VL-1 and only then moving the PC
110 to the next instruction, Vertical-First moves the PC onwards
111 (vertically) through multiple instructions **with the same
112 srcstep and dststep**, then an explict instruction used to
113 advance srcstep/dststep. An outer loop is expected to be
114 used (branch instruction) which completes a series of
115 Vector operations.
116
117 Testing any end condition of any loop of any REMAP state allows branches to be
118 used to create loops.
119
120 Programmer's note: when Predicate Non-Zeroing is used this indicates to
121 the underlying hardware that any masked-out element must be skipped.
122 *This includes in Vertical-First Mode*, and programmers should be keenly
123 aware that srcstep or dststep or both *may* jump by more than one as
124 a result, because the actual request under these circumstances was to execute
125 on the first available next *non-masked-out* element. It should be
126 evident that it is the `sv.svstep` instruction that must be Predicated
127 in order for the **entire** loop to use the Predicate correctly, and
128 it is strongly recommended for all instructions within the same
129 Vertical-First Loop to utilise the exact same Predicate Mask(s).*
130
131 Programmers should be aware that VL, srcstep and dststep and
132 the SUBVL substeps are global in nature.
133 Nested looping with different schedules is perfectly possible, as is
134 calling of functions, however SVSTATE (and any associated SVSHAPEs
135 if REMAP is being used) should
136 obviously be stored on the stack in order to achieve this benefit
137 not normally found in Vector ISAs.
138
139 -------------
140
141 \newpage{}
142
143 # Appendix
144
145 **SVSTATE_NEXT**
146
147 ```
148 if SVi = 1 then return REMAP SVSHAPE0 current offset
149 if SVi = 2 then return REMAP SVSHAPE1 current offset
150 if SVi = 3 then return REMAP SVSHAPE2 current offset
151 if SVi = 4 then return REMAP SVSHAPE3 current offset
152 if SVi = 5 then return SVSTATE.srcstep
153 if SVi = 6 then return SVSTATE.dststep
154 if SVi = 7 then return SVSTATE.ssubstep # SUBVL source step
155 if SVi = 8 then return SVSTATE.dsubstep # SUBVL dest step
156 # SVi=0, explicit iteration requezted
157 src_iterate();
158 dst_iterate();
159 return 0
160 ```
161
162 **ADVANCE_STEPS**
163
164 ```
165 def src_iterate(): # source-stepping iterator
166 subvl = self.subvl
167 vl = self.svstate.vl
168 pack = self.svstate.pack
169 unpack = self.svstate.unpack
170 ssubstep = self.svstate.ssubstep
171 end_ssub = ssubstep == subvl
172 end_src = self.svstate.srcstep == vl-1
173 # first source step
174 srcstep = self.svstate.srcstep
175 srcmask = self.srcmask
176 if pack:
177 # pack advances subvl in *outer* loop
178 while True:
179 assert srcstep <= vl-1
180 end_src = srcstep == vl-1
181 if end_src:
182 if end_ssub:
183 self.loopend = True
184 else:
185 self.svstate.ssubstep += SelectableInt(1, 2)
186 srcstep = 0 # reset
187 break
188 else:
189 srcstep += 1 # advance srcstep
190 if not self.srcstep_skip:
191 break
192 if ((1 << srcstep) & srcmask) != 0:
193 break
194 else:
195 # advance subvl in *inner* loop
196 if end_ssub:
197 while True:
198 assert srcstep <= vl-1
199 end_src = srcstep == vl-1
200 if end_src: # end-point
201 self.loopend = True
202 srcstep = 0
203 break
204 else:
205 srcstep += 1
206 if not self.srcstep_skip:
207 break
208 if ((1 << srcstep) & srcmask) != 0:
209 break
210 else:
211 log(" sskip", bin(srcmask), bin(1 << srcstep))
212 self.svstate.ssubstep = SelectableInt(0, 2) # reset
213 else:
214 # advance ssubstep
215 self.svstate.ssubstep += SelectableInt(1, 2)
216
217 self.svstate.srcstep = SelectableInt(srcstep, 7)
218
219 def dst_iterate(): # dest step iterator
220 vl = self.svstate.vl
221 subvl = self.subvl
222 pack = self.svstate.pack
223 unpack = self.svstate.unpack
224 dsubstep = self.svstate.dsubstep
225 end_dsub = dsubstep == subvl
226 dststep = self.svstate.dststep
227 end_dst = dststep == vl-1
228 dstmask = self.dstmask
229 # now dest step
230 if unpack:
231 # unpack advances subvl in *outer* loop
232 while True:
233 assert dststep <= vl-1
234 end_dst = dststep == vl-1
235 if end_dst:
236 if end_dsub:
237 self.loopend = True
238 else:
239 self.svstate.dsubstep += SelectableInt(1, 2)
240 dststep = 0 # reset
241 break
242 else:
243 dststep += 1 # advance dststep
244 if not self.dststep_skip:
245 break
246 if ((1 << dststep) & dstmask) != 0:
247 break
248 else:
249 # advance subvl in *inner* loop
250 if end_dsub:
251 while True:
252 assert dststep <= vl-1
253 end_dst = dststep == vl-1
254 if end_dst: # end-point
255 self.loopend = True
256 dststep = 0
257 break
258 else:
259 dststep += 1
260 if not self.dststep_skip:
261 break
262 if ((1 << dststep) & dstmask) != 0:
263 break
264 self.svstate.dsubstep = SelectableInt(0, 2) # reset
265 else:
266 # advance ssubstep
267 self.svstate.dsubstep += SelectableInt(1, 2)
268
269 self.svstate.dststep = SelectableInt(dststep, 7)
270
271 ```
272
273 [[!tag standards]]
274
275 -------------
276
277 \newpage{}
278
279