(no commit message)
[libreriscv.git] / simple_v_extension / sv_prefix_proposal.rst
1 SimpleV Prefix (SVprefix) Proposal v0.2
2 =======================================
3
4 This proposal is designed to be able to operate without SVcsr, but not to
5 require the absence of SVcsr.
6
7 Conventions
8 ===========
9
10 Conventions used in this document:
11 - Bits are numbered starting from 0 at the LSB, so bit 3 is 1 in the integer 8.
12 - Bit ranges are inclusive on both ends, so 5:3 means bits 5, 4, and 3.
13
14 Operations work on variable-length vectors of sub-vectors, where each sub-vector
15 has a length *svlen*, and an element type *etype*. When the vectors are stored
16 in registers, all elements are packed so that there is no padding in-between
17 elements of the same vector. The number of bytes in a sub-vector, *svsz*, is the
18 product of *svlen* and the element size in bytes.
19
20 Half-Precision Floating Point (FP16)
21 ====================================
22 If the F extension is supported, SVprefix adds support for FP16 in the
23 base FP instructions by using 10 (H) in the floating-point format field *fmt*
24 and using 001 (H) in the floating-point load/store *width* field.
25
26 Compressed Instructions
27 =======================
28 This proposal doesn't include any prefixed RVC instructions, instead, it will
29 include 32-bit instructions that are compressed forms of SVprefix 48-bit
30 instructions, in the same manner that RVC instructions are compressed forms of
31 RVI instructions. The compressed instructions will be defined later by
32 considering which 48-bit instructions are the most common.
33
34 48-bit Prefixed Instructions
35 ============================
36 All 48-bit prefixed instructions contain a 32-bit "base" instruction as the
37 last 4 bytes. Since all 32-bit instructions have bits 1:0 set to 11, those bits
38 are reused for additional encoding space in the 48-bit instructions.
39
40 64-bit Prefixed Instructions
41 ============================
42
43 TODO. Really need to resolve vitp7 by reducing lsk to 2 bits, or just use
44 0b111111 as the prefix, then lsk can remain at 3 bits.
45
46 48-bit Instruction Encodings
47 ============================
48
49 In the following table, *Reserved* entries must be zero. RV32 equivalent encodings
50 included for side-by-side comparison (and listed below, separately).
51
52 First, bits 17:0:
53
54 +---------------+--------+------------+------------+-----+------------+-------------+------+------------+--------+
55 | Encoding | 17 | 16 | 15 | 14 | 13 | 12 | 11:7 | 6 | 5:0 |
56 +---------------+--------+------------+------------+-----+------------+-------------+------+------------+--------+
57 | P48-LD-type | rd[5] | rs1[5] | vitp7[6] | vd | vs1 | vitp7[5:0] | *Reserved* | 011111 |
58 +---------------+--------+------------+------------+-----+------------+-------------+------+------------+--------+
59 | P48-ST-type |vitp7[6]| rs1[5] | rs2[5] | vs2 | vs1 | vitp7[5:0] | *Reserved* | 011111 |
60 +---------------+--------+------------+------------+-----+------------+-------------+------+------------+--------+
61 | P48-R-type | rd[5] | rs1[5] | rs2[5] | vs2 | vs1 | vitp6 | *Reserved* | 011111 |
62 +---------------+--------+------------+------------+-----+------------+--------------------+------------+--------+
63 | P48-I-type | rd[5] | rs1[5] | vitp7[6] | vd | vs1 | vitp7[5:0] | *Reserved* | 011111 |
64 +---------------+--------+------------+------------+-----+------------+--------------------+------------+--------+
65 | P48-U-type | rd[5] | *Reserved* | *Reserved* | vd | *Reserved* | vitp6 | *Reserved* | 011111 |
66 +---------------+--------+------------+------------+-----+------------+-------------+------+------------+--------+
67 | P48-FR-type | rd[5] | rs1[5] | rs2[5] | vs2 | vs1 | *Reserved* | vtp5 | *Reserved* | 011111 |
68 +---------------+--------+------------+------------+-----+------------+-------------+------+------------+--------+
69 | P48-FI-type | rd[5] | rs1[5] | vitp7[6] | vd | vs1 | vitp7[5:0] | *Reserved* | 011111 |
70 +---------------+--------+------------+------------+-----+------------+-------------+------+------------+--------+
71 | P48-FR4-type | rd[5] | rs1[5] | rs2[5] | vs2 | rs3[5] | vs3 [#fr4]_ | vtp5 | *Reserved* | 011111 |
72 +---------------+--------+------------+------------+-----+------------+-------------+------+------------+--------+
73
74 .. [#fr4] Only vs2 and vs3 are included in the P48-FR4-type encoding because
75 there is not enough space for vs1 as well, and because it is more
76 useful to have a scalar argument for each of the multiplication and
77 addition portions of fmadd than to have two scalars on the
78 multiplication portion.
79
80 Table showing correspondance between P48-*-type and RV32-*-type. These are
81 bits 47:18 (RV32 shifted up by 16 bits):
82
83 +---------------+---------------+
84 | Encoding | 47:18 |
85 +---------------+---------------+
86 | RV32 Encoding | 31:2 |
87 +---------------+---------------+
88 | P48-LD-type | RV32-I-type |
89 +---------------+---------------+
90 | P48-ST-type | RV32-S-Type |
91 +---------------+---------------+
92 | P48-R-type | RV32-R-Type |
93 +---------------+---------------+
94 | P48-I-type | RV32-I-Type |
95 +---------------+---------------+
96 | P48-U-type | RV32-U-Type |
97 +---------------+---------------+
98 | P48-FR-type | RV32-FR-Type |
99 +---------------+---------------+
100 | P48-FI-type | RV32-I-Type |
101 +---------------+---------------+
102 | P48-FR4-type | RV32-FR-type |
103 +---------------+---------------+
104
105 Table showing Standard RV32 encodings:
106
107 +---------------+-------------+-------+----------+----------+--------+----------+--------+--------+------------+
108 | Encoding | 31:27 | 26:25 | 24:20 | 19:15 | 14:12 | 11:7 | 6:2 | 1 | 0 |
109 +---------------+-------------+-------+----------+----------+--------+----------+--------+--------+------------+
110 | RV32-R-type + funct7 + rs2[4:0] + rs1[4:0] + funct3 | rd[4:0] + opcode + 1 + 1 |
111 +---------------+-------------+-------+----------+----------+--------+----------+--------+--------+------------+
112 | RV32-S-type + imm[11:5] + rs2[4:0] + rs1[4:0] + funct3 | imm[4:0] + opcode + 1 + 1 |
113 +---------------+-------------+-------+----------+----------+--------+----------+--------+--------+------------+
114 | RV32-I-type + imm[11:0] + rs1[4:0] + funct3 | rd[4:0] + opcode + 1 + 1 |
115 +---------------+-------------+-------+----------+----------+--------+----------+--------+--------+------------+
116 | RV32-U-type + imm[31:12] | rd[4:0] + opcode + 1 + 1 |
117 +---------------+-------------+-------+----------+----------+--------+----------+--------+--------+------------+
118 | RV32-FR4-type + rs3[4:0] + fmt + rs2[4:0] + rs1[4:0] + funct3 | rd[4:0] + opcode + 1 + 1 |
119 +---------------+-------------+-------+----------+----------+--------+----------+--------+--------+------------+
120 | RV32-FR-type + funct5 + fmt + rs2[4:0] + rs1[4:0] + rm | rd[4:0] + opcode + 1 + 1 |
121 +---------------+-------------+-------+----------+----------+--------+----------+--------+--------+------------+
122
123 64-bit Instruction Encodings
124 ============================
125
126 TODO (please disregard)
127
128 +--------------+-------+-------+--------+--------+--------+----------+
129 | Encoding | 63:58 | 57 | 56 | 55 | 54 | 53:48 |
130 +--------------+-------+-------+--------+--------+--------+----------+
131 | P64-LD-type | VLtyp | rd[6] | rs1[6] | | | MVLtp |
132 +--------------+-------+-------+--------+--------+--------+----------+
133 | P64-ST-type | VLtyp | | rs1[6] | rs2[6] | | MVLtp |
134 +--------------+-------+-------+--------+--------+--------+----------+
135 | P64-R-type | VLtyp | rd[6] | rs1[6] | rs2[6] | | MVLtp |
136 +--------------+-------+-------+--------+--------+--------+----------+
137 | P64-I-type | VLtyp | rd[6] | rs1[6] | | | MVLtp |
138 +--------------+-------+-------+--------+--------+--------+----------+
139 | P64-U-type | VLtyp | rd[6] | | | | MVLtp |
140 +--------------+-------+-------+--------+--------+--------+----------+
141 | P64-FR-type | VLtyp | | rs1[6] | rs2[6] | | MVLtp |
142 +--------------+-------+-------+--------+--------+--------+----------+
143 | P64-FI-type | VLtyp | rd[6] | rs1[6] | rs2[6] | | MVLtp |
144 +--------------+-------+-------+--------+--------+--------+----------+
145 | P64-FR4-type | VLtyp | rd[6] | rs1[6] | rs2[6] | rs3[6] | MVLtp |
146 +--------------+-------+-------+--------+--------+--------+----------+
147
148 VLtyp
149
150 +--------------+---------+
151 | vtyp[5:1] | vtyp[0] |
152 +--------------+---------+
153 | regnum | 1 |
154 +--------------+---------+
155 | immed | 0 |
156 +--------------+---------+
157
158 Just as in the VLIW format, when bit 0 of vtyp is zero, bits 1 to 5 specify the scalar register that VL is set from. When bit 0 is 1, VL is set to the immediate (plus one).
159
160 vs#/vd Fields' Encoding
161 =======================
162
163 +--------+----------+----------------------------------------------------------+
164 | vs#/vd | Mnemonic | Meaning |
165 +========+==========+==========================================================+
166 | 0 | S | the rs#/rd field specifies a scalar (single sub-vector); |
167 | | | the rs#/rd field is zero-extended to get the actual |
168 | | | 7-bit register number |
169 +--------+----------+----------------------------------------------------------+
170 | 1 | V | the rs#/rd field specifies a vector; the rs#/rd field is |
171 | | | decoded using the `Vector Register Number Encoding`_ to |
172 | | | get the actual 7-bit register number |
173 +--------+----------+----------------------------------------------------------+
174
175 If a vs#/vd field is not present, it is as if it was present with a value that
176 is the bitwise-or of all present vs#/vd fields.
177
178 * scalar register numbers do NOT increment when allocated in the
179 hardware for-loop. the same scalar register number is handed
180 to every ALU.
181
182 * vector register numbers *DO* increase when allocated in the
183 hardware for-loop. sequentially-increasing register data
184 is handed to sequential ALUs.
185
186 Vector Register Number Encoding
187 ===============================
188
189 When vs#/vd is 1, the actual 7-bit register number is derived from the
190 corresponding 6-bit rs#/rd field:
191
192 +---------------------------------+
193 | Actual 7-bit register number |
194 +===========+=============+=======+
195 | Bit 6 | Bits 5:1 | Bit 0 |
196 +-----------+-------------+-------+
197 | rs#/rd[0] | rs#/rd[5:1] | 0 |
198 +-----------+-------------+-------+
199
200 TODO: similar scheme for 64-bit encoding (incorporating extra bit rs#/rd[6] from 64-bit encoding)
201
202 Load/Store Kind (lsk) Field Encoding
203 ====================================
204
205 +--------+-----+--------------------------------------------------------------------------------+
206 | vd/vs2 | vs1 | Meaning |
207 +========+=====+================================================================================+
208 | 0 | 0 | srcbase is scalar, LD/ST is pure scalar. |
209 +--------+-----+--------------------------------------------------------------------------------+
210 | 1 | 0 | srcbase is scalar, LD/ST is unit strided |
211 +--------+-----+--------------------------------------------------------------------------------+
212 | 0 | 1 | srcbase is a vector (gather/scatter aka array of srcbases). VSPLAT and VSELECT |
213 +--------+-----+--------------------------------------------------------------------------------+
214 | 1 | 1 | srcbase is a vector, LD/ST is a full vector LD/ST. |
215 +--------+-----+--------------------------------------------------------------------------------+
216
217 Notes:
218
219 * A register strided LD/ST would require *5* registers. srcbase, vd/vs2, predicate 1, predicate 2 and the stride register.
220 * Complex strides may all be done with a general purpose vector of srcbases.
221 * Twin predication may be used even when vd/vs1 is a scalar, to give VSPLAT and VSELECT, because the hardware loop ends on the first occurrence of a 1 in the predicate when a predicate is applied to a scalar.
222 * Full vectorised gather/scatter is enabled when both registers are marked as vectorised, however unlike e.g Intel AVX512, twin predication can be applied.
223
224 Open question: RVV overloads the width field of LOAD-FP/STORE-FP using the bit 2 to indicate additional interpretation of the 11 bit immediate. Should this be considered?
225
226
227 Sub-Vector Length (svlen) Field Encoding
228 =======================================================
229
230 +----------------+-------+
231 | svlen Encoding | Value |
232 +================+=======+
233 | 00 | 4 |
234 +----------------+-------+
235 | 01 | 1 |
236 +----------------+-------+
237 | 10 | 2 |
238 +----------------+-------+
239 | 11 | 3 |
240 +----------------+-------+
241
242 Predication (pred) Field Encoding
243 =================================
244
245 +------+------------+--------------------+----------------------------------------+
246 | pred | Mnemonic | Predicate Register | Meaning |
247 +======+============+====================+========================================+
248 | 000 | *None* | *None* | The instruction is unpredicated |
249 +------+------------+--------------------+----------------------------------------+
250 | 001 | *Reserved* | *Reserved* | |
251 +------+------------+--------------------+----------------------------------------+
252 | 010 | !x9 | x9 (s1) | execute vector op[0..i] on x9[i] == 0 |
253 +------+------------+ +----------------------------------------+
254 | 011 | x9 | | execute vector op[0..i] on x9[i] == 1 |
255 +------+------------+--------------------+----------------------------------------+
256 | 100 | !x10 | x10 (a0) | execute vector op[0..i] on x10[i] == 0 |
257 +------+------------+ +----------------------------------------+
258 | 101 | x10 | | execute vector op[0..i] on x10[i] == 1 |
259 +------+------------+--------------------+----------------------------------------+
260 | 110 | !x11 | x11 (a1) | execute vector op[0..i] on x11[i] == 0 |
261 +------+------------+ +----------------------------------------+
262 | 111 | x11 | | execute vector op[0..i] on x11[i] == 1 |
263 +------+------------+--------------------+----------------------------------------+
264
265 Twin-predication (tpred) Field Encoding
266 =======================================
267
268 +-------+------------+--------------------+----------------------------------------------+
269 | tpred | Mnemonic | Predicate Register | Meaning |
270 +=======+============+====================+==============================================+
271 | 000 | *None* | *None* | The instruction is unpredicated |
272 +-------+------------+--------------------+----------------------------------------------+
273 | 001 | x9,off | src=x9, dest=none | src[0..i] uses x9[i], dest unpredicated |
274 +-------+------------+ +----------------------------------------------+
275 | 010 | off,x10 | src=none, dest=x10 | dest[0..i] uses x10[i], src unpredicated |
276 +-------+------------+ +----------------------------------------------+
277 | 011 | x9,10 | src=x9, dest=x10 | src[0..i] uses x9[i], dest[0..i] uses x10[i] |
278 +-------+------------+--------------------+----------------------------------------------+
279 | 100 | *None* | *RESERVED* | Instruction is unpredicated (TBD) |
280 +-------+------------+--------------------+----------------------------------------------+
281 | 101 | !x9,off | src=!x9, dest=none | |
282 +-------+------------+ +----------------------------------------------+
283 | 110 | off,!x10 | src=none, dest=!x10| |
284 +-------+------------+ +----------------------------------------------+
285 | 111 | !x9,!x10 | src=!x9, dest=!x10 | |
286 +-------+------------+--------------------+----------------------------------------------+
287
288 Integer Element Type (itype) Field Encoding
289 ===========================================
290
291 +------------+-------+--------------+--------------+-----------------+-------------------+
292 | Signedness | itype | Element Type | Mnemonic in | Mnemonic in FP | Meaning (INT may |
293 | [#sgn_def]_| | | Integer | Instructions | be un/signed, FP |
294 | [#sgn_def]_| | | Instructions | (such as fmv.x) | just re-sized |
295 +============+=======+==============+==============+=================+===================+
296 | Unsigned | 01 | u8 | BU | BU | Unsigned 8-bit |
297 | +-------+--------------+--------------+-----------------+-------------------+
298 | | 10 | u16 | HU | HU | Unsigned 16-bit |
299 | +-------+--------------+--------------+-----------------+-------------------+
300 | | 11 | u32 | WU | WU | Unsigned 32-bit |
301 | +-------+--------------+--------------+-----------------+-------------------+
302 | | 00 | uXLEN | WU/DU/QU | WU/LU/TU | Unsigned XLEN-bit |
303 +------------+-------+--------------+--------------+-----------------+-------------------+
304 | Signed | 01 | i8 | BS | BS | Signed 8-bit |
305 | +-------+--------------+--------------+-----------------+-------------------+
306 | | 10 | i16 | HS | HS | Signed 16-bit |
307 | +-------+--------------+--------------+-----------------+-------------------+
308 | | 11 | i32 | W | W | Signed 32-bit |
309 | +-------+--------------+--------------+-----------------+-------------------+
310 | | 00 | iXLEN | W/D/Q | W/L/T | Signed XLEN-bit |
311 +------------+-------+--------------+--------------+-----------------+-------------------+
312
313 .. [#sgn_def] Signedness is defined in `Signedness Decision Procedure`_
314
315 Note: vector mode is effectively a type-cast of the register file
316 as if it was a sequential array being typecast to typedef itype[]
317 (c syntax). The starting point of the "typecast" is the vector
318 register rs#/rd.
319
320 Example: if itype=0b10 (u16), and rd is set to "vector", and
321 VL is set to 4, the 64-bit register at rd is subdivided into
322 *FOUR* 16-bit destination elements. It is *NOT* four
323 separate 64-bit destination registers (rd+0, rd+1, rd+2, rd+3)
324 that are sign-extended from the source width size out to 64-bit,
325 because that is itype=0b00 (uXLEN).
326
327 Signedness Decision Procedure
328 =============================
329
330 1. If the opcode field is either OP or OP-IMM, then
331 1. Signedness is Unsigned.
332 2. If the opcode field is either OP-32 or OP-IMM-32, then
333 1. Signedness is Signed.
334 3. If Signedness is encoded in a field of the base instruction, [#sign_enc]_ then
335 1. Signedness uses the encoded value.
336 4. Otherwise,
337 1. Signedness is Unsigned.
338
339 .. [#sign_enc] Like in fcvt.d.l[u], but unlike in fmv.x.w, since there is no
340 fmv.x.wu
341
342 Vector Type and Predication 5-bit (vtp5) Field Encoding
343 =======================================================
344
345 In the following table, X denotes a wildcard that is 0 or 1 and can be a
346 different value for every occurrence.
347
348 +-------+-----------+-----------+
349 | vtp5 | pred | svlen |
350 +=======+===========+===========+
351 | 1XXXX | vtp5[4:2] | vtp5[1:0] |
352 +-------+ | |
353 | 01XXX | | |
354 +-------+ | |
355 | 000XX | | |
356 +-------+-----------+-----------+
357 | 001XX | *Reserved* |
358 +-------+-----------------------+
359
360 Vector Integer Type and Predication 6-bit (vitp6) Field Encoding
361 ================================================================
362
363 In the following table, X denotes a wildcard that is 0 or 1 and can be a
364 different value for every occurrence.
365
366 +--------+------------+---------+------------+------------+
367 | vitp6 | itype | pred[2] | pred[0:1] | svlen |
368 +========+============+=========+============+============+
369 | XX1XXX | vitp6[5:4] | 0 | vitp6[3:2] | vitp6[1:0] |
370 +--------+ | | | |
371 | XX00XX | | | | |
372 +--------+------------+---------+------------+------------+
373 | XX01XX | *Reserved* |
374 +--------+------------------------------------------------+
375
376 vitp7 field: only tpred=
377
378 +---------+------------+----------+-------------+------------+
379 | vitp7 | itype | tpred[2] | tpred[0:1] | svlen |
380 +=========+============+==========+=============+============+
381 | XXXXXXX | vitp7[5:4] | vitp7[6] | vitp7[3:2] | vitp7[1:0] |
382 +---------+------------+----------+-------------+------------+
383
384 48-bit Instruction Encoding Decision Procedure
385 ==============================================
386
387 In the following decision procedure, *Reserved* means that there is not yet a
388 defined 48-bit instruction encoding for the base instruction.
389
390 1. If the base instruction is a load instruction, then
391 a. If the base instruction is an I-type instruction, then
392 1. The encoding is P48-LD-type.
393 b. Otherwise
394 1. The encoding is *Reserved*.
395 2. If the base instruction is a store instruction, then
396 a. If the base instruction is an S-type instruction, then
397 1. The encoding is P48-ST-type.
398 b. Otherwise
399 1. The encoding is *Reserved*.
400 3. If the base instruction is a SYSTEM instruction, then
401 a. The encoding is *Reserved*.
402 4. If the base instruction is an integer instruction, then
403 a. If the base instruction is an R-type instruction, then
404 1. The encoding is P48-R-type.
405 b. If the base instruction is an I-type instruction, then
406 1. The encoding is P48-I-type.
407 c. If the base instruction is an S-type instruction, then
408 1. The encoding is *Reserved*.
409 d. If the base instruction is an B-type instruction, then
410 1. The encoding is *Reserved*.
411 e. If the base instruction is an U-type instruction, then
412 1. The encoding is P48-U-type.
413 f. If the base instruction is an J-type instruction, then
414 1. The encoding is *Reserved*.
415 g. Otherwise
416 1. The encoding is *Reserved*.
417 5. If the base instruction is a floating-point instruction, then
418 a. If the base instruction is an R-type instruction, then
419 1. The encoding is P48-FR-type.
420 b. If the base instruction is an I-type instruction, then
421 1. The encoding is P48-FI-type.
422 c. If the base instruction is an S-type instruction, then
423 1. The encoding is *Reserved*.
424 d. If the base instruction is an B-type instruction, then
425 1. The encoding is *Reserved*.
426 e. If the base instruction is an U-type instruction, then
427 1. The encoding is *Reserved*.
428 f. If the base instruction is an J-type instruction, then
429 1. The encoding is *Reserved*.
430 g. If the base instruction is an R4-type instruction, then
431 1. The encoding is P48-FR4-type.
432 h. Otherwise
433 1. The encoding is *Reserved*.
434 6. Otherwise
435 a. The encoding is *Reserved*.
436
437 CSR Registers
438 =============
439
440 +--------+-----------------+---------------------------------------------------+
441 | Name | Legal Values | Meaning |
442 +========+=================+===================================================+
443 | VL | 0 <= VL <= XLEN | Vector Length. The number of sub-vectors operated |
444 | | | on by vector instructions. |
445 +--------+-----------------+---------------------------------------------------+
446 | Vstart | 0 <= VL < XLEN | The sub-vector index to start execution at. |
447 | | | Successful completion of all elements in a vector |
448 | | | instruction sets Vstart to 0. Set to the index of |
449 | | | the failing sub-vector when a vector instruction |
450 | | | traps. Used to resume execution of vector |
451 | | | instructions after a trap. Is *NOT* "slow" |
452 +--------+-----------------+---------------------------------------------------+
453
454 SetVL
455 =====
456
457 setvl rd, rs1, imm
458
459 imm is the amount of space allocated from the register file by the compiler.
460
461 Pseudocode:
462
463 1. Trap if imm > XLEN.
464 2. If rs1 is x0, then
465 1. Set VL to imm.
466 3. Else If regs[rs1] > 2 * imm, then
467 1. Set VL to XLEN.
468 4. Else If regs[rs1] > imm, then
469 1. Set VL to regs[rs1] / 2 rounded down.
470 5. Otherwise,
471 1. Set VL to regs[rs1].
472 6. Set regs[rd] to VL.
473
474 Additional Instructions
475 =======================
476
477 Add instructions to convert between integer types.
478
479 Add instructions to `swizzle`_ elements in sub-vectors. Note that the sub-vector
480 lengths of the source and destination won't necessarily match.
481
482 .. _swizzle: https://www.khronos.org/opengl/wiki/Data_Type_(GLSL)#Swizzling
483
484 Add instructions to transpose (2-4)x(2-4) element matrices.
485
486 Add instructions to insert or extract a sub-vector from a vector, with the index
487 allowed to be both immediate and from a register (*immediate can be covered partly
488 by twin-predication, register cannot: requires MV.X aka VSELECT*)
489
490 Add a register gather instruction (aka MV.X)
491
492 # Open questions <a name="questions"></a>
493
494 What is SUBVL and how does it work
495
496 --
497
498 SVorig goes to a lot of effort to make VL 1<= MAXVL and MAXVL 1..64 where both CSRs may be stored internally in only 6 bits.
499
500 Thus, CSRRWI can reach 1..32 for VL and MAXVL.
501
502 In addition, setting a hardware loop to zero turning instructions into NOPs, um, just branch over them, to start the first loop at the end, on the test for loop variable being zero, a la c "while do" instead of "do while".
503
504 Or, does it not matter that VL only goes up to 31 on a CSRRWI, and that it only goes to a max of 63 rather than 64?
505
506 --
507
508 Should these questions be moved to Discussion subpage
509
510 --
511
512 Is MV.X good enough a substitute for swizzle?
513
514 --
515
516 Is vectorised srcbase ok as a gather scatter and ok substitute for register stride? 5 dependency registers (reg stride being the 5th) is quite scary