simple_v_extension/sv_prefix_proposal.rst

   1 SimpleV Prefix (SVprefix) Proposal v0.2
   2 =======================================
   3
   4 This proposal is designed to be able to operate without SVcsr, but not to
   5 require the absence of SVcsr.
   6
   7 Conventions
   8 ===========
   9
  10 Conventions used in this document:
  11 - Bits are numbered starting from 0 at the LSB, so bit 3 is 1 in the integer 8.
  12 - Bit ranges are inclusive on both ends, so 5:3 means bits 5, 4, and 3.
  13
  14 Operations work on variable-length vectors of sub-vectors, where each sub-vector
  15 has a length *svlen*, and an element type *etype*. When the vectors are stored
  16 in registers, all elements are packed so that there is no padding in-between
  17 elements of the same vector. The number of bytes in a sub-vector, *svsz*, is the
  18 product of *svlen* and the element size in bytes.
  19
  20 Half-Precision Floating Point (FP16)
  21 ====================================
  22 If the F extension is supported, SVprefix adds support for FP16 in the
  23 base FP instructions by using 10 (H) in the floating-point format field *fmt*
  24 and using 001 (H) in the floating-point load/store *width* field.
  25
  26 Compressed Instructions
  27 =======================
  28 This proposal doesn't include any prefixed RVC instructions, instead, it will
  29 include 32-bit instructions that are compressed forms of SVprefix 48-bit
  30 instructions, in the same manner that RVC instructions are compressed forms of
  31 RVI instructions. The compressed instructions will be defined later by
  32 considering which 48-bit instructions are the most common.
  33
  34 48-bit Prefixed Instructions
  35 ============================
  36 All 48-bit prefixed instructions contain a 32-bit "base" instruction as the
  37 last 4 bytes. Since all 32-bit instructions have bits 1:0 set to 11, those bits
  38 are reused for additional encoding space in the 48-bit instructions.
  39
  40 64-bit Prefixed Instructions
  41 ============================
  42
  43 TODO.  Really need to resolve vitp7 by reducing lsk to 2 bits, or just use
  44 0b111111 as the prefix, then lsk can remain at 3 bits.
  45
  46 48-bit Instruction Encodings
  47 ============================
  48
  49 In the following table, *Reserved* entries must be zero.  RV32 equivalent encodings
  50 included for side-by-side comparison (and listed below, separately).
  51
  52 First, bits 17:0:
  53
  54 +---------------+--------+------------+------------+-----+------------+-------------+------+------------+--------+
  55 | Encoding      | 17     | 16         | 15         | 14  | 13         | 12          | 11:7 | 6          | 5:0    |
  56 +---------------+--------+------------+------------+-----+------------+-------------+------+------------+--------+
  57 | P48-LD-type   | rd[5]  | rs1[5]     | vitp7[6]   | vd  | vs1        | vitp7[5:0]         | *Reserved* | 011111 |
  58 +---------------+--------+------------+------------+-----+------------+-------------+------+------------+--------+
  59 | P48-ST-type   |vitp7[6]| rs1[5]     | rs2[5]     | vs2 | vs1        | vitp7[5:0]         | *Reserved* | 011111 |
  60 +---------------+--------+------------+------------+-----+------------+-------------+------+------------+--------+
  61 | P48-R-type    | rd[5]  | rs1[5]     | rs2[5]     | vs2 | vs1        | vitp6              | *Reserved* | 011111 |
  62 +---------------+--------+------------+------------+-----+------------+--------------------+------------+--------+
  63 | P48-I-type    | rd[5]  | rs1[5]     | vitp7[6]   | vd  | vs1        | vitp7[5:0]         | *Reserved* | 011111 |
  64 +---------------+--------+------------+------------+-----+------------+--------------------+------------+--------+
  65 | P48-U-type    | rd[5]  | *Reserved* | *Reserved* | vd  | *Reserved* | vitp6              | *Reserved* | 011111 |
  66 +---------------+--------+------------+------------+-----+------------+-------------+------+------------+--------+
  67 | P48-FR-type   | rd[5]  | rs1[5]     | rs2[5]     | vs2 | vs1        | *Reserved*  | vtp5 | *Reserved* | 011111 |
  68 +---------------+--------+------------+------------+-----+------------+-------------+------+------------+--------+
  69 | P48-FI-type   | rd[5]  | rs1[5]     | vitp7[6]   | vd  | vs1        | vitp7[5:0]         | *Reserved* | 011111 |
  70 +---------------+--------+------------+------------+-----+------------+-------------+------+------------+--------+
  71 | P48-FR4-type  | rd[5]  | rs1[5]     | rs2[5]     | vs2 | rs3[5]     | vs3 [#fr4]_ | vtp5 | *Reserved* | 011111 |
  72 +---------------+--------+------------+------------+-----+------------+-------------+------+------------+--------+
  73
  74 .. [#fr4] Only vs2 and vs3 are included in the P48-FR4-type encoding because
  75           there is not enough space for vs1 as well, and because it is more
  76           useful to have a scalar argument for each of the multiplication and
  77           addition portions of fmadd than to have two scalars on the
  78           multiplication portion.
  79
  80 Table showing correspondance between P48-*-type and RV32-*-type.  These are
  81 bits 47:18 (RV32 shifted up by 16 bits):
  82
  83 +---------------+---------------+
  84 | Encoding      | 47:18         |
  85 +---------------+---------------+
  86 | RV32 Encoding | 31:2          |
  87 +---------------+---------------+
  88 | P48-LD-type   | RV32-I-type   |
  89 +---------------+---------------+
  90 | P48-ST-type   | RV32-S-Type   |
  91 +---------------+---------------+
  92 | P48-R-type    | RV32-R-Type   |
  93 +---------------+---------------+
  94 | P48-I-type    | RV32-I-Type   |
  95 +---------------+---------------+
  96 | P48-U-type    | RV32-U-Type   |
  97 +---------------+---------------+
  98 | P48-FR-type   | RV32-FR-Type  |
  99 +---------------+---------------+
 100 | P48-FI-type   | RV32-I-Type   |
 101 +---------------+---------------+
 102 | P48-FR4-type  | RV32-FR-type  |
 103 +---------------+---------------+
 104
 105 Table showing Standard RV32 encodings:
 106
 107 +---------------+-------------+-------+----------+----------+--------+----------+--------+--------+------------+
 108 | Encoding      | 31:27       | 26:25 | 24:20    | 19:15    | 14:12  | 11:7     | 6:2    | 1      | 0          |
 109 +---------------+-------------+-------+----------+----------+--------+----------+--------+--------+------------+
 110 | RV32-R-type   +    funct7           + rs2[4:0] + rs1[4:0] + funct3 | rd[4:0]  + opcode + 1      + 1          |
 111 +---------------+-------------+-------+----------+----------+--------+----------+--------+--------+------------+
 112 | RV32-S-type   + imm[11:5]           + rs2[4:0] + rs1[4:0] + funct3 | imm[4:0] + opcode + 1      + 1          |
 113 +---------------+-------------+-------+----------+----------+--------+----------+--------+--------+------------+
 114 | RV32-I-type   + imm[11:0]                      + rs1[4:0] + funct3 | rd[4:0]  + opcode + 1      + 1          |
 115 +---------------+-------------+-------+----------+----------+--------+----------+--------+--------+------------+
 116 | RV32-U-type   + imm[31:12]                                         | rd[4:0]  + opcode + 1      + 1          |
 117 +---------------+-------------+-------+----------+----------+--------+----------+--------+--------+------------+
 118 | RV32-FR4-type + rs3[4:0]    + fmt   + rs2[4:0] + rs1[4:0] + funct3 | rd[4:0]  + opcode + 1      + 1          |
 119 +---------------+-------------+-------+----------+----------+--------+----------+--------+--------+------------+
 120 | RV32-FR-type  + funct5      + fmt   + rs2[4:0] + rs1[4:0] + rm     | rd[4:0]  + opcode + 1      + 1          |
 121 +---------------+-------------+-------+----------+----------+--------+----------+--------+--------+------------+
 122
 123 64-bit Instruction Encodings
 124 ============================
 125
 126 TODO (please disregard)
 127
 128 +--------------+-------+-------+--------+--------+--------+----------+
 129 | Encoding     | 63:58 | 57    | 56     | 55     | 54     | 53:48    |
 130 +--------------+-------+-------+--------+--------+--------+----------+
 131 | P64-LD-type  | VLtyp | rd[6] | rs1[6] |        |        | MVLtp    |
 132 +--------------+-------+-------+--------+--------+--------+----------+
 133 | P64-ST-type  | VLtyp |       | rs1[6] | rs2[6] |        | MVLtp    |
 134 +--------------+-------+-------+--------+--------+--------+----------+
 135 | P64-R-type   | VLtyp | rd[6] | rs1[6] | rs2[6] |        | MVLtp    |
 136 +--------------+-------+-------+--------+--------+--------+----------+
 137 | P64-I-type   | VLtyp | rd[6] | rs1[6] |        |        | MVLtp    |
 138 +--------------+-------+-------+--------+--------+--------+----------+
 139 | P64-U-type   | VLtyp | rd[6] |        |        |        | MVLtp    |
 140 +--------------+-------+-------+--------+--------+--------+----------+
 141 | P64-FR-type  | VLtyp |       | rs1[6] | rs2[6] |        | MVLtp    |
 142 +--------------+-------+-------+--------+--------+--------+----------+
 143 | P64-FI-type  | VLtyp | rd[6] | rs1[6] | rs2[6] |        | MVLtp    |
 144 +--------------+-------+-------+--------+--------+--------+----------+
 145 | P64-FR4-type | VLtyp | rd[6] | rs1[6] | rs2[6] | rs3[6] | MVLtp   |
 146 +--------------+-------+-------+--------+--------+--------+----------+
 147
 148 VLtyp
 149
 150 +--------------+---------+
 151 | vtyp[5:1]    | vtyp[0] |
 152 +--------------+---------+
 153 | regnum       |  1      |
 154 +--------------+---------+
 155 | immed        |  0      |
 156 +--------------+---------+
 157
 158 Just as in the VLIW format, when bit 0 of vtyp is zero, bits 1 to 5 specify the scalar register that VL is set from.  When bit 0 is 1, VL is set to the immediate (plus one).
 159
 160 vs#/vd Fields' Encoding
 161 =======================
 162
 163 +--------+----------+----------------------------------------------------------+
 164 | vs#/vd | Mnemonic | Meaning                                                  |
 165 +========+==========+==========================================================+
 166 | 0      | S        | the rs#/rd field specifies a scalar (single sub-vector); |
 167 |        |          | the rs#/rd field is zero-extended to get the actual      |
 168 |        |          | 7-bit register number                                    |
 169 +--------+----------+----------------------------------------------------------+
 170 | 1      | V        | the rs#/rd field specifies a vector; the rs#/rd field is |
 171 |        |          | decoded using the `Vector Register Number Encoding`_ to  |
 172 |        |          | get the actual 7-bit register number                     |
 173 +--------+----------+----------------------------------------------------------+
 174
 175 If a vs#/vd field is not present, it is as if it was present with a value that
 176 is the bitwise-or of all present vs#/vd fields.
 177
 178 * scalar register numbers do NOT increment when allocated in the
 179   hardware for-loop.  the same scalar register number is handed
 180   to every ALU.
 181
 182 * vector register numbers *DO* increase when allocated in the
 183   hardware for-loop.  sequentially-increasing register data
 184   is handed to sequential ALUs.
 185
 186 Vector Register Number Encoding
 187 ===============================
 188
 189 When vs#/vd is 1, the actual 7-bit register number is derived from the
 190 corresponding 6-bit rs#/rd field:
 191
 192 +---------------------------------+
 193 | Actual 7-bit register number    |
 194 +===========+=============+=======+
 195 | Bit 6     | Bits 5:1    | Bit 0 |
 196 +-----------+-------------+-------+
 197 | rs#/rd[0] | rs#/rd[5:1] | 0     |
 198 +-----------+-------------+-------+
 199
 200 TODO: similar scheme for 64-bit encoding (incorporating extra bit rs#/rd[6] from 64-bit encoding)
 201
 202 Load/Store Kind (lsk) Field Encoding
 203 ====================================
 204
 205 +--------+-----+--------------------------------------------------------------------------------+
 206 | vd/vs2 | vs1 | Meaning                                                                        |
 207 +========+=====+================================================================================+
 208 | 0      | 0   | srcbase is scalar, LD/ST is pure scalar.                                       |
 209 +--------+-----+--------------------------------------------------------------------------------+
 210 | 1      | 0   | srcbase is scalar, LD/ST is unit strided                                       |
 211 +--------+-----+--------------------------------------------------------------------------------+
 212 | 0      | 1   | srcbase is a vector (gather/scatter aka array of srcbases). VSPLAT and VSELECT |
 213 +--------+-----+--------------------------------------------------------------------------------+
 214 | 1      | 1   | srcbase is a vector, LD/ST is a full vector LD/ST.                             |
 215 +--------+-----+--------------------------------------------------------------------------------+
 216
 217 Notes:
 218
 219 * A register strided LD/ST would require *5* registers. srcbase, vd/vs2, predicate 1, predicate 2 and the stride register.
 220 * Complex strides may all be done with a general purpose vector of srcbases.
 221 * Twin predication may be used even when vd/vs1 is a scalar, to give VSPLAT and VSELECT, because the hardware loop ends on the first occurrence of a 1 in the predicate when a predicate is applied to a scalar.
 222 * Full vectorised gather/scatter is enabled when both registers are marked as vectorised, however unlike e.g Intel AVX512, twin predication can be applied.
 223
 224 Open question: RVV overloads the width field of LOAD-FP/STORE-FP using the bit 2 to indicate additional interpretation of the 11 bit immediate. Should this be considered?
 225
 226
 227 Sub-Vector Length (svlen) Field Encoding
 228 =======================================================
 229
 230 +----------------+-------+
 231 | svlen Encoding | Value |
 232 +================+=======+
 233 | 00             | 4     |
 234 +----------------+-------+
 235 | 01             | 1     |
 236 +----------------+-------+
 237 | 10             | 2     |
 238 +----------------+-------+
 239 | 11             | 3     |
 240 +----------------+-------+
 241
 242 Predication (pred) Field Encoding
 243 =================================
 244
 245 +------+------------+--------------------+----------------------------------------+
 246 | pred | Mnemonic   | Predicate Register | Meaning                                |
 247 +======+============+====================+========================================+
 248 | 000  | *None*     | *None*             | The instruction is unpredicated        |
 249 +------+------------+--------------------+----------------------------------------+
 250 | 001  | *Reserved* | *Reserved*         |                                        |
 251 +------+------------+--------------------+----------------------------------------+
 252 | 010  | !x9        | x9 (s1)            | execute vector op[0..i] on x9[i] == 0  |
 253 +------+------------+                    +----------------------------------------+
 254 | 011  | x9         |                    | execute vector op[0..i] on x9[i] == 1  |
 255 +------+------------+--------------------+----------------------------------------+
 256 | 100  | !x10       | x10 (a0)           | execute vector op[0..i] on x10[i] == 0 |
 257 +------+------------+                    +----------------------------------------+
 258 | 101  | x10        |                    | execute vector op[0..i] on x10[i] == 1 |
 259 +------+------------+--------------------+----------------------------------------+
 260 | 110  | !x11       | x11 (a1)           | execute vector op[0..i] on x11[i] == 0 |
 261 +------+------------+                    +----------------------------------------+
 262 | 111  | x11        |                    | execute vector op[0..i] on x11[i] == 1 |
 263 +------+------------+--------------------+----------------------------------------+
 264
 265 Twin-predication (tpred) Field Encoding
 266 =======================================
 267
 268 +-------+------------+--------------------+----------------------------------------------+
 269 | tpred | Mnemonic   | Predicate Register | Meaning                                      |
 270 +=======+============+====================+==============================================+
 271 | 000   | *None*     | *None*             | The instruction is unpredicated              |
 272 +-------+------------+--------------------+----------------------------------------------+
 273 | 001   | x9,off     | src=x9, dest=none  | src[0..i] uses x9[i], dest unpredicated      |
 274 +-------+------------+                    +----------------------------------------------+
 275 | 010   | off,x10    | src=none, dest=x10 | dest[0..i] uses x10[i], src unpredicated     |
 276 +-------+------------+                    +----------------------------------------------+
 277 | 011   | x9,10      | src=x9, dest=x10   | src[0..i] uses x9[i], dest[0..i] uses x10[i] |
 278 +-------+------------+--------------------+----------------------------------------------+
 279 | 100   | *None*     | *RESERVED*         | Instruction is unpredicated (TBD)            |
 280 +-------+------------+--------------------+----------------------------------------------+
 281 | 101   | !x9,off    | src=!x9, dest=none |                                              |
 282 +-------+------------+                    +----------------------------------------------+
 283 | 110   | off,!x10   | src=none, dest=!x10|                                              |
 284 +-------+------------+                    +----------------------------------------------+
 285 | 111   | !x9,!x10   | src=!x9, dest=!x10 |                                              |
 286 +-------+------------+--------------------+----------------------------------------------+
 287
 288 Integer Element Type (itype) Field Encoding
 289 ===========================================
 290
 291 +------------+-------+--------------+--------------+-----------------+-------------------+
 292 | Signedness | itype | Element Type | Mnemonic in  | Mnemonic in FP  | Meaning (INT may  |
 293 | [#sgn_def]_|       |              | Integer      | Instructions    | be un/signed, FP  |
 294 | [#sgn_def]_|       |              | Instructions | (such as fmv.x) | just re-sized     |
 295 +============+=======+==============+==============+=================+===================+
 296 | Unsigned   | 01    | u8           | BU           | BU              | Unsigned 8-bit    |
 297 |            +-------+--------------+--------------+-----------------+-------------------+
 298 |            | 10    | u16          | HU           | HU              | Unsigned 16-bit   |
 299 |            +-------+--------------+--------------+-----------------+-------------------+
 300 |            | 11    | u32          | WU           | WU              | Unsigned 32-bit   |
 301 |            +-------+--------------+--------------+-----------------+-------------------+
 302 |            | 00    | uXLEN        | WU/DU/QU     | WU/LU/TU        | Unsigned XLEN-bit |
 303 +------------+-------+--------------+--------------+-----------------+-------------------+
 304 | Signed     | 01    | i8           | BS           | BS              | Signed 8-bit      |
 305 |            +-------+--------------+--------------+-----------------+-------------------+
 306 |            | 10    | i16          | HS           | HS              | Signed 16-bit     |
 307 |            +-------+--------------+--------------+-----------------+-------------------+
 308 |            | 11    | i32          | W            | W               | Signed 32-bit     |
 309 |            +-------+--------------+--------------+-----------------+-------------------+
 310 |            | 00    | iXLEN        | W/D/Q        | W/L/T           | Signed XLEN-bit   |
 311 +------------+-------+--------------+--------------+-----------------+-------------------+
 312
 313 .. [#sgn_def] Signedness is defined in `Signedness Decision Procedure`_
 314
 315 Note: vector mode is effectively a type-cast of the register file
 316 as if it was a sequential array being typecast to typedef itype[]
 317 (c syntax).  The starting point of the "typecast" is the vector
 318 register rs#/rd.
 319
 320 Example: if itype=0b10 (u16), and rd is set to "vector", and
 321 VL is set to 4, the 64-bit register at rd is subdivided into
 322 *FOUR* 16-bit destination elements.  It is *NOT* four
 323 separate 64-bit destination registers (rd+0, rd+1, rd+2, rd+3)
 324 that are sign-extended from the source width size out to 64-bit,
 325 because that is itype=0b00 (uXLEN).
 326
 327 Signedness Decision Procedure
 328 =============================
 329
 330 1. If the opcode field is either OP or OP-IMM, then
 331     1. Signedness is Unsigned.
 332 2. If the opcode field is either OP-32 or OP-IMM-32, then
 333     1. Signedness is Signed.
 334 3. If Signedness is encoded in a field of the base instruction, [#sign_enc]_ then
 335     1. Signedness uses the encoded value.
 336 4. Otherwise,
 337     1. Signedness is Unsigned.
 338
 339 .. [#sign_enc] Like in fcvt.d.l[u], but unlike in fmv.x.w, since there is no
 340                fmv.x.wu
 341
 342 Vector Type and Predication 5-bit (vtp5) Field Encoding
 343 =======================================================
 344
 345 In the following table, X denotes a wildcard that is 0 or 1 and can be a
 346 different value for every occurrence.
 347
 348 +-------+-----------+-----------+
 349 | vtp5  | pred      | svlen     |
 350 +=======+===========+===========+
 351 | 1XXXX | vtp5[4:2] | vtp5[1:0] |
 352 +-------+           |           |
 353 | 01XXX |           |           |
 354 +-------+           |           |
 355 | 000XX |           |           |
 356 +-------+-----------+-----------+
 357 | 001XX | *Reserved*            |
 358 +-------+-----------------------+
 359
 360 Vector Integer Type and Predication 6-bit (vitp6) Field Encoding
 361 ================================================================
 362
 363 In the following table, X denotes a wildcard that is 0 or 1 and can be a
 364 different value for every occurrence.
 365
 366 +--------+------------+---------+------------+------------+
 367 | vitp6  | itype      | pred[2] | pred[0:1]  | svlen      |
 368 +========+============+=========+============+============+
 369 | XX1XXX | vitp6[5:4] | 0       | vitp6[3:2] | vitp6[1:0] |
 370 +--------+            |         |            |            |
 371 | XX00XX |            |         |            |            |
 372 +--------+------------+---------+------------+------------+
 373 | XX01XX | *Reserved*                                     |
 374 +--------+------------------------------------------------+
 375
 376 vitp7 field: only tpred=
 377
 378 +---------+------------+----------+-------------+------------+
 379 | vitp7   | itype      | tpred[2] | tpred[0:1]  | svlen      |
 380 +=========+============+==========+=============+============+
 381 | XXXXXXX | vitp7[5:4] | vitp7[6] | vitp7[3:2]  | vitp7[1:0] |
 382 +---------+------------+----------+-------------+------------+
 383
 384 48-bit Instruction Encoding Decision Procedure
 385 ==============================================
 386
 387 In the following decision procedure, *Reserved* means that there is not yet a
 388 defined 48-bit instruction encoding for the base instruction.
 389
 390 1. If the base instruction is a load instruction, then
 391     a. If the base instruction is an I-type instruction, then
 392         1. The encoding is P48-LD-type.
 393     b. Otherwise
 394         1. The encoding is *Reserved*.
 395 2. If the base instruction is a store instruction, then
 396     a. If the base instruction is an S-type instruction, then
 397         1. The encoding is P48-ST-type.
 398     b. Otherwise
 399         1. The encoding is *Reserved*.
 400 3. If the base instruction is a SYSTEM instruction, then
 401     a. The encoding is *Reserved*.
 402 4. If the base instruction is an integer instruction, then
 403     a. If the base instruction is an R-type instruction, then
 404         1. The encoding is P48-R-type.
 405     b. If the base instruction is an I-type instruction, then
 406         1. The encoding is P48-I-type.
 407     c. If the base instruction is an S-type instruction, then
 408         1. The encoding is *Reserved*.
 409     d. If the base instruction is an B-type instruction, then
 410         1. The encoding is *Reserved*.
 411     e. If the base instruction is an U-type instruction, then
 412         1. The encoding is P48-U-type.
 413     f. If the base instruction is an J-type instruction, then
 414         1. The encoding is *Reserved*.
 415     g. Otherwise
 416         1. The encoding is *Reserved*.
 417 5. If the base instruction is a floating-point instruction, then
 418     a. If the base instruction is an R-type instruction, then
 419         1. The encoding is P48-FR-type.
 420     b. If the base instruction is an I-type instruction, then
 421         1. The encoding is P48-FI-type.
 422     c. If the base instruction is an S-type instruction, then
 423         1. The encoding is *Reserved*.
 424     d. If the base instruction is an B-type instruction, then
 425         1. The encoding is *Reserved*.
 426     e. If the base instruction is an U-type instruction, then
 427         1. The encoding is *Reserved*.
 428     f. If the base instruction is an J-type instruction, then
 429         1. The encoding is *Reserved*.
 430     g. If the base instruction is an R4-type instruction, then
 431         1. The encoding is P48-FR4-type.
 432     h. Otherwise
 433         1. The encoding is *Reserved*.
 434 6. Otherwise
 435     a. The encoding is *Reserved*.
 436
 437 CSR Registers
 438 =============
 439
 440 +--------+-----------------+---------------------------------------------------+
 441 | Name   | Legal Values    | Meaning                                           |
 442 +========+=================+===================================================+
 443 | VL     | 0 <= VL <= XLEN | Vector Length. The number of sub-vectors operated |
 444 |        |                 | on by vector instructions.                        |
 445 +--------+-----------------+---------------------------------------------------+
 446 | Vstart | 0 <= VL < XLEN  | The sub-vector index to start execution at.       |
 447 |        |                 | Successful completion of all elements in a vector |
 448 |        |                 | instruction sets Vstart to 0. Set to the index of |
 449 |        |                 | the failing sub-vector when a vector instruction  |
 450 |        |                 | traps.  Used to resume execution of vector        |
 451 |        |                 | instructions after a trap. Is *NOT* "slow"        |
 452 +--------+-----------------+---------------------------------------------------+
 453
 454 SetVL
 455 =====
 456
 457 setvl rd, rs1, imm
 458
 459 imm is the amount of space allocated from the register file by the compiler.
 460
 461 Pseudocode:
 462
 463 1. Trap if imm > XLEN.
 464 2. If rs1 is x0, then
 465     1. Set VL to imm.
 466 3. Else If regs[rs1] > 2 * imm, then
 467     1. Set VL to XLEN.
 468 4. Else If regs[rs1] > imm, then
 469     1. Set VL to regs[rs1] / 2 rounded down.
 470 5. Otherwise,
 471     1. Set VL to regs[rs1].
 472 6. Set regs[rd] to VL.
 473
 474 Additional Instructions
 475 =======================
 476
 477 Add instructions to convert between integer types.
 478
 479 Add instructions to `swizzle`_ elements in sub-vectors. Note that the sub-vector
 480 lengths of the source and destination won't necessarily match.
 481
 482 .. _swizzle: https://www.khronos.org/opengl/wiki/Data_Type_(GLSL)#Swizzling
 483
 484 Add instructions to transpose (2-4)x(2-4) element matrices.
 485
 486 Add instructions to insert or extract a sub-vector from a vector, with the index
 487 allowed to be both immediate and from a register (*immediate can be covered partly
 488 by twin-predication, register cannot: requires MV.X aka VSELECT*)
 489
 490 Add a register gather instruction (aka MV.X)
 491
 492 # Open questions <a name="questions"></a>
 493
 494 What is SUBVL and how does it work
 495
 496 --
 497
 498 SVorig goes to a lot of effort to make VL 1<= MAXVL and MAXVL 1..64 where both CSRs may be stored internally in only 6 bits.
 499
 500 Thus, CSRRWI can reach 1..32 for VL and MAXVL.
 501
 502 In addition, setting a hardware loop to zero turning instructions into NOPs, um, just branch over them, to start the first loop at the end, on the test for loop variable being zero, a la c "while do" instead of "do while".
 503
 504 Or, does it not matter that VL only goes up to 31 on a CSRRWI, and that it only goes to a max of 63 rather than 64?
 505
 506 --
 507
 508 Should these questions be moved to Discussion subpage
 509
 510 --
 511
 512 Is MV.X good enough a substitute for swizzle?
 513
 514 --
 515
 516 Is vectorised srcbase ok as a gather scatter and ok substitute for register stride? 5 dependency registers (reg stride being the 5th) is quite scary