From: Luke Kenneth Casson Leighton Date: Thu, 26 Apr 2018 03:08:10 +0000 (+0100) Subject: rename harmonised pages X-Git-Tag: convert-csv-opcode-to-binary~5508 X-Git-Url: https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff_plain;h=adf417bbb5b570c31f63cbabf1ec7938ccb245f5 rename harmonised pages --- diff --git a/A_Harmonised_RVV_and_Packed_SIMD.mdwn b/A_Harmonised_RVV_and_Packed_SIMD.mdwn deleted file mode 100644 index fa431f1b8..000000000 --- a/A_Harmonised_RVV_and_Packed_SIMD.mdwn +++ /dev/null @@ -1,49 +0,0 @@ -# Proposal to harmonise RV Vector spec with Andes Packed SIMD ("Harmonised" RVP) - -[[Comparative analysis Harmonised RVP vs Andes Packed SIMD ISA proposal]] - -##### MVL, setvl instruction & VL CSR work as per RV Vector spec. - -##### VLD and VST are supported - -RVP implementations may choose to load/store to/from Integer register file (rather than from a dedicated Vector register file). - -* Thus, RVP implementations have a choice of providing a dedicated Vector register file, or sharing the integer register file, but not both simultaneously. (Supporting both would need a CSR mode switch bit). -* Mapping of v0-31 <-> r0-31 **is fixed** at 1:1. (An exception may be made to map v1 to r5, as otherwise may clash with procedure linkage). -* VLD and VST in this case will have similar behaviour to LW/LD and SW/SD respectively, but only operate on up to VL elements (see point #4 below). -* If integer register file is used for vector operations, any callee saved registers (r2-4, 8-9, 18-27) must be saved with RVI SW or SD instructions, before being used as vector registers (this register saving behaviour is harmless but redundant when RVP code is run on a machine with a dedicated vector reg file). - -##### VLDX, VSTX, VLDS, VSTS are not supported in hardware -To keep RVP implementations simple, these instructions will trap, and may be implemented as software emulation - -##### Default register "banks" and types - -In the absence of an explicit VCFG setup, the vector registers (when shared with Integer register file) are to default into two “banks” as follows: - -* v0-v15: vectors with INT8 elements, split into 8 x signed (v0-v7) & 8 x unsigned (v8-v15) -* v16-v29: vectors with INT16 elements, split into 8 x signed (v16-v23) & 6 x unsigned (v24-v29) - -Having the above default vector type configuration harmonises most of the Andes SIMD instruction set (which explicitly encodes INT8 vs INT16 vector types as separate instructions). The main change from the Andes SIMD proposal is that instructions are restricted to 14 registers of each vector element type (with element size explicitly encoded in the most significant bit of the 5 bit register specifier fields). - -Notes: - -* To preserve forward RVV compatibility, programmers should still explicitly setup VDCFG to the above default vector types -* Essentially the same register allocation algorithm used for RVV can be used for RVP, except the algorithm should preferentially use temporary registers first, before using saved registers -* v30-v31 are reserved for 32 bit operations (see Section 2.3 of this document), and hence not part of the register bank of INT16 vectors. -* v0 is mapped to r1 (hardwired to zero), and v1 is used for predicate masks. However, both can be considered INT8 vectors. - -##### Default MVL - -The default RVV MVL value (in absence of explicit VCFG setup) is to be MVL = 2 on RV32I machines and MVL = 4 on RV64I machines. -However, note RV32I registers can fit 4x INT8 elements. To preserve Andes SIMD behaviour, all VOP instructions should still operate on all “unused” elements in the register, regardless of MVL. (This is still compliant with the RVV spec, provided elements from VL..MVL-1 are set to zero). VMEM instructions however will only operate on VL elements, and so where full Andes SIMD compliance is required (without RVV forward compatibility), LW/LD and SW/SD are to be used instead of VLD and VST. - -##### Alternative register "banks" and alternative MVL - -A programmer can configure VCFG with any mix of these alternative configurations: - -* v0-v31 are all INT 16, and MVL is same as for Default MVL above -* v0-v31 are all INT 8 and MVL is 4 on RV32I and 8 on RV64I -* A lesser number of registers ( register operations: - -| 31 30 29 28 27 26 | 25 | 24 23 22 21 20 | 19 18 17 16 15 | 14 | 13 12 | 11 10 9 8 7 | 6 5 4 3 2 1 0 | -| ----------------- | -- | -------------- | -------------- | -- | ----- | ----------- | ------------- | -| func_6 | 0 | rs2 | rs1 | 0 | mm | rd1 | VOP opcode | - -Immediate + register -> register operations: - -| 31 30 29 | 28 27 26 | 25 | 24 23 22 21 20 | 19 18 17 16 15 | 14 | 13 12 | 11 10 9 8 7 | 6 5 4 3 2 1 0 | -| -------- | -------- | -- | -------------- | -------------- | -- | ----- | ----------- | ------------- | -| func_3 | imm[7:5] | 1 | imm[4:0] | rs1 | 0 | mm | rd1 | VOP opcode | - -Register x 3 -> register operations: - -| 31 30 29 28 27 | 26 25 | 24 23 22 21 20 | 19 18 17 16 15 | 14 | 13 12 | 11 10 9 8 7 | 6 5 4 3 2 1 0 | -| ----------------------- | -------------- | -------------- | -- | ----- | ----------- | ------------- | -| rs3 | func_2 | rs2 | rs1 | 1 | mm | rd1 | VOP opcode | - -Values for mm field (bits 12:13 above): - -* mm = 00 -> no predicate mask, and use current global saturation / rounding settings -* mm = 00 -> no predicate mask, and force saturation or rounding for this instruction only -* mm = 10 -> use v1 as predicate mask, and use global saturation / rounding settings -* mm = 11 -> use ~v1 as predicate mask, and use global saturation / rounding settings - -## 16-bit Arithmetic - -| Andes Mnemonic | 16-bit Instruction | Harmonised RVP Equivalent | -| ------------------ | ------------------------- | ------------------- | -| ADD16 rt, ra, rb | Add | VADD (v16 <= rt,ra,rb <= v29), mm=00| -| RADD16 rt, ra, rb | Signed Halving add | RADD (v16 <= rt,ra,rb <= v23), mm=00| -| URADD16 rt, ra, rb | Unsigned Halving add | RADD (v24 <= rt,ra,rb <= v29), mm=00| -| KADD16 rt, ra, rb | Signed Saturating add | VADD (v16 <= rt,ra,rb <= v23), mm=01| -| UKADD16 rt, ra, rb | Unsigned Saturating add | VADD (v24 <= rt,ra,rb <= v29), mm=01| -| SUB16 rt, ra, rb | Subtract | VSUB (v16 <= rt,ra,rb <= v29), mm=00| -| RSUB16 rt, ra, rb | Signed Halving sub | RSUB (v16 <= rt,ra,rb <= v23), mm=00| -| URSUB16 rt, ra, rb | Unsigned Halving sub | RSUB (v24 <= rt,ra,rb <= v29), mm=00| -| KSUB16 rt, ra, rb | Signed Saturating sub | VSUB (v16 <= rt,ra,rb <= v23), mm=01| -| UKSUB16 rt, ra, rb | Unsigned Saturating sub | VSUB (v24 <= rt,ra,rb <= v29), mm=01| -| CRAS16 rt, ra, rb | Cross Add & Sub | | -| RCRAS16 rt, ra, rb | Signed Halving Cross Add & Sub | | -| URCRAS16 rt, ra, rb| Unsigned Halving Cross Add & Sub | | -| KCRAS16 rt, ra, rb | Signed Saturating Cross Add & Sub | | -| UKCRAS16 rt, ra, rb| Unsigned Saturating Cross Add & Sub | | -| CRSA16 rt, ra, rb | Cross Sub & Add | | -| RCRSA16 rt, ra, rb | Signed Halving Cross Sub & Add | | -| URCRSA16 rt, ra, rb| Unsigned Halving Cross Sub & Add | | -| KCRSA16 rt, ra, rb | Signed Saturating Cross Sub & Add | | -| UKCRSA16 rt, ra, rb| Unsigned Saturating Cross Sub & Add | | - -## 8-bit Arithmetic - -| Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent | -| ------------------ | ------------------------- | ------------------- | -| ADD8 rt, ra, rb | Add | VADD (v2 <= rt,ra,rb <= v15), mm=00 | -| RADD8 rt, ra, rb | Signed Halving add | RADD (v2 <= rt,ra,rb <= v7), mm=00 | -| URADD8 rt, ra, rb | Unsigned Halving add | RADD (v8 <= rt,ra,rb <= v15), mm=00 | -| KADD8 rt, ra, rb | Signed Saturating add | VADD (v2 <= rt,ra,rb <= v7), mm=01 | -| UKADD8 rt, ra, rb | Unsigned Saturating add | VADD (v8 <= rt,ra,rb <= v15), mm=01 | -| SUB8 rt, ra, rb | Subtract | VSUB (v2 <= rt,ra,rb <= v15), mm=00 | -| RSUB8 rt, ra, rb | Signed Halving sub | RSUB (v2 <= rt,ra,rb <= v7), mm=00 | -| URSUB8 rt, ra, rb | Unsigned Halving sub | RSUB (v8 <= rt,ra,rb <= v15), mm=00 | -| KSUB8 rt, ra, rb | Signed Saturating sub | VSUB (v2 <= rt,ra,rb <= v7), mm=01 | -| UKSUB8 rt, ra, rb | Unsigned Saturating sub | VSUB (v8 <= rt,ra,rb <= v15), mm=01 | - -## 16-bit Shifts - -SRA[I]16/SRL[I]16/SLL[I]16 to be mapped to VOP shift instructions in same manner as ADD16/SUB16 - -The “K” (Saturation) and “u” (Rounding) variants could be encoded using VOP’s mm field (mm=01 is saturated or rounded shift, mm=00 is standard VOP shift) - -| Andes Mnemonic | 16-bit Instruction | Harmonised RVP Equivalent | -| ------------------ | ------------------------- | ------------------- | -| SRA16 rt, ra, rb | Shift right arithmetic | VSRA (v16 <= rt,ra,rb <= v29), mm=00| -| SRAI16 rt, ra, im | Shift right arithmetic imm | VSRAI (v16 <= rt,ra <= v29), mm=00| -| SRA16.u rt, ra, rb | Rounding Shift right arithmetic | VSRA (v16 <= rt,ra,rb <= v29), mm=01| -| SRAI16.u rt, ra, im | Rounding Shift right arithmetic imm | VSRAI (v16 <= rt,ra <= v29), mm=01| -| SRL16 rt, ra, rb | Shift right logical | VSRL (v16 <= rt,ra,rb <= v29), mm=00| -| SRLI16 rt, ra, im | Shift right logical imm | VSRLI (v16 <= rt,ra <= v29), mm=00| -| SRL16.u rt, ra, rb | Rounding Shift right logical | VSRL (v16 <= rt,ra,rb <= v29), mm=01| -| SRLI16.u rt, ra, im | Rounding Shift right logical imm | VSLRI (v16 <= rt,ra <= v29), mm=01| -| SLL16 rt, ra, rb | Shift left logical | VSLL (v16 <= rt,ra,rb <= v29), mm=00| -| SLLI16 rt, ra, im | Shift left logical imm | VSLLI (v16 <= rt,ra <= v29), mm=00| -| KSLL16 rt, ra, rb | Saturating Shift left logical | VSLL (v16 <= rt,ra,rb <= v29), mm=01| -| KSLLI16 rt, ra, im | Saturating Shift left logical imm | VSLLI (v16 <= rt,ra <= v29), mm=01| -| KSLRA16 rt, ra, rb | Saturating Shift left logical or Shift right arithmetic || -| KSLRA16.u rt, ra, rb | Saturating Shift left logical or Rounding Shift right arithmetic || - - -## 8-bit Shifts - -Andes SIMD Packed ISA omits 8 bit shifts, but these can be encoded in Harmonised RVP as follows: - -| Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent | -| ------------------ | ------------------------- | ------------------- | -| n/a | Shift right arithmetic | VSRA (v2 <= rt,ra,rb <= v15), mm=00| -| n/a | Shift right arithmetic imm | VSRAI (v2 <= rt,ra <= v15), mm=00| -| n/a | Rounding Shift right arithmetic | VSRA (v2 <= rt,ra,rb <= v15), mm=01| -| n/a | Rounding Shift right arithmetic imm | VSRAI (v2 <= rt,ra <= v15), mm=01| -| n/a | Shift right logical | VSRL (v2 <= rt,ra,rb <= v15), mm=00| -| n/a | Shift right logical imm | VSRLI (v2 <= rt,ra <= v15), mm=00| -| n/a | Rounding Shift right logical | VSRL (v2 <= rt,ra,rb <= v15), mm=01| -| n/a | Rounding Shift right logical imm | VSLRI (v2 <= rt,ra <= v15), mm=01| -| n/a | Shift left logical | VSLL (v2 <= rt,ra,rb <= v15), mm=00| -| n/a | Shift left logical imm | VSLLI (v2 <= rt,ra <= v15), mm=00| -| n/a | Saturating Shift left logical | VSLL (v2 <= rt,ra,rb <= v15), mm=01| -| n/a | Saturating Shift left logical imm | VSLLI (v2 <= rt,ra <= v15), mm=01| - -## 16-bit Comparison instructions - -| Andes Mnemonic | 16-bit Instruction | Harmonised RVP Equivalent | -| ------------------ | ------------------------- | ------------------- | -| CMPEQ16 rt, ra, rb | Compare equal | VSEQ (v16 <= rt,ra,rb <= v29), mm=00| -| SCMPLT16 rt, ra, rb | Signed Compare less than | !VSGT (v16 <= rt,ra,rb <= v23), mm=00| -| SCMPLE16 rt, ra, rb | Signed Compare less or equal | VSLE (v16 <= rt,ra,rb <= v23), mm=00| -| UCMPLT16 rt, ra, rb | Unsigned Compare less than | !VSGT (v24 <= rt,ra,rb <= v29), mm=00| -| UCMPLE16 rt, ra, rb | Unsigned Compare less or equal | VSLE (v24 <= rt,ra,rb <= v29), mm=00| - -## 8-bit Comparison instructions - -| Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent | -| ------------------ | ------------------------- | ------------------- | -| CMPEQ8 rt, ra, rb | Compare equal | VSEQ (v2 <= rt,ra,rb <= v7), mm=00| -| SCMPLT8 rt, ra, rb | Signed Compare less than | !VSGT (v2 <= rt,ra,rb <= v7), mm=00| -| SCMPLE8 rt, ra, rb | Signed Compare less or equal | VSLE (v2 <= rt,ra,rb <= v7), mm=00| -| UCMPLT8 rt, ra, rb | Unsigned Compare less than | !VSGT (v8 <= rt,ra,rb <= v15), mm=00| -| UCMPLE8 rt, ra, rb | Unsigned Compare less or equal | VSLE (v8 <= rt,ra,rb <= v15), mm=00| - -## 16-bit Miscellaneous instructions - -| Andes Mnemonic | 16-bit Instruction | Harmonised RVP Equivalent | -| ------------------ | ------------------------ | ------------------- | -| SMIN16 rt, ra, rb | Signed minimum | VMIN (v16 <= rt,ra,rb <= v23), mm=00| -| UMIN16 rt, ra, rb | Unsigned minimum | VMIN (v24 <= rt,ra,rb <= v29), mm=00| -| SMAX16 rt, ra, rb | Signed maximum | VMAX (v16 <= rt,ra,rb <= v23), mm=00| -| UMAX16 rt, ra, rb | Unsigned maximum | VMAX (v24 <= rt,ra,rb <= v29), mm=00| -| SCLIP16 rt, ra, im | Signed clip | ?VCLIP (v16 <= rt,ra,rb <= v23), mm=01| -| UCLIP16 rt, ra, im | Unsigned clip | ?VCLIP (v24 <= rt,ra,rb <= v29), mm=01| -| KMUL16 rt, ra, rb | Signed multiply 16x16->16 | VMUL (v16 <= rt,ra,rb <= v23), mm=01| -| KMULX16 rt, ra, rb | Signed crossed multiply 16x16->16 | | -| SMUL16 rt, ra, rb | Signed multiply 16x16->32 | VMUL (30 <= rt <= 31, v16 <= ra,rb <= v23), mm=00| -| SMULX16 rt, ra, rb | Signed crossed multiply 16x16->32 | | -| UMUL16 rt, ra, rb | Signed multiply 16x16->32 | VMUL (30 <= rt <= 31, v24 <= ra,rb <= r31), mm=00| -| UMULX16 rt, ra, rb | Signed crossed multiply 16x16->32 | | -| KABS16 rt, ra | Saturated absolute value | VSGNX (v16 <= rt <= v29, v16 <= ra,rb <= v23, mm=01) | - -## 8-bit Miscellaneous instructions - -| Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent | -| ------------------ | ------------------------- | ------------------- | -| SMIN8 rt, ra, rb | Signed minimum | VMIN (v2 <= rt,ra,rb <= v7), mm=00| -| UMIN8 rt, ra, rb | Unsigned minimum | VMIN (v8 <= rt,ra,rb <= v15), mm=00| -| SMAX8 rt, ra, rb | Signed maximum | VMAX (v2 <= rt,ra,rb <= v7), mm=00| -| UMAX8 rt, ra, rb | Unsigned maximum | VMAX (v8 <= rt,ra,rb <= v15), mm=00| -| KABS8 rt, ra | Saturated absolute value | VSGNX (v2 <= rt <= v15, v2 <= ra,rb <= v8, mm=01) | - -## 8-bit Unpacking instructions - -| Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent | -| ------------------ | ------------------------- | ------------------- | -| SUNPKD810 rt, ra | Signed unpack bytes 1 & 0 | VMV (v16<= rt <= 23, v2 <= ra <= v7), mm=00| -| SUNPKD820 rt, ra | Signed unpack bytes 2 & 0 | | -| SUNPKD830 rt, ra | Signed unpack bytes 3 & 0 | | -| SUNPKD831 rt, ra | Signed unpack bytes 3 & 1 | | -| ZUNPKD810 rt, ra | Unsigned unpack bytes 1 & 0 | VMV (v24<= rt <= 31, v8 <= ra <= v15), mm=00| -| ZUNPKD820 rt, ra | Unsigned unpack bytes 2 & 0 | | -| ZUNPKD830 rt, ra | Unsigned unpack bytes 3 & 0 | | -| ZUNPKD831 rt, ra | Unsigned unpack bytes 3 & 1 | | diff --git a/harmonised_rvv_rvp.mdwn b/harmonised_rvv_rvp.mdwn new file mode 100644 index 000000000..fa431f1b8 --- /dev/null +++ b/harmonised_rvv_rvp.mdwn @@ -0,0 +1,49 @@ +# Proposal to harmonise RV Vector spec with Andes Packed SIMD ("Harmonised" RVP) + +[[Comparative analysis Harmonised RVP vs Andes Packed SIMD ISA proposal]] + +##### MVL, setvl instruction & VL CSR work as per RV Vector spec. + +##### VLD and VST are supported + +RVP implementations may choose to load/store to/from Integer register file (rather than from a dedicated Vector register file). + +* Thus, RVP implementations have a choice of providing a dedicated Vector register file, or sharing the integer register file, but not both simultaneously. (Supporting both would need a CSR mode switch bit). +* Mapping of v0-31 <-> r0-31 **is fixed** at 1:1. (An exception may be made to map v1 to r5, as otherwise may clash with procedure linkage). +* VLD and VST in this case will have similar behaviour to LW/LD and SW/SD respectively, but only operate on up to VL elements (see point #4 below). +* If integer register file is used for vector operations, any callee saved registers (r2-4, 8-9, 18-27) must be saved with RVI SW or SD instructions, before being used as vector registers (this register saving behaviour is harmless but redundant when RVP code is run on a machine with a dedicated vector reg file). + +##### VLDX, VSTX, VLDS, VSTS are not supported in hardware +To keep RVP implementations simple, these instructions will trap, and may be implemented as software emulation + +##### Default register "banks" and types + +In the absence of an explicit VCFG setup, the vector registers (when shared with Integer register file) are to default into two “banks” as follows: + +* v0-v15: vectors with INT8 elements, split into 8 x signed (v0-v7) & 8 x unsigned (v8-v15) +* v16-v29: vectors with INT16 elements, split into 8 x signed (v16-v23) & 6 x unsigned (v24-v29) + +Having the above default vector type configuration harmonises most of the Andes SIMD instruction set (which explicitly encodes INT8 vs INT16 vector types as separate instructions). The main change from the Andes SIMD proposal is that instructions are restricted to 14 registers of each vector element type (with element size explicitly encoded in the most significant bit of the 5 bit register specifier fields). + +Notes: + +* To preserve forward RVV compatibility, programmers should still explicitly setup VDCFG to the above default vector types +* Essentially the same register allocation algorithm used for RVV can be used for RVP, except the algorithm should preferentially use temporary registers first, before using saved registers +* v30-v31 are reserved for 32 bit operations (see Section 2.3 of this document), and hence not part of the register bank of INT16 vectors. +* v0 is mapped to r1 (hardwired to zero), and v1 is used for predicate masks. However, both can be considered INT8 vectors. + +##### Default MVL + +The default RVV MVL value (in absence of explicit VCFG setup) is to be MVL = 2 on RV32I machines and MVL = 4 on RV64I machines. +However, note RV32I registers can fit 4x INT8 elements. To preserve Andes SIMD behaviour, all VOP instructions should still operate on all “unused” elements in the register, regardless of MVL. (This is still compliant with the RVV spec, provided elements from VL..MVL-1 are set to zero). VMEM instructions however will only operate on VL elements, and so where full Andes SIMD compliance is required (without RVV forward compatibility), LW/LD and SW/SD are to be used instead of VLD and VST. + +##### Alternative register "banks" and alternative MVL + +A programmer can configure VCFG with any mix of these alternative configurations: + +* v0-v31 are all INT 16, and MVL is same as for Default MVL above +* v0-v31 are all INT 8 and MVL is 4 on RV32I and 8 on RV64I +* A lesser number of registers ( register operations: + +| 31 30 29 28 27 26 | 25 | 24 23 22 21 20 | 19 18 17 16 15 | 14 | 13 12 | 11 10 9 8 7 | 6 5 4 3 2 1 0 | +| ----------------- | -- | -------------- | -------------- | -- | ----- | ----------- | ------------- | +| func_6 | 0 | rs2 | rs1 | 0 | mm | rd1 | VOP opcode | + +Immediate + register -> register operations: + +| 31 30 29 | 28 27 26 | 25 | 24 23 22 21 20 | 19 18 17 16 15 | 14 | 13 12 | 11 10 9 8 7 | 6 5 4 3 2 1 0 | +| -------- | -------- | -- | -------------- | -------------- | -- | ----- | ----------- | ------------- | +| func_3 | imm[7:5] | 1 | imm[4:0] | rs1 | 0 | mm | rd1 | VOP opcode | + +Register x 3 -> register operations: + +| 31 30 29 28 27 | 26 25 | 24 23 22 21 20 | 19 18 17 16 15 | 14 | 13 12 | 11 10 9 8 7 | 6 5 4 3 2 1 0 | +| ----------------------- | -------------- | -------------- | -- | ----- | ----------- | ------------- | +| rs3 | func_2 | rs2 | rs1 | 1 | mm | rd1 | VOP opcode | + +Values for mm field (bits 12:13 above): + +* mm = 00 -> no predicate mask, and use current global saturation / rounding settings +* mm = 00 -> no predicate mask, and force saturation or rounding for this instruction only +* mm = 10 -> use v1 as predicate mask, and use global saturation / rounding settings +* mm = 11 -> use ~v1 as predicate mask, and use global saturation / rounding settings + +## 16-bit Arithmetic + +| Andes Mnemonic | 16-bit Instruction | Harmonised RVP Equivalent | +| ------------------ | ------------------------- | ------------------- | +| ADD16 rt, ra, rb | Add | VADD (v16 <= rt,ra,rb <= v29), mm=00| +| RADD16 rt, ra, rb | Signed Halving add | RADD (v16 <= rt,ra,rb <= v23), mm=00| +| URADD16 rt, ra, rb | Unsigned Halving add | RADD (v24 <= rt,ra,rb <= v29), mm=00| +| KADD16 rt, ra, rb | Signed Saturating add | VADD (v16 <= rt,ra,rb <= v23), mm=01| +| UKADD16 rt, ra, rb | Unsigned Saturating add | VADD (v24 <= rt,ra,rb <= v29), mm=01| +| SUB16 rt, ra, rb | Subtract | VSUB (v16 <= rt,ra,rb <= v29), mm=00| +| RSUB16 rt, ra, rb | Signed Halving sub | RSUB (v16 <= rt,ra,rb <= v23), mm=00| +| URSUB16 rt, ra, rb | Unsigned Halving sub | RSUB (v24 <= rt,ra,rb <= v29), mm=00| +| KSUB16 rt, ra, rb | Signed Saturating sub | VSUB (v16 <= rt,ra,rb <= v23), mm=01| +| UKSUB16 rt, ra, rb | Unsigned Saturating sub | VSUB (v24 <= rt,ra,rb <= v29), mm=01| +| CRAS16 rt, ra, rb | Cross Add & Sub | | +| RCRAS16 rt, ra, rb | Signed Halving Cross Add & Sub | | +| URCRAS16 rt, ra, rb| Unsigned Halving Cross Add & Sub | | +| KCRAS16 rt, ra, rb | Signed Saturating Cross Add & Sub | | +| UKCRAS16 rt, ra, rb| Unsigned Saturating Cross Add & Sub | | +| CRSA16 rt, ra, rb | Cross Sub & Add | | +| RCRSA16 rt, ra, rb | Signed Halving Cross Sub & Add | | +| URCRSA16 rt, ra, rb| Unsigned Halving Cross Sub & Add | | +| KCRSA16 rt, ra, rb | Signed Saturating Cross Sub & Add | | +| UKCRSA16 rt, ra, rb| Unsigned Saturating Cross Sub & Add | | + +## 8-bit Arithmetic + +| Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent | +| ------------------ | ------------------------- | ------------------- | +| ADD8 rt, ra, rb | Add | VADD (v2 <= rt,ra,rb <= v15), mm=00 | +| RADD8 rt, ra, rb | Signed Halving add | RADD (v2 <= rt,ra,rb <= v7), mm=00 | +| URADD8 rt, ra, rb | Unsigned Halving add | RADD (v8 <= rt,ra,rb <= v15), mm=00 | +| KADD8 rt, ra, rb | Signed Saturating add | VADD (v2 <= rt,ra,rb <= v7), mm=01 | +| UKADD8 rt, ra, rb | Unsigned Saturating add | VADD (v8 <= rt,ra,rb <= v15), mm=01 | +| SUB8 rt, ra, rb | Subtract | VSUB (v2 <= rt,ra,rb <= v15), mm=00 | +| RSUB8 rt, ra, rb | Signed Halving sub | RSUB (v2 <= rt,ra,rb <= v7), mm=00 | +| URSUB8 rt, ra, rb | Unsigned Halving sub | RSUB (v8 <= rt,ra,rb <= v15), mm=00 | +| KSUB8 rt, ra, rb | Signed Saturating sub | VSUB (v2 <= rt,ra,rb <= v7), mm=01 | +| UKSUB8 rt, ra, rb | Unsigned Saturating sub | VSUB (v8 <= rt,ra,rb <= v15), mm=01 | + +## 16-bit Shifts + +SRA[I]16/SRL[I]16/SLL[I]16 to be mapped to VOP shift instructions in same manner as ADD16/SUB16 + +The “K” (Saturation) and “u” (Rounding) variants could be encoded using VOP’s mm field (mm=01 is saturated or rounded shift, mm=00 is standard VOP shift) + +| Andes Mnemonic | 16-bit Instruction | Harmonised RVP Equivalent | +| ------------------ | ------------------------- | ------------------- | +| SRA16 rt, ra, rb | Shift right arithmetic | VSRA (v16 <= rt,ra,rb <= v29), mm=00| +| SRAI16 rt, ra, im | Shift right arithmetic imm | VSRAI (v16 <= rt,ra <= v29), mm=00| +| SRA16.u rt, ra, rb | Rounding Shift right arithmetic | VSRA (v16 <= rt,ra,rb <= v29), mm=01| +| SRAI16.u rt, ra, im | Rounding Shift right arithmetic imm | VSRAI (v16 <= rt,ra <= v29), mm=01| +| SRL16 rt, ra, rb | Shift right logical | VSRL (v16 <= rt,ra,rb <= v29), mm=00| +| SRLI16 rt, ra, im | Shift right logical imm | VSRLI (v16 <= rt,ra <= v29), mm=00| +| SRL16.u rt, ra, rb | Rounding Shift right logical | VSRL (v16 <= rt,ra,rb <= v29), mm=01| +| SRLI16.u rt, ra, im | Rounding Shift right logical imm | VSLRI (v16 <= rt,ra <= v29), mm=01| +| SLL16 rt, ra, rb | Shift left logical | VSLL (v16 <= rt,ra,rb <= v29), mm=00| +| SLLI16 rt, ra, im | Shift left logical imm | VSLLI (v16 <= rt,ra <= v29), mm=00| +| KSLL16 rt, ra, rb | Saturating Shift left logical | VSLL (v16 <= rt,ra,rb <= v29), mm=01| +| KSLLI16 rt, ra, im | Saturating Shift left logical imm | VSLLI (v16 <= rt,ra <= v29), mm=01| +| KSLRA16 rt, ra, rb | Saturating Shift left logical or Shift right arithmetic || +| KSLRA16.u rt, ra, rb | Saturating Shift left logical or Rounding Shift right arithmetic || + + +## 8-bit Shifts + +Andes SIMD Packed ISA omits 8 bit shifts, but these can be encoded in Harmonised RVP as follows: + +| Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent | +| ------------------ | ------------------------- | ------------------- | +| n/a | Shift right arithmetic | VSRA (v2 <= rt,ra,rb <= v15), mm=00| +| n/a | Shift right arithmetic imm | VSRAI (v2 <= rt,ra <= v15), mm=00| +| n/a | Rounding Shift right arithmetic | VSRA (v2 <= rt,ra,rb <= v15), mm=01| +| n/a | Rounding Shift right arithmetic imm | VSRAI (v2 <= rt,ra <= v15), mm=01| +| n/a | Shift right logical | VSRL (v2 <= rt,ra,rb <= v15), mm=00| +| n/a | Shift right logical imm | VSRLI (v2 <= rt,ra <= v15), mm=00| +| n/a | Rounding Shift right logical | VSRL (v2 <= rt,ra,rb <= v15), mm=01| +| n/a | Rounding Shift right logical imm | VSLRI (v2 <= rt,ra <= v15), mm=01| +| n/a | Shift left logical | VSLL (v2 <= rt,ra,rb <= v15), mm=00| +| n/a | Shift left logical imm | VSLLI (v2 <= rt,ra <= v15), mm=00| +| n/a | Saturating Shift left logical | VSLL (v2 <= rt,ra,rb <= v15), mm=01| +| n/a | Saturating Shift left logical imm | VSLLI (v2 <= rt,ra <= v15), mm=01| + +## 16-bit Comparison instructions + +| Andes Mnemonic | 16-bit Instruction | Harmonised RVP Equivalent | +| ------------------ | ------------------------- | ------------------- | +| CMPEQ16 rt, ra, rb | Compare equal | VSEQ (v16 <= rt,ra,rb <= v29), mm=00| +| SCMPLT16 rt, ra, rb | Signed Compare less than | !VSGT (v16 <= rt,ra,rb <= v23), mm=00| +| SCMPLE16 rt, ra, rb | Signed Compare less or equal | VSLE (v16 <= rt,ra,rb <= v23), mm=00| +| UCMPLT16 rt, ra, rb | Unsigned Compare less than | !VSGT (v24 <= rt,ra,rb <= v29), mm=00| +| UCMPLE16 rt, ra, rb | Unsigned Compare less or equal | VSLE (v24 <= rt,ra,rb <= v29), mm=00| + +## 8-bit Comparison instructions + +| Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent | +| ------------------ | ------------------------- | ------------------- | +| CMPEQ8 rt, ra, rb | Compare equal | VSEQ (v2 <= rt,ra,rb <= v7), mm=00| +| SCMPLT8 rt, ra, rb | Signed Compare less than | !VSGT (v2 <= rt,ra,rb <= v7), mm=00| +| SCMPLE8 rt, ra, rb | Signed Compare less or equal | VSLE (v2 <= rt,ra,rb <= v7), mm=00| +| UCMPLT8 rt, ra, rb | Unsigned Compare less than | !VSGT (v8 <= rt,ra,rb <= v15), mm=00| +| UCMPLE8 rt, ra, rb | Unsigned Compare less or equal | VSLE (v8 <= rt,ra,rb <= v15), mm=00| + +## 16-bit Miscellaneous instructions + +| Andes Mnemonic | 16-bit Instruction | Harmonised RVP Equivalent | +| ------------------ | ------------------------ | ------------------- | +| SMIN16 rt, ra, rb | Signed minimum | VMIN (v16 <= rt,ra,rb <= v23), mm=00| +| UMIN16 rt, ra, rb | Unsigned minimum | VMIN (v24 <= rt,ra,rb <= v29), mm=00| +| SMAX16 rt, ra, rb | Signed maximum | VMAX (v16 <= rt,ra,rb <= v23), mm=00| +| UMAX16 rt, ra, rb | Unsigned maximum | VMAX (v24 <= rt,ra,rb <= v29), mm=00| +| SCLIP16 rt, ra, im | Signed clip | ?VCLIP (v16 <= rt,ra,rb <= v23), mm=01| +| UCLIP16 rt, ra, im | Unsigned clip | ?VCLIP (v24 <= rt,ra,rb <= v29), mm=01| +| KMUL16 rt, ra, rb | Signed multiply 16x16->16 | VMUL (v16 <= rt,ra,rb <= v23), mm=01| +| KMULX16 rt, ra, rb | Signed crossed multiply 16x16->16 | | +| SMUL16 rt, ra, rb | Signed multiply 16x16->32 | VMUL (30 <= rt <= 31, v16 <= ra,rb <= v23), mm=00| +| SMULX16 rt, ra, rb | Signed crossed multiply 16x16->32 | | +| UMUL16 rt, ra, rb | Signed multiply 16x16->32 | VMUL (30 <= rt <= 31, v24 <= ra,rb <= r31), mm=00| +| UMULX16 rt, ra, rb | Signed crossed multiply 16x16->32 | | +| KABS16 rt, ra | Saturated absolute value | VSGNX (v16 <= rt <= v29, v16 <= ra,rb <= v23, mm=01) | + +## 8-bit Miscellaneous instructions + +| Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent | +| ------------------ | ------------------------- | ------------------- | +| SMIN8 rt, ra, rb | Signed minimum | VMIN (v2 <= rt,ra,rb <= v7), mm=00| +| UMIN8 rt, ra, rb | Unsigned minimum | VMIN (v8 <= rt,ra,rb <= v15), mm=00| +| SMAX8 rt, ra, rb | Signed maximum | VMAX (v2 <= rt,ra,rb <= v7), mm=00| +| UMAX8 rt, ra, rb | Unsigned maximum | VMAX (v8 <= rt,ra,rb <= v15), mm=00| +| KABS8 rt, ra | Saturated absolute value | VSGNX (v2 <= rt <= v15, v2 <= ra,rb <= v8, mm=01) | + +## 8-bit Unpacking instructions + +| Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent | +| ------------------ | ------------------------- | ------------------- | +| SUNPKD810 rt, ra | Signed unpack bytes 1 & 0 | VMV (v16<= rt <= 23, v2 <= ra <= v7), mm=00| +| SUNPKD820 rt, ra | Signed unpack bytes 2 & 0 | | +| SUNPKD830 rt, ra | Signed unpack bytes 3 & 0 | | +| SUNPKD831 rt, ra | Signed unpack bytes 3 & 1 | | +| ZUNPKD810 rt, ra | Unsigned unpack bytes 1 & 0 | VMV (v24<= rt <= 31, v8 <= ra <= v15), mm=00| +| ZUNPKD820 rt, ra | Unsigned unpack bytes 2 & 0 | | +| ZUNPKD830 rt, ra | Unsigned unpack bytes 3 & 0 | | +| ZUNPKD831 rt, ra | Unsigned unpack bytes 3 & 1 | |