5ddb0306629be11bc579741529ffeb3261038fe5
[libreriscv.git] / harmonised_rvv_rvp / comparative_analysis_harmonised_rvp_vs_andes_packed_simd_isa_proposal.mdwn
1 # Comparative analysis with Andes Packed ISA proposal
2
3 ## Register file
4
5 The harmonised RVP register file is divided into a lower bank of Vector[INT8] and an upper bank of Vector[INT16]
6
7 | Register | Andes ISA | Harmonised RVP ISA |
8 | ------------------ | ------------------------- | ------------------- |
9 | v0 | Hardwired zero | Hardwired zero |
10 | v1 | 32bit GPR or Vector[4xINT8 or 2xINT16] | Predicate mask |
11 | | | |
12 | v2 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xSINT8] |
13 | v3 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xSINT8] |
14 | v4 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xSINT8] |
15 | v5 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xSINT8] |
16 | v6 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xSINT8] |
17 | v7 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xSINT8] |
18 | v8 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] |
19 | v9 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] |
20 | v10 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] |
21 | v11 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] |
22 | v12 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] |
23 | v13 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] |
24 | v14 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] |
25 | v15 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[4xUINT8] |
26 | | | |
27 | v16 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] |
28 | v17 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] |
29 | v18 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] |
30 | v19 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] |
31 | v20 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] |
32 | v21 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] |
33 | v22 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] |
34 | v23 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xSINT16] |
35 | v24 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xUINT16] |
36 | v25 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xUINT16] |
37 | v26 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xUINT16] |
38 | v27 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xUINT16] |
39 | v28 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xUINT16] |
40 | v29 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[2xUINT16] |
41 | | | |
42 | v30 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[1xSINT32] |
43 | v31 | 32bit GPR or Vector[4xINT8 or 2xINT16] | 32bit GPR or Vector[1xSINT32] |
44
45
46 ## 16-bit Arithmetic
47
48 | Andes Mnemonic | 16-bit Instruction | Harmonised RVP Equivalent |
49 | ------------------ | ------------------------- | ------------------- |
50 | ADD16 rt, ra, rb | Add | VADD (v16 <= rt,ra,rb <= v29), mm=00|
51 | RADD16 rt, ra, rb | Signed Halving add | RADD (v16 <= rt,ra,rb <= v23), mm=00|
52 | URADD16 rt, ra, rb | Unsigned Halving add | RADD (v24 <= rt,ra,rb <= v29), mm=00|
53 | KADD16 rt, ra, rb | Signed Saturating add | VADD (v16 <= rt,ra,rb <= v23), mm=01|
54 | UKADD16 rt, ra, rb | Unsigned Saturating add | VADD (v24 <= rt,ra,rb <= v29), mm=01|
55 | SUB16 rt, ra, rb | Subtract | VSUB (v16 <= rt,ra,rb <= v29), mm=00|
56 | RSUB16 rt, ra, rb | Signed Halving sub | RSUB (v16 <= rt,ra,rb <= v23), mm=00|
57 | URSUB16 rt, ra, rb | Unsigned Halving sub | RSUB (v24 <= rt,ra,rb <= v29), mm=00|
58 | KSUB16 rt, ra, rb | Signed Saturating sub | VSUB (v16 <= rt,ra,rb <= v23), mm=01|
59 | UKSUB16 rt, ra, rb | Unsigned Saturating sub | VSUB (v24 <= rt,ra,rb <= v29), mm=01|
60 | CRAS16 rt, ra, rb | Cross Add & Sub | |
61 | RCRAS16 rt, ra, rb | Signed Halving Cross Add & Sub | |
62 | URCRAS16 rt, ra, rb| Unsigned Halving Cross Add & Sub | |
63 | KCRAS16 rt, ra, rb | Signed Saturating Cross Add & Sub | |
64 | UKCRAS16 rt, ra, rb| Unsigned Saturating Cross Add & Sub | |
65 | CRSA16 rt, ra, rb | Cross Sub & Add | |
66 | RCRSA16 rt, ra, rb | Signed Halving Cross Sub & Add | |
67 | URCRSA16 rt, ra, rb| Unsigned Halving Cross Sub & Add | |
68 | KCRSA16 rt, ra, rb | Signed Saturating Cross Sub & Add | |
69 | UKCRSA16 rt, ra, rb| Unsigned Saturating Cross Sub & Add | |
70
71 ## 8-bit Arithmetic
72
73 | Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent |
74 | ------------------ | ------------------------- | ------------------- |
75 | ADD8 rt, ra, rb | Add | VADD (v2 <= rt,ra,rb <= v15), mm=00 |
76 | RADD8 rt, ra, rb | Signed Halving add | RADD (v2 <= rt,ra,rb <= v7), mm=00 |
77 | URADD8 rt, ra, rb | Unsigned Halving add | RADD (v8 <= rt,ra,rb <= v15), mm=00 |
78 | KADD8 rt, ra, rb | Signed Saturating add | VADD (v2 <= rt,ra,rb <= v7), mm=01 |
79 | UKADD8 rt, ra, rb | Unsigned Saturating add | VADD (v8 <= rt,ra,rb <= v15), mm=01 |
80 | SUB8 rt, ra, rb | Subtract | VSUB (v2 <= rt,ra,rb <= v15), mm=00 |
81 | RSUB8 rt, ra, rb | Signed Halving sub | RSUB (v2 <= rt,ra,rb <= v7), mm=00 |
82 | URSUB8 rt, ra, rb | Unsigned Halving sub | RSUB (v8 <= rt,ra,rb <= v15), mm=00 |
83 | KSUB8 rt, ra, rb | Signed Saturating sub | VSUB (v2 <= rt,ra,rb <= v7), mm=01 |
84 | UKSUB8 rt, ra, rb | Unsigned Saturating sub | VSUB (v8 <= rt,ra,rb <= v15), mm=01 |
85
86 ## 16-bit Shifts
87
88 SRA[I]16/SRL[I]16/SLL[I]16 to be mapped to VOP shift instructions in same manner as ADD16/SUB16
89
90 The “K” (Saturation) and “u” (Rounding) variants could be encoded using VOP’s mm field (mm=01 is saturated or rounded shift, mm=00 is standard VOP shift)
91
92 | Andes Mnemonic | 16-bit Instruction | Harmonised RVP Equivalent |
93 | ------------------ | ------------------------- | ------------------- |
94 | SRA16 rt, ra, rb | Shift right arithmetic | VSRA (v16 <= rt,ra,rb <= v29), mm=00|
95 | SRAI16 rt, ra, im | Shift right arithmetic imm | VSRAI (v16 <= rt,ra <= v29), mm=00|
96 | SRA16.u rt, ra, rb | Rounding Shift right arithmetic | VSRA (v16 <= rt,ra,rb <= v29), mm=01|
97 | SRAI16.u rt, ra, im | Rounding Shift right arithmetic imm | VSRAI (v16 <= rt,ra <= v29), mm=01|
98 | SRL16 rt, ra, rb | Shift right logical | VSRL (v16 <= rt,ra,rb <= v29), mm=00|
99 | SRLI16 rt, ra, im | Shift right logical imm | VSRLI (v16 <= rt,ra <= v29), mm=00|
100 | SRL16.u rt, ra, rb | Rounding Shift right logical | VSRL (v16 <= rt,ra,rb <= v29), mm=01|
101 | SRLI16.u rt, ra, im | Rounding Shift right logical imm | VSLRI (v16 <= rt,ra <= v29), mm=01|
102 | SLL16 rt, ra, rb | Shift left logical | VSLL (v16 <= rt,ra,rb <= v29), mm=00|
103 | SLLI16 rt, ra, im | Shift left logical imm | VSLLI (v16 <= rt,ra <= v29), mm=00|
104 | KSLL16 rt, ra, rb | Saturating Shift left logical | VSLL (v16 <= rt,ra,rb <= v29), mm=01|
105 | KSLLI16 rt, ra, im | Saturating Shift left logical imm | VSLLI (v16 <= rt,ra <= v29), mm=01|
106 | KSLRA16 rt, ra, rb | Saturating Shift left logical or Shift right arithmetic ||
107 | KSLRA16.u rt, ra, rb | Saturating Shift left logical or Rounding Shift right arithmetic ||
108
109
110 ## 8-bit Shifts
111
112 Andes SIMD Packed ISA omits 8 bit shifts, but these can be encoded in Harmonised RVP as follows:
113
114 | Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent |
115 | ------------------ | ------------------------- | ------------------- |
116 | n/a | Shift right arithmetic | VSRA (v2 <= rt,ra,rb <= v15), mm=00|
117 | n/a | Shift right arithmetic imm | VSRAI (v2 <= rt,ra <= v15), mm=00|
118 | n/a | Rounding Shift right arithmetic | VSRA (v2 <= rt,ra,rb <= v15), mm=01|
119 | n/a | Rounding Shift right arithmetic imm | VSRAI (v2 <= rt,ra <= v15), mm=01|
120 | n/a | Shift right logical | VSRL (v2 <= rt,ra,rb <= v15), mm=00|
121 | n/a | Shift right logical imm | VSRLI (v2 <= rt,ra <= v15), mm=00|
122 | n/a | Rounding Shift right logical | VSRL (v2 <= rt,ra,rb <= v15), mm=01|
123 | n/a | Rounding Shift right logical imm | VSLRI (v2 <= rt,ra <= v15), mm=01|
124 | n/a | Shift left logical | VSLL (v2 <= rt,ra,rb <= v15), mm=00|
125 | n/a | Shift left logical imm | VSLLI (v2 <= rt,ra <= v15), mm=00|
126 | n/a | Saturating Shift left logical | VSLL (v2 <= rt,ra,rb <= v15), mm=01|
127 | n/a | Saturating Shift left logical imm | VSLLI (v2 <= rt,ra <= v15), mm=01|
128
129 ## 16-bit Comparison instructions
130
131 | Andes Mnemonic | 16-bit Instruction | Harmonised RVP Equivalent |
132 | ------------------ | ------------------------- | ------------------- |
133 | CMPEQ16 rt, ra, rb | Compare equal | VSEQ (v16 <= rt,ra,rb <= v29), mm=00|
134 | SCMPLT16 rt, ra, rb | Signed Compare less than | !VSGT (v16 <= rt,ra,rb <= v23), mm=00|
135 | SCMPLE16 rt, ra, rb | Signed Compare less or equal | VSLE (v16 <= rt,ra,rb <= v23), mm=00|
136 | UCMPLT16 rt, ra, rb | Unsigned Compare less than | !VSGT (v24 <= rt,ra,rb <= v29), mm=00|
137 | UCMPLE16 rt, ra, rb | Unsigned Compare less or equal | VSLE (v24 <= rt,ra,rb <= v29), mm=00|
138
139 ## 8-bit Comparison instructions
140
141 | Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent |
142 | ------------------ | ------------------------- | ------------------- |
143 | CMPEQ8 rt, ra, rb | Compare equal | VSEQ (v2 <= rt,ra,rb <= v7), mm=00|
144 | SCMPLT8 rt, ra, rb | Signed Compare less than | !VSGT (v2 <= rt,ra,rb <= v7), mm=00|
145 | SCMPLE8 rt, ra, rb | Signed Compare less or equal | VSLE (v2 <= rt,ra,rb <= v7), mm=00|
146 | UCMPLT8 rt, ra, rb | Unsigned Compare less than | !VSGT (v8 <= rt,ra,rb <= v15), mm=00|
147 | UCMPLE8 rt, ra, rb | Unsigned Compare less or equal | VSLE (v8 <= rt,ra,rb <= v15), mm=00|
148
149 ## 16-bit Miscellaneous instructions
150
151 | Andes Mnemonic | 16-bit Instruction | Harmonised RVP Equivalent |
152 | ------------------ | ------------------------ | ------------------- |
153 | SMIN16 rt, ra, rb | Signed minimum | VMIN (v16 <= rt,ra,rb <= v23), mm=00|
154 | UMIN16 rt, ra, rb | Unsigned minimum | VMIN (v24 <= rt,ra,rb <= v29), mm=00|
155 | SMAX16 rt, ra, rb | Signed maximum | VMAX (v16 <= rt,ra,rb <= v23), mm=00|
156 | UMAX16 rt, ra, rb | Unsigned maximum | VMAX (v24 <= rt,ra,rb <= v29), mm=00|
157 | SCLIP16 rt, ra, im | Signed clip | ?VCLIP (v16 <= rt,ra,rb <= v23), mm=01|
158 | UCLIP16 rt, ra, im | Unsigned clip | ?VCLIP (v24 <= rt,ra,rb <= v29), mm=01|
159 | KMUL16 rt, ra, rb | Signed multiply 16x16->16 | VMUL (v16 <= rt,ra,rb <= v23), mm=01|
160 | KMULX16 rt, ra, rb | Signed crossed multiply 16x16->16 | |
161 | SMUL16 rt, ra, rb | Signed multiply 16x16->32 | VMUL (v30 <= rt <= v31, v16 <= ra,rb <= v23), mm=00|
162 | SMULX16 rt, ra, rb | Signed crossed multiply 16x16->32 | |
163 | UMUL16 rt, ra, rb | Signed multiply 16x16->32 | VMUL (v30 <= rt <= v31, v24 <= ra,rb <= r31), mm=00|
164 | UMULX16 rt, ra, rb | Signed crossed multiply 16x16->32 | |
165 | KABS16 rt, ra | Saturated absolute value | VSGNX (v16 <= rt <= v29, v16 <= ra,rb <= v23, mm=01) |
166
167 ## 8-bit Miscellaneous instructions
168
169 | Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent |
170 | ------------------ | ------------------------- | ------------------- |
171 | SMIN8 rt, ra, rb | Signed minimum | VMIN (v2 <= rt,ra,rb <= v7), mm=00|
172 | UMIN8 rt, ra, rb | Unsigned minimum | VMIN (v8 <= rt,ra,rb <= v15), mm=00|
173 | SMAX8 rt, ra, rb | Signed maximum | VMAX (v2 <= rt,ra,rb <= v7), mm=00|
174 | UMAX8 rt, ra, rb | Unsigned maximum | VMAX (v8 <= rt,ra,rb <= v15), mm=00|
175 | KABS8 rt, ra | Saturated absolute value | VSGNX (v2 <= rt <= v15, v2 <= ra,rb <= v8, mm=01) |
176
177 ## 8-bit Unpacking instructions
178
179 | Andes Mnemonic | 8-bit Instruction | Harmonised RVP Equivalent |
180 | ------------------ | ------------------------- | ------------------- |
181 | SUNPKD810 rt, ra | Signed unpack bytes 1 & 0 | VMV (v16<= rt <= 23, v2 <= ra <= v7), mm=00|
182 | SUNPKD820 rt, ra | Signed unpack bytes 2 & 0 | |
183 | SUNPKD830 rt, ra | Signed unpack bytes 3 & 0 | |
184 | SUNPKD831 rt, ra | Signed unpack bytes 3 & 1 | |
185 | ZUNPKD810 rt, ra | Unsigned unpack bytes 1 & 0 | VMV (v24<= rt <= 31, v8 <= ra <= v15), mm=00|
186 | ZUNPKD820 rt, ra | Unsigned unpack bytes 2 & 0 | |
187 | ZUNPKD830 rt, ra | Unsigned unpack bytes 3 & 0 | |
188 | ZUNPKD831 rt, ra | Unsigned unpack bytes 3 & 1 | |