add shaddw to ls004
[libreriscv.git] / openpower / sv / rfc / ls004.mdwn
1 # RFC ls004 Shift-And-Add
2
3 **URLs**:
4
5 * <https://libre-soc.org/openpower/sv/biginteger/analysis/>
6 * <https://libre-soc.org/openpower/sv/rfc/ls004/>
7 * bigint: <https://bugs.libre-soc.org/show_bug.cgi?id=960> TODO: maybe remove this link due to confusion and irrelevance?
8 * <https://git.openpower.foundation/isa/PowerISA/issues/91>
9 * shift-and-add <https://bugs.libre-soc.org/show_bug.cgi?id=968>
10 * add shaddw: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
11
12 **Severity**: Major
13
14 **Status**: New
15
16 **Date**: 31 Oct 2022
17
18 **Target**: v3.2B
19
20 **Source**: v3.0B
21
22 **Books and Section affected**:
23
24 ```
25 Book I Fixed-Point Shift Instructions 3.3.14.2
26 Appendix E Power ISA sorted by opcode
27 Appendix F Power ISA sorted by version
28 Appendix G Power ISA sorted by Compliancy Subset
29 Appendix H Power ISA sorted by mnemonic
30 ```
31
32 **Summary**
33
34 ```
35 Instructions added
36 shadd - Shift and Add
37 shaddw - Shift and Add Signed Word
38 shadduw - Shift and Add Unsigned Word
39 ```
40
41 **Submitter**: Luke Leighton (Libre-SOC)
42
43 **Requester**: Libre-SOC
44
45 **Impact on processor**:
46
47 ```
48 Addition of three new GPR-based instructions
49 ```
50
51 **Impact on software**:
52
53 ```
54 Requires support for new instructions in assembler, debuggers,
55 and related tools.
56 ```
57
58 **Keywords**:
59
60 ```
61 GPR, Bit-manipulation, Shift, Arithmetic, Array Indexing
62 ```
63
64 **Motivation**
65
66 Power ISA is missing LD/ST Indexed with shift, which is present in both ARM
67 and x86. Adding more LD/ST is thirty eight instructions, a compromise is to
68 add shift-and-add. Replaces a pair of explicit instructions in hot-loops.
69
70 **Notes and Observations**:
71
72 1. `shadd` and `shadduw` operate on unsigned integers.
73 2. `shadduw` is intended for performing address offsets,
74 as the second operand is constrained to lower 32-bits
75 and zero-extended.
76 3. All three are 2-in 1-out instructions.
77 4. shift-add operations are present in both x86 and aarch64,
78 since they are useful for both general arithmetic and for
79 computing addresses even when not immediately followed
80 with a load/store.
81 5. `shaddw` is often more useful than `shadduw` because C/C++ programmers like
82 to use `int` for array indexing. for additional details see
83 <https://bugs.libre-soc.org/show_bug.cgi?id=996>.
84
85 TODO: signed 32-bit shift-and-add should be added, this needs to be addressed
86 before submitting the RFC: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
87
88 **Changes**
89
90 Add the following entries to:
91
92 * the Appendices of Book I
93 * Instructions of Book I added to Section 3.3.14.2
94
95 ----------------
96
97 \newpage{}
98
99 # Table of LD/ST-Indexed-Shift
100
101 The following demonstrates the alternative instructions that could
102 be considered to be added. They are all 9-bit XO which is not hugely
103 costly. The totals are
104
105 * 12 Load Indexed Shifted (with Update)
106 * 3 Load Indexed Shifted Byte-reverse
107 * 8 Store Indexed Shifted (with Update)
108 * 3 Store Indexed Shifted Byte-reverse
109 * 6 Floating-Point Load Indexed Shifted (with Update)
110 * 6 Floating-Point Store Indexed Shifted (with Update)
111
112 Total count: 38 new 9-bit XO instructions, for an approximate total
113 XO cost of 3 bits within a single Primary Opcode. With the savings
114 that these instructions represent in hot-loops, as evidenced by their
115 inclusion in top-end ISAs such as x86 and ARM, the cost may be considered
116 justifiable. However there is no point in placing these in EXT2xx, they
117 need to be in EXT0xx, because if added as 64-bit Encoding the benefit
118 reduction in binary size is not achieved.
119
120 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
121 |-------|------|-------|-------|-------|-------|----------------------|
122 | PO | RT | RA | RB | sm | XO | lbzsx RT,RA,RB,sm |
123 | PO | RT | RA | RB | sm | XO | lbzusx RT,RA,RB,sm |
124 | PO | RT | RA | RB | sm | XO | lhzsx RT,RA,RB,sm |
125 | PO | RT | RA | RB | sm | XO | lhzusx RT,RA,RB,sm |
126 | PO | RT | RA | RB | sm | XO | lhasx RT,RA,RB,sm |
127 | PO | RT | RA | RB | sm | XO | lhausx RT,RA,RB,sm |
128 | PO | RT | RA | RB | sm | XO | lwzsx RT,RA,RB,sm |
129 | PO | RT | RA | RB | sm | XO | lwzusx RT,RA,RB,sm |
130 | PO | RT | RA | RB | sm | XO | lwasx RT,RA,RB,sm |
131 | PO | RT | RA | RB | sm | XO | lwausx RT,RA,RB,sm |
132 | PO | RT | RA | RB | sm | XO | ldsx RT,RA,RB,sm |
133 | PO | RT | RA | RB | sm | XO | ldusx RT,RA,RB,sm |
134 | PO | RT | RA | RB | sm | XO | lhbrsx RT,RA,RB,sm |
135 | PO | RT | RA | RB | sm | XO | lwbrsx RT,RA,RB,sm |
136 | PO | RT | RA | RB | sm | XO | ldbrsx RT,RA,RB,sm |
137 | PO | RS | RA | RB | sm | XO | stbsx RS,RA,RB,sm |
138 | PO | RS | RA | RB | sm | XO | stbusx RS,RA,RB,sm |
139 | PO | RS | RA | RB | sm | XO | sthsx RS,RA,RB,sm |
140 | PO | RS | RA | RB | sm | XO | sthusx RS,RA,RB,sm |
141 | PO | RS | RA | RB | sm | XO | stwsx RS,RA,RB,sm |
142 | PO | RS | RA | RB | sm | XO | stwusx RS,RA,RB,sm |
143 | PO | RS | RA | RB | sm | XO | stdsx RS,RA,RB,sm |
144 | PO | RS | RA | RB | sm | XO | stdusx RS,RA,RB,sm |
145 | PO | RS | RA | RB | sm | XO | sthbrsx RS,RA,RB,sm |
146 | PO | RS | RA | RB | sm | XO | stwbrsx RS,RA,RB,sm |
147 | PO | RS | RA | RB | sm | XO | stdbrsx RS,RA,RB,sm |
148 | PO | FRT | RA | RB | sm | XO | lfsxs FRT,RA,RB,sm |
149 | PO | FRT | RA | RB | sm | XO | lfsuxs FRT,RA,RB,sm |
150 | PO | FRT | RA | RB | sm | XO | lfdxs FRT,RA,RB,sm |
151 | PO | FRT | RA | RB | sm | XO | lfduxs FRT,RA,RB,sm |
152 | PO | FRT | RA | RB | sm | XO | lfiwaxs FRT,RA,RB,sm |
153 | PO | FRT | RA | RB | sm | XO | lfiwzxs FRT,RA,RB,sm |
154 | PO | FRS | RA | RB | sm | XO | stfsxs FRS,RA,RB,sm |
155 | PO | FRS | RA | RB | sm | XO | stfsuxs FRS,RA,RB,sm |
156 | PO | FRS | RA | RB | sm | XO | stfdxs FRS,RA,RB,sm |
157 | PO | FRS | RA | RB | sm | XO | stfduxs FRS,RA,RB,sm |
158 | PO | FRS | RA | RB | sm | XO | stfiwxs FRS,RA,RB,sm |
159
160 ----------------
161
162 \newpage{}
163
164 # Shift-and-Add
165
166 `shadd RT, RA, RB`
167
168 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
169 |-------|------|-------|-------|-------|-------|----|----------|
170 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
171
172 Pseudocode:
173
174 ```
175 shift <- sm + 1 # Shift is between 1-4
176 sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
177 RT <- sum # Result stored in RT
178 ```
179
180 When `sm` is zero, the contents of register RB are multiplied by 2,
181 added to the contents of register RA, and the result stored in RT.
182
183 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
184
185 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
186
187 **NEED EXAMPLES (not sure how to embed sm)!!!**
188 Examples:
189
190 ```
191 # adds r1 to (r2*8)
192 shadd r4, r1, r2, 3
193 ```
194
195 # Shift-and-Add Signed Word
196
197 `shaddw RT, RA, RB`
198
199 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
200 |-------|------|-------|-------|-------|-------|----|----------|
201 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
202
203 Pseudocode:
204
205 ```
206 shift <- sm + 1 # Shift is between 1-4
207 n <- EXTS64((RB)[32:63]) # Only use lower 32-bits of RB
208 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
209 RT <- sum # Result stored in RT
210 ```
211
212 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
213 added to the contents of register RA, and the result stored in RT.
214
215 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
216
217 Operands RA and RB, and the result RT are all 64-bit, signed integers.
218
219 *Programmer's Note:
220 The advantage of this instruction is doing address offsets. RA is the base 64-bit
221 address. RB is the offset into data structure limited to 32-bit.*
222
223 Examples:
224
225 ```
226 #
227 shaddw r4, r1, r2
228 ```
229
230 [[!tag opf_rfc]]
231
232 # Shift-and-Add Unsigned Word
233
234 `shadduw RT, RA, RB`
235
236 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
237 |-------|------|-------|-------|-------|-------|----|----------|
238 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
239
240 Pseudocode:
241
242 ```
243 shift <- sm + 1 # Shift is between 1-4
244 n <- (RB)[32:63] # Only use lower 32-bits of RB
245 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
246 RT <- sum # Result stored in RT
247 ```
248
249 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
250 added to the contents of register RA, and the result stored in RT.
251
252 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
253
254 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
255
256 *Programmer's Note:
257 The advantage of this instruction is doing address offsets. RA is the base 64-bit
258 address. RB is the offset into data structure limited to 32-bit.*
259
260 Examples:
261
262 ```
263 #
264 shadduw r4, r1, r2
265 ```
266
267 [[!tag opf_rfc]]
268
269 # Appendices
270
271 Appendix E Power ISA sorted by opcode
272 Appendix F Power ISA sorted by version
273 Appendix G Power ISA sorted by Compliancy Subset
274 Appendix H Power ISA sorted by mnemonic
275
276 | Form | Book | Page | Version | mnemonic | Description |
277 |------|------|------|---------|----------|-------------|
278 | Z23 | I | # | 3.0B | shadd | Shift-and-Add |
279 | Z23 | I | # | 3.0B | shaddw | Shift-and-Add Signed Word |
280 | Z23 | I | # | 3.0B | shadduw | Shift-and-Add Unsigned Word |