cfa109b816afe573505c54e8d54cd1d89fde818e
[libreriscv.git] / openpower / sv / rfc / ls004.mdwn
1 # RFC ls004 Shift-And-Add
2
3 * Funded by NLnet under the Privacy and Enhanced Trust Programme, EU
4 Horizon2020 Grant 825310, and NGI0 Entrust No 101069594
5 * <https://libre-soc.org/openpower/sv/rfc/ls004/>
6 * <https://git.openpower.foundation/isa/PowerISA/issues/125>
7 * feedback: <https://bugs.libre-soc.org/show_bug.cgi?id=1091>
8
9 **Changes**:
10
11 * initial shift-and-add <https://bugs.libre-soc.org/show_bug.cgi?id=968>
12 * add saddw: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
13 * consider LD/ST-Shifted <https://bugs.libre-soc.org/show_bug.cgi?id=1055>
14
15 **Severity**: Major
16
17 **Status**: New
18
19 **Date**: 31 Oct 2022
20
21 **Target**: v3.2B
22
23 **Source**: v3.0B
24
25 **Books and Section affected**:
26
27 ```
28 Book I Fixed-Point Shift Instructions 3.3.14.2
29 Appendix E Power ISA sorted by opcode
30 Appendix F Power ISA sorted by version
31 Appendix G Power ISA sorted by Compliancy Subset
32 Appendix H Power ISA sorted by mnemonic
33 ```
34
35 **Summary**
36
37 ```
38 Instructions added
39 sadd - Shift and Add
40 saddw - Shift and Add Signed Word
41 sadduw - Shift and Add Unsigned Word
42 Also under consideration LD/ST-Indexed-Shifted
43 ```
44
45 **Submitter**: Luke Leighton (Libre-SOC)
46
47 **Requester**: Libre-SOC
48
49 **Impact on processor**:
50
51 ```
52 Addition of three new GPR-based instructions
53 ```
54
55 **Impact on software**:
56
57 ```
58 Requires support for new instructions in assembler, debuggers,
59 and related tools.
60 ```
61
62 **Keywords**:
63
64 ```
65 GPR, Bit-manipulation, Shift, Arithmetic, Array Indexing
66 ```
67
68 **Motivation**
69
70 Power ISA is missing LD/ST Indexed with shift, which is present in both ARM
71 and x86. Adding more LD/ST is thirty eight instructions, a compromise is to
72 add shift-and-add. Replaces a pair of explicit instructions in hot-loops.
73
74 **Notes and Observations**:
75
76 1. `sadd` and `sadduw` operate on unsigned integers.
77 2. `sadduw` is intended for performing address offsets,
78 as the second operand is constrained to lower 32-bits
79 and zero-extended.
80 3. All three are 2-in 1-out instructions.
81 4. shift-add operations are present in both x86 and aarch64,
82 since they are useful for both general arithmetic and for
83 computing addresses even when not immediately followed
84 with a load/store.
85 5. `saddw` is often more useful than `sadduw` because C/C++ programmers like
86 to use `int` for array indexing. for additional details see
87 <https://bugs.libre-soc.org/show_bug.cgi?id=996>.
88 6. Even Motorola 68000 has LD/ST-Indexed-Shifted <https://tack.sourceforge.net/olddocs/m68020.html#2.2.2.%20Extra%20MC68020%20addressing%20modes>
89 7. should average-shift-add also be included? what about CA-in / CA-out?
90
91 **Changes**
92
93 Add the following entries to:
94
95 * the Appendices of Book I
96 * Instructions of Book I added to Section 3.3.14.2
97
98 ----------------
99
100 \newpage{}
101
102 # Table of LD/ST-Indexed-Shift
103
104 The following demonstrates the alternative instructions that could
105 be considered to be added. They are all 9-bit XO:
106
107 * 12 Load Indexed Shifted (with Update)
108 * 3 Load Indexed Shifted Byte-reverse
109 * 8 Store Indexed Shifted (with Update)
110 * 3 Store Indexed Shifted Byte-reverse
111 * 6 Floating-Point Load Indexed Shifted (with Update)
112 * 6 Floating-Point Store Indexed Shifted (with Update)
113 * 6 Load Indexed Shifted Update Post-Increment
114 * 4 Store Indexed Shifted Update Post-Increment
115 * 2 Floating-Point Load Indexed Shifted Update Post-Increment
116 * 2 Floating-Point Store Indexed Shifted Update Post-Increment
117
118 Total count: 51 new 9-bit XO instructions, for an approximate total
119 XO cost of 3 bits within a single Primary Opcode. With the savings
120 that these instructions represent in hot-loops, as evidenced by their
121 inclusion in top-end ISAs such as x86 and ARM, the cost may be considered
122 justifiable. However there is no point in placing the 38
123 Shifted-only group in EXT2xx, they need to be in EXT0xx, because if added
124 as 64-bit Encoding the benefit reduction in binary size is not achieved.
125 Post-Increment-Shifted on the other hand could reasonably be proposed
126 in EXT2xx.
127
128 **LD/ST-Shifted**
129
130 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
131 |-------|------|-------|-------|-------|-------|----------------------|
132 | PO | RT | RA | RB | SH | XO | lbzsx RT,RA,RB,SH |
133 | PO | RT | RA | RB | SH | XO | lhzsx RT,RA,RB,SH |
134 | PO | RT | RA | RB | SH | XO | lhasx RT,RA,RB,SH |
135 | PO | RT | RA | RB | SH | XO | lwzsx RT,RA,RB,SH |
136 | PO | RT | RA | RB | SH | XO | lwasx RT,RA,RB,SH |
137 | PO | RT | RA | RB | SH | XO | ldsx RT,RA,RB,SH |
138 | PO | RT | RA | RB | SH | XO | lhbrsx RT,RA,RB,SH |
139 | PO | RT | RA | RB | SH | XO | lwbrsx RT,RA,RB,SH |
140 | PO | RT | RA | RB | SH | XO | ldbrsx RT,RA,RB,SH |
141 | PO | RS | RA | RB | SH | XO | stbsx RS,RA,RB,SH |
142 | PO | RS | RA | RB | SH | XO | sthsx RS,RA,RB,SH |
143 | PO | RS | RA | RB | SH | XO | stwsx RS,RA,RB,SH |
144 | PO | RS | RA | RB | SH | XO | stdsx RS,RA,RB,SH |
145 | PO | RS | RA | RB | SH | XO | sthbrsx RS,RA,RB,SH |
146 | PO | RS | RA | RB | SH | XO | stwbrsx RS,RA,RB,SH |
147 | PO | RS | RA | RB | SH | XO | stdbrsx RS,RA,RB,SH |
148 | PO | FRT | RA | RB | SH | XO | lfsxs FRT,RA,RB,SH |
149 | PO | FRT | RA | RB | SH | XO | lfdxs FRT,RA,RB,SH |
150 | PO | FRT | RA | RB | SH | XO | lfiwaxs FRT,RA,RB,SH |
151 | PO | FRT | RA | RB | SH | XO | lfiwzxs FRT,RA,RB,SH |
152 | PO | FRS | RA | RB | SH | XO | stfsxs FRS,RA,RB,SH |
153 | PO | FRS | RA | RB | SH | XO | stfdxs FRS,RA,RB,SH |
154 | PO | FRS | RA | RB | SH | XO | stfiwxs FRS,RA,RB,SH |
155
156 **LD/ST-Shifted-Update**
157
158 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
159 |-------|------|-------|-------|-------|-------|----------------------|
160 | PO | RT | RA | RB | SH | XO | lbzusx RT,RA,RB,SH |
161 | PO | RT | RA | RB | SH | XO | lhzusx RT,RA,RB,SH |
162 | PO | RT | RA | RB | SH | XO | lhausx RT,RA,RB,SH |
163 | PO | RT | RA | RB | SH | XO | lwzusx RT,RA,RB,SH |
164 | PO | RT | RA | RB | SH | XO | lwausx RT,RA,RB,SH |
165 | PO | RT | RA | RB | SH | XO | ldusx RT,RA,RB,SH |
166 | PO | RS | RA | RB | SH | XO | stbusx RS,RA,RB,SH |
167 | PO | RS | RA | RB | SH | XO | sthusx RS,RA,RB,SH |
168 | PO | RS | RA | RB | SH | XO | stwusx RS,RA,RB,SH |
169 | PO | RS | RA | RB | SH | XO | stdusx RS,RA,RB,SH |
170 | PO | FRT | RA | RB | SH | XO | lfsuxs FRT,RA,RB,SH |
171 | PO | FRT | RA | RB | SH | XO | lfduxs FRT,RA,RB,SH |
172 | PO | FRS | RA | RB | SH | XO | stfsuxs FRS,RA,RB,SH |
173 | PO | FRS | RA | RB | SH | XO | stfduxs FRS,RA,RB,SH |
174
175 **Post-Increment-Update LD/ST-Shifted**
176
177 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
178 |-------|------|-------|-------|-------|-------|----------------------|
179 | PO | RT | RA | RB | SH | XO | lbzupsx RT,RA,RB,SH |
180 | PO | RT | RA | RB | SH | XO | lhzupsx RT,RA,RB,SH |
181 | PO | RT | RA | RB | SH | XO | lhaupsx RT,RA,RB,SH |
182 | PO | RT | RA | RB | SH | XO | lwzupsx RT,RA,RB,SH |
183 | PO | RT | RA | RB | SH | XO | lwaupsx RT,RA,RB,SH |
184 | PO | RS | RA | RB | SH | XO | stbupsx RS,RA,RB,SH |
185 | PO | RS | RA | RB | SH | XO | sthupsx RS,RA,RB,SH |
186 | PO | RS | RA | RB | SH | XO | stwupsx RS,RA,RB,SH |
187 | PO | RS | RA | RB | SH | XO | stdupsx RS,RA,RB,SH |
188 | PO | RT | RA | RB | SH | XO | ldupsx RT,RA,RB,SH |
189 | PO | FRT | RA | RB | SH | XO | lfdupxs FRT,RA,RB,SH |
190 | PO | FRT | RA | RB | SH | XO | lfsupxs FRT,RA,RB,SH |
191 | PO | FRS | RA | RB | SH | XO | stfdupxs FRS,RA,RB,SH |
192 | PO | FRS | RA | RB | SH | XO | stfsupxs FRS,RA,RB,SH |
193
194 ----------------
195
196 \newpage{}
197
198 # Shift-and-Add
199
200 `sadd RT, RA, RB, SH`
201
202 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
203 |-------|------|-------|-------|-------|-------|----|----------|
204 | PO | RT | RA | RB | SH | XO | Rc | Z23-Form |
205
206 Pseudocode:
207
208 ```
209 shift <- SH + 1 # Shift is between 1-4
210 sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
211 RT <- sum # Result stored in RT
212 ```
213
214 When `SH` is zero, the contents of register RB are multiplied by 2,
215 added to the contents of register RA, and the result stored in RT.
216
217 `SH` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
218
219 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
220
221 **NEED EXAMPLES (not sure how to embed SH)!!!**
222 Examples:
223
224 ```
225 # adds r1 to (r2*8)
226 sadd r4, r1, r2, 3
227 ```
228
229 # Shift-and-Add Signed Word
230
231 `saddw RT, RA, RB, SH`
232
233 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
234 |-------|------|-------|-------|-------|-------|----|----------|
235 | PO | RT | RA | RB | SH | XO | Rc | Z23-Form |
236
237 Pseudocode:
238
239 ```
240 shift <- SH + 1 # Shift is between 1-4
241 n <- EXTS64((RB)[32:63]) # Only use lower 32-bits of RB
242 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
243 RT <- sum # Result stored in RT
244 ```
245
246 When `SH` is zero, the lower word contents of register RB are multiplied by 2,
247 added to the contents of register RA, and the result stored in RT.
248
249 `SH` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
250
251 Operands RA and RB, and the result RT are all 64-bit, signed integers.
252
253 *Programmer's Note:
254 The advantage of this instruction is doing address offsets. RA is the base 64-bit
255 address. RB is the offset into data structure limited to 32-bit.*
256
257 Examples:
258
259 ```
260 # r4 = r1 + (r2*16)
261 saddw r4, r1, r2, 3
262 ```
263
264 ----------------
265
266 \newpage{}
267
268
269 # Shift-and-Add Unsigned Word
270
271 `sadduw RT, RA, RB, SH`
272
273 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
274 |-------|------|-------|-------|-------|-------|----|----------|
275 | PO | RT | RA | RB | SH | XO | Rc | Z23-Form |
276
277 Pseudocode:
278
279 ```
280 shift <- SH + 1 # Shift is between 1-4
281 n <- (RB)[32:63] # Only use lower 32-bits of RB
282 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
283 RT <- sum # Result stored in RT
284 ```
285
286 When `SH` is zero, the lower word contents of register RB are multiplied by 2,
287 added to the contents of register RA, and the result stored in RT.
288
289 `SH` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
290
291 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
292
293 *Programmer's Note:
294 The advantage of this instruction is doing address offsets. RA is the base 64-bit
295 address. RB is the offset into data structure limited to 32-bit.*
296
297 Examples:
298
299 ```
300 #
301 sadduw r4, r1, r2, 2
302 ```
303
304 \newpage{}
305 [[!inline pages="openpower/isa/pifixedloadshift" raw=yes ]]
306 \newpage{}
307 [[!inline pages="openpower/isa/pifixedstoreshift" raw=yes ]]
308 \newpage{}
309 [[!inline pages="openpower/isa/pifploadshift" raw=yes ]]
310 \newpage{}
311 [[!inline pages="openpower/isa/pifpstoreshift" raw=yes ]]
312
313 \newpage{}
314
315 # Instruction Formats
316
317 **Add the following to Book I 1.6.1**
318
319 Z23-Form:
320
321 ```
322 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
323 |-------|------|-------|-------|-------|-------|----|----------|
324 | PO | RT | RA | RB | SH | XO | Rc | Z23-Form |
325 | PO | RS | RA | RB | SH | XO | Rc | Z23-Form |
326 | PO | FRT | RA | RB | SH | XO | Rc | Z23-Form |
327 | PO | FRS | RA | RB | SH | XO | Rc | Z23-Form |
328 ```
329
330 # Instruction Fields
331
332 Add Z23 to the following Formats in Book I 1.6.2: `FRS FRT RT RA RB XO Rc`
333
334 Add the following new fields:
335
336 ```
337 SH (21:22)
338 Field used to specify a shift amount.
339 Formats: Z23
340 ```
341
342 # Appendices
343
344 Appendix E Power ISA sorted by opcode
345 Appendix F Power ISA sorted by version
346 Appendix G Power ISA sorted by Compliancy Subset
347 Appendix H Power ISA sorted by mnemonic
348
349 | Form | Book | Page | Version | mnemonic | Description |
350 |------|------|------|---------|----------|-------------|
351 | Z23 | I | # | 3.0B | sadd | Shift-and-Add |
352 | Z23 | I | # | 3.0B | saddw | Shift-and-Add Signed Word |
353 | Z23 | I | # | 3.0B | sadduw | Shift-and-Add Unsigned Word |
354
355 [[!tag opf_rfc]]
356