8c2d1492c4c699ec0a19ac1cbcb90b6106dabcf1
[libreriscv.git] / openpower / sv / rfc / ls004.mdwn
1 # RFC ls004 Shift-And-Add
2
3 **URLs**:
4
5 * <https://libre-soc.org/openpower/sv/biginteger/analysis/>
6 * <https://libre-soc.org/openpower/sv/rfc/ls004/>
7 * bigint: <https://bugs.libre-soc.org/show_bug.cgi?id=960> TODO: maybe remove this link due to confusion and irrelevance?
8 * <https://git.openpower.foundation/isa/PowerISA/issues/91>
9 * shift-and-add <https://bugs.libre-soc.org/show_bug.cgi?id=968>
10 * add shaddw: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
11
12 **Severity**: Major
13
14 **Status**: New
15
16 **Date**: 31 Oct 2022
17
18 **Target**: v3.2B
19
20 **Source**: v3.0B
21
22 **Books and Section affected**:
23
24 ```
25 Book I Fixed-Point Shift Instructions 3.3.14.2
26 Appendix E Power ISA sorted by opcode
27 Appendix F Power ISA sorted by version
28 Appendix G Power ISA sorted by Compliancy Subset
29 Appendix H Power ISA sorted by mnemonic
30 ```
31
32 **Summary**
33
34 ```
35 Instructions added
36 shadd - Shift and Add
37 shaddw - Shift and Add Signed Word
38 shadduw - Shift and Add Unsigned Word
39 ```
40
41 **Submitter**: Luke Leighton (Libre-SOC)
42
43 **Requester**: Libre-SOC
44
45 **Impact on processor**:
46
47 ```
48 Addition of three new GPR-based instructions
49 ```
50
51 **Impact on software**:
52
53 ```
54 Requires support for new instructions in assembler, debuggers,
55 and related tools.
56 ```
57
58 **Keywords**:
59
60 ```
61 GPR, Bit-manipulation, Shift, Arithmetic, Array Indexing
62 ```
63
64 **Motivation**
65
66 Power ISA is missing LD/ST Indexed with shift, which is present in both ARM
67 and x86. Adding more LD/ST is thirty eight instructions, a compromise is to
68 add shift-and-add. Replaces a pair of explicit instructions in hot-loops.
69
70 **Notes and Observations**:
71
72 1. `shadd` and `shadduw` operate on unsigned integers.
73 2. `shadduw` is intended for performing address offsets,
74 as the second operand is constrained to lower 32-bits
75 and zero-extended.
76 3. All three are 2-in 1-out instructions.
77 4. shift-add operations are present in both x86 and aarch64,
78 since they are useful for both general arithmetic and for
79 computing addresses even when not immediately followed
80 with a load/store.
81 5. `shaddw` is often more useful than `shadduw` because C/C++ programmers like
82 to use `int` for array indexing. for additional details see
83 <https://bugs.libre-soc.org/show_bug.cgi?id=996>.
84
85 **Changes**
86
87 Add the following entries to:
88
89 * the Appendices of Book I
90 * Instructions of Book I added to Section 3.3.14.2
91
92 ----------------
93
94 \newpage{}
95
96 # Table of LD/ST-Indexed-Shift
97
98 The following demonstrates the alternative instructions that could
99 be considered to be added. They are all 9-bit XO which is not hugely
100 costly. The totals are
101
102 * 12 Load Indexed Shifted (with Update)
103 * 3 Load Indexed Shifted Byte-reverse
104 * 8 Store Indexed Shifted (with Update)
105 * 3 Store Indexed Shifted Byte-reverse
106 * 6 Floating-Point Load Indexed Shifted (with Update)
107 * 6 Floating-Point Store Indexed Shifted (with Update)
108
109 Total count: 38 new 9-bit XO instructions, for an approximate total
110 XO cost of 3 bits within a single Primary Opcode. With the savings
111 that these instructions represent in hot-loops, as evidenced by their
112 inclusion in top-end ISAs such as x86 and ARM, the cost may be considered
113 justifiable. However there is no point in placing these in EXT2xx, they
114 need to be in EXT0xx, because if added as 64-bit Encoding the benefit
115 reduction in binary size is not achieved.
116
117 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
118 |-------|------|-------|-------|-------|-------|----------------------|
119 | PO | RT | RA | RB | sm | XO | lbzsx RT,RA,RB,sm |
120 | PO | RT | RA | RB | sm | XO | lbzusx RT,RA,RB,sm |
121 | PO | RT | RA | RB | sm | XO | lhzsx RT,RA,RB,sm |
122 | PO | RT | RA | RB | sm | XO | lhzusx RT,RA,RB,sm |
123 | PO | RT | RA | RB | sm | XO | lhasx RT,RA,RB,sm |
124 | PO | RT | RA | RB | sm | XO | lhausx RT,RA,RB,sm |
125 | PO | RT | RA | RB | sm | XO | lwzsx RT,RA,RB,sm |
126 | PO | RT | RA | RB | sm | XO | lwzusx RT,RA,RB,sm |
127 | PO | RT | RA | RB | sm | XO | lwasx RT,RA,RB,sm |
128 | PO | RT | RA | RB | sm | XO | lwausx RT,RA,RB,sm |
129 | PO | RT | RA | RB | sm | XO | ldsx RT,RA,RB,sm |
130 | PO | RT | RA | RB | sm | XO | ldusx RT,RA,RB,sm |
131 | PO | RT | RA | RB | sm | XO | lhbrsx RT,RA,RB,sm |
132 | PO | RT | RA | RB | sm | XO | lwbrsx RT,RA,RB,sm |
133 | PO | RT | RA | RB | sm | XO | ldbrsx RT,RA,RB,sm |
134 | PO | RS | RA | RB | sm | XO | stbsx RS,RA,RB,sm |
135 | PO | RS | RA | RB | sm | XO | stbusx RS,RA,RB,sm |
136 | PO | RS | RA | RB | sm | XO | sthsx RS,RA,RB,sm |
137 | PO | RS | RA | RB | sm | XO | sthusx RS,RA,RB,sm |
138 | PO | RS | RA | RB | sm | XO | stwsx RS,RA,RB,sm |
139 | PO | RS | RA | RB | sm | XO | stwusx RS,RA,RB,sm |
140 | PO | RS | RA | RB | sm | XO | stdsx RS,RA,RB,sm |
141 | PO | RS | RA | RB | sm | XO | stdusx RS,RA,RB,sm |
142 | PO | RS | RA | RB | sm | XO | sthbrsx RS,RA,RB,sm |
143 | PO | RS | RA | RB | sm | XO | stwbrsx RS,RA,RB,sm |
144 | PO | RS | RA | RB | sm | XO | stdbrsx RS,RA,RB,sm |
145 | PO | FRT | RA | RB | sm | XO | lfsxs FRT,RA,RB,sm |
146 | PO | FRT | RA | RB | sm | XO | lfsuxs FRT,RA,RB,sm |
147 | PO | FRT | RA | RB | sm | XO | lfdxs FRT,RA,RB,sm |
148 | PO | FRT | RA | RB | sm | XO | lfduxs FRT,RA,RB,sm |
149 | PO | FRT | RA | RB | sm | XO | lfiwaxs FRT,RA,RB,sm |
150 | PO | FRT | RA | RB | sm | XO | lfiwzxs FRT,RA,RB,sm |
151 | PO | FRS | RA | RB | sm | XO | stfsxs FRS,RA,RB,sm |
152 | PO | FRS | RA | RB | sm | XO | stfsuxs FRS,RA,RB,sm |
153 | PO | FRS | RA | RB | sm | XO | stfdxs FRS,RA,RB,sm |
154 | PO | FRS | RA | RB | sm | XO | stfduxs FRS,RA,RB,sm |
155 | PO | FRS | RA | RB | sm | XO | stfiwxs FRS,RA,RB,sm |
156
157 ----------------
158
159 \newpage{}
160
161 # Shift-and-Add
162
163 `shadd RT, RA, RB`
164
165 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
166 |-------|------|-------|-------|-------|-------|----|----------|
167 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
168
169 Pseudocode:
170
171 ```
172 shift <- sm + 1 # Shift is between 1-4
173 sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
174 RT <- sum # Result stored in RT
175 ```
176
177 When `sm` is zero, the contents of register RB are multiplied by 2,
178 added to the contents of register RA, and the result stored in RT.
179
180 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
181
182 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
183
184 **NEED EXAMPLES (not sure how to embed sm)!!!**
185 Examples:
186
187 ```
188 # adds r1 to (r2*8)
189 shadd r4, r1, r2, 3
190 ```
191
192 # Shift-and-Add Signed Word
193
194 `shaddw RT, RA, RB`
195
196 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
197 |-------|------|-------|-------|-------|-------|----|----------|
198 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
199
200 Pseudocode:
201
202 ```
203 shift <- sm + 1 # Shift is between 1-4
204 n <- EXTS64((RB)[32:63]) # Only use lower 32-bits of RB
205 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
206 RT <- sum # Result stored in RT
207 ```
208
209 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
210 added to the contents of register RA, and the result stored in RT.
211
212 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
213
214 Operands RA and RB, and the result RT are all 64-bit, signed integers.
215
216 *Programmer's Note:
217 The advantage of this instruction is doing address offsets. RA is the base 64-bit
218 address. RB is the offset into data structure limited to 32-bit.*
219
220 Examples:
221
222 ```
223 #
224 shaddw r4, r1, r2
225 ```
226
227 [[!tag opf_rfc]]
228
229 # Shift-and-Add Unsigned Word
230
231 `shadduw RT, RA, RB`
232
233 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
234 |-------|------|-------|-------|-------|-------|----|----------|
235 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
236
237 Pseudocode:
238
239 ```
240 shift <- sm + 1 # Shift is between 1-4
241 n <- (RB)[32:63] # Only use lower 32-bits of RB
242 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
243 RT <- sum # Result stored in RT
244 ```
245
246 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
247 added to the contents of register RA, and the result stored in RT.
248
249 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
250
251 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
252
253 *Programmer's Note:
254 The advantage of this instruction is doing address offsets. RA is the base 64-bit
255 address. RB is the offset into data structure limited to 32-bit.*
256
257 Examples:
258
259 ```
260 #
261 shadduw r4, r1, r2
262 ```
263
264 [[!tag opf_rfc]]
265
266 # Appendices
267
268 Appendix E Power ISA sorted by opcode
269 Appendix F Power ISA sorted by version
270 Appendix G Power ISA sorted by Compliancy Subset
271 Appendix H Power ISA sorted by mnemonic
272
273 | Form | Book | Page | Version | mnemonic | Description |
274 |------|------|------|---------|----------|-------------|
275 | Z23 | I | # | 3.0B | shadd | Shift-and-Add |
276 | Z23 | I | # | 3.0B | shaddw | Shift-and-Add Signed Word |
277 | Z23 | I | # | 3.0B | shadduw | Shift-and-Add Unsigned Word |