1 # RFC ls004 Shift-And-Add
5 * <https://libre-soc.org/openpower/sv/biginteger/analysis/>
6 * <https://libre-soc.org/openpower/sv/rfc/ls004/>
7 * bigint: <https://bugs.libre-soc.org/show_bug.cgi?id=960> TODO: maybe remove this link due to confusion and irrelevance?
8 * <https://git.openpower.foundation/isa/PowerISA/issues/91>
9 * shift-and-add <https://bugs.libre-soc.org/show_bug.cgi?id=968>
10 * add shaddw: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
22 **Books and Section affected**:
25 Book I Fixed-Point Shift Instructions 3.3.14.2
26 Appendix E Power ISA sorted by opcode
27 Appendix F Power ISA sorted by version
28 Appendix G Power ISA sorted by Compliancy Subset
29 Appendix H Power ISA sorted by mnemonic
37 shaddw - Shift and Add Signed Word
38 shadduw - Shift and Add Unsigned Word
41 **Submitter**: Luke Leighton (Libre-SOC)
43 **Requester**: Libre-SOC
45 **Impact on processor**:
48 Addition of three new GPR-based instructions
51 **Impact on software**:
54 Requires support for new instructions in assembler, debuggers,
61 GPR, Bit-manipulation, Shift, Arithmetic, Array Indexing
66 Power ISA is missing LD/ST Indexed with shift, which is present in both ARM
67 and x86. Adding more LD/ST is thirty eight instructions, a compromise is to
68 add shift-and-add. Replaces a pair of explicit instructions in hot-loops.
70 **Notes and Observations**:
72 1. `shadd` and `shadduw` operate on unsigned integers.
73 2. `shadduw` is intended for performing address offsets,
74 as the second operand is constrained to lower 32-bits
76 3. All three are 2-in 1-out instructions.
77 4. shift-add operations are present in both x86 and aarch64,
78 since they are useful for both general arithmetic and for
79 computing addresses even when not immediately followed
81 5. `shaddw` is often more useful than `shadduw` because C/C++ programmers like
82 to use `int` for array indexing. for additional details see
83 <https://bugs.libre-soc.org/show_bug.cgi?id=996>.
87 Add the following entries to:
89 * the Appendices of Book I
90 * Instructions of Book I added to Section 3.3.14.2
96 # Table of LD/ST-Indexed-Shift
98 The following demonstrates the alternative instructions that could
99 be considered to be added. They are all 9-bit XO which is not hugely
100 costly. The totals are
102 * 12 Load Indexed Shifted (with Update)
103 * 3 Load Indexed Shifted Byte-reverse
104 * 8 Store Indexed Shifted (with Update)
105 * 3 Store Indexed Shifted Byte-reverse
106 * 6 Floating-Point Load Indexed Shifted (with Update)
107 * 6 Floating-Point Store Indexed Shifted (with Update)
109 Total count: 38 new 9-bit XO instructions, for an approximate total
110 XO cost of 3 bits within a single Primary Opcode. With the savings
111 that these instructions represent in hot-loops, as evidenced by their
112 inclusion in top-end ISAs such as x86 and ARM, the cost may be considered
113 justifiable. However there is no point in placing these in EXT2xx, they
114 need to be in EXT0xx, because if added as 64-bit Encoding the benefit
115 reduction in binary size is not achieved.
117 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
118 |-------|------|-------|-------|-------|-------|----------------------|
119 | PO | RT | RA | RB | sm | XO | lbzsx RT,RA,RB,sm |
120 | PO | RT | RA | RB | sm | XO | lbzusx RT,RA,RB,sm |
121 | PO | RT | RA | RB | sm | XO | lhzsx RT,RA,RB,sm |
122 | PO | RT | RA | RB | sm | XO | lhzusx RT,RA,RB,sm |
123 | PO | RT | RA | RB | sm | XO | lhasx RT,RA,RB,sm |
124 | PO | RT | RA | RB | sm | XO | lhausx RT,RA,RB,sm |
125 | PO | RT | RA | RB | sm | XO | lwzsx RT,RA,RB,sm |
126 | PO | RT | RA | RB | sm | XO | lwzusx RT,RA,RB,sm |
127 | PO | RT | RA | RB | sm | XO | lwasx RT,RA,RB,sm |
128 | PO | RT | RA | RB | sm | XO | lwausx RT,RA,RB,sm |
129 | PO | RT | RA | RB | sm | XO | ldsx RT,RA,RB,sm |
130 | PO | RT | RA | RB | sm | XO | ldusx RT,RA,RB,sm |
131 | PO | RT | RA | RB | sm | XO | lhbrsx RT,RA,RB,sm |
132 | PO | RT | RA | RB | sm | XO | lwbrsx RT,RA,RB,sm |
133 | PO | RT | RA | RB | sm | XO | ldbrsx RT,RA,RB,sm |
134 | PO | RS | RA | RB | sm | XO | stbsx RS,RA,RB,sm |
135 | PO | RS | RA | RB | sm | XO | stbusx RS,RA,RB,sm |
136 | PO | RS | RA | RB | sm | XO | sthsx RS,RA,RB,sm |
137 | PO | RS | RA | RB | sm | XO | sthusx RS,RA,RB,sm |
138 | PO | RS | RA | RB | sm | XO | stwsx RS,RA,RB,sm |
139 | PO | RS | RA | RB | sm | XO | stwusx RS,RA,RB,sm |
140 | PO | RS | RA | RB | sm | XO | stdsx RS,RA,RB,sm |
141 | PO | RS | RA | RB | sm | XO | stdusx RS,RA,RB,sm |
142 | PO | RS | RA | RB | sm | XO | sthbrsx RS,RA,RB,sm |
143 | PO | RS | RA | RB | sm | XO | stwbrsx RS,RA,RB,sm |
144 | PO | RS | RA | RB | sm | XO | stdbrsx RS,RA,RB,sm |
145 | PO | FRT | RA | RB | sm | XO | lfsxs FRT,RA,RB,sm |
146 | PO | FRT | RA | RB | sm | XO | lfsuxs FRT,RA,RB,sm |
147 | PO | FRT | RA | RB | sm | XO | lfdxs FRT,RA,RB,sm |
148 | PO | FRT | RA | RB | sm | XO | lfduxs FRT,RA,RB,sm |
149 | PO | FRT | RA | RB | sm | XO | lfiwaxs FRT,RA,RB,sm |
150 | PO | FRT | RA | RB | sm | XO | lfiwzxs FRT,RA,RB,sm |
151 | PO | FRS | RA | RB | sm | XO | stfsxs FRS,RA,RB,sm |
152 | PO | FRS | RA | RB | sm | XO | stfsuxs FRS,RA,RB,sm |
153 | PO | FRS | RA | RB | sm | XO | stfdxs FRS,RA,RB,sm |
154 | PO | FRS | RA | RB | sm | XO | stfduxs FRS,RA,RB,sm |
155 | PO | FRS | RA | RB | sm | XO | stfiwxs FRS,RA,RB,sm |
165 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
166 |-------|------|-------|-------|-------|-------|----|----------|
167 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
172 shift <- sm + 1 # Shift is between 1-4
173 sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
174 RT <- sum # Result stored in RT
177 When `sm` is zero, the contents of register RB are multiplied by 2,
178 added to the contents of register RA, and the result stored in RT.
180 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
182 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
184 **NEED EXAMPLES (not sure how to embed sm)!!!**
192 # Shift-and-Add Signed Word
196 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
197 |-------|------|-------|-------|-------|-------|----|----------|
198 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
203 shift <- sm + 1 # Shift is between 1-4
204 n <- EXTS64((RB)[32:63]) # Only use lower 32-bits of RB
205 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
206 RT <- sum # Result stored in RT
209 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
210 added to the contents of register RA, and the result stored in RT.
212 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
214 Operands RA and RB, and the result RT are all 64-bit, signed integers.
217 The advantage of this instruction is doing address offsets. RA is the base 64-bit
218 address. RB is the offset into data structure limited to 32-bit.*
229 # Shift-and-Add Unsigned Word
233 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
234 |-------|------|-------|-------|-------|-------|----|----------|
235 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
240 shift <- sm + 1 # Shift is between 1-4
241 n <- (RB)[32:63] # Only use lower 32-bits of RB
242 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
243 RT <- sum # Result stored in RT
246 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
247 added to the contents of register RA, and the result stored in RT.
249 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
251 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
254 The advantage of this instruction is doing address offsets. RA is the base 64-bit
255 address. RB is the offset into data structure limited to 32-bit.*
268 Appendix E Power ISA sorted by opcode
269 Appendix F Power ISA sorted by version
270 Appendix G Power ISA sorted by Compliancy Subset
271 Appendix H Power ISA sorted by mnemonic
273 | Form | Book | Page | Version | mnemonic | Description |
274 |------|------|------|---------|----------|-------------|
275 | Z23 | I | # | 3.0B | shadd | Shift-and-Add |
276 | Z23 | I | # | 3.0B | shaddw | Shift-and-Add Signed Word |
277 | Z23 | I | # | 3.0B | shadduw | Shift-and-Add Unsigned Word |