1 # RFC ls004 Shift-And-Add
5 * <https://libre-soc.org/openpower/sv/biginteger/analysis/>
6 * <https://libre-soc.org/openpower/sv/rfc/ls004/>
7 * bigint: <https://bugs.libre-soc.org/show_bug.cgi?id=960> TODO: maybe remove this link due to confusion and irrelevance?
8 * <https://git.openpower.foundation/isa/PowerISA/issues/91>
9 * shift-and-add <https://bugs.libre-soc.org/show_bug.cgi?id=968>
10 * add shaddw: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
22 **Books and Section affected**:
25 Book I Fixed-Point Shift Instructions 3.3.14.2
26 Appendix E Power ISA sorted by opcode
27 Appendix F Power ISA sorted by version
28 Appendix G Power ISA sorted by Compliancy Subset
29 Appendix H Power ISA sorted by mnemonic
37 shaddw - Shift and Add Signed Word
38 shadduw - Shift and Add Unsigned Word
41 **Submitter**: Luke Leighton (Libre-SOC)
43 **Requester**: Libre-SOC
45 **Impact on processor**:
48 Addition of three new GPR-based instructions
51 **Impact on software**:
54 Requires support for new instructions in assembler, debuggers,
61 GPR, Bit-manipulation, Shift, Arithmetic, Array Indexing
66 Power ISA is missing LD/ST Indexed with shift, which is present in both ARM
67 and x86. Adding more LD/ST is thirty eight instructions, a compromise is to
68 add shift-and-add. Replaces a pair of explicit instructions in hot-loops.
70 **Notes and Observations**:
72 1. `shadd` and `shadduw` operate on unsigned integers.
73 2. `shadduw` is intended for performing address offsets,
74 as the second operand is constrained to lower 32-bits
76 3. All three are 2-in 1-out instructions.
77 4. shift-add operations are present in both x86 and aarch64,
78 since they are useful for both general arithmetic and for
79 computing addresses even when not immediately followed
81 5. `shaddw` is often more useful than `shadduw` because C/C++ programmers like
82 to use `int` for array indexing. for additional details see
83 <https://bugs.libre-soc.org/show_bug.cgi?id=996>.
85 TODO: signed 32-bit shift-and-add should be added, this needs to be addressed
86 before submitting the RFC: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
90 Add the following entries to:
92 * the Appendices of Book I
93 * Instructions of Book I added to Section 3.3.14.2
99 # Table of LD/ST-Indexed-Shift
101 The following demonstrates the alternative instructions that could
102 be considered to be added. They are all 9-bit XO which is not hugely
103 costly. The totals are
105 * 12 Load Indexed Shifted (with Update)
106 * 3 Load Indexed Shifted Byte-reverse
107 * 8 Store Indexed Shifted (with Update)
108 * 3 Store Indexed Shifted Byte-reverse
109 * 6 Floating-Point Load Indexed Shifted (with Update)
110 * 6 Floating-Point Store Indexed Shifted (with Update)
112 Total count: 38 new 9-bit XO instructions, for an approximate total
113 XO cost of 3 bits within a single Primary Opcode. With the savings
114 that these instructions represent in hot-loops, as evidenced by their
115 inclusion in top-end ISAs such as x86 and ARM, the cost may be considered
116 justifiable. However there is no point in placing these in EXT2xx, they
117 need to be in EXT0xx, because if added as 64-bit Encoding the benefit
118 reduction in binary size is not achieved.
120 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
121 |-------|------|-------|-------|-------|-------|----------------------|
122 | PO | RT | RA | RB | sm | XO | lbzsx RT,RA,RB,sm |
123 | PO | RT | RA | RB | sm | XO | lbzusx RT,RA,RB,sm |
124 | PO | RT | RA | RB | sm | XO | lhzsx RT,RA,RB,sm |
125 | PO | RT | RA | RB | sm | XO | lhzusx RT,RA,RB,sm |
126 | PO | RT | RA | RB | sm | XO | lhasx RT,RA,RB,sm |
127 | PO | RT | RA | RB | sm | XO | lhausx RT,RA,RB,sm |
128 | PO | RT | RA | RB | sm | XO | lwzsx RT,RA,RB,sm |
129 | PO | RT | RA | RB | sm | XO | lwzusx RT,RA,RB,sm |
130 | PO | RT | RA | RB | sm | XO | lwasx RT,RA,RB,sm |
131 | PO | RT | RA | RB | sm | XO | lwausx RT,RA,RB,sm |
132 | PO | RT | RA | RB | sm | XO | ldsx RT,RA,RB,sm |
133 | PO | RT | RA | RB | sm | XO | ldusx RT,RA,RB,sm |
134 | PO | RT | RA | RB | sm | XO | lhbrsx RT,RA,RB,sm |
135 | PO | RT | RA | RB | sm | XO | lwbrsx RT,RA,RB,sm |
136 | PO | RT | RA | RB | sm | XO | ldbrsx RT,RA,RB,sm |
137 | PO | RS | RA | RB | sm | XO | stbsx RS,RA,RB,sm |
138 | PO | RS | RA | RB | sm | XO | stbusx RS,RA,RB,sm |
139 | PO | RS | RA | RB | sm | XO | sthsx RS,RA,RB,sm |
140 | PO | RS | RA | RB | sm | XO | sthusx RS,RA,RB,sm |
141 | PO | RS | RA | RB | sm | XO | stwsx RS,RA,RB,sm |
142 | PO | RS | RA | RB | sm | XO | stwusx RS,RA,RB,sm |
143 | PO | RS | RA | RB | sm | XO | stdsx RS,RA,RB,sm |
144 | PO | RS | RA | RB | sm | XO | stdusx RS,RA,RB,sm |
145 | PO | RS | RA | RB | sm | XO | sthbrsx RS,RA,RB,sm |
146 | PO | RS | RA | RB | sm | XO | stwbrsx RS,RA,RB,sm |
147 | PO | RS | RA | RB | sm | XO | stdbrsx RS,RA,RB,sm |
148 | PO | FRT | RA | RB | sm | XO | lfsxs FRT,RA,RB,sm |
149 | PO | FRT | RA | RB | sm | XO | lfsuxs FRT,RA,RB,sm |
150 | PO | FRT | RA | RB | sm | XO | lfdxs FRT,RA,RB,sm |
151 | PO | FRT | RA | RB | sm | XO | lfduxs FRT,RA,RB,sm |
152 | PO | FRT | RA | RB | sm | XO | lfiwaxs FRT,RA,RB,sm |
153 | PO | FRT | RA | RB | sm | XO | lfiwzxs FRT,RA,RB,sm |
154 | PO | FRS | RA | RB | sm | XO | stfsxs FRS,RA,RB,sm |
155 | PO | FRS | RA | RB | sm | XO | stfsuxs FRS,RA,RB,sm |
156 | PO | FRS | RA | RB | sm | XO | stfdxs FRS,RA,RB,sm |
157 | PO | FRS | RA | RB | sm | XO | stfduxs FRS,RA,RB,sm |
158 | PO | FRS | RA | RB | sm | XO | stfiwxs FRS,RA,RB,sm |
168 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
169 |-------|------|-------|-------|-------|-------|----|----------|
170 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
175 shift <- sm + 1 # Shift is between 1-4
176 sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
177 RT <- sum # Result stored in RT
180 When `sm` is zero, the contents of register RB are multiplied by 2,
181 added to the contents of register RA, and the result stored in RT.
183 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
185 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
187 **NEED EXAMPLES (not sure how to embed sm)!!!**
195 # Shift-and-Add Signed Word
199 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
200 |-------|------|-------|-------|-------|-------|----|----------|
201 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
206 shift <- sm + 1 # Shift is between 1-4
207 n <- EXTS64((RB)[32:63]) # Only use lower 32-bits of RB
208 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
209 RT <- sum # Result stored in RT
212 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
213 added to the contents of register RA, and the result stored in RT.
215 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
217 Operands RA and RB, and the result RT are all 64-bit, signed integers.
220 The advantage of this instruction is doing address offsets. RA is the base 64-bit
221 address. RB is the offset into data structure limited to 32-bit.*
232 # Shift-and-Add Unsigned Word
236 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
237 |-------|------|-------|-------|-------|-------|----|----------|
238 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
243 shift <- sm + 1 # Shift is between 1-4
244 n <- (RB)[32:63] # Only use lower 32-bits of RB
245 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
246 RT <- sum # Result stored in RT
249 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
250 added to the contents of register RA, and the result stored in RT.
252 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
254 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
257 The advantage of this instruction is doing address offsets. RA is the base 64-bit
258 address. RB is the offset into data structure limited to 32-bit.*
271 Appendix E Power ISA sorted by opcode
272 Appendix F Power ISA sorted by version
273 Appendix G Power ISA sorted by Compliancy Subset
274 Appendix H Power ISA sorted by mnemonic
276 | Form | Book | Page | Version | mnemonic | Description |
277 |------|------|------|---------|----------|-------------|
278 | Z23 | I | # | 3.0B | shadd | Shift-and-Add |
279 | Z23 | I | # | 3.0B | shaddw | Shift-and-Add Signed Word |
280 | Z23 | I | # | 3.0B | shadduw | Shift-and-Add Unsigned Word |