shift-add is useful even with LD-ST-indexed-shifted
[libreriscv.git] / openpower / sv / rfc / ls004.mdwn
1 # RFC ls004 Shift-And-Add
2
3 **URLs**:
4
5 * <https://libre-soc.org/openpower/sv/biginteger/analysis/>
6 * <https://libre-soc.org/openpower/sv/rfc/ls004/>
7 * bigint: <https://bugs.libre-soc.org/show_bug.cgi?id=960> TODO: maybe remove this link due to confusion and irrelevance?
8 * <https://git.openpower.foundation/isa/PowerISA/issues/91>
9 * shift-and-add <https://bugs.libre-soc.org/show_bug.cgi?id=968>
10 * add shaddw: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
11
12 **Severity**: Major
13
14 **Status**: New
15
16 **Date**: 31 Oct 2022
17
18 **Target**: v3.2B
19
20 **Source**: v3.0B
21
22 **Books and Section affected**:
23
24 ```
25 Book I Fixed-Point Shift Instructions 3.3.14.2
26 Appendix E Power ISA sorted by opcode
27 Appendix F Power ISA sorted by version
28 Appendix G Power ISA sorted by Compliancy Subset
29 Appendix H Power ISA sorted by mnemonic
30 ```
31
32 **Summary**
33
34 ```
35 Instructions added
36 shadd - Shift and Add
37 shadduw - Shift and Add Unsigned Word
38 ```
39
40 **Submitter**: Luke Leighton (Libre-SOC)
41
42 **Requester**: Libre-SOC
43
44 **Impact on processor**:
45
46 ```
47 Addition of two new GPR-based instructions
48 ```
49
50 **Impact on software**:
51
52 ```
53 Requires support for new instructions in assembler, debuggers,
54 and related tools.
55 ```
56
57 **Keywords**:
58
59 ```
60 GPR, Big-manip, Shift, Arithmetic
61 ```
62
63 **Motivation**
64
65 Power ISA is missing LD/ST Indexed with shift, which is present in both ARM
66 and x86. Adding more LD/ST is thirty eight instructions, a compromise is to
67 add shift-and-add. Replaces a pair of explicit instructions in hot-loops.
68
69 **Notes and Observations**:
70
71 1. `shadd` and `shadduw` operate on unsigned integers.
72 2. `shadduw` is intended for performing address offsets,
73 as the second operand is constrained to lower 32-bits
74 and zero-extended.
75 3. Both are 2-in 1-out instructions.
76 4. shift-add operations are present in both x86 and aarch64,
77 since they are useful for both general arithmetic and for
78 computing addresses even when not immediately followed
79 with a load/store.
80
81 TODO: signed 32-bit shift-and-add should be added, this needs to be addressed
82 before submitting the RFC: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
83
84 **Changes**
85
86 Add the following entries to:
87
88 * the Appendices of Book I
89 * Instructions of Book I added to Section 3.3.14.2
90
91 ----------------
92
93 \newpage{}
94
95 # Table of LD/ST-Indexed-Shift
96
97 The following demonstrates the alternative instructions that could
98 be considered to be added. They are all 9-bit XO which is not hugely
99 costly. The totals are
100
101 * 12 Load Indexed Shifted (with Update)
102 * 3 Load Indexed Shifted Byte-reverse
103 * 8 Store Indexed Shifted (with Update)
104 * 3 Store Indexed Shifted Byte-reverse
105 * 6 Floating-Point Load Indexed Shifted (with Update)
106 * 6 Floating-Point Store Indexed Shifted (with Update)
107
108 Total count: 38 new 9-bit XO instructions, for an approximate total
109 XO cost of 3 bits within a single Primary Opcode. With the savings
110 that these instructions represent in hot-loops, as evidenced by their
111 inclusion in top-end ISAs such as x86 and ARM, the cost may be considered
112 justifiable. However there is no point in placing these in EXT2xx, they
113 need to be in EXT0xx, because if added as 64-bit Encoding the benefit
114 reduction in binary size is not achieved.
115
116 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
117 |-------|------|-------|-------|-------|-------|----------|
118 | PO | RT | RA | RB | sm | XO | lbzsx RT,RA,RB,sm |
119 | PO | RT | RA | RB | sm | XO | lbzusx RT,RA,RB,sm |
120 | PO | RT | RA | RB | sm | XO | lhzsx RT,RA,RB,sm |
121 | PO | RT | RA | RB | sm | XO | lhzusx RT,RA,RB,sm |
122 | PO | RT | RA | RB | sm | XO | lhasx RT,RA,RB,sm |
123 | PO | RT | RA | RB | sm | XO | lhausx RT,RA,RB,sm |
124 | PO | RT | RA | RB | sm | XO | lwzsx RT,RA,RB,sm |
125 | PO | RT | RA | RB | sm | XO | lwzusx RT,RA,RB,sm |
126 | PO | RT | RA | RB | sm | XO | lwasx RT,RA,RB,sm |
127 | PO | RT | RA | RB | sm | XO | lwausx RT,RA,RB,sm |
128 | PO | RT | RA | RB | sm | XO | ldsx RT,RA,RB,sm |
129 | PO | RT | RA | RB | sm | XO | ldusx RT,RA,RB,sm |
130 | PO | RT | RA | RB | sm | XO | lhbrsx RT,RA,RB,sm |
131 | PO | RT | RA | RB | sm | XO | lwbrsx RT,RA,RB,sm |
132 | PO | RT | RA | RB | sm | XO | ldbrsx RT,RA,RB,sm |
133 | PO | RS | RA | RB | sm | XO | stbus RS,RA,RB,sm |
134 | PO | RS | RA | RB | sm | XO | stbusx RS,RA,RB,sm |
135 | PO | RS | RA | RB | sm | XO | sthsx RS,RA,RB,sm |
136 | PO | RS | RA | RB | sm | XO | sthusx RS,RA,RB,sm |
137 | PO | RS | RA | RB | sm | XO | stwsx RS,RA,RB,sm |
138 | PO | RS | RA | RB | sm | XO | stwusx RS,RA,RB,sm |
139 | PO | RS | RA | RB | sm | XO | stdsx RS,RA,RB,sm |
140 | PO | RS | RA | RB | sm | XO | stdusx RS,RA,RB,sm |
141 | PO | RS | RA | RB | sm | XO | sthbrsx RS,RA,RB,sm |
142 | PO | RS | RA | RB | sm | XO | stwbrsx RS,RA,RB,sm |
143 | PO | RS | RA | RB | sm | XO | stdbrsx RS,RA,RB,sm |
144 | PO | FRT | RA | RB | sm | XO | lfsxs FRT,RA,RB,sm |
145 | PO | FRT | RA | RB | sm | XO | lfsuxs FRT,RA,RB,sm |
146 | PO | FRT | RA | RB | sm | XO | lfdxs FRT,RA,RB,sm |
147 | PO | FRT | RA | RB | sm | XO | lfduxs FRT,RA,RB,sm |
148 | PO | FRT | RA | RB | sm | XO | lfiwaxs FRT,RA,RB,sm |
149 | PO | FRT | RA | RB | sm | XO | lfiwzxs FRT,RA,RB,sm |
150 | PO | FRS | RA | RB | sm | XO | stfsxs FRS,RA,RB,sm |
151 | PO | FRS | RA | RB | sm | XO | stfsuxs FRS,RA,RB,sm |
152 | PO | FRS | RA | RB | sm | XO | stfdxs FRS,RA,RB,sm |
153 | PO | FRS | RA | RB | sm | XO | stfduxs FRS,RA,RB,sm |
154 | PO | FRS | RA | RB | sm | XO | stfiwxs FRS,RA,RB,sm |
155
156 ----------------
157
158 \newpage{}
159
160 # Shift-and-Add
161
162 `shadd RT, RA, RB`
163
164 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
165 |-------|------|-------|-------|-------|-------|----|----------|
166 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
167
168 Pseudocode:
169
170 ```
171 shift <- sm + 1 # Shift is between 1-4
172 sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
173 RT <- sum # Result stored in RT
174 ```
175
176 When `sm` is zero, the contents of register RB are multiplied by 2,
177 added to the contents of register RA, and the result stored in RT.
178
179 `sm` is a 2-bit bitfield, and allows multiplication of RB by 2, 4, 8, 16.
180
181 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
182
183 **NEED EXAMPLES (not sure how to embedd sm)!!!**
184 Examples:
185
186 ```
187 # adds r1 to (r2*8)
188 shadd r4, r1, r2, 3
189 ```
190
191 # Shift-and-Add Unsigned Word
192
193 `shadd RT, RA, RB`
194
195 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
196 |-------|------|-------|-------|-------|-------|----|----------|
197 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
198
199 Pseudocode:
200
201 ```
202 shift <- sm + 1 # Shift is between 1-4
203 n <- (RB)[32:63] # Only use lower 32-bits of RB
204 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
205 RT <- sum # Result stored in RT
206 ```
207
208 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
209 added to the contents of register RA, and the result stored in RT.
210
211 `sm` is a 2-bit bitfield, and allows multiplication of RB by 2, 4, 8, 16.
212
213 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
214
215 *Programmer's Note:
216 The advantage of this instruction is doing address offsets. RA is the base 64-bit
217 address. RB is the offset into data structure limited to 32-bit.*
218
219 Examples:
220
221 ```
222 #
223 shadduw r4, r1, r2
224 ```
225
226
227 [[!tag opf_rfc]]
228
229 # Appendices
230
231 Appendix E Power ISA sorted by opcode
232 Appendix F Power ISA sorted by version
233 Appendix G Power ISA sorted by Compliancy Subset
234 Appendix H Power ISA sorted by mnemonic
235
236 | Form | Book | Page | Version | mnemonic | Description |
237 |------|------|------|---------|----------|-------------|
238 | Z23 | I | # | 3.0B | shadd | Shift-and-Add |
239 | Z23 | I | # | 3.0B | shadduw | Shift-and-Add Unsigned Word |
240