fe8c1513e2fb331c4f27c68e9e8a0bf061df057f
[libreriscv.git] / openpower / sv / rfc / ls004.mdwn
1 # RFC ls004 Shift-And-Add
2
3 **URLs**:
4
5 * <https://libre-soc.org/openpower/sv/biginteger/analysis/>
6 * <https://libre-soc.org/openpower/sv/rfc/ls004/>
7 * bigint: <https://bugs.libre-soc.org/show_bug.cgi?id=960> TODO: maybe remove this link due to confusion and irrelevance?
8 * <https://git.openpower.foundation/isa/PowerISA/issues/91>
9 * shift-and-add <https://bugs.libre-soc.org/show_bug.cgi?id=968>
10 * add shaddw: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
11
12 **Severity**: Major
13
14 **Status**: New
15
16 **Date**: 31 Oct 2022
17
18 **Target**: v3.2B
19
20 **Source**: v3.0B
21
22 **Books and Section affected**:
23
24 ```
25 Book I Fixed-Point Shift Instructions 3.3.14.2
26 Appendix E Power ISA sorted by opcode
27 Appendix F Power ISA sorted by version
28 Appendix G Power ISA sorted by Compliancy Subset
29 Appendix H Power ISA sorted by mnemonic
30 ```
31
32 **Summary**
33
34 ```
35 Instructions added
36 shadd - Shift and Add
37 shadduw - Shift and Add Unsigned Word
38 ```
39
40 **Submitter**: Luke Leighton (Libre-SOC)
41
42 **Requester**: Libre-SOC
43
44 **Impact on processor**:
45
46 ```
47 Addition of two new GPR-based instructions
48 ```
49
50 **Impact on software**:
51
52 ```
53 Requires support for new instructions in assembler, debuggers,
54 and related tools.
55 ```
56
57 **Keywords**:
58
59 ```
60 GPR, Big-manip, Shift, Arithmetic
61 ```
62
63 **Motivation**
64
65 Power ISA is missing LD/ST Indexed with shift, which is present in both ARM
66 and x86. Adding more LD/ST is thirty eight instructions, a compromise is to
67 add shift-and-add. Replaces a pair of explicit instructions in hot-loops.
68
69 **Notes and Observations**:
70
71 1. `shadd` and `shadduw` operate on unsigned integers.
72 2. `shadduw` is intended for performing address offsets,
73 as the second operand is constrained to lower 32-bits
74 and zero-extended.
75 3. Both are 2-in 1-out instructions.
76
77 TODO: signed 32-bit shift-and-add should be added, this needs to be addressed
78 before submitting the RFC: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
79
80 **Changes**
81
82 Add the following entries to:
83
84 * the Appendices of Book I
85 * Instructions of Book I added to Section 3.3.14.2
86
87 ----------------
88
89 \newpage{}
90
91 # Table of LD/ST-Indexed-Shift
92
93 The following demonstrates the alternative instructions that could
94 be considered to be added. They are all 9-bit XO which is not hugely
95 costly. The totals are
96
97 * 12 Load Indexed Shifted (with Update)
98 * 3 Load Indexed Shifted Byte-reverse
99 * 8 Store Indexed Shifted (with Update)
100 * 3 Store Indexed Shifted Byte-reverse
101 * 6 Floating-Point Load Indexed Shifted (with Update)
102 * 6 Floating-Point Store Indexed Shifted (with Update)
103
104 Total count: 38 new 9-bit XO instructions, for an approximate total
105 XO cost of 3 bits within a single Primary Opcode. With the savings
106 that these instructions represent in hot-loops, as evidenced by their
107 inclusion in top-end ISAs such as x86 and ARM, the cost may be considered
108 justifiable. However there is no point in placing these in EXT2xx, they
109 need to be in EXT0xx, because if added as 64-bit Encoding the benefit
110 reduction in binary size is not achieved.
111
112 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
113 |-------|------|-------|-------|-------|-------|----------|
114 | PO | RT | RA | RB | sm | XO | lbzsx RT,RA,RB,sm |
115 | PO | RT | RA | RB | sm | XO | lbzusx RT,RA,RB,sm |
116 | PO | RT | RA | RB | sm | XO | lhzsx RT,RA,RB,sm |
117 | PO | RT | RA | RB | sm | XO | lhzusx RT,RA,RB,sm |
118 | PO | RT | RA | RB | sm | XO | lhasx RT,RA,RB,sm |
119 | PO | RT | RA | RB | sm | XO | lhausx RT,RA,RB,sm |
120 | PO | RT | RA | RB | sm | XO | lwzsx RT,RA,RB,sm |
121 | PO | RT | RA | RB | sm | XO | lwzusx RT,RA,RB,sm |
122 | PO | RT | RA | RB | sm | XO | lwasx RT,RA,RB,sm |
123 | PO | RT | RA | RB | sm | XO | lwausx RT,RA,RB,sm |
124 | PO | RT | RA | RB | sm | XO | ldsx RT,RA,RB,sm |
125 | PO | RT | RA | RB | sm | XO | ldusx RT,RA,RB,sm |
126 | PO | RT | RA | RB | sm | XO | lhbrsx RT,RA,RB,sm |
127 | PO | RT | RA | RB | sm | XO | lwbrsx RT,RA,RB,sm |
128 | PO | RT | RA | RB | sm | XO | ldbrsx RT,RA,RB,sm |
129 | PO | RS | RA | RB | sm | XO | stbus RS,RA,RB,sm |
130 | PO | RS | RA | RB | sm | XO | stbusx RS,RA,RB,sm |
131 | PO | RS | RA | RB | sm | XO | sthsx RS,RA,RB,sm |
132 | PO | RS | RA | RB | sm | XO | sthusx RS,RA,RB,sm |
133 | PO | RS | RA | RB | sm | XO | stwsx RS,RA,RB,sm |
134 | PO | RS | RA | RB | sm | XO | stwusx RS,RA,RB,sm |
135 | PO | RS | RA | RB | sm | XO | stdsx RS,RA,RB,sm |
136 | PO | RS | RA | RB | sm | XO | stdusx RS,RA,RB,sm |
137 | PO | RS | RA | RB | sm | XO | sthbrsx RS,RA,RB,sm |
138 | PO | RS | RA | RB | sm | XO | stwbrsx RS,RA,RB,sm |
139 | PO | RS | RA | RB | sm | XO | stdbrsx RS,RA,RB,sm |
140 | PO | FRT | RA | RB | sm | XO | lfsxs FRT,RA,RB,sm |
141 | PO | FRT | RA | RB | sm | XO | lfsuxs FRT,RA,RB,sm |
142 | PO | FRT | RA | RB | sm | XO | lfdxs FRT,RA,RB,sm |
143 | PO | FRT | RA | RB | sm | XO | lfduxs FRT,RA,RB,sm |
144 | PO | FRT | RA | RB | sm | XO | lfiwaxs FRT,RA,RB,sm |
145 | PO | FRT | RA | RB | sm | XO | lfiwzxs FRT,RA,RB,sm |
146 | PO | FRS | RA | RB | sm | XO | stfsxs FRS,RA,RB,sm |
147 | PO | FRS | RA | RB | sm | XO | stfsuxs FRS,RA,RB,sm |
148 | PO | FRS | RA | RB | sm | XO | stfdxs FRS,RA,RB,sm |
149 | PO | FRS | RA | RB | sm | XO | stfduxs FRS,RA,RB,sm |
150 | PO | FRS | RA | RB | sm | XO | stfiwxs FRS,RA,RB,sm |
151
152 ----------------
153
154 \newpage{}
155
156 # Shift-and-Add
157
158 `shadd RT, RA, RB`
159
160 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
161 |-------|------|-------|-------|-------|-------|----|----------|
162 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
163
164 Pseudocode:
165
166 ```
167 shift <- sm + 1 # Shift is between 1-4
168 sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
169 RT <- sum # Result stored in RT
170 ```
171
172 When `sm` is zero, the contents of register RB are multiplied by 2,
173 added to the contents of register RA, and the result stored in RT.
174
175 `sm` is a 2-bit bitfield, and allows multiplication of RB by 2, 4, 8, 16.
176
177 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
178
179 **NEED EXAMPLES (not sure how to embedd sm)!!!**
180 Examples:
181
182 ```
183 # adds r1 to (r2*8)
184 shadd r4, r1, r2, 3
185 ```
186
187 # Shift-and-Add Unsigned Word
188
189 `shadd RT, RA, RB`
190
191 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
192 |-------|------|-------|-------|-------|-------|----|----------|
193 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
194
195 Pseudocode:
196
197 ```
198 shift <- sm + 1 # Shift is between 1-4
199 n <- (RB)[32:63] # Only use lower 32-bits of RB
200 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
201 RT <- sum # Result stored in RT
202 ```
203
204 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
205 added to the contents of register RA, and the result stored in RT.
206
207 `sm` is a 2-bit bitfield, and allows multiplication of RB by 2, 4, 8, 16.
208
209 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
210
211 *Programmer's Note:
212 The advantage of this instruction is doing address offsets. RA is the base 64-bit
213 address. RB is the offset into data structure limited to 32-bit.*
214
215 Examples:
216
217 ```
218 #
219 shadduw r4, r1, r2
220 ```
221
222
223 [[!tag opf_rfc]]
224
225 # Appendices
226
227 Appendix E Power ISA sorted by opcode
228 Appendix F Power ISA sorted by version
229 Appendix G Power ISA sorted by Compliancy Subset
230 Appendix H Power ISA sorted by mnemonic
231
232 | Form | Book | Page | Version | mnemonic | Description |
233 |------|------|------|---------|----------|-------------|
234 | Z23 | I | # | 3.0B | shadd | Shift-and-Add |
235 | Z23 | I | # | 3.0B | shadduw | Shift-and-Add Unsigned Word |
236