1 # RFC ls003 Big Integer
5 * <https://libre-soc.org/openpower/sv/biginteger/analysis/>
6 * <https://libre-soc.org/openpower/sv/rfc/ls003/>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=960>
8 * <https://git.openpower.foundation/isa/PowerISA/issues/91>
20 **Books and Section affected**: **UPDATE**
23 Book I 64-bit Fixed-Point Arithmetic Instructions 3.3.9.1
24 Appendix E Power ISA sorted by opcode
25 Appendix F Power ISA sorted by version
26 Appendix G Power ISA sorted by Compliancy Subset
27 Appendix H Power ISA sorted by mnemonic
35 maddedu - Multiply-Add Extended Double Unsigned
36 divmod2du - Divide/Modulo Quad-Double Unsigned
37 dsld - Double Shift Left Doubleword
38 dsrd - Double Shift Right Doubleword
41 **Submitter**: Luke Leighton (Libre-SOC)
43 **Requester**: Libre-SOC
45 **Impact on processor**:
48 Addition of two new GPR-based instructions
51 **Impact on software**:
54 Requires support for new instructions in assembler, debuggers,
61 GPR, Big-integer, Double-word
66 * Similar to `maddhdu` and `maddld`, but allow for a big-integer rolling
67 accumulation affect: `RC` effectively becomes a 64-bit carry in chains
68 of highly-efficient loop-unrolled arbitrary-length big-integer operations.
69 * Similar to `divdeu`, and has similar advantages to `maddedu`,
70 Modulo result is available with the quotient in a single instruction
71 allowing highly-efficient arbitrary-length big-integer division.
73 **Notes and Observations**:
75 1. It is not practical to add Rc=1 variants as VA-Form is used and
76 there is a **pair** of results produced.
77 2. An overflow variant (XER.OV set) of `divmod2du` would be valuable
78 but VA-Form EXT004 is under severe pressure.
79 3. Both `maddhdu` and `divmod2du` instructions have been present in Intel x86
80 for several decades. Likewise, variants of `dsld` and `dsrd`.
81 4. None of these instruction is present in VSX: these are 128/64 whereas
83 5. `maddedu` and `divmod2du` are full inverses of each other, including
84 when used for arbitrary-length big-integer arithmetic.
85 6. These are all 3-in 2-out instructions. If Power ISA did not already
86 have LD/ST-with-update instructions and instructions with `RAp`
87 and `RTp` then these instructions would not be proposed.
91 Add the following entries to:
93 * the Appendices of Book I
94 * Instructions of Book I added to Section 3.3.9.1
95 * VA2-Form of Book I Section 1.6.21.1 and 1.6.2
102 # Multiply-Add Extended Double Unsigned
104 `maddedu RT, RA, RB, RC`
106 | 0-5 | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form |
107 |-------|------|-------|-------|-------|-------|---------|
108 | EXT04 | RT | RA | RB | RC | XO | VA-Form |
113 prod[0:127] <- (RA) * (RB) # Multiply RA and RB, result 128-bit
114 sum[0:127] <- EXTZ(RC) + prod # Zero extend RC, add product
115 RT <- sum[64:127] # Store low half in RT
116 RS <- sum[0:63] # RS implicit register, equal to RC
119 Special registers altered:
123 The 64-bit operands are (RA), (RB), and (RC).
124 RC is zero-extended (not shifted, not sign-extended).
125 The 128-bit product of the operands (RA) and (RB) is added to (RC).
126 The low-order 64 bits of the 128-bit sum are
127 placed into register RT.
128 The high-order 64 bits of the 128-bit sum are
129 placed into register RS.
130 RS is implictly defined as the same register as RC.
132 All three operands and the result are interpreted as
135 The differences here to `maddhdu` are that `maddhdu` stores the upper
136 half in RT, where `maddedu` stores the upper half in RS.
138 The value stored in RT is exactly equivalent to `maddld` despite `maddld`
139 performing sign-extension on RC, because RT is the full mathematical result
140 modulo 2^64 and sign/zero extension from 64 to 128 bits produces identical
141 results modulo 2^64. This is why there is no maddldu instruction.
144 To achieve a big-integer rolling-accumulation effect:
145 assuming the scalar to multiply is in r0, and r3 is
146 used (effectively) as a 64-bit carry,
147 the vector to multiply by starts at r4 and the result vector
148 in r20, instructions may be issued `maddedu r20,r4,r0,r3
149 maddedu r21,r5,r0,r3` etc. where the first `maddedu` will have
150 stored the upper half of the 128-bit multiply into r3, such
151 that it may be picked up by the second `maddedu`. Repeat inline
152 to construct a larger bigint scalar-vector multiply,
153 as Scalar GPR register file space permits. If register
154 spill is required then r3, as the effective 64-bit carry,
155 continues the chain.*
160 # (r0 * r1) + r2, store lower in r4, upper in r2
161 maddedu r4, r0, r1, r2
163 # Chaining together for larger bigint (see Programmer's Note above)
164 # r3 starts with zero (no carry-in)
174 # Divide/Modulo Quad-Double Unsigned
176 **Should name be Divide/Module Double Extended Unsigned?**
177 **Check the pseudo-code comments**
179 `divmod2du RT,RA,RB,RC`
181 | 0-5 | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form |
182 |-------|------|-------|-------|-------|-------|---------|
183 | EXT04 | RT | RA | RB | RC | XO | VA-Form |
188 if ((RA) <u (RB)) & ((RB) != [0]*64) then # Check RA<RB, for divide-by-0
189 dividend[0:127] <- (RA) || (RC) # Combine RA/RC as 128-bit
190 divisor[0:127] <- [0]*64 || (RB) # Extend RB to 128-bit
191 result <- dividend / divisor # Unsigned Division
192 modulo <- dividend % divisor # Unsigned Modulo
193 RT <- result[64:127] # Store result in RT
194 RS <- modulo[64:127] # Modulo in RC, implicit
195 else # In case of error
196 RT <- [1]*64 # RT all 1's
197 RS <- [0]*64 # RS all 0's
200 Special registers altered:
204 The 128-bit dividend is (RA) || (RC). The 64-bit divisor is
205 (RB). If the quotient can be represented in 64 bits, it is
206 placed into register RT. The modulo is placed into register RS.
207 RS is implictly defined as the same register as RC, similarly to maddedu.
209 The quotient can be represented in 64-bits when both these conditions
212 * (RA) < (RB) (unsigned comparison)
213 * (RB) is NOT 0 (not divide-by-0)
215 If these conditions are not met, RT is set to all 1's, RS to all 0's.
217 All operands, quotient, and modulo are interpreted as unsigned integers.
219 Divide/Modulo Quad-Double Unsigned is a VA-Form instruction
220 that is near-identical to `divdeu` except that:
222 * the lower 64 bits of the dividend, instead of being zero, contain a
224 * it performs a fused divide and modulo in a single instruction, storing
225 the modulo in an implicit RS (similar to `maddedu`)
226 * There is no `UNDEFINED` behaviour.
228 RB, the divisor, remains 64 bit. The instruction is therefore a 128/64
229 division, producing a (pair) of 64 bit result(s), in the same way that
230 Intel [divq](https://www.felixcloutier.com/x86/div) works.
232 are detected in exactly the same fashion as `divdeu`, except that rather
233 than have `UNDEFINED` behaviour, RT is set to all ones and RS set to all
236 *Programmer's note: there are no Rc variants of any of these VA-Form
237 instructions. `cmpi` will need to be used to detect overflow conditions:
238 the saving in instruction count is that both RT and RS will have already
239 been set to useful values (all 1s and all zeros respectively)
240 needed as part of implementing Knuth's Algorithm D*
242 For Scalar usage, just as for `maddedu`, `RS=RC`
246 # ((r0 << 64) + r2) / r1, store in r4
247 # ((r0 << 64) + r2) % r1, store in r2
248 divmod2du r4, r0, r1, r2
253 Add the following to Book I, 1.6.21.1, VA2-Form
256 |0 |6 |11 |16 |21 |24|26 |31 |
257 | PO | RT | RA | RB | RC | XO | Rc |
260 Add `RA` to `XO` Field in Book I, 1.6.2
264 Field used to specify a GPR to be used as a
265 source or as a target.
266 Formats: A, BM2, D, DQ, DQE, DS, M, MD, MDS, TX, VA, VA2,
267 VX, X, XO, XS, SVL, XB, TLI, Z23
270 *TODO* other fields `RT, RB, RC, XO, and Rc`, see
271 <https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/fields.text;hb=HEAD>
275 Appendix E Power ISA sorted by opcode
276 Appendix F Power ISA sorted by version
277 Appendix G Power ISA sorted by Compliancy Subset
278 Appendix H Power ISA sorted by mnemonic
280 | Form | Book | Page | Version | mnemonic | Description |
281 |------|------|------|---------|----------|-------------|
282 | VA | I | # | 3.0B | maddedu | Multiply-Add Extend Double Unsigned |
283 | VA | I | # | 3.0B | divmod2du | Divide/Modulo Quad-Double Unsigned |