1 # RFC ls003 Big Integer
5 * <https://libre-soc.org/openpower/sv/biginteger/analysis/>
6 * <https://libre-soc.org/openpower/sv/rfc/ls003/>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=960>
8 * <https://git.openpower.foundation/isa/PowerISA/issues/91>
20 **Books and Section affected**: **UPDATE**
23 Book I 64-bit Fixed-Point Arithmetic Instructions 3.3.9.1
24 Appendix E Power ISA sorted by opcode
25 Appendix F Power ISA sorted by version
26 Appendix G Power ISA sorted by Compliancy Subset
27 Appendix H Power ISA sorted by mnemonic
34 maddedu - Multiply-Add Extended Double Unsigned
35 divmod2du - Divide/Modulo Quad-Double Unsigned
38 **Submitter**: Luke Leighton (Libre-SOC)
40 **Requester**: Libre-SOC
42 **Impact on processor**:
45 Addition of two new GPR-based instructions
48 **Impact on software**:
51 Requires support for new instructions in assembler, debuggers,
58 GPR, Big-integer, Double-word
63 Similar to `maddhdu` and `maddld`, but allow for a big-integer rolling
64 accumulation affect: `RC` effectively becomes a 64-bit carry in chains
65 of highly-efficient loop-unrolled arbitrary-length big-integer operations.
66 Similar to `divdeu`, and has similar advantages to `maddedu`,
67 Modulo result is available with the quotient in a single instruction
68 allowing highly-efficient arbitrary-length big-integer division.
70 **Notes and Observations**:
72 1. It is not practical to add Rc=1 variants as VA-Form is used and
73 there is a **pair** of results produced.
74 2. An overflow variant (XER.OV set) of `divmod2du` would be valuable
75 but VA-Form EXT004 is under severe pressure.
76 3. Both instructions have been present in Intel x86 for several decades.
77 4. Neither instruction is present in VSX: these are 128/64 whereas
79 5. `maddedu` and `divmod2du` are full inverses of each other, including
80 when used for arbitrary-length big-integer arithmetic
81 6. These are both 3-in 2-out instructions. If Power ISA did not already
82 have LD/ST-with-update instructions and instructions with `RAp`
83 and `RTp` then these instructions would not be proposed.
87 Add the following entries to:
89 * the Appendices of Book I
90 * Instructions of Book I added to Section 3.3.9.1
96 # Multiply-Add Extended Double Unsigned
98 `maddedu RT, RA, RB, RC`
100 | 0-5 | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form |
101 |-------|------|-------|-------|-------|-------|---------|
102 | EXT04 | RT | RA | RB | RC | XO | VA-Form |
107 prod[0:127] <- (RA) * (RB) # Multiply RA and RB, result 128-bit
108 sum[0:127] <- EXTZ(RC) + prod # Zero extend RC, add product
109 RT <- sum[64:127] # Store low half in RT
110 RS <- sum[0:63] # RS implicit register, equal to RC
113 Special registers altered:
117 The 64-bit operands are (RA), (RB), and (RC).
118 RC is zero-extended (not shifted, not sign-extended).
119 The 128-bit product of the operands (RA) and (RB) is added to (RC).
120 The low-order 64 bits of the 128-bit sum are
121 placed into register RT.
122 The high-order 64 bits of the 128-bit sum are
123 placed into register RS.
124 RS is implictly defined as the same register as RC.
126 All three operands and the result are interpreted as
129 The differences here to `maddhdu` are that `maddhdu` stores the upper
130 half in RT, where `maddedu` stores the upper half in RS.
132 The value stored in RT is exactly equivalent to `maddld` despite `maddld`
133 performing sign-extension on RC, because RT is the full mathematical result
134 modulo 2^64 and sign/zero extension from 64 to 128 bits produces identical
135 results modulo 2^64. This is why there is no maddldu instruction.
138 As a Scalar Power ISA operation, like `lq` and `stq`, RS=RT+1.
139 To achieve a big-integer rolling-accumulation effect:
140 assuming the scalar to multiply is in r0,
141 the vector to multiply by starts at r4 and the result vector
142 in r20, instructions may be issued `maddedu r20,r4,r0,r20
143 maddedu r21,r5,r0,r21` etc. where the first `maddedu` will have
144 stored the upper half of the 128-bit multiply into r21, such
145 that it may be picked up by the second `maddedu`. Repeat inline
146 to construct a larger bigint scalar-vector multiply,
147 as Scalar GPR register file space permits.*
152 # (r0 * r1) + r2, store lower in r4, upper in r2
153 maddedu r4, r0, r1, r2
155 # Chaining together for larger bigint (see Programmer's Note above)
156 maddedu r20,r4,r0,r20
157 maddedu r21,r5,r0,r21
164 # Divide/Modulo Quad-Double Unsigned
166 **Should name be Divide/Module Double Extended Unsigned?**
167 **Check the pseudo-code comments**
169 `divmod2du RT,RA,RB,RC`
171 | 0-5 | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form |
172 |-------|------|-------|-------|-------|-------|---------|
173 | EXT04 | RT | RA | RB | RC | XO | VA-Form |
178 if ((RA) <u (RB)) & ((RB) != [0]*XLEN) then # Check RA<RB, for divide-by-0
179 dividend[0:(XLEN*2)-1] <- (RA) || (RC) # Combine RA/RC, zero extend
180 divisor[0:(XLEN*2)-1] <- [0]*XLEN || (RB) # Extend to 128-bit
181 result <- dividend / divisor # Division
182 modulo <- dividend % divisor # Modulo
183 RT <- result[XLEN:(XLEN*2)-1] # Store result in RT
184 RS <- modulo[XLEN:(XLEN*2)-1] # Modulo in RC, implicit
185 else # In case of error
186 RT <- [1]*XLEN # RT all 1's
187 RS <- [0]*XLEN # RS all 0's
190 Special registers altered:
194 The 128-bit dividend is (RA) || (RC). The 64-bit divisor is
195 (RB). If the quotient can be represented in 64 bits, it is
196 placed into register RT. The modulo is placed into register RS.
197 RS is implictly defined as the same register as RC, similarly to maddedu.
199 The instruction is only defined where both conditions are true:
201 * (RA) < (RB) (unsigned comparison)
202 * (RB) is NOT 0 (not divide-by-0)
204 If these conditions are not met, RT is set to all 1's, RS to all 0's.
206 Both operands, quotient, and modulo are interpreted as unsigned integers.
209 Divide/Modulo Quad-Double Unsigned is another VA-Form instruction
210 that is near-identical to `divdeu` except that:
212 * the lower 64 bits of the dividend, instead of being zero, contain a
214 * it performs a fused divide and modulo in a single instruction, storing
215 the modulo in an implicit RS (similar to `maddedu`)
217 RB, the divisor, remains 64 bit. The instruction is therefore a 128/64
218 division, producing a (pair) of 64 bit result(s), in the same way that
219 Intel [divq](https://www.felixcloutier.com/x86/div) works.
221 are detected in exactly the same fashion as `divdeu`, except that rather
222 than have `UNDEFINED` behaviour, RT is set to all ones and RS set to all
225 *Programmer's note: there are no Rc variants of any of these VA-Form
226 instructions. `cmpi` will need to be used to detect overflow conditions:
227 the saving in instruction count is that both RT and RS will have already
228 been set to useful values (all 1s and all zeros respectively)
229 needed as part of implementing Knuth's Algorithm D*
231 For Scalar usage, just as for `maddedu`, `RS=RC`
235 # ((r0 << 64) + r2) / r1, store in r4
236 # ((r0 << 64) + r2) % r1, store in r2
237 divmod2du r4, r0, r1, r2
244 Appendix E Power ISA sorted by opcode
245 Appendix F Power ISA sorted by version
246 Appendix G Power ISA sorted by Compliancy Subset
247 Appendix H Power ISA sorted by mnemonic
249 | Form | Book | Page | Version | mnemonic | Description |
250 |------|------|------|---------|----------|-------------|
251 | VA | I | # | 3.0B | maddedu | Multiply-Add Extend Double Unsigned |
252 | VA | I | # | 3.0B | divmod2du | Divide/Modulo Quad-Double Unsigned |