openpower/sv/rfc/ls003.mdwn

   1 # RFC ls003 Big Integer
   2
   3 **URLs**:
   4
   5 * <https://libre-soc.org/openpower/sv/biginteger/analysis/>
   6 * <https://libre-soc.org/openpower/sv/rfc/ls003/>
   7 * <https://bugs.libre-soc.org/show_bug.cgi?id=960>
   8 * <https://git.openpower.foundation/isa/PowerISA/issues/91>
   9
  10 **Severity**: Major
  11
  12 **Status**: New
  13
  14 **Date**: 20 Oct 2022
  15
  16 **Target**: v3.2B
  17
  18 **Source**: v3.0B
  19
  20 **Books and Section affected**: **UPDATE**
  21
  22 ```
  23     Book I 64-bit Fixed-Point Arithmetic Instructions 3.3.9.1
  24     Appendix E Power ISA sorted by opcode
  25     Appendix F Power ISA sorted by version
  26     Appendix G Power ISA sorted by Compliancy Subset
  27     Appendix H Power ISA sorted by mnemonic
  28 ```
  29
  30 **Summary**
  31
  32 Instructions added
  33
  34 ```
  35     maddedu - Multiply-Add Extended Double Unsigned
  36     divmod2du - Divide/Modulo Quad-Double Unsigned
  37     dsld - Double Shift Left Doubleword
  38     dsrd - Double Shift Right Doubleword
  39 ```
  40
  41 **Submitter**: Luke Leighton (Libre-SOC)
  42
  43 **Requester**: Libre-SOC
  44
  45 **Impact on processor**:
  46
  47 ```
  48     Addition of two new GPR-based instructions
  49 ```
  50
  51 **Impact on software**:
  52
  53 ```
  54     Requires support for new instructions in assembler, debuggers,
  55     and related tools.
  56 ```
  57
  58 **Keywords**:
  59
  60 ```
  61     GPR, Big-integer, Double-word
  62 ```
  63
  64 **Motivation**
  65
  66 * Similar to `maddhdu` and `maddld`, but allow for a big-integer rolling
  67   accumulation affect: `RC` effectively becomes a 64-bit carry in chains
  68   of highly-efficient loop-unrolled arbitrary-length big-integer operations.
  69 * Similar to `divdeu`, and has similar advantages to `maddedu`,
  70   Modulo result is available with the quotient in a single instruction
  71   allowing highly-efficient arbitrary-length big-integer division.
  72
  73 **Notes and Observations**:
  74
  75 1. It is not practical to add Rc=1 variants as VA-Form is used and
  76    there is a **pair** of results produced.
  77 2. An overflow variant (XER.OV set) of `divmod2du` would be valuable
  78    but VA-Form EXT004 is under severe pressure.
  79 3. Both `maddhdu` and `divmod2du` instructions have been present in Intel x86
  80    for several decades.  Likewise, variants of `dsld` and `dsrd`.
  81 4. None of these instruction is present in VSX: these are 128/64 whereas
  82    VSX is 128/128.
  83 5. `maddedu` and `divmod2du` are full inverses of each other, including
  84   when used for arbitrary-length big-integer arithmetic.
  85 6. These are all 3-in 2-out instructions. If Power ISA did not already
  86   have LD/ST-with-update instructions and instructions with `RAp`
  87   and `RTp` then these instructions would not be proposed.
  88
  89 **Changes**
  90
  91 Add the following entries to:
  92
  93 * the Appendices of Book I
  94 * Instructions of Book I added to Section 3.3.9.1
  95 * VA2-Form of Book I Section 1.6.21.1 and 1.6.2
  96
  97 ----------------
  98
  99 \newpage{}
 100
 101 # Multiply-Add Extended Double Unsigned
 102
 103 `maddedu RT, RA, RB, RC`
 104
 105 |  0-5  | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form    |
 106 |-------|------|-------|-------|-------|-------|---------|
 107 | EXT04 | RT   |  RA   |  RB   |   RC  |  XO   | VA-Form |
 108
 109 Pseudocode:
 110
 111 ```
 112 prod[0:127] <- (RA) * (RB)    # Multiply RA and RB, result 128-bit
 113 sum[0:127] <- EXTZ(RC) + prod # Zero extend RC, add product
 114 RT <- sum[64:127]             # Store low half in RT
 115 RS <- sum[0:63]               # RS implicit register, equal to RC
 116 ```
 117
 118 Special registers altered:
 119
 120     None
 121
 122 The 64-bit operands are (RA), (RB), and (RC).
 123 RC is zero-extended (not shifted, not sign-extended).
 124 The 128-bit product of the operands (RA) and (RB) is added to (RC).
 125 The low-order 64 bits of the 128-bit sum are
 126 placed into register RT.
 127 The high-order 64 bits of the 128-bit sum are
 128 placed into register RS.
 129 RS is implictly defined as the same register as RC.
 130
 131 All three operands and the result are interpreted as
 132 unsigned integers.
 133
 134 The differences here to `maddhdu` are that `maddhdu` stores the upper
 135 half in RT, where `maddedu` stores the upper half in RS.
 136
 137 The value stored in RT is exactly equivalent to `maddld` despite `maddld`
 138 performing sign-extension on RC, because RT is the full mathematical result
 139 modulo 2^64 and sign/zero extension from 64 to 128 bits produces identical
 140 results modulo 2^64. This is why there is no maddldu instruction.
 141
 142 *Programmer's Note:
 143 To achieve a big-integer rolling-accumulation effect:
 144 assuming the scalar to multiply is in r0, and r3 is
 145 used (effectively) as a 64-bit carry,
 146 the vector to multiply by starts at r4 and the result vector
 147 in r20, instructions may be issued `maddedu r20,r4,r0,r3
 148 maddedu r21,r5,r0,r3` etc. where the first `maddedu` will have
 149 stored the upper half of the 128-bit multiply into r3, such
 150 that it may be picked up by the second `maddedu`. Repeat inline
 151 to construct a larger bigint scalar-vector multiply,
 152 as Scalar GPR register file space permits. If register
 153 spill is required then r3, as the effective 64-bit carry,
 154 continues the chain.*
 155
 156 Examples:
 157
 158 ```
 159 # (r0 * r1) + r2, store lower in r4, upper in r2
 160 maddedu r4, r0, r1, r2
 161
 162 # Chaining together for larger bigint (see Programmer's Note above)
 163 # r3 starts with zero (no carry-in)
 164 maddedu r20,r4,r0,r3
 165 maddedu r21,r5,r0,r3
 166 maddedu r22,r6,r0,r3
 167 ```
 168
 169 ----------
 170
 171 \newpage{}
 172
 173 # Divide/Modulo Quad-Double Unsigned
 174
 175 **Should name be Divide/Module Double Extended Unsigned?**
 176 **Check the pseudo-code comments**
 177
 178 `divmod2du RT,RA,RB,RC`
 179
 180 |  0-5  | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form    |
 181 |-------|------|-------|-------|-------|-------|---------|
 182 | EXT04 | RT   |  RA   |  RB   |   RC  |  XO   | VA-Form |
 183
 184 Pseudo-code:
 185
 186 ```
 187 if ((RA) <u (RB)) & ((RB) != [0]*64) then  # Check RA<RB, for divide-by-0
 188     dividend[0:127] <- (RA) || (RC)        # Combine RA/RC as 128-bit
 189     divisor[0:127] <- [0]*64 || (RB)       # Extend RB to 128-bit
 190     result <- dividend / divisor           # Unsigned Division
 191     modulo <- dividend % divisor           # Unsigned Modulo
 192     RT <- result[64:127]                   # Store result in RT
 193     RS <- modulo[64:127]                   # Modulo in RC, implicit
 194 else                                       # In case of error
 195     RT <- [1]*64                           # RT all 1's
 196     RS <- [0]*64                           # RS all 0's
 197 ```
 198
 199 Special registers altered:
 200
 201     None
 202
 203 The 128-bit dividend is (RA) || (RC). The 64-bit divisor is
 204 (RB). If the quotient can be represented in 64 bits, it is
 205 placed into register RT. The modulo is placed into register RS.
 206 RS is implictly defined as the same register as RC, similarly to maddedu.
 207
 208 The quotient can be represented in 64-bits when both these conditions
 209 are true:
 210
 211 * (RA) < (RB) (unsigned comparison)
 212 * (RB) is NOT 0 (not divide-by-0)
 213
 214 If these conditions are not met, RT is set to all 1's, RS to all 0's.
 215
 216 All operands, quotient, and modulo are interpreted as unsigned integers.
 217
 218 Divide/Modulo Quad-Double Unsigned is a VA-Form instruction
 219 that is near-identical to `divdeu` except that:
 220
 221 * the lower 64 bits of the dividend, instead of being zero, contain a
 222   register, RC.
 223 * it performs a fused divide and modulo in a single instruction, storing
 224   the modulo in an implicit RS (similar to `maddedu`)
 225 * There is no `UNDEFINED` behaviour.
 226
 227 RB, the divisor, remains 64 bit.  The instruction is therefore a 128/64
 228 division, producing a (pair) of 64 bit result(s), in the same way that
 229 Intel [divq](https://www.felixcloutier.com/x86/div) works.
 230 Overflow conditions
 231 are detected in exactly the same fashion as `divdeu`, except that rather
 232 than have `UNDEFINED` behaviour, RT is set to all ones and RS set to all
 233 zeros on overflow.
 234
 235 *Programmer's note: there are no Rc variants of any of these VA-Form
 236 instructions. `cmpi` will need to be used to detect overflow conditions:
 237 the saving in instruction count is that both RT and RS will have already
 238 been set to useful values (all 1s and all zeros respectively)
 239 needed as part of implementing Knuth's Algorithm D*
 240
 241 For Scalar usage, just as for `maddedu`, `RS=RC`
 242 Examples:
 243
 244 ```
 245 # ((r0 << 64) + r2) / r1, store in r4
 246 # ((r0 << 64) + r2) % r1, store in r2
 247 divmod2du r4, r0, r1, r2
 248 ```
 249
 250 # VA2-Form
 251
 252 Add the following to Book I, 1.6.21.1, VA2-Form
 253
 254 ```
 255 |0      |6     |11     |16     |21  |24|26  |31  |
 256 | PO    |  RT  |   RA  |   RB  | RC    | XO | Rc |
 257 ```
 258
 259 Add `RA` to `XO` Field in Book I, 1.6.2
 260
 261 ```
 262 RA (11:15)
 263     Field used to specify a GPR to be used as a
 264     source or as a target.
 265     Formats: A, BM2, D, DQ, DQE, DS, M, MD, MDS, TX, VA, VA2,
 266              VX, X, XO, XS, SVL, XB, TLI, Z23
 267 ```
 268
 269 *TODO* other fields `RT, RB, RC, XO, and Rc`, see
 270 <https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/fields.text;hb=HEAD>
 271
 272 # Appendices
 273
 274     Appendix E Power ISA sorted by opcode
 275     Appendix F Power ISA sorted by version
 276     Appendix G Power ISA sorted by Compliancy Subset
 277     Appendix H Power ISA sorted by mnemonic
 278
 279 | Form | Book | Page | Version | mnemonic | Description |
 280 |------|------|------|---------|----------|-------------|
 281 | VA   | I    | #    | 3.0B    | maddedu  | Multiply-Add Extend Double Unsigned |
 282 | VA   | I    | #    | 3.0B    | divmod2du | Divide/Modulo Quad-Double Unsigned |
 283
 284 ----------------
 285
 286 [[!tag opf_rfc]]