openpower/sv/rfc/ls003.mdwn

   1 # RFC ls003 Big Integer
   2
   3 **URLs**:
   4
   5 * <https://libre-soc.org/openpower/sv/biginteger/analysis/>
   6 * <https://libre-soc.org/openpower/sv/rfc/ls003/>
   7 * <https://bugs.libre-soc.org/show_bug.cgi?id=960>
   8 * <https://git.openpower.foundation/isa/PowerISA/issues/91>
   9
  10 **Severity**: Major
  11
  12 **Status**: New
  13
  14 **Date**: 20 Oct 2022
  15
  16 **Target**: v3.2B
  17
  18 **Source**: v3.0B
  19
  20 **Books and Section affected**: **UPDATE**
  21
  22 ```
  23     Book I 64-bit Fixed-Point Arithmetic Instructions 3.3.9.1
  24     Appendix E Power ISA sorted by opcode
  25     Appendix F Power ISA sorted by version
  26     Appendix G Power ISA sorted by Compliancy Subset
  27     Appendix H Power ISA sorted by mnemonic
  28 ```
  29
  30 **Summary**
  31
  32 Instructions added
  33
  34 ```
  35     maddedu - Multiply-Add Extended Double Unsigned
  36     divmod2du - Divide/Modulo Quad-Double Unsigned
  37     dsld - Double Shift Left Doubleword
  38     dsrd - Double Shift Right Doubleword
  39 ```
  40
  41 **Submitter**: Luke Leighton (Libre-SOC)
  42
  43 **Requester**: Libre-SOC
  44
  45 **Impact on processor**:
  46
  47 ```
  48     Addition of two new GPR-based instructions
  49 ```
  50
  51 **Impact on software**:
  52
  53 ```
  54     Requires support for new instructions in assembler, debuggers,
  55     and related tools.
  56 ```
  57
  58 **Keywords**:
  59
  60 ```
  61     GPR, Big-integer, Double-word
  62 ```
  63
  64 **Motivation**
  65
  66 * Similar to `maddhdu` and `maddld`, but allow for a big-integer rolling
  67   accumulation affect: `RC` effectively becomes a 64-bit carry in chains
  68   of highly-efficient loop-unrolled arbitrary-length big-integer operations.
  69 * Similar to `divdeu`, and has similar advantages to `maddedu`,
  70   Modulo result is available with the quotient in a single instruction
  71   allowing highly-efficient arbitrary-length big-integer division.
  72
  73 **Notes and Observations**:
  74
  75 1. It is not practical to add Rc=1 variants as VA-Form is used and
  76    there is a **pair** of results produced.
  77 2. An overflow variant (XER.OV set) of `divmod2du` would be valuable
  78    but VA-Form EXT004 is under severe pressure.
  79 3. Both `maddhdu` and `divmod2du` instructions have been present in Intel x86
  80    for several decades.  Likewise, variants of `dsld` and `dsrd`.
  81 4. None of these instruction is present in VSX: these are 128/64 whereas
  82    VSX is 128/128.
  83 5. `maddedu` and `divmod2du` are full inverses of each other, including
  84   when used for arbitrary-length big-integer arithmetic.
  85 6. These are all 3-in 2-out instructions. If Power ISA did not already
  86   have LD/ST-with-update instructions and instructions with `RAp`
  87   and `RTp` then these instructions would not be proposed.
  88
  89 **Changes**
  90
  91 Add the following entries to:
  92
  93 * the Appendices of Book I
  94 * Instructions of Book I added to Section 3.3.9.1
  95 * VA2-Form of Book I Section 1.6.21.1 and 1.6.2
  96
  97 ----------------
  98 [[!tag opf_rfc]]
  99
 100 \newpage{}
 101
 102 # Multiply-Add Extended Double Unsigned
 103
 104 `maddedu RT, RA, RB, RC`
 105
 106 |  0-5  | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form    |
 107 |-------|------|-------|-------|-------|-------|---------|
 108 | EXT04 | RT   |  RA   |  RB   |   RC  |  XO   | VA-Form |
 109
 110 Pseudocode:
 111
 112 ```
 113 prod[0:127] <- (RA) * (RB)    # Multiply RA and RB, result 128-bit
 114 sum[0:127] <- EXTZ(RC) + prod # Zero extend RC, add product
 115 RT <- sum[64:127]             # Store low half in RT
 116 RS <- sum[0:63]               # RS implicit register, equal to RC
 117 ```
 118
 119 Special registers altered:
 120
 121     None
 122
 123 The 64-bit operands are (RA), (RB), and (RC).
 124 RC is zero-extended (not shifted, not sign-extended).
 125 The 128-bit product of the operands (RA) and (RB) is added to (RC).
 126 The low-order 64 bits of the 128-bit sum are
 127 placed into register RT.
 128 The high-order 64 bits of the 128-bit sum are
 129 placed into register RS.
 130 RS is implictly defined as the same register as RC.
 131
 132 All three operands and the result are interpreted as
 133 unsigned integers.
 134
 135 The differences here to `maddhdu` are that `maddhdu` stores the upper
 136 half in RT, where `maddedu` stores the upper half in RS.
 137
 138 The value stored in RT is exactly equivalent to `maddld` despite `maddld`
 139 performing sign-extension on RC, because RT is the full mathematical result
 140 modulo 2^64 and sign/zero extension from 64 to 128 bits produces identical
 141 results modulo 2^64. This is why there is no maddldu instruction.
 142
 143 *Programmer's Note:
 144 To achieve a big-integer rolling-accumulation effect:
 145 assuming the scalar to multiply is in r0, and r3 is
 146 used (effectively) as a 64-bit carry,
 147 the vector to multiply by starts at r4 and the result vector
 148 in r20, instructions may be issued `maddedu r20,r4,r0,r3
 149 maddedu r21,r5,r0,r3` etc. where the first `maddedu` will have
 150 stored the upper half of the 128-bit multiply into r3, such
 151 that it may be picked up by the second `maddedu`. Repeat inline
 152 to construct a larger bigint scalar-vector multiply,
 153 as Scalar GPR register file space permits. If register
 154 spill is required then r3, as the effective 64-bit carry,
 155 continues the chain.*
 156
 157 Examples:
 158
 159 ```
 160 # (r0 * r1) + r2, store lower in r4, upper in r2
 161 maddedu r4, r0, r1, r2
 162
 163 # Chaining together for larger bigint (see Programmer's Note above)
 164 # r3 starts with zero (no carry-in)
 165 maddedu r20,r4,r0,r3
 166 maddedu r21,r5,r0,r3
 167 maddedu r22,r6,r0,r3
 168 ```
 169
 170 ----------
 171
 172 \newpage{}
 173
 174 # Divide/Modulo Quad-Double Unsigned
 175
 176 **Should name be Divide/Module Double Extended Unsigned?**
 177 **Check the pseudo-code comments**
 178
 179 `divmod2du RT,RA,RB,RC`
 180
 181 |  0-5  | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form    |
 182 |-------|------|-------|-------|-------|-------|---------|
 183 | EXT04 | RT   |  RA   |  RB   |   RC  |  XO   | VA-Form |
 184
 185 Pseudo-code:
 186
 187 ```
 188 if ((RA) <u (RB)) & ((RB) != [0]*64) then  # Check RA<RB, for divide-by-0
 189     dividend[0:127] <- (RA) || (RC)        # Combine RA/RC as 128-bit
 190     divisor[0:127] <- [0]*64 || (RB)       # Extend RB to 128-bit
 191     result <- dividend / divisor           # Unsigned Division
 192     modulo <- dividend % divisor           # Unsigned Modulo
 193     RT <- result[64:127]                   # Store result in RT
 194     RS <- modulo[64:127]                   # Modulo in RC, implicit
 195 else                                       # In case of error
 196     RT <- [1]*64                           # RT all 1's
 197     RS <- [0]*64                           # RS all 0's
 198 ```
 199
 200 Special registers altered:
 201
 202     None
 203
 204 The 128-bit dividend is (RA) || (RC). The 64-bit divisor is
 205 (RB). If the quotient can be represented in 64 bits, it is
 206 placed into register RT. The modulo is placed into register RS.
 207 RS is implictly defined as the same register as RC, similarly to maddedu.
 208
 209 The quotient can be represented in 64-bits when both these conditions
 210 are true:
 211
 212 * (RA) < (RB) (unsigned comparison)
 213 * (RB) is NOT 0 (not divide-by-0)
 214
 215 If these conditions are not met, RT is set to all 1's, RS to all 0's.
 216
 217 All operands, quotient, and modulo are interpreted as unsigned integers.
 218
 219 Divide/Modulo Quad-Double Unsigned is a VA-Form instruction
 220 that is near-identical to `divdeu` except that:
 221
 222 * the lower 64 bits of the dividend, instead of being zero, contain a
 223   register, RC.
 224 * it performs a fused divide and modulo in a single instruction, storing
 225   the modulo in an implicit RS (similar to `maddedu`)
 226 * There is no `UNDEFINED` behaviour.
 227
 228 RB, the divisor, remains 64 bit.  The instruction is therefore a 128/64
 229 division, producing a (pair) of 64 bit result(s), in the same way that
 230 Intel [divq](https://www.felixcloutier.com/x86/div) works.
 231 Overflow conditions
 232 are detected in exactly the same fashion as `divdeu`, except that rather
 233 than have `UNDEFINED` behaviour, RT is set to all ones and RS set to all
 234 zeros on overflow.
 235
 236 *Programmer's note: there are no Rc variants of any of these VA-Form
 237 instructions. `cmpi` will need to be used to detect overflow conditions:
 238 the saving in instruction count is that both RT and RS will have already
 239 been set to useful values (all 1s and all zeros respectively)
 240 needed as part of implementing Knuth's Algorithm D*
 241
 242 For Scalar usage, just as for `maddedu`, `RS=RC`
 243 Examples:
 244
 245 ```
 246 # ((r0 << 64) + r2) / r1, store in r4
 247 # ((r0 << 64) + r2) % r1, store in r2
 248 divmod2du r4, r0, r1, r2
 249 ```
 250
 251 # VA2-Form
 252
 253 Add the following to Book I, 1.6.21.1, VA2-Form
 254
 255 ```
 256 |0      |6     |11     |16     |21  |24|26  |31  |
 257 | PO    |  RT  |   RA  |   RB  | RC    | XO | Rc |
 258 ```
 259
 260 Add `RA` to `XO` Field in Book I, 1.6.2
 261
 262 ```
 263 RA (11:15)
 264     Field used to specify a GPR to be used as a
 265     source or as a target.
 266     Formats: A, BM2, D, DQ, DQE, DS, M, MD, MDS, TX, VA, VA2,
 267              VX, X, XO, XS, SVL, XB, TLI, Z23
 268 ```
 269
 270 *TODO* other fields `RT, RB, RC, XO, and Rc`, see
 271 <https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/fields.text;hb=HEAD>
 272
 273 # Appendices
 274
 275     Appendix E Power ISA sorted by opcode
 276     Appendix F Power ISA sorted by version
 277     Appendix G Power ISA sorted by Compliancy Subset
 278     Appendix H Power ISA sorted by mnemonic
 279
 280 | Form | Book | Page | Version | mnemonic | Description |
 281 |------|------|------|---------|----------|-------------|
 282 | VA   | I    | #    | 3.0B    | maddedu  | Multiply-Add Extend Double Unsigned |
 283 | VA   | I    | #    | 3.0B    | divmod2du | Divide/Modulo Quad-Double Unsigned |
 284