openpower/sv/rfc/ls003.mdwn

   1 # RFC ls003 Big Integer
   2
   3 **URLs**:
   4
   5 * <https://libre-soc.org/openpower/sv/biginteger/analysis/>
   6 * <https://libre-soc.org/openpower/sv/rfc/ls003/>
   7 * <https://bugs.libre-soc.org/show_bug.cgi?id=960>
   8 * <https://git.openpower.foundation/isa/PowerISA/issues/91>
   9
  10 **Severity**: Major
  11
  12 **Status**: New
  13
  14 **Date**: 20 Oct 2022
  15
  16 **Target**: v3.2B
  17
  18 **Source**: v3.0B
  19
  20 **Books and Section affected**: **UPDATE**
  21
  22 ```
  23     Book I 64-bit Fixed-Point Arithmetic Instructions 3.3.9.1
  24     Appendix E Power ISA sorted by opcode
  25     Appendix F Power ISA sorted by version
  26     Appendix G Power ISA sorted by Compliancy Subset
  27     Appendix H Power ISA sorted by mnemonic
  28 ```
  29
  30 **Summary**
  31
  32 Instructions added
  33
  34 ```
  35     maddedu - Multiply-Add Extended Double Unsigned
  36     divmod2du - Divide/Modulo Quad-Double Unsigned
  37     dsld - Double Shift Left Doubleword
  38     dsrd - Double Shift Right Doubleword
  39 ```
  40
  41 **Submitter**: Luke Leighton (Libre-SOC)
  42
  43 **Requester**: Libre-SOC
  44
  45 **Impact on processor**:
  46
  47 ```
  48     Addition of two new GPR-based instructions
  49 ```
  50
  51 **Impact on software**:
  52
  53 ```
  54     Requires support for new instructions in assembler, debuggers,
  55     and related tools.
  56 ```
  57
  58 **Keywords**:
  59
  60 ```
  61     GPR, Big-integer, Double-word
  62 ```
  63
  64 **Motivation**
  65
  66 * Similar to `maddhdu` and `maddld`, but allow for a big-integer rolling
  67   accumulation affect: `RC` effectively becomes a 64-bit carry in chains
  68   of highly-efficient loop-unrolled arbitrary-length big-integer operations.
  69 * Similar to `divdeu`, and has similar advantages to `maddedu`,
  70   Modulo result is available with the quotient in a single instruction
  71   allowing highly-efficient arbitrary-length big-integer division.
  72
  73 **Notes and Observations**:
  74
  75 TODO: address Jacob's comments: https://libre-soc.org/irclog/%23libre-soc.2022-10-28.log.html#t2022-10-28T18:00:27
  76
  77 1. It is not practical to add Rc=1 variants as VA-Form is used and
  78    there is a **pair** of results produced.
  79 2. An overflow variant (XER.OV set) of `divmod2du` would be valuable
  80    but VA-Form EXT004 is under severe pressure.
  81 3. Both `maddhdu` and `divmod2du` instructions have been present in Intel x86
  82    for several decades.  Likewise, variants of `dsld` and `dsrd`.
  83 4. None of these instruction is present in VSX: these are 128/64 whereas
  84    VSX is 128/128.
  85 5. `maddedu` and `divmod2du` are full inverses of each other, including
  86   when used for arbitrary-length big-integer arithmetic.
  87 6. These are all 3-in 2-out instructions. If Power ISA did not already
  88   have LD/ST-with-update instructions and instructions with `RAp`
  89   and `RTp` then these instructions would not be proposed.
  90
  91 **Changes**
  92
  93 Add the following entries to:
  94
  95 * the Appendices of Book I
  96 * Instructions of Book I added to Section 3.3.9.1
  97 * VA2-Form of Book I Section 1.6.21.1 and 1.6.2
  98
  99 ----------------
 100
 101 \newpage{}
 102
 103 # Multiply-Add Extended Double Unsigned
 104
 105 `maddedu RT, RA, RB, RC`
 106
 107 |  0-5  | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form    |
 108 |-------|------|-------|-------|-------|-------|---------|
 109 | EXT04 | RT   |  RA   |  RB   |   RC  |  XO   | VA-Form |
 110
 111 Pseudocode:
 112
 113 ```
 114 prod[0:127] <- (RA) * (RB)    # Multiply RA and RB, result 128-bit
 115 sum[0:127] <- EXTZ(RC) + prod # Zero extend RC, add product
 116 RT <- sum[64:127]             # Store low half in RT
 117 RS <- sum[0:63]               # RS implicit register, equal to RC
 118 ```
 119
 120 Special registers altered:
 121
 122     None
 123
 124 The 64-bit operands are (RA), (RB), and (RC).
 125 RC is zero-extended (not shifted, not sign-extended).
 126 The 128-bit product of the operands (RA) and (RB) is added to (RC).
 127 The low-order 64 bits of the 128-bit sum are
 128 placed into register RT.
 129 The high-order 64 bits of the 128-bit sum are
 130 placed into register RS.
 131 RS is implictly defined as the same register as RC.
 132
 133 All three operands and the result are interpreted as
 134 unsigned integers.
 135
 136 The differences here to `maddhdu` are that `maddhdu` stores the upper
 137 half in RT, where `maddedu` stores the upper half in RS.
 138
 139 The value stored in RT is exactly equivalent to `maddld` despite `maddld`
 140 performing sign-extension on RC, because RT is the full mathematical result
 141 modulo 2^64 and sign/zero extension from 64 to 128 bits produces identical
 142 results modulo 2^64. This is why there is no maddldu instruction.
 143
 144 *Programmer's Note:
 145 To achieve a big-integer rolling-accumulation effect:
 146 assuming the scalar to multiply is in r0, and r3 is
 147 used (effectively) as a 64-bit carry,
 148 the vector to multiply by starts at r4 and the result vector
 149 in r20, instructions may be issued `maddedu r20,r4,r0,r3
 150 maddedu r21,r5,r0,r3` etc. where the first `maddedu` will have
 151 stored the upper half of the 128-bit multiply into r3, such
 152 that it may be picked up by the second `maddedu`. Repeat inline
 153 to construct a larger bigint scalar-vector multiply,
 154 as Scalar GPR register file space permits. If register
 155 spill is required then r3, as the effective 64-bit carry,
 156 continues the chain.*
 157
 158 Examples:
 159
 160 ```
 161 # (r0 * r1) + r2, store lower in r4, upper in r2
 162 maddedu r4, r0, r1, r2
 163
 164 # Chaining together for larger bigint (see Programmer's Note above)
 165 # r3 starts with zero (no carry-in)
 166 maddedu r20,r4,r0,r3
 167 maddedu r21,r5,r0,r3
 168 maddedu r22,r6,r0,r3
 169 ```
 170
 171 ----------
 172
 173 \newpage{}
 174
 175 # Divide/Modulo Quad-Double Unsigned
 176
 177 **Should name be Divide/Module Double Extended Unsigned?**
 178 **Check the pseudo-code comments**
 179
 180 `divmod2du RT,RA,RB,RC`
 181
 182 |  0-5  | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form    |
 183 |-------|------|-------|-------|-------|-------|---------|
 184 | EXT04 | RT   |  RA   |  RB   |   RC  |  XO   | VA-Form |
 185
 186 Pseudo-code:
 187
 188 ```
 189 if ((RA) <u (RB)) & ((RB) != [0]*64) then  # Check RA<RB, for divide-by-0
 190     dividend[0:127] <- (RA) || (RC)        # Combine RA/RC as 128-bit
 191     divisor[0:127] <- [0]*64 || (RB)       # Extend RB to 128-bit
 192     result <- dividend / divisor           # Unsigned Division
 193     modulo <- dividend % divisor           # Unsigned Modulo
 194     RT <- result[64:127]                   # Store result in RT
 195     RS <- modulo[64:127]                   # Modulo in RC, implicit
 196 else                                       # In case of error
 197     RT <- [1]*64                           # RT all 1's
 198     RS <- [0]*64                           # RS all 0's
 199 ```
 200
 201 Special registers altered:
 202
 203     None
 204
 205 The 128-bit dividend is (RA) || (RC). The 64-bit divisor is
 206 (RB). If the quotient can be represented in 64 bits, it is
 207 placed into register RT. The modulo is placed into register RS.
 208 RS is implictly defined as the same register as RC, similarly to maddedu.
 209
 210 The quotient can be represented in 64-bits when both these conditions
 211 are true:
 212
 213 * (RA) < (RB) (unsigned comparison)
 214 * (RB) is NOT 0 (not divide-by-0)
 215
 216 If these conditions are not met, RT is set to all 1's, RS to all 0's.
 217
 218 All operands, quotient, and modulo are interpreted as unsigned integers.
 219
 220 Divide/Modulo Quad-Double Unsigned is a VA-Form instruction
 221 that is near-identical to `divdeu` except that:
 222
 223 * the lower 64 bits of the dividend, instead of being zero, contain a
 224   register, RC.
 225 * it performs a fused divide and modulo in a single instruction, storing
 226   the modulo in an implicit RS (similar to `maddedu`)
 227 * There is no `UNDEFINED` behaviour.
 228
 229 RB, the divisor, remains 64 bit.  The instruction is therefore a 128/64
 230 division, producing a (pair) of 64 bit result(s), in the same way that
 231 Intel [divq](https://www.felixcloutier.com/x86/div) works.
 232 Overflow conditions
 233 are detected in exactly the same fashion as `divdeu`, except that rather
 234 than have `UNDEFINED` behaviour, RT is set to all ones and RS set to all
 235 zeros on overflow.
 236
 237 *Programmer's note: there are no Rc variants of any of these VA-Form
 238 instructions. `cmpi` will need to be used to detect overflow conditions:
 239 the saving in instruction count is that both RT and RS will have already
 240 been set to useful values (all 1s and all zeros respectively)
 241 needed as part of implementing Knuth's Algorithm D*
 242
 243 For Scalar usage, just as for `maddedu`, `RS=RC`
 244 Examples:
 245
 246 ```
 247 # ((r0 << 64) + r2) / r1, store in r4
 248 # ((r0 << 64) + r2) % r1, store in r2
 249 divmod2du r4, r0, r1, r2
 250 ```
 251
 252 # Dynamic-Shift Left Doubleword
 253
 254 `dsld RT,RA,RB,RC`
 255
 256 |  0-5  | 6-10 | 11-15 | 16-20 | 21-25 | 26-30 | 31 | Form     |
 257 |-------|------|-------|-------|-------|-------|----|----------|
 258 | EXT04 | RT   |  RA   |  RB   |   RC  |  XO   | Rc | VA2-Form |
 259
 260 Pseudo-code:
 261
 262     n <- (RB)[58:63]                        # Take lower 6-bits of RB for shift
 263     v <- ROTL64((RA), n)                    # Rotate RA 64-bit left by n bits
 264     mask <- MASK(64, 63-n)                  # 1's mask, set mask[64-n:63] to 0's
 265     RT <- (v[0:63] & mask) | ((RC) & ¬mask) # Mask out bits
 266     RS <- v[0:63] & ¬mask                   # ?
 267     overflow = 0                            # Clear overflow flag
 268     if RS != [0]*64:                        # Check if RS is NOT zero
 269         overflow = 1                        # Set the overflow flag
 270
 271 Special Registers Altered:
 272
 273     CR0                    (if Rc=1)
 274
 275 **CHECK if overflow flag is the expected behaviour**
 276
 277 The contents of register RA are shifted left the number
 278 of bits specified by (RB) 58:63.
 279 **Please check if this is correct!!! This condition is taken
 280 from PowerISA spec page 253, definition of MASK128(x,y)!!!**
 281 A mask is generated having 0-bits from bit (64-n) through
 282 bit 63 and 1-bits elsewhere.
 283
 284 The rotated data is ANDed with the generated mask, and ORed
 285 with contents of RC ANDed with inverted mask.
 286 The result is placed into register RT.
 287
 288 Additionally, the rotated data is ANDed with inverted mask and
 289 placed into register RS. If value in RS is not all 0's, the
 290 overflow flag is raised.
 291
 292 Similarly maddedu and divmod2du, dsld can be chained (using RC).
 293
 294 # Dynamic-Shift Right Doubleword
 295
 296 `dsrd RT,RA,RB,RC`
 297
 298 |  0-5  | 6-10 | 11-15 | 16-20 | 21-25 | 26-30 | 31 | Form     |
 299 |-------|------|-------|-------|-------|-------|----|----------|
 300 | EXT04 | RT   |  RA   |  RB   |   RC  |  XO   | Rc | VA2-Form |
 301
 302 Pseudo-code:
 303
 304     n <- (RB)[58:63]                        # Take lower 6-bits of RB for shift
 305     v <- ROTL64((RA), 64-n)                 # Rotate RA 64-bit left by 64-n bits
 306     mask <- MASK(n, 63)                     # 0's mask, set mask[n:63] to 1'
 307     RT <- (v[0:63] & mask) | ((RC) & ¬mask) #
 308     RS <- v[0:63] & ¬mask
 309     overflow = 0
 310     if RS != [0]*64:
 311         overflow = 1
 312
 313 Special Registers Altered:
 314
 315     CR0                    (if Rc=1)
 316
 317
 318 \newpage{}
 319
 320 # VA2-Form
 321
 322 Add the following to Book I, 1.6.21.1, VA2-Form
 323
 324 ```
 325 |0      |6     |11     |16     |21  |24|26  |31  |
 326 | PO    |  RT  |   RA  |   RB  | RC    | XO | Rc |
 327 ```
 328
 329 Add `RA` to `XO` Field in Book I, 1.6.2
 330
 331 ```
 332 RA (11:15)
 333     Field used to specify a GPR to be used as a
 334     source or as a target.
 335     Formats: A, BM2, D, DQ, DQE, DS, M, MD, MDS, TX, VA, VA2,
 336              VX, X, XO, XS, SVL, XB, TLI, Z23
 337 ```
 338
 339 *TODO* other fields `RT, RB, RC, XO, and Rc`, see
 340 <https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/fields.text;hb=HEAD>
 341
 342 # Appendices
 343
 344     Appendix E Power ISA sorted by opcode
 345     Appendix F Power ISA sorted by version
 346     Appendix G Power ISA sorted by Compliancy Subset
 347     Appendix H Power ISA sorted by mnemonic
 348
 349 | Form | Book | Page | Version | mnemonic | Description |
 350 |------|------|------|---------|----------|-------------|
 351 | VA   | I    | #    | 3.0B    | maddedu  | Multiply-Add Extend Double Unsigned |
 352 | VA   | I    | #    | 3.0B    | divmod2du | Divide/Modulo Quad-Double Unsigned |
 353
 354 ----------------
 355
 356 [[!tag opf_rfc]]