openpower/sv/rfc/ls003.mdwn

   1 # RFC ls003 Big Integer
   2
   3 **URLs**:
   4
   5 * <https://libre-soc.org/openpower/sv/biginteger/analysis/>
   6 * <https://libre-soc.org/openpower/sv/rfc/ls003/>
   7 * <https://bugs.libre-soc.org/show_bug.cgi?id=960>
   8 * <https://git.openpower.foundation/isa/PowerISA/issues/91>
   9
  10 **Severity**: Major
  11
  12 **Status**: New
  13
  14 **Date**: 20 Oct 2022
  15
  16 **Target**: v3.2B
  17
  18 **Source**: v3.0B
  19
  20 **Books and Section affected**: **UPDATE**
  21
  22 ```
  23     Book I 64-bit Fixed-Point Arithmetic Instructions 3.3.9.1
  24     Appendix E Power ISA sorted by opcode
  25     Appendix F Power ISA sorted by version
  26     Appendix G Power ISA sorted by Compliancy Subset
  27     Appendix H Power ISA sorted by mnemonic
  28 ```
  29
  30 **Summary**
  31
  32 ```
  33     Instructions added
  34     maddedu - Multiply-Add Extended Double Unsigned
  35     divmod2du - Divide/Modulo Quad-Double Unsigned
  36 ```
  37
  38 **Submitter**: Luke Leighton (Libre-SOC)
  39
  40 **Requester**: Libre-SOC
  41
  42 **Impact on processor**:
  43
  44 ```
  45     Addition of two new GPR-based instructions
  46 ```
  47
  48 **Impact on software**:
  49
  50 ```
  51     Requires support for new instructions in assembler, debuggers,
  52     and related tools.
  53 ```
  54
  55 **Keywords**:
  56
  57 ```
  58     GPR, Big-integer, Double-word
  59 ```
  60
  61 **Motivation**
  62
  63 Similar to `maddhdu` and `maddld`, but allow for a big-integer rolling
  64 accumulation affect: `RC` effectively becomes a 64-bit carry in chains
  65 of highly efficient loop-unrolled arbitrary-length big-integer operations.
  66 Similar to `divdeu`, and has similar advantages to `maddedu`,
  67 Modulo result is available with the quotient.
  68
  69 **Notes and Observations**:
  70
  71 1. There is no need for an Rc=1 variant as VA-Form is being used.
  72 2. There is no need for Special Registers as VA-Form is being used.
  73 3. Both instructions have been present in Intel x86 for several decades.
  74 4. Neither instruction is present in VSX: these are 128/64 whereas
  75    VSX is 128/128.
  76 5. `maddedu` and `divmod2du` are inverses of each other, including
  77   when used for arbitrary-length big-integer arithmetic
  78
  79 **Changes**
  80
  81 Add the following entries to:
  82
  83 * the Appendices of Book I
  84 * Instructions of Book I added to Section 3.3.9.1
  85
  86 ----------------
  87
  88 \newpage{}
  89
  90 # Multiply-Add Extended Double Unsigned
  91
  92 `maddedu RT, RA, RB, RC`
  93
  94 |  0-5  | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form    |
  95 |-------|------|-------|-------|-------|-------|---------|
  96 | EXT04 | RT   |  RA   |  RB   |   RC  |  XO   | VA-Form |
  97
  98 Pseudocode:
  99
 100 ```
 101     prod[0:127] <- (RA) * (RB)    # Multiply RA and RB, result 128-bit
 102     sum[0:127] <- EXTZ(RC) + prod # Zero extend RC, add product
 103     RT <- sum[64:127]             # Store low half in RT
 104     RS <- sum[0:63]               # RS implicit register, see below
 105 ```
 106
 107 Special registers altered:
 108
 109     None
 110
 111 RC is zero-extended (not shifted, not sign-extended), the 128-bit product added
 112 to it; the lower half of that result stored in RT and the upper half
 113 in RS.
 114
 115 The differences here to `maddhdu` are that `maddhdu` stores the upper
 116 half in RT, where `maddedu` stores the upper half in RS.
 117
 118 The value stored in RT is exactly equivalent to `maddld` despite `maddld`
 119 performing sign-extension on RC, because RT is the full mathematical result
 120 modulo 2^64 and sign/zero extension from 64 to 128 bits produces identical
 121 results modulo 2^64. This is why there is no maddldu instruction.
 122
 123 RS is implictly defined as the register following RT (RS=RT+1).
 124
 125 *Programmer's Note:
 126 As a Scalar Power ISA operation, like `lq` and `stq`, RS=RT+1.
 127 To achieve a big-integer rolling-accumulation effect:
 128 assuming the scalar to multiply is in r0,
 129 the vector to multiply by starts at r4 and the result vector
 130 in r20, instructions may be issued `maddedu r20,r4,r0,r20
 131 maddedu r21,r5,r0,r21` etc. where the first `maddedu` will have
 132 stored the upper half of the 128-bit multiply into r21, such
 133 that it may be picked up by the second `maddedu`. Repeat inline
 134 to construct a larger bigint scalar-vector multiply,
 135 as Scalar GPR register file space permits.*
 136
 137 Examples:
 138
 139 ```
 140     maddedu r4, r0, r1, r2 # ((r0)*(r1))+(r2), store lower in r4, upper in r5
 141 ```
 142
 143 # Divide/Modulo Quad-Double Unsigned
 144
 145 **Should name be Divide/Module Double Extended Unsigned?**
 146 **Check the pseudo-code comments**
 147
 148 `divmod2du RT,RA,RB,RC`
 149
 150 |  0-5  | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form    |
 151 |-------|------|-------|-------|-------|-------|---------|
 152 | EXT04 | RT   |  RA   |  RB   |   RC  |  XO   | VA-Form |
 153
 154 Pseudo-code:
 155
 156     if ((RA) <u (RB)) & ((RB) != [0]*XLEN) then   # Check RA<RB, for divide-by-0
 157         dividend[0:(XLEN*2)-1] <- (RA) || (RC)    # Combine RA/RC, zero extend
 158         divisor[0:(XLEN*2)-1] <- [0]*XLEN || (RB) # Extend to 128-bit
 159         result <- dividend / divisor              # Division
 160         modulo <- dividend % divisor              # Modulo
 161         RT <- result[XLEN:(XLEN*2)-1]             # Store result in RT
 162         RS <- modulo[XLEN:(XLEN*2)-1]             # Modulo in RC, implicit
 163     else                                          # In case of error
 164         RT <- [1]*XLEN                            # RT all 1's
 165         RS <- [0]*XLEN                            # RS all 0's
 166
 167 Special registers altered:
 168
 169     None
 170
 171 Divide/Modulo Quad-Double Unsigned is another VA-Form instruction
 172 that is near-identical to `divdeu` except that:
 173
 174 * the lower 64 bits of the dividend, instead of being zero, contain a
 175   register, RC.
 176 * it performs a fused divide and modulo in a single instruction, storing
 177   the modulo in an implicit RS (similar to `maddedu`)
 178
 179 RB, the divisor, remains 64 bit.  The instruction is therefore a 128/64
 180 division, producing a (pair) of 64 bit result(s), in the same way that
 181 Intel [divq](https://www.felixcloutier.com/x86/div) works.
 182 Overflow conditions
 183 are detected in exactly the same fashion as `divdeu`, except that rather
 184 than have `UNDEFINED` behaviour, RT is set to all ones and RS set to all
 185 zeros on overflow.
 186
 187 *Programmer's note: there are no Rc variants of any of these VA-Form
 188 instructions. `cmpi` will need to be used to detect overflow conditions:
 189 the saving in instruction count is that both RT and RS will have already
 190 been set to useful values (all 1s and all zeros respectively)
 191 needed as part of implementing Knuth's
 192 Algorithm D*
 193
 194 For Scalar usage, just as for `maddedu`, `RS=RT+1` (similar to `lq` and `stq`).
 195
 196 Examples:
 197
 198 ```
 199     divmod2du r4, r0, r1, r2 # ((r0)||(r2)) / (r1), store in r4
 200                              # ((r0)||(r2)) % (r1), store in r5
 201 ```
 202
 203 [[!tag opf_rfc]]
 204
 205 # Appendices
 206
 207     Appendix E Power ISA sorted by opcode
 208     Appendix F Power ISA sorted by version
 209     Appendix G Power ISA sorted by Compliancy Subset
 210     Appendix H Power ISA sorted by mnemonic
 211
 212 | Form | Book | Page | Version | mnemonic | Description |
 213 |------|------|------|---------|----------|-------------|
 214 | VA   | I    | #    | 3.0B    | maddedu  | Multiply-Add Extend Double Unsigned |
 215 | VA   | I    | #    | 3.0B    | divmod2du | Floatif Move | Divide/Modulo Quad-Double Unsigned
 216