openpower/sv/rfc/ls003.mdwn

   1 # RFC ls003 Big Integer
   2
   3 **URLs**:
   4
   5 * <https://libre-soc.org/openpower/sv/>
   6 * <https://libre-soc.org/openpower/sv/rfc/ls003/>
   7 * <https://bugs.libre-soc.org/show_bug.cgi?id=944>
   8 * <https://git.openpower.foundation/isa/PowerISA/issues/87>
   9
  10 **Severity**: Major
  11
  12 **Status**: New
  13
  14 **Date**: -- Oct 2022 **(UPDATE)**
  15
  16 **Target**: v3.2B
  17
  18 **Source**: v3.0B
  19
  20 **Books and Section affected**: **UPDATE**
  21
  22 ```
  23     Book I 64-bit Fixed-Point Arithmetic Instructions 3.3.9.1
  24     Appendix E Power ISA sorted by opcode
  25     Appendix F Power ISA sorted by version
  26     Appendix G Power ISA sorted by Compliancy Subset
  27     Appendix H Power ISA sorted by mnemonic
  28 ```
  29
  30 **Summary**
  31
  32 ```
  33     Instructions added
  34     maddedu - Multiply-Add Extended Double Unsigned
  35     divmod2du - Divide/Modulo Quad-Double Unsigned
  36 ```
  37
  38 **Submitter**: Luke Leighton (Libre-SOC)
  39
  40 **Requester**: Libre-SOC
  41
  42 **Impact on processor**:
  43
  44 ```
  45     Addition of two new GPR-based instructions
  46 ```
  47
  48 **Impact on software**:
  49
  50 ```
  51     Requires support for new instructions in assembler, debuggers,
  52     and related tools.
  53 ```
  54
  55 **Keywords**:
  56
  57 ```
  58     GPR, Big-integer, Double-word
  59 ```
  60
  61 **Motivation**
  62
  63 Similar to `maddhdu` and `maddld`, but allow for a big-integer rolling
  64 accumulation affect. As the second result location is implicitly defined as the register after the first result (RS=RT+1), the Scalar Register set can be used
  65 for vector computation.
  66 Similar to `divdeu`, and has similar advantages to `maddedu`. Modulo result is
  67 available with the quotient.
  68
  69 **Notes and Observations**:
  70
  71 1. There is no need for an Rc=1 variant as VA-Form is being used.
  72 2. There is no need for Special Registers as VA-Form is being used.
  73
  74 **Changes**
  75
  76 Add the following entries to:
  77
  78 * the Appendices of Book I
  79 * Instructions of Book I added to Section 3.3.9.1
  80
  81 ----------------
  82
  83 \newpage{}
  84
  85 # Multiply-Add Extended Double Unsigned
  86
  87 `maddedu RT, RA, RB, RC`
  88
  89 |  0-5  | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form    |
  90 |-------|------|-------|-------|-------|-------|---------|
  91 | EXT04 | RT   |  RA   |  RB   |   RC  |  XO   | VA-Form |
  92
  93 Pseudocode:
  94
  95 ```
  96     prod[0:127] <- (RA) * (RB)    # Multiply RA and RB, result 128-bit
  97     sum[0:127] <- EXTZ(RC) + prod # Zero extend RC, add product
  98     RT <- sum[64:127]             # Store low half in RT
  99     RS <- sum[0:63]               # RS implicit register, see below
 100 ```
 101
 102 Special registers altered:
 103
 104     None
 105
 106 RC is zero-extended (not shifted, not sign-extended), the 128-bit product added
 107 to it; the lower half of that result stored in RT and the upper half
 108 in RS.
 109
 110 The differences here to `maddhdu` are that `maddhdu` stores the upper
 111 half in RT, where `maddedu` stores the upper half in RS.
 112
 113 The value stored in RT is exactly equivalent to `maddld` despite `maddld`
 114 performing sign-extension on RC, because RT is the full mathematical result
 115 modulo 2^64 and sign/zero extension from 64 to 128 bits produces identical
 116 results modulo 2^64. This is why there is no maddldu instruction.
 117
 118 RS is implictly defined as the register following RT (RS=RT+1).
 119
 120 *Programmer's Note:
 121 As a Scalar Power ISA operation, like `lq` and `stq`, RS=RT+1.
 122 To achieve a big-integer rolling-accumulation effect:
 123 assuming the scalar to multiply is in r0,
 124 the vector to multiply by starts at r4 and the result vector
 125 in r20, instructions may be issued `maddedu r20,r4,r0,r20
 126 maddedu r21,r5,r0,r21` etc. where the first `maddedu` will have
 127 stored the upper half of the 128-bit multiply into r21, such
 128 that it may be picked up by the second `maddedu`. Repeat inline
 129 to construct a larger bigint scalar-vector multiply,
 130 as Scalar GPR register file space permits.*
 131
 132 Examples:
 133
 134 ```
 135     maddedu r4, r0, r1, r2 # ((r0)*(r1))+(r2), store lower in r4, upper in r5
 136 ```
 137
 138 # Divide/Modulo Quad-Double Unsigned
 139
 140 **Should name be Divide/Module Double Extended Unsigned?**
 141 **Check the pseudo-code comments**
 142
 143 `divmod2du RT,RA,RB,RC`
 144
 145 |  0-5  | 6-10 | 11-15 | 16-20 | 21-25 | 26-31 | Form    |
 146 |-------|------|-------|-------|-------|-------|---------|
 147 | EXT04 | RT   |  RA   |  RB   |   RC  |  XO   | VA-Form |
 148
 149 Pseudo-code:
 150
 151     if ((RA) <u (RB)) & ((RB) != [0]*XLEN) then   # Check RA<RB, for divide-by-0
 152         dividend[0:(XLEN*2)-1] <- (RA) || (RC)    # Combine RA/RC, zero extend
 153         divisor[0:(XLEN*2)-1] <- [0]*XLEN || (RB) # Extend to 128-bit
 154         result <- dividend / divisor              # Division
 155         modulo <- dividend % divisor              # Modulo
 156         RT <- result[XLEN:(XLEN*2)-1]             # Store result in RT
 157         RS <- modulo[XLEN:(XLEN*2)-1]             # Modulo in RC, implicit
 158     else                                          # In case of error
 159         RT <- [1]*XLEN                            # RT all 1's
 160         RS <- [0]*XLEN                            # RS all 0's
 161
 162 Special registers altered:
 163
 164     None
 165
 166 Divide/Modulo Quad-Double Unsigned is another VA-Form instruction
 167 that is near-identical to `divdeu` except that:
 168
 169 * the lower 64 bits of the dividend, instead of being zero, contain a
 170   register, RC.
 171 * it performs a fused divide and modulo in a single instruction, storing
 172   the modulo in an implicit RS (similar to `maddedu`)
 173
 174 RB, the divisor, remains 64 bit.  The instruction is therefore a 128/64
 175 division, producing a (pair) of 64 bit result(s), in the same way that
 176 Intel [divq](https://www.felixcloutier.com/x86/div) works.
 177 Overflow conditions
 178 are detected in exactly the same fashion as `divdeu`, except that rather
 179 than have `UNDEFINED` behaviour, RT is set to all ones and RS set to all
 180 zeros on overflow.
 181
 182 *Programmer's note: there are no Rc variants of any of these VA-Form
 183 instructions. `cmpi` will need to be used to detect overflow conditions:
 184 the saving in instruction count is that both RT and RS will have already
 185 been set to useful values (all 1s and all zeros respectively)
 186 needed as part of implementing Knuth's
 187 Algorithm D*
 188
 189 For Scalar usage, just as for `maddedu`, `RS=RT+1` (similar to `lq` and `stq`).
 190
 191 Examples:
 192
 193 ```
 194     divmod2du r4, r0, r1, r2 # ((r0)||(r2)) / (r1), store in r4
 195                              # ((r0)||(r2)) % (r1), store in r5
 196 ```
 197
 198 [[!tag opf_rfc]]
 199
 200 # Appendices
 201
 202     Appendix E Power ISA sorted by opcode
 203     Appendix F Power ISA sorted by version
 204     Appendix G Power ISA sorted by Compliancy Subset
 205     Appendix H Power ISA sorted by mnemonic
 206
 207 | Form | Book | Page | Version | mnemonic | Description |
 208 |------|------|------|---------|----------|-------------|
 209 | VA   | I    | #    | 3.0B    | maddedu  | Multiply-Add Extend Double Unsigned |
 210 | VA   | I    | #    | 3.0B    | divmod2du | Floatif Move | Divide/Modulo Quad-Double Unsigned
 211