openpower/sv/rfc/ls013.mdwn

   1 # RFC ls013 Min/Max GPR/FPR
   2
   3 **URLs**:
   4
   5 * <https://libre-soc.org/openpower/sv/rfc/ls013/>
   6 * <https://git.openpower.foundation/isa/PowerISA/issues/TODO>
   7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1057>
   8
   9 **Severity**: Major
  10
  11 **Status**: New
  12
  13 **Date**: 14 Apr 2023
  14
  15 **Target**: v3.2B
  16
  17 **Source**: v3.1B
  18
  19 **Books and Section affected**:
  20
  21 ```
  22     Book I Fixed-Point and Floating-Point Instructions
  23     Appendix E Power ISA sorted by opcode
  24     Appendix F Power ISA sorted by version
  25     Appendix G Power ISA sorted by Compliancy Subset
  26     Appendix H Power ISA sorted by mnemonic
  27 ```
  28
  29 **Summary**
  30
  31 ```
  32     Instructions added
  33 ```
  34
  35 **Submitter**: Luke Leighton (Libre-SOC)
  36
  37 **Requester**: Libre-SOC
  38
  39 **Impact on processor**:
  40
  41 ```
  42     Addition of new GPR-based and FPR-based instructions
  43 ```
  44
  45 **Impact on software**:
  46
  47 ```
  48     Requires support for new instructions in assembler, debuggers,
  49     and related tools.
  50 ```
  51
  52 **Keywords**:
  53
  54 ```
  55     GPR, FPR, min, max, fmin, fmax
  56 ```
  57
  58 **Motivation**
  59
  60 Minimum/Maximum are common operations that can take an astounding number of
  61 operations to implement in software. Additionally, Vector Reduce-Min/Max are
  62 common vector operations, and SVP64 Parallel Reduction needs a single Scalar
  63 instruction in order to effectively implement Reduce-Min/Max.
  64
  65 **Notes and Observations**:
  66
  67 1. SVP64 REMAP Parallel Reduction needs a single Scalar instruction to
  68     work with, for best effectiveness.  With no SFFS minimum/maximum
  69     instructions Simple-V min/max Parallel Reduction is severely compromised.
  70 2. Once one FP min/max mode is implemented the rest are not much more hardware.
  71 3. There exists similar instructions in VSX (not IEEE754-2019 though).
  72     This is frequently used to justify not adding them. However SVP64/VSX may
  73     have different meaning from SVP64/SFFS, so it is *really* crucial to have
  74     SFFS ops even if "equivalent" to VSX in order for SVP64 to not be
  75     compromised (non-orthogonal).
  76 4. FP min/max are rather complex to implement in software, the most commonly
  77     used FP max function `fmax` from glibc compiled for SFFS is an astounding
  78     32 instructions.
  79
  80 **Changes**
  81
  82 Add the following entries to:
  83
  84 * the Appendices of Book I
  85 * Book I 3.3.9 Fixed-Point Arithmetic Instructions
  86 * Book I 4.6.6.1 Floating-Point Elementary Arithmetic Instructions
  87 * Book I 1.6.1 and 1.6.2
  88
  89 ----------------
  90
  91 \newpage{}
  92
  93 # Floating-Point Instructions
  94
  95 This group is to provide Floating-Point min/max however with IEEE754 having advanced
  96 to 2019 there are now subtle differences. These are selectable with a Mode Field, `FMM`.
  97
  98 ## `FMM` -- Floating Min/Max Mode
  99
 100 <a id="fmm-floating-min-max-mode"></a>
 101
 102 | `FMM` | Extended Mnemonic             | Origin                         | Semantics                                       |
 103 |-------|-------------------------------|--------------------------------|-------------------------------------------------|
 104 | 0000  | fminnum08[s] FRT, FRA, FRB    | IEEE 754-2008                  | FRT = minNum(FRA, FRB)  (1)                     |
 105 | 0001  | fmin19[s] FRT, FRA, FRB       | IEEE 754-2019                  | FRT = minimum(FRA, FRB)                         |
 106 | 0010  | fminnum19[s] FRT, FRA, FRB    | IEEE 754-2019                  | FRT = minimumNumber(FRA, FRB)                   |
 107 | 0011  | fminc[s] FRT, FRA, FRB        | x86 minss or Win32's min macro | FRT = FRA \< FRB ? FRA : FRB                    |
 108 | 0100  | fminmagnum08[s] FRT, FRA, FRB | IEEE 754-2008 (TODO: (3))      | FRT = minmaxmag(FRA, FRB, False, fminnum08) (2) |
 109 | 0101  | fminmag19[s] FRT, FRA, FRB    | IEEE 754-2019                  | FRT = minmaxmag(FRA, FRB, False, fmin19) (2)    |
 110 | 0110  | fminmagnum19[s] FRT, FRA, FRB | IEEE 754-2019                  | FRT = minmaxmag(FRA, FRB, False, fminnum19) (2) |
 111 | 0111  | fminmagc[s] FRT, FRA, FRB     | -                              | FRT = minmaxmag(FRA, FRB, False, fminc) (2)     |
 112 | 1000  | fmaxnum08[s] FRT, FRA, FRB    | IEEE 754-2008                  | FRT = maxNum(FRA, FRB)  (1)                     |
 113 | 1001  | fmax19[s] FRT, FRA, FRB       | IEEE 754-2019                  | FRT = maximum(FRA, FRB)                         |
 114 | 1010  | fmaxnum19[s] FRT, FRA, FRB    | IEEE 754-2019                  | FRT = maximumNumber(FRA, FRB)                   |
 115 | 1011  | fmaxc[s] FRT, FRA, FRB        | x86 maxss or Win32's max macro | FRT = FRA > FRB ? FRA : FRB                     |
 116 | 1100  | fmaxmagnum08[s] FRT, FRA, FRB | IEEE 754-2008 (TODO: (3))      | FRT = minmaxmag(FRA, FRB, True, fmaxnum08) (2)  |
 117 | 1101  | fmaxmag19[s] FRT, FRA, FRB    | IEEE 754-2019                  | FRT = minmaxmag(FRA, FRB, True, fmax19) (2)     |
 118 | 1110  | fmaxmagnum19[s] FRT, FRA, FRB | IEEE 754-2019                  | FRT = minmaxmag(FRA, FRB, True, fmaxnum19) (2)  |
 119 | 1111  | fmaxmagc[s] FRT, FRA, FRB     | -                              | FRT = minmaxmag(FRA, FRB, True, fmaxc) (2)      |
 120
 121 Note (1): for the purposes of minNum/maxNum, -0.0 is defined to be less than
 122     +0.0. This is left unspecified in IEEE 754-2008.
 123
 124 Note (2): minmaxmag(x, y, cmp, fallback) is defined as:
 125
 126 ```python
 127 def minmaxmag(x, y, is_max, fallback):
 128     a = abs(x) < abs(y)
 129     b = abs(x) > abs(y)
 130     if is_max:
 131         a, b = b, a  # swap
 132     if a:
 133         return x
 134     if b:
 135         return y
 136     # equal magnitudes, or NaN input(s)
 137     return fallback(x, y)
 138 ```
 139
 140 Note (3): TODO: icr if IEEE 754-2008 has min/maxMagNum like IEEE 754-2019's
 141     minimum/maximumMagnitudeNumber
 142
 143 ----------------
 144
 145 \newpage{}
 146
 147 ## Floating Minimum/Maximum MM-form
 148
 149 * fminmax FRT, FRA, FRB, FMM
 150 * fminmax. FRT, FRA, FRB, FMM
 151
 152 ```
 153     |0    |6    |11   |16   |21   |25  |31  |
 154     | PO  | FRT | FRA | FRB | FMM | XO | Rc |
 155 ```
 156
 157 Compute the minimum/maximum of FRA and FRB, according to FMM, and store the
 158 result in FRT.
 159
 160 Special Registers altered:
 161
 162 ```
 163     FX VXSNAN
 164     CR1     (if Rc=1)
 165 ```
 166
 167 Extended Mnemonics:
 168
 169 see [`FMM` -- Floating Min/Max Mode](#fmm-floating-min-max-mode)
 170
 171 ----------
 172
 173 ## Floating Minimum/Maximum Single MM-form
 174
 175 * fminmaxs FRT, FRA, FRB, FMM
 176 * fminmaxs. FRT, FRA, FRB, FMM
 177
 178 ```
 179     |0    |6    |11   |16   |21   |25  |31  |
 180     | PO  | FRT | FRA | FRB | FMM | XO | Rc |
 181 ```
 182
 183 Compute the minimum/maximum of FRA and FRB, according to FMM, and store the
 184 result in FRT.
 185
 186 Special Registers altered:
 187
 188 ```
 189     FX VXSNAN
 190     CR1     (if Rc=1)
 191 ```
 192
 193 Extended Mnemonics:
 194
 195 see [`FMM` -- Floating Min/Max Mode](#fmm-floating-min-max-mode)
 196
 197 ----------
 198
 199 \newpage{}
 200
 201 # Fixed-Point Instructions
 202
 203 These are signed and unsigned, min or max.  SVP64 Prefixing defines Saturation
 204 semantics therefore Saturated variants of these instructions need not be proposed.
 205
 206 ## `MMM` -- Integer Min/Max Mode
 207
 208 <a id="mmm-integer-min-max-mode"></a>
 209
 210 * bit 0: set if word variant else dword
 211 * bit 1: set if signed else unsigned
 212 * bit 2: set if max else min
 213
 214 | `MMM` | Extended Mnemonic | Semantics                                    |
 215 |-------|-------------------|----------------------------------------------|
 216 | 000   | `minu RT,RA,RB`   | `RT = (uint64_t)RA < (uint64_t)RB ? RA : RB` |
 217 | 001   | `maxu RT,RA,RB`   | `RT = (uint64_t)RA > (uint64_t)RB ? RA : RB` |
 218 | 010   | `mins RT,RA,RB`   | `RT =  (int64_t)RA < (int64_t)RB  ? RA : RB` |
 219 | 011   | `maxs RT,RA,RB`   | `RT =  (int64_t)RA > (int64_t)RB  ? RA : RB` |
 220 | 100   | `minuw RT,RA,RB`  | `RT = (uint32_t)RA < (uint32_t)RB ? RA : RB` |
 221 | 101   | `maxuw RT,RA,RB`  | `RT = (uint32_t)RA > (uint32_t)RB ? RA : RB` |
 222 | 110   | `minsw RT,RA,RB`  | `RT =  (int32_t)RA < (int32_t)RB  ? RA : RB` |
 223 | 111   | `maxsw RT,RA,RB`  | `RT =  (int32_t)RA > (int32_t)RB  ? RA : RB` |
 224
 225 ## Minimum/Maximum MM-Form
 226
 227 * minmax RT, RA, RB, MMM
 228 * minmax. RT, RA, RB, MMM
 229
 230 ```
 231     |0    |6    |11   |16   |21   |24 |25  |31  |
 232     | PO  | RT  | RA  | RB  | MMM | / | XO | Rc |
 233 ```
 234
 235 ```
 236     a <- (RA|0)
 237     b <- (RB)
 238     if MMM[0] then  # word mode
 239         # shift left by XLEN/2 to make the dword comparison
 240         # do word comparison of the original inputs
 241         a <- a[XLEN/2:XLEN-1] || [0] * XLEN/2
 242         b <- b[XLEN/2:XLEN-1] || [0] * XLEN/2
 243     if MMM[1] then  # signed mode
 244         # invert sign bits to make the unsigned comparison
 245         # do signed comparison of the original inputs
 246         a[0] <- ¬a[0]
 247         b[0] <- ¬b[0]
 248     # if Rc = 1 then store the result of comparing a and b to CR0
 249     if Rc = 1 then
 250         if a <u b then
 251             CR0 <- 0b100 || XER.SO
 252         if a = b then
 253             CR0 <- 0b001 || XER.SO
 254         if a >u b then
 255             CR0 <- 0b010 || XER.SO
 256     if MMM[2] then  # max mode
 257         # swap a and b to make the less than comparison do
 258         # greater than comparison of the original inputs
 259         t <- a
 260         a <- b
 261         b <- t
 262     # store the entire selected source (even in word mode)
 263     # if Rc = 1 then store the result of comparing a and b to CR0
 264     if a <u b then RT <- (RA|0)
 265     else RT <- (RB)
 266 ```
 267
 268 Compute the integer minimum/maximum according to `MMM` of `(RA|0)` and `(RB)`
 269 and store the result in `RT`.
 270
 271 Special Registers altered:
 272
 273 ```
 274     CR0     (if Rc=1)
 275 ```
 276
 277 Extended Mnemonics:
 278
 279 see [`MMM` -- Integer Min/Max Mode](#mmm-integer-min-max-mode)
 280
 281 ----------
 282
 283 \newpage{}
 284
 285 # Instruction Formats
 286
 287 Add the following entries to Book I 1.6.1 Word Instruction Formats:
 288
 289 ## MM-FORM
 290
 291 ```
 292     |0    |6    |11   |16   |21   |24 |25  |31  |
 293     | PO  | FRT | FRA | FRB | FMM     | XO | Rc |
 294     | PO  | RT  | RA  | RB  | MMM | / | XO | Rc |
 295 ```
 296
 297 Add the following new fields to Book I 1.6.2 Word Instruction Fields:
 298
 299 ```
 300     FMM (21:24)
 301         Field used to specify minimum/maximum mode for fminmax[s].
 302
 303         Formats: MM
 304
 305     MMM (21:23)
 306         Field used to specify minimum/maximum mode for integer minmax.
 307
 308         Formats: MM
 309 ```
 310
 311 Add `MM` to the `Formats:` list for all of `FRT`, `FRA`, `FRB`, `XO (25:30)`,
 312 `Rc`, `RT`, `RA` and `RB`.
 313
 314 ----------
 315
 316 \newpage{}
 317
 318 # Appendices
 319
 320     Appendix E Power ISA sorted by opcode
 321     Appendix F Power ISA sorted by version
 322     Appendix G Power ISA sorted by Compliancy Subset
 323     Appendix H Power ISA sorted by mnemonic
 324
 325 | Form | Book | Page | Version | Mnemonic | Description |
 326 |------|------|------|---------|----------|-------------|
 327 | MM   | I    | #    | 3.2B    | fminmax  | Floating Minimum/Maximum |
 328 | MM   | I    | #    | 3.2B    | fminmaxs | Floating Minimum/Maximum Single |
 329 | MM   | I    | #    | 3.2B    | minmax   | Minimum/Maximum |
 330
 331 ## fmax instruction count
 332
 333 32 instructions are required in SFFS to emulate fmax.
 334
 335 ```
 336 #include <stdint.h>
 337 #include <string.h>
 338
 339 inline uint64_t asuint64(double f) {
 340     union {
 341         double f;
 342         uint64_t i;
 343     } u = {f};
 344     return u.i;
 345 }
 346
 347 inline int issignaling(double v) {
 348     // copied from glibc:
 349     // https://github.com/bminor/glibc/blob/e2756903329365134089d23548e9083d23bc3dd9/sysdeps/ieee754/dbl-64/math_config.h#L101
 350     uint64_t ix = asuint64(v);
 351     return 2 * (ix ^ 0x0008000000000000) > 2 * 0x7ff8000000000000ULL;
 352 }
 353
 354 double fmax(double x, double y) {
 355     // copied from glibc:
 356     // https://github.com/bminor/glibc/blob/e2756903329365134089d23548e9083d23bc3dd9/math/s_fmax_template.c
 357     if(__builtin_isgreaterequal(x, y))
 358         return x;
 359     else if(__builtin_isless(x, y))
 360         return y;
 361     else if(issignaling(x) || issignaling(y))
 362         return x + y;
 363     else
 364         return __builtin_isnan(y) ? x : y;
 365 }
 366 ```
 367
 368 Assembly listing:
 369
 370 ```
 371     fmax(double, double):
 372         fcmpu 0,1,2
 373         fmr 0,1
 374         cror 30,1,2
 375         beq 7,.L12
 376         blt 0,.L13
 377         stfd 1,-16(1)
 378         lis 9,0x8
 379         li 8,-1
 380         sldi 9,9,32
 381         rldicr 8,8,0,11
 382         ori 2,2,0
 383         ld 10,-16(1)
 384         xor 10,10,9
 385         sldi 10,10,1
 386         cmpld 0,10,8
 387         bgt 0,.L5
 388         stfd 2,-16(1)
 389         ori 2,2,0
 390         ld 10,-16(1)
 391         xor 9,10,9
 392         sldi 9,9,1
 393         cmpld 0,9,8
 394         ble 0,.L6
 395 .L5:
 396         fadd 1,0,2
 397         blr
 398 .L13:
 399         fmr 1,2
 400         blr
 401 .L6:
 402         fcmpu 0,2,2
 403         fmr 1,2
 404         bnulr 0
 405 .L12:
 406         fmr 1,0
 407         blr
 408         .long 0
 409         .byte 0,9,0,0,0,0,0,0
 410 ```
 411
 412 [[!tag opf_rfc]]
 413