openpower/sv/rfc/ls013.mdwn

   1 # RFC ls013 Min/Max GPR/FPR
   2
   3 **URLs**:
   4
   5 * <https://libre-soc.org/openpower/sv/rfc/ls013/>
   6 * <https://git.openpower.foundation/isa/PowerISA/issues/TODO>
   7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1057>
   8
   9 **Severity**: Major
  10
  11 **Status**: New
  12
  13 **Date**: 14 Apr 2023
  14
  15 **Target**: v3.2B
  16
  17 **Source**: v3.1B
  18
  19 **Books and Section affected**:
  20
  21 ```
  22     Book I Fixed-Point and Floating-Point Instructions
  23     Appendix E Power ISA sorted by opcode
  24     Appendix F Power ISA sorted by version
  25     Appendix G Power ISA sorted by Compliancy Subset
  26     Appendix H Power ISA sorted by mnemonic
  27 ```
  28
  29 **Summary**
  30
  31 ```
  32     Instructions added
  33 ```
  34
  35 **Submitter**: Luke Leighton (Libre-SOC)
  36
  37 **Requester**: Libre-SOC
  38
  39 **Impact on processor**:
  40
  41 ```
  42     Addition of new GPR-based and FPR-based instructions
  43 ```
  44
  45 **Impact on software**:
  46
  47 ```
  48     Requires support for new instructions in assembler, debuggers,
  49     and related tools.
  50 ```
  51
  52 **Keywords**:
  53
  54 ```
  55     GPR, FPR, min, max, fmin, fmax
  56 ```
  57
  58 **Motivation**
  59
  60 Minimum/Maximum are common operations that can take an astounding number of
  61 operations to implement in software. Additionally, Vector Reduce-Min/Max are
  62 common vector operations, and SVP64 Parallel Reduction needs a single Scalar
  63 instruction in order to effectively implement Reduce-Min/Max.
  64
  65 **Notes and Observations**:
  66
  67 1. SVP64 REMAP Parallel Reduction needs a single Scalar instruction to
  68     work with, for best effectiveness.  With no SFFS minimum/maximum
  69     instructions Simple-V min/max Parallel Reduction is severely compromised.
  70 2. Once one FP min/max mode is implemented the rest are not much more hardware.
  71 3. There exists similar instructions in VSX (not IEEE754-2019 though).
  72     This is frequently used to justify not adding them. However SVP64/VSX may
  73     have different meaning from SVP64/SFFS, so it is *really* crucial to have
  74     SFFS ops even if "equivalent" to VSX in order for SVP64 to not be
  75     compromised (non-orthogonal).
  76 4. FP min/max are rather complex to implement in software, the most commonly
  77     used FP max function `fmax` from glibc compiled for SFFS is an astounding
  78     32 instructions.
  79
  80 **Changes**
  81
  82 Add the following entries to:
  83
  84 * the Appendices of Book I
  85 * Book I 3.3.9 Fixed-Point Arithmetic Instructions
  86 * Book I 4.6.6.1 Floating-Point Elementary Arithmetic Instructions
  87 * Book I 1.6.1 and 1.6.2
  88
  89 ----------------
  90
  91 \newpage{}
  92
  93 # Floating-Point Instructions
  94
  95 This group is to provide Floating-Point min/max however with IEEE754 having advanced
  96 to 2019 there are now subtle differences. These are selectable with a Mode Field, `FMM`.
  97
  98 ## `FMM` -- Floating Min/Max Mode
  99
 100 <a id="fmm-floating-min-max-mode"></a>
 101
 102 <!-- hyphens in table determine width of columns for pandoc --
 103 please don't change just to make markdown source look better -->
 104 | `FMM` | Extended Mnemonic               | Origin                            | Semantics                                         |
 105 | ---   |---------------------------------|        -------------------        |---------------------------------------------------|
 106 | 0000  | fminnum08[s] FRT, FRA, FRB      | IEEE 754-2008                     | FRT = minNum(FRA, FRB)  (1)                       |
 107 | 0001  | fmin19[s] FRT, FRA, FRB         | IEEE 754-2019                     | FRT = minimum(FRA, FRB)                           |
 108 | 0010  | fminnum19[s] FRT, FRA, FRB      | IEEE 754-2019                     | FRT = minimumNumber(FRA, FRB)                     |
 109 | 0011  | fminc[s] FRT, FRA, FRB          | x86 minss or<br>Win32's min macro | FRT = FRA \< FRB ? FRA : FRB                      |
 110 | 0100  | fminmagnum08[s] FRT, FRA, FRB   | IEEE 754-2008<br>(TODO: (3))      | FRT = minmaxmag(FRA, FRB, False, fminnum08) (2)   |
 111 | 0101  | fminmag19[s] FRT, FRA, FRB      | IEEE 754-2019                     | FRT = minmaxmag(FRA, FRB, False, fmin19) (2)      |
 112 | 0110  | fminmagnum19[s] FRT, FRA, FRB   | IEEE 754-2019                     | FRT = minmaxmag(FRA, FRB, False, fminnum19) (2)   |
 113 | 0111  | fminmagc[s] FRT, FRA, FRB       | -                                 | FRT = minmaxmag(FRA, FRB, False, fminc) (2)       |
 114 | 1000  | fmaxnum08[s] FRT, FRA, FRB      | IEEE 754-2008                     | FRT = maxNum(FRA, FRB)  (1)                       |
 115 | 1001  | fmax19[s] FRT, FRA, FRB         | IEEE 754-2019                     | FRT = maximum(FRA, FRB)                           |
 116 | 1010  | fmaxnum19[s] FRT, FRA, FRB      | IEEE 754-2019                     | FRT = maximumNumber(FRA, FRB)                     |
 117 | 1011  | fmaxc[s] FRT, FRA, FRB          | x86 maxss or<br>Win32's max macro | FRT = FRA > FRB ? FRA : FRB                       |
 118 | 1100  | fmaxmagnum08[s] FRT, FRA, FRB   | IEEE 754-2008<br>(TODO: (3))      | FRT = minmaxmag(FRA, FRB, True, fmaxnum08) (2)    |
 119 | 1101  | fmaxmag19[s] FRT, FRA, FRB      | IEEE 754-2019                     | FRT = minmaxmag(FRA, FRB, True, fmax19) (2)       |
 120 | 1110  | fmaxmagnum19[s] FRT, FRA, FRB   | IEEE 754-2019                     | FRT = minmaxmag(FRA, FRB, True, fmaxnum19) (2)    |
 121 | 1111  | fmaxmagc[s] FRT, FRA, FRB       | -                                 | FRT = minmaxmag(FRA, FRB, True, fmaxc) (2)        |
 122
 123 Note (1): for the purposes of minNum/maxNum, -0.0 is defined to be less than
 124     +0.0. This is left unspecified in IEEE 754-2008.
 125
 126 Note (2): minmaxmag(x, y, cmp, fallback) is defined as:
 127
 128 ```python
 129 def minmaxmag(x, y, is_max, fallback):
 130     a = abs(x) < abs(y)
 131     b = abs(x) > abs(y)
 132     if is_max:
 133         a, b = b, a  # swap
 134     if a:
 135         return x
 136     if b:
 137         return y
 138     # equal magnitudes, or NaN input(s)
 139     return fallback(x, y)
 140 ```
 141
 142 Note (3): TODO: icr if IEEE 754-2008 has min/maxMagNum like IEEE 754-2019's
 143     minimum/maximumMagnitudeNumber
 144
 145 ----------------
 146
 147 \newpage{}
 148
 149 ## Floating Minimum/Maximum MM-form
 150
 151 * fminmax FRT, FRA, FRB, FMM
 152 * fminmax. FRT, FRA, FRB, FMM
 153
 154 ```
 155     |0    |6    |11   |16   |21   |25  |31  |
 156     | PO  | FRT | FRA | FRB | FMM | XO | Rc |
 157 ```
 158
 159 Compute the minimum/maximum of FRA and FRB, according to FMM, and store the
 160 result in FRT.
 161
 162 Special Registers altered:
 163
 164 ```
 165     FX VXSNAN
 166     CR1     (if Rc=1)
 167 ```
 168
 169 Extended Mnemonics:
 170
 171 see [`FMM` -- Floating Min/Max Mode](#fmm-floating-min-max-mode)
 172
 173 ----------
 174
 175 ## Floating Minimum/Maximum Single MM-form
 176
 177 * fminmaxs FRT, FRA, FRB, FMM
 178 * fminmaxs. FRT, FRA, FRB, FMM
 179
 180 ```
 181     |0    |6    |11   |16   |21   |25  |31  |
 182     | PO  | FRT | FRA | FRB | FMM | XO | Rc |
 183 ```
 184
 185 Compute the minimum/maximum of FRA and FRB, according to FMM, and store the
 186 result in FRT.
 187
 188 Special Registers altered:
 189
 190 ```
 191     FX VXSNAN
 192     CR1     (if Rc=1)
 193 ```
 194
 195 Extended Mnemonics:
 196
 197 see [`FMM` -- Floating Min/Max Mode](#fmm-floating-min-max-mode)
 198
 199 ----------
 200
 201 \newpage{}
 202
 203 # Fixed-Point Instructions
 204
 205 These are signed and unsigned, min or max.  SVP64 Prefixing defines Saturation
 206 semantics therefore Saturated variants of these instructions need not be proposed.
 207
 208 ## `MMM` -- Integer Min/Max Mode
 209
 210 <a id="mmm-integer-min-max-mode"></a>
 211
 212 * bit 0: set if word variant else dword
 213 * bit 1: set if signed else unsigned
 214 * bit 2: set if max else min
 215
 216 | `MMM` | Extended Mnemonic | Semantics                                    |
 217 |-------|-------------------|----------------------------------------------|
 218 | 000   | `minu RT,RA,RB`   | `RT = (uint64_t)RA < (uint64_t)RB ? RA : RB` |
 219 | 001   | `maxu RT,RA,RB`   | `RT = (uint64_t)RA > (uint64_t)RB ? RA : RB` |
 220 | 010   | `mins RT,RA,RB`   | `RT =  (int64_t)RA < (int64_t)RB  ? RA : RB` |
 221 | 011   | `maxs RT,RA,RB`   | `RT =  (int64_t)RA > (int64_t)RB  ? RA : RB` |
 222 | 100   | `minuw RT,RA,RB`  | `RT = (uint32_t)RA < (uint32_t)RB ? RA : RB` |
 223 | 101   | `maxuw RT,RA,RB`  | `RT = (uint32_t)RA > (uint32_t)RB ? RA : RB` |
 224 | 110   | `minsw RT,RA,RB`  | `RT =  (int32_t)RA < (int32_t)RB  ? RA : RB` |
 225 | 111   | `maxsw RT,RA,RB`  | `RT =  (int32_t)RA > (int32_t)RB  ? RA : RB` |
 226
 227 ## Minimum/Maximum MM-Form
 228
 229 * minmax RT, RA, RB, MMM
 230 * minmax. RT, RA, RB, MMM
 231
 232 ```
 233     |0    |6    |11   |16   |21   |24 |25  |31  |
 234     | PO  | RT  | RA  | RB  | MMM | / | XO | Rc |
 235 ```
 236
 237 ```
 238     a <- (RA|0)
 239     b <- (RB)
 240     if MMM[0] then  # word mode
 241         # shift left by XLEN/2 to make the dword comparison
 242         # do word comparison of the original inputs
 243         a <- a[XLEN/2:XLEN-1] || [0] * XLEN/2
 244         b <- b[XLEN/2:XLEN-1] || [0] * XLEN/2
 245     if MMM[1] then  # signed mode
 246         # invert sign bits to make the unsigned comparison
 247         # do signed comparison of the original inputs
 248         a[0] <- ¬a[0]
 249         b[0] <- ¬b[0]
 250     # if Rc = 1 then store the result of comparing a and b to CR0
 251     if Rc = 1 then
 252         if a <u b then
 253             CR0 <- 0b100 || XER.SO
 254         if a = b then
 255             CR0 <- 0b001 || XER.SO
 256         if a >u b then
 257             CR0 <- 0b010 || XER.SO
 258     if MMM[2] then  # max mode
 259         # swap a and b to make the less than comparison do
 260         # greater than comparison of the original inputs
 261         t <- a
 262         a <- b
 263         b <- t
 264     # store the entire selected source (even in word mode)
 265     # if Rc = 1 then store the result of comparing a and b to CR0
 266     if a <u b then RT <- (RA|0)
 267     else RT <- (RB)
 268 ```
 269
 270 Compute the integer minimum/maximum according to `MMM` of `(RA|0)` and `(RB)`
 271 and store the result in `RT`.
 272
 273 Special Registers altered:
 274
 275 ```
 276     CR0     (if Rc=1)
 277 ```
 278
 279 Extended Mnemonics:
 280
 281 see [`MMM` -- Integer Min/Max Mode](#mmm-integer-min-max-mode)
 282
 283 ----------
 284
 285 \newpage{}
 286
 287 # Instruction Formats
 288
 289 Add the following entries to Book I 1.6.1 Word Instruction Formats:
 290
 291 ## MM-FORM
 292
 293 ```
 294     |0    |6    |11   |16   |21   |24 |25  |31  |
 295     | PO  | FRT | FRA | FRB | FMM     | XO | Rc |
 296     | PO  | RT  | RA  | RB  | MMM | / | XO | Rc |
 297 ```
 298
 299 Add the following new fields to Book I 1.6.2 Word Instruction Fields:
 300
 301 ```
 302     FMM (21:24)
 303         Field used to specify minimum/maximum mode for fminmax[s].
 304
 305         Formats: MM
 306
 307     MMM (21:23)
 308         Field used to specify minimum/maximum mode for integer minmax.
 309
 310         Formats: MM
 311 ```
 312
 313 Add `MM` to the `Formats:` list for all of `FRT`, `FRA`, `FRB`, `XO (25:30)`,
 314 `Rc`, `RT`, `RA` and `RB`.
 315
 316 ----------
 317
 318 \newpage{}
 319
 320 # Appendices
 321
 322     Appendix E Power ISA sorted by opcode
 323     Appendix F Power ISA sorted by version
 324     Appendix G Power ISA sorted by Compliancy Subset
 325     Appendix H Power ISA sorted by mnemonic
 326
 327 | Form | Book | Page | Version | Mnemonic | Description |
 328 |------|------|------|---------|----------|-------------|
 329 | MM   | I    | #    | 3.2B    | fminmax  | Floating Minimum/Maximum |
 330 | MM   | I    | #    | 3.2B    | fminmaxs | Floating Minimum/Maximum Single |
 331 | MM   | I    | #    | 3.2B    | minmax   | Minimum/Maximum |
 332
 333 ## fmax instruction count
 334
 335 32 instructions are required in SFFS to emulate fmax.
 336
 337 ```
 338 #include <stdint.h>
 339 #include <string.h>
 340
 341 inline uint64_t asuint64(double f) {
 342     union {
 343         double f;
 344         uint64_t i;
 345     } u = {f};
 346     return u.i;
 347 }
 348
 349 inline int issignaling(double v) {
 350     // copied from glibc:
 351     // https://github.com/bminor/glibc/blob/e2756903329365134089d23548e9083d23bc3dd9/sysdeps/ieee754/dbl-64/math_config.h#L101
 352     uint64_t ix = asuint64(v);
 353     return 2 * (ix ^ 0x0008000000000000) > 2 * 0x7ff8000000000000ULL;
 354 }
 355
 356 double fmax(double x, double y) {
 357     // copied from glibc:
 358     // https://github.com/bminor/glibc/blob/e2756903329365134089d23548e9083d23bc3dd9/math/s_fmax_template.c
 359     if(__builtin_isgreaterequal(x, y))
 360         return x;
 361     else if(__builtin_isless(x, y))
 362         return y;
 363     else if(issignaling(x) || issignaling(y))
 364         return x + y;
 365     else
 366         return __builtin_isnan(y) ? x : y;
 367 }
 368 ```
 369
 370 Assembly listing:
 371
 372 ```
 373     fmax(double, double):
 374         fcmpu 0,1,2
 375         fmr 0,1
 376         cror 30,1,2
 377         beq 7,.L12
 378         blt 0,.L13
 379         stfd 1,-16(1)
 380         lis 9,0x8
 381         li 8,-1
 382         sldi 9,9,32
 383         rldicr 8,8,0,11
 384         ori 2,2,0
 385         ld 10,-16(1)
 386         xor 10,10,9
 387         sldi 10,10,1
 388         cmpld 0,10,8
 389         bgt 0,.L5
 390         stfd 2,-16(1)
 391         ori 2,2,0
 392         ld 10,-16(1)
 393         xor 9,10,9
 394         sldi 9,9,1
 395         cmpld 0,9,8
 396         ble 0,.L6
 397 .L5:
 398         fadd 1,0,2
 399         blr
 400 .L13:
 401         fmr 1,2
 402         blr
 403 .L6:
 404         fcmpu 0,2,2
 405         fmr 1,2
 406         bnulr 0
 407 .L12:
 408         fmr 1,0
 409         blr
 410         .long 0
 411         .byte 0,9,0,0,0,0,0,0
 412 ```
 413
 414 [[!tag opf_rfc]]
 415