From: Jacob Lifshay Date: Tue, 11 Apr 2023 06:04:47 +0000 (-0700) Subject: add fmin/max and mark as high priority X-Git-Tag: opf_rfc_ls012_v1~10 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=a67c976f69b56b46bd0ff9849f9a691b31e4f4bf;p=libreriscv.git add fmin/max and mark as high priority --- diff --git a/openpower/sv/rfc/ls012/optable.csv b/openpower/sv/rfc/ls012/optable.csv index 56503e7af..609952497 100644 --- a/openpower/sv/rfc/ls012/optable.csv +++ b/openpower/sv/rfc/ls012/optable.csv @@ -126,21 +126,21 @@ fpown(s), TBD, low, 10, yes, EXT2xx, no, transcendentals, 2R1W1w fpowr(s), TBD, low, 10, yes, EXT2xx, no, transcendentals, 2R1W1w frootn(s), TBD, low, 10, yes, EXT2xx, no, transcendentals, 2R1W1w fhypot(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w -fminnum08(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w -fmaxnum08(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w -fmin19(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w -fmax19(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w -fminnum19(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w -fmaxnum19(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w -fminc(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w -fmaxc(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w -fminmagnum08(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w -fmaxmagnum08(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w -fminmag19(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w -fmaxmag19(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w -fminmagnum19(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w -fmaxmagnum19(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w -fminmagc(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w -fmaxmagc(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w +fminnum08(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w +fmaxnum08(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w +fmin19(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w +fmax19(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w +fminnum19(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w +fmaxnum19(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w +fminc(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w +fmaxc(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w +fminmagnum08(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w +fmaxmagnum08(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w +fminmag19(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w +fmaxmag19(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w +fminmagnum19(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w +fmaxmagnum19(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w +fminmagc(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w +fmaxmagc(s), TBD, high, 10, yes, TBD, no, transcendentals, 2R1W1w fmod(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w fremainder(s), TBD, TBD, 10, yes, TBD, no, transcendentals, 2R1W1w diff --git a/openpower/transcendentals.mdwn b/openpower/transcendentals.mdwn index f4b7f14b6..388865355 100644 --- a/openpower/transcendentals.mdwn +++ b/openpower/transcendentals.mdwn @@ -42,6 +42,7 @@ TODO: rename extension subsets -- we're not on RISC-V anymore. acosh, atanh (can be synthesised - see below) * **ZftransAdv**: much more complex to implement in hardware * **Zfrsqrt**: Reciprocal square-root. +* **Zfminmax**: Min/Max. Minimum recommended requirements for 3D: Zftrans, Ztrignpi, Zarctrignpi, with Ztrigpi and Zarctrigpi as augmentations. @@ -233,22 +234,22 @@ Note (6) 4xf32-only, requires VMX. | fpowr | x power of y (x +ve) | FRT = exp(FRA log(FRB)) | ZftransAdv | | frootn | x power 1/n (n integer) | FRT = pow(FRA, 1/RB) | ZftransAdv | | fhypot | hypotenuse | FRT = sqrt(FRA^2 + FRB^2) | ZftransAdv | -| fminnum08 | IEEE 754-2008 minNum | FRT = minNum(FRA, FRB) (1) | TBD | -| fmaxnum08 | IEEE 754-2008 maxNum | FRT = maxNum(FRA, FRB) (1) | TBD | -| fmin19 | IEEE 754-2019 minimum | FRT = minimum(FRA, FRB) | TBD | -| fmax19 | IEEE 754-2019 maximum | FRT = maximum(FRA, FRB) | TBD | -| fminnum19 | IEEE 754-2019 minimumNumber | FRT = minimumNumber(FRA, FRB) | TBD | -| fmaxnum19 | IEEE 754-2019 maximumNumber | FRT = maximumNumber(FRA, FRB) | TBD | -| fminc | C ternary-op minimum | FRT = FRA \< FRB ? FRA : FRB | TBD | -| fmaxc | C ternary-op maximum | FRT = FRA > FRB ? FRA : FRB | TBD | -| fminmagnum08 | IEEE 754-2008 minNumMag | FRT = minmaxmag(FRA, FRB, False, fminnum08) (2)| TBD | -| fmaxmagnum08 | IEEE 754-2008 maxNumMag | FRT = minmaxmag(FRA, FRB, True, fmaxnum08) (2) | TBD | -| fminmag19 | IEEE 754-2019 minimumMagnitude | FRT = minmaxmag(FRA, FRB, False, fmin19) (2) | TBD | -| fmaxmag19 | IEEE 754-2019 maximumMagnitude | FRT = minmaxmag(FRA, FRB, True, fmax19) (2) | TBD | -| fminmagnum19 | IEEE 754-2019 minimumMagnitudeNumber | FRT = minmaxmag(FRA, FRB, False, fminnum19) (2)| TBD | -| fmaxmagnum19 | IEEE 754-2019 maximumMagnitudeNumber | FRT = minmaxmag(FRA, FRB, True, fmaxnum19) (2) | TBD | -| fminmagc | C ternary-op minimum magnitude | FRT = minmaxmag(FRA, FRB, False, fminc) (2) | TBD | -| fmaxmagc | C ternary-op maximum magnitude | FRT = minmaxmag(FRA, FRB, True, fmaxc) (2) | TBD | +| fminnum08 | IEEE 754-2008 minNum | FRT = minNum(FRA, FRB) (1) | Zfminmax | +| fmaxnum08 | IEEE 754-2008 maxNum | FRT = maxNum(FRA, FRB) (1) | Zfminmax | +| fmin19 | IEEE 754-2019 minimum | FRT = minimum(FRA, FRB) | Zfminmax | +| fmax19 | IEEE 754-2019 maximum | FRT = maximum(FRA, FRB) | Zfminmax | +| fminnum19 | IEEE 754-2019 minimumNumber | FRT = minimumNumber(FRA, FRB) | Zfminmax | +| fmaxnum19 | IEEE 754-2019 maximumNumber | FRT = maximumNumber(FRA, FRB) | Zfminmax | +| fminc | C ternary-op minimum | FRT = FRA \< FRB ? FRA : FRB | Zfminmax | +| fmaxc | C ternary-op maximum | FRT = FRA > FRB ? FRA : FRB | Zfminmax | +| fminmagnum08 | IEEE 754-2008 minNumMag | FRT = minmaxmag(FRA, FRB, False, fminnum08) (2)| Zfminmax | +| fmaxmagnum08 | IEEE 754-2008 maxNumMag | FRT = minmaxmag(FRA, FRB, True, fmaxnum08) (2) | Zfminmax | +| fminmag19 | IEEE 754-2019 minimumMagnitude | FRT = minmaxmag(FRA, FRB, False, fmin19) (2) | Zfminmax | +| fmaxmag19 | IEEE 754-2019 maximumMagnitude | FRT = minmaxmag(FRA, FRB, True, fmax19) (2) | Zfminmax | +| fminmagnum19 | IEEE 754-2019 minimumMagnitudeNumber | FRT = minmaxmag(FRA, FRB, False, fminnum19) (2)| Zfminmax | +| fmaxmagnum19 | IEEE 754-2019 maximumMagnitudeNumber | FRT = minmaxmag(FRA, FRB, True, fmaxnum19) (2) | Zfminmax | +| fminmagc | C ternary-op minimum magnitude | FRT = minmaxmag(FRA, FRB, False, fminc) (2) | Zfminmax | +| fmaxmagc | C ternary-op maximum magnitude | FRT = minmaxmag(FRA, FRB, True, fmaxc) (2) | Zfminmax | | fmod | modulus | FRT = fmod(FRA, FRB) | TBD | | fremainder | IEEE 754 remainder | FRT = remainder(FRA, FRB) | TBD | @@ -327,6 +328,8 @@ the less common subsets are still required for IEEE754 HPC. MALI Midgard, an embedded / mobile 3D GPU, for example only has the following opcodes: + 28 - fmin + 2C - fmax E8 - fatan_pt2 F0 - frcp (reciprocal) F2 - frsqrt (inverse square root, 1/sqrt(x)) @@ -343,6 +346,7 @@ Vivante Embedded/Mobile 3D (etnaviv ) only has the following: + fmin/fmax (implemented using SELECT) sin, cos2pi cos, sin2pi log2, exp @@ -354,6 +358,7 @@ It also has fast variants of some of these, as a CSR Mode. AMD's R600 GPU (R600\_Instruction\_Set\_Architecture.pdf) and the RDNA ISA (RDNA\_Shader\_ISA\_5August2019.pdf, Table 22, Section 6.3) have: + MIN/MAX/MIN_DX10/MAX_DX10 COS2PI (appx) EXP2 LOG (IEEE754) @@ -363,7 +368,7 @@ RDNA ISA (RDNA\_Shader\_ISA\_5August2019.pdf, Table 22, Section 6.3) have: SIN2PI (appx) AMD RDNA has F16 and F32 variants of all the above, and also has F64 -variants of SQRT, RSQRT and RECIP. It is interesting that even the +variants of SQRT, RSQRT, MIN, MAX, and RECIP. It is interesting that even the modern high-end AMD GPU does not have TAN or ATAN, where MALI Midgard does. @@ -482,6 +487,28 @@ is acceptable for 3D. Therefore they are their own subset extensions. +### Zfminmax + +* fminnum08 fmaxnum08 +* fmin19 fmax19 +* fminnum19 fmaxnum19 +* fminc fmaxc +* fminmagnum08 fmaxmagnum08 +* fminmag19 fmaxmag19 +* fminmagnum19 fmaxmagnum19 +* fminmagc fmaxmagc + +These are commonly used for vector reductions, where having them be a single +instruction is critical. They are also commonly used in GPU shaders, HPC, and +general-purpose FP algorithms. + +These min and max operations are quite cheap to implement hardware-wise, +being comparable in cost to fcmp + some muxes. They're all in one extension +because once you implement some of them, the rest require only slightly more +hardware complexity. + +Therefore they are their own subset extension. + # Synthesis, Pseudo-code ops and macro-ops The pseudo-ops are best left up to the compiler rather than being actual