ztrans_proposal.mdwn

   1 # Zftrans - transcendental operations
   2
   3 See:
   4
   5 * <http://bugs.libre-riscv.org/show_bug.cgi?id=127>
   6 * <https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html>
   7 * Discussion: <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002342.html>
   8 * [[rv_major_opcode_1010011]] for opcode listing.
   9 * [[zfpacc_proposal]] for accuracy settings proposal
  10
  11 Extension subsets:
  12
  13 * **Zftrans**: standard transcendentals (best suited to 3D)
  14 * **ZftransExt**: extra functions (useful, not generally needed for 3D,
  15   can be synthesised using Ztrans)
  16 * **Ztrigpi**: trig. xxx-pi sinpi cospi tanpi
  17 * **Ztrignpi**: trig non-xxx-pi sin cos tan
  18 * **Zarctrigpi**: arc-trig. a-xxx-pi: atan2pi asinpi acospi
  19 * **Zarctrignpi**: arc-trig. non-a-xxx-pi: atan2, asin, acos
  20 * **Zfhyp**: hyperbolic/inverse-hyperbolic.  sinh, cosh, tanh, asinh,
  21   acosh, atanh (can be synthesised - see below)
  22 * **ZftransAdv**: much more complex to implement in hardware
  23 * **Zfrsqrt**: Reciprocal square-root.
  24
  25 Minimum recommended requirements for 3D: Zftrans, Ztrigpi, Zarctrigpi,
  26 Zarctrignpi
  27
  28 [[!toc levels=2]]
  29
  30 # TODO:
  31
  32 * Decision on accuracy, moved to [[zfpacc_proposal]]
  33 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002355.html>
  34 * Errors **MUST** be repeatable.
  35 * How about four Platform Specifications? 3DUNIX, UNIX, 3DEmbedded and Embedded?
  36 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002361.html>
  37   Accuracy requirements for dual (triple) purpose implementations must
  38   meet the higher standard.
  39 * Reciprocal Square-root is in its own separate extension (Zfrsqrt) as
  40   it is desirable on its own by other implementors.  This to be evaluated.
  41
  42 # Proposed Opcodes vs Khronos OpenCL Opcodes <a name="khronos_equiv"></a>
  43
  44 This list shows the (direct) equivalence between proposed opcodes and
  45 their Khronos OpenCL equivalents.
  46
  47 See
  48 <https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html>
  49
  50 Special FP16 opcodes are *not* being proposed, except by indirect / inherent
  51 use of the "fmt" field that is already present in the RISC-V Specification.
  52
  53 "Native" opcodes are *not* being proposed: implementors will be expected
  54 to use the (equivalent) proposed opcode covering the same function.
  55
  56 "Fast" opcodes are *not* being proposed, because the Khronos Specification
  57 fast\_length, fast\_normalise and fast\_distance OpenCL opcodes require
  58 vectors (or can be done as scalar operations using other RISC-V instructions).
  59
  60 The OpenCL FP32 opcodes are **direct** equivalents to the proposed opcodes.
  61 Deviation from conformance with the Khronos Specification - including the
  62 Khronos Specification accuracy requirements - is not an option.
  63
  64 [[!table data="""
  65 Proposed opcode | OpenCL FP32 | OpenCL FP16 | OpenCL native | OpenCL fast |
  66 FSIN            | sin         | half\_sin   | native\_sin   | NONE        |
  67 FCOS            | cos         | half\_cos   | native\_cos   | NONE        |
  68 FTAN            | tan         | half\_tan   | native\_tan   | NONE        |
  69 FASIN           | asin        | NONE        | NONE          | NONE        |
  70 FACOS           | acos        | NONE        | NONE          | NONE        |
  71 FSINPI          | sinpi       | NONE        | NONE          | NONE        |
  72 FCOSPI          | cospi       | NONE        | NONE          | NONE        |
  73 FTANPI          | tanpi       | NONE        | NONE          | NONE        |
  74 FASINPI         | asinpi      | NONE        | NONE          | NONE        |
  75 FACOSPI         | acospi      | NONE        | NONE          | NONE        |
  76 FATANPI         | atanpi      | NONE        | NONE          | NONE        |
  77 FSINH           | sinh        | NONE        | NONE          | NONE        |
  78 FCOSH           | cosh        | NONE        | NONE          | NONE        |
  79 FTANH           | tanh        | NONE        | NONE          | NONE        |
  80 FASINH          | asinh       | NONE        | NONE          | NONE        |
  81 FACOSH          | acosh       | NONE        | NONE          | NONE        |
  82 FATANH          | atanh       | NONE        | NONE          | NONE        |
  83 FRSQRT          | rsqrt       | half\_rsqrt | native\_rsqrt | NONE        |
  84 FCBRT           | cbrt        | NONE        | NONE          | NONE        |
  85 FEXP2           | exp2        | half\_exp2  | native\_exp2  | NONE        |
  86 FLOG2           | log2        | half\_log2  | native\_log2  | NONE        |
  87 FEXPM1 (1)      | expm1       | NONE        | NONE          | NONE        |
  88 FLOG1P (1)      | log1p       | NONE        | NONE          | NONE        |
  89 FEXP (1)        | exp         | half\_exp   | native\_exp   | NONE        |
  90 FLOG (1)        | log         | half\_log   | native\_log   | NONE        |
  91 FEXP10          | exp10       | half\_exp10 | native\_exp10 | NONE        |
  92 FLOG10          | log10       | half\_log10 | native\_log10 | NONE        |
  93 FATAN2          | atan2       | NONE        | NONE          | NONE        |
  94 FATAN2PI        | atan2pi     | NONE        | NONE          | NONE        |
  95 FPOW            | pow         | NONE        | NONE          | NONE        |
  96 FROOT           | rootn       | NONE        | NONE          | NONE        |
  97 FHYPOT          | hypot       | NONE        | NONE          | NONE        |
  98 """]]
  99
 100 Note (1): See "synthesis", below.  FEXPM1, FEXP and FLOG1P, FLOG, may
 101 be synthesised in terms of the other.  FEXPM1 and FLOG1P are more accurate.
 102 It is likely therefore that FLOG and FEXP will be removed.
 103
 104 # List of 2-arg opcodes
 105
 106 [[!table  data="""
 107 opcode    | Description           | pseudo-code                | Extension |
 108 FATAN2    | atan2 arc tangent     | rd = atan2(rs2, rs1)       | Zarctrignpi |
 109 FATAN2PI  | atan arc tangent / pi | rd = atan2(rs2, rs1) / pi  | Zarctrigpi |
 110 FPOW      | x power of y          | rd = pow(rs1, rs2)         | ZftransAdv |
 111 FROOT     | x power 1/y           | rd = pow(rs1, 1/rs2)       | ZftransAdv |
 112 FHYPOT    | hypotenuse            | rd = sqrt(rs1^2 + rs2^2)       | Zftrans    |
 113 """]]
 114
 115 # List of 1-arg transcendental opcodes
 116
 117 [[!table  data="""
 118 opcode   | Description              | pseudo-code             | Extension |
 119 FRSQRT   | Reciprocal Square-root   | rd = sqrt(rs1)          | Zfrsqrt    |
 120 FCBRT    | Cube Root                | rd = pow(rs1, 3)        | Zftrans    |
 121 FEXP2    | power-of-2               | rd = pow(2, rs1)        | Zftrans    |
 122 FLOG2    | log2                     | rd = log2(rs1)          | Zftrans    |
 123 FEXPM1   | exponent minus 1         | rd = pow(e, rs1) - 1.0  | Zftrans    |
 124 FLOG1P   | log plus 1               | rd = log(e, 1 + rs1)    | Zftrans    |
 125 FEXP     | exponent                 | rd = pow(e, rs1)        | ZftransExt |
 126 FLOG     | natural log (base e)     | rd = log(e, rs1)        | ZftransExt |
 127 FEXP10   | power-of-10              | rd = pow(10, rs1)       | ZftransExt |
 128 FLOG10   | log base 10              | rd = log10(rs1)         | ZftransExt |
 129 """]]
 130
 131 # List of 1-arg trigonometric opcodes
 132
 133 [[!table  data="""
 134 opcode   | Description              | pseudo-code             | Extension |
 135 FSIN     | sin (radians)            | rd = sin(rs1)           | Ztrignpi    |
 136 FCOS     | cos (radians)            | rd = cos(rs1)           | Ztrignpi    |
 137 FTAN     | tan (radians)            | rd = tan(rs1)           | Ztrignpi    |
 138 FASIN    | arcsin (radians)         | rd = asin(rs1)          | Zarctrignpi |
 139 FACOS    | arccos (radians)         | rd = acos(rs1)          | Zarctrignpi |
 140 FSINPI   | sin times pi             | rd = sin(pi * rs1)      | Ztrigpi |
 141 FCOSPI   | cos times pi             | rd = cos(pi * rs1)      | Ztrigpi |
 142 FTANPI   | tan times pi             | rd = tan(pi * rs1)      | Ztrigpi |
 143 FASINPI  | arcsin times pi          | rd = asin(pi * rs1)     | Zarctrigpi |
 144 FACOSPI  | arccos times pi          | rd = acos(pi * rs1)     | Zarctrigpi |
 145 FATANPI  | arctan times pi          | rd = atan(pi * rs1)     | Zarctrigpi |
 146 FSINH    | hyperbolic sin (radians) | rd = sinh(rs1)          | Zfhyp |
 147 FCOSH    | hyperbolic cos (radians) | rd = cosh(rs1)          | Zfhyp |
 148 FTANH    | hyperbolic tan (radians) | rd = tanh(rs1)          | Zfhyp |
 149 FASINH   | inverse hyperbolic sin   | rd = asinh(rs1)         | Zfhyp |
 150 FACOSH   | inverse hyperbolic cos   | rd = acosh(rs1)         | Zfhyp |
 151 FATANH   | inverse hyperbolic tan   | rd = atanh(rs1)         | Zfhyp |
 152 """]]
 153
 154 # Synthesis, Pseudo-code ops and macro-ops
 155
 156 The pseudo-ops are best left up to the compiler rather than being actual
 157 pseudo-ops, by allocating one scalar FP register for use as a constant
 158 (loop invariant) set to "1.0" at the beginning of a function or other
 159 suitable code block.
 160
 161 * FRCP rd, rs1 - pseudo-code alias for rd = 1.0 / rs1
 162 * FATAN - pseudo-code alias for rd = atan2(rs1, 1.0) - FATAN2
 163 * FATANPI - pseudo alias for rd = atan2pi(rs1, 1.0) - FATAN2PI
 164 * FSINCOS - fused macro-op between FSIN and FCOS (issued in that order).
 165 * FSINCOSPI - fused macro-op between FSINPI and FCOSPI (issued in that order).
 166
 167 FATANPI example pseudo-code:
 168
 169     lui t0, 0x3F800 // upper bits of f32 1.0
 170     fmv.x.s ft0, t0
 171     fatan2pi.s rd, rs1, ft0
 172
 173 Hypotenuse example (obviates need for Zfhyp except for high-performance):
 174
 175     ASINH( x ) = ln( x + SQRT(x**2+1)
 176
 177 LOG / LOGP1 example:
 178
 179     LOG(x) = LOGP1(x) + 1.0
 180     EXP(x) = EXPM1(x-1.0)
 181
 182 # To evaluate: should LOG be replaced with LOG1P (and EXP with EXPM1)?
 183
 184 RISC principle says "exclude LOG because it's covered by LOGP1 plus an ADD".
 185 Research needed to ensure that implementors are not compromised by such
 186 a decision
 187 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002358.html>
 188