ztrans_proposal.mdwn

   1 # Zftrans - transcendental operations
   2
   3 See:
   4
   5 * <http://bugs.libre-riscv.org/show_bug.cgi?id=127>
   6 * <https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html>
   7 * Discussion: <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002342.html>
   8 * [[rv_major_opcode_1010011]] for opcode listing.
   9 * [[zfpacc_proposal]] for accuracy settings proposal
  10
  11 Extension subsets:
  12
  13 * **Zftrans**: standard transcendentals (best suited to 3D)
  14 * **ZftransExt**: extra functions (useful, not generally needed for 3D,
  15   can be synthesised using Ztrans)
  16 * **Ztrigpi**: trig. xxx-pi sinpi cospi tanpi
  17 * **Ztrignpi**: trig non-xxx-pi sin cos tan
  18 * **Zarctrigpi**: arc-trig. a-xxx-pi: atan2pi asinpi acospi
  19 * **Zarctrignpi**: arc-trig. non-a-xxx-pi: atan2, asin, acos
  20 * **Zfhyp**: hyperbolic/inverse-hyperbolic.  sinh, cosh, tanh, asinh,
  21   acosh, atanh (can be synthesised - see below)
  22 * **ZftransAdv**: much more complex to implement in hardware
  23 * **Zfrsqrt**: Reciprocal square-root.
  24
  25 Minimum recommended requirements for 3D: Zftrans, Ztrigpi, Zarctrigpi,
  26 Zarctrignpi
  27
  28 [[!toc levels=2]]
  29
  30 # TODO:
  31
  32 * Decision on accuracy, moved to [[zfpacc_proposal]]
  33 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002355.html>
  34 * Errors **MUST** be repeatable.
  35 * How about four Platform Specifications? 3DUNIX, UNIX, 3DEmbedded and Embedded?
  36 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002361.html>
  37   Accuracy requirements for dual (triple) purpose implementations must
  38   meet the higher standard.
  39 * Reciprocal Square-root is in its own separate extension (Zfrsqrt) as
  40   it is desirable on its own by other implementors.  This to be evaluated.
  41
  42 # Proposed Opcodes vs Khronos OpenCL Opcodes <a name="khronos_equiv"></a>
  43
  44 This list shows the (direct) equivalence between proposed opcodes and
  45 their Khronos OpenCL equivalents.
  46
  47 See
  48 <https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html>
  49
  50 Special FP16 opcodes are *not* being proposed, except by indirect / inherent
  51 use of the "fmt" field that is already present in the RISC-V Specification.
  52
  53 "Native" opcodes are *not* being proposed: implementors will be expected
  54 to use the (equivalent) proposed opcode covering the same function.
  55
  56 "Fast" opcodes are *not* being proposed, because the Khronos Specification
  57 fast\_length, fast\_normalise and fast\_distance OpenCL opcodes require
  58 vectors (or can be done as scalar operations using other RISC-V instructions).
  59
  60 The OpenCL FP32 opcodes are **direct** equivalents to the proposed opcodes.
  61 Deviation from conformance with the Khronos Specification - including the
  62 Khronos Specification accuracy requirements - is not an option.
  63
  64 [[!table data="""
  65 Proposed opcode | OpenCL FP32 | OpenCL FP16 | OpenCL native | OpenCL fast |
  66 FSIN            | sin         | half\_sin   | native\_sin   | NONE        |
  67 FCOS            | cos         | half\_cos   | native\_cos   | NONE        |
  68 FTAN            | tan         | half\_tan   | native\_tan   | NONE        |
  69 FASIN           | asin        | NONE        | NONE          | NONE        |
  70 FACOS           | acos        | NONE        | NONE          | NONE        |
  71 FSINPI          | sinpi       | NONE        | NONE          | NONE        |
  72 FCOSPI          | cospi       | NONE        | NONE          | NONE        |
  73 FTANPI          | tanpi       | NONE        | NONE          | NONE        |
  74 FASINPI         | asinpi      | NONE        | NONE          | NONE        |
  75 FACOSPI         | acospi      | NONE        | NONE          | NONE        |
  76 FATANPI         | atanpi      | NONE        | NONE          | NONE        |
  77 FSINH           | sinh        | NONE        | NONE          | NONE        |
  78 FCOSH           | cosh        | NONE        | NONE          | NONE        |
  79 FTANH           | tanh        | NONE        | NONE          | NONE        |
  80 FASINH          | asinh       | NONE        | NONE          | NONE        |
  81 FACOSH          | acosh       | NONE        | NONE          | NONE        |
  82 FATANH          | atanh       | NONE        | NONE          | NONE        |
  83 FRSQRT          | rsqrt       | half\_rsqrt | native\_rsqrt | NONE        |
  84 FCBRT           | cbrt        | NONE        | NONE          | NONE        |
  85 FEXP2           | exp2        | half\_exp2  | native\_exp2  | NONE        |
  86 FLOG2           | log2        | half\_log2  | native\_log2  | NONE        |
  87 FEXPM1          | expm1       | NONE        | NONE          | NONE        |
  88 FLOG1P          | log1p       | NONE        | NONE          | NONE        |
  89 FEXP            | exp         | half\_exp   | native\_exp   | NONE        |
  90 FLOG            | log         | half\_log   | native\_log   | NONE        |
  91 FEXP10          | exp10       | half\_exp10 | native\_exp10 | NONE        |
  92 FLOG10          | log10       | half\_log10 | native\_log10 | NONE        |
  93 FATAN2          | atan2       | NONE        | NONE          | NONE        |
  94 FATAN2PI        | atan2pi     | NONE        | NONE          | NONE        |
  95 FPOW            | pow         | NONE        | NONE          | NONE        |
  96 FROOT           | rootn       | NONE        | NONE          | NONE        |
  97 FHYPOT          | hypot       | NONE        | NONE          | NONE        |
  98 """]]
  99
 100 # List of 2-arg opcodes
 101
 102 [[!table  data="""
 103 opcode    | Description           | pseudo-code                | Extension |
 104 FATAN2    | atan2 arc tangent     | rd = atan2(rs2, rs1)       | Zarctrignpi |
 105 FATAN2PI  | atan arc tangent / pi | rd = atan2(rs2, rs1) / pi  | Zarctrigpi |
 106 FPOW      | x power of y          | rd = pow(rs1, rs2)         | ZftransAdv |
 107 FROOT     | x power 1/y           | rd = pow(rs1, 1/rs2)       | ZftransAdv |
 108 FHYPOT    | hypotenuse            | rd = sqrt(rs1^2 + rs2^2)       | Zftrans    |
 109 """]]
 110
 111 # List of 1-arg transcendental opcodes
 112
 113 [[!table  data="""
 114 opcode   | Description              | pseudo-code             | Extension |
 115 FRSQRT   | Reciprocal Square-root   | rd = sqrt(rs1)          | Zfrsqrt    |
 116 FCBRT    | Cube Root                | rd = pow(rs1, 3)        | Zftrans    |
 117 FEXP2    | power-of-2               | rd = pow(2, rs1)        | Zftrans    |
 118 FLOG2    | log2                     | rd = log2(rs1)          | Zftrans    |
 119 FEXPM1   | exponent minus 1         | rd = pow(e, rs1) - 1.0  | Zftrans    |
 120 FLOG1P   | log plus 1               | rd = log(e, 1 + rs1)    | Zftrans    |
 121 FEXP     | exponent                 | rd = pow(e, rs1)        | ZftransExt |
 122 FLOG     | natural log (base e)     | rd = log(e, rs1)        | ZftransExt |
 123 FEXP10   | power-of-10              | rd = pow(10, rs1)       | ZftransExt |
 124 FLOG10   | log base 10              | rd = log10(rs1)         | ZftransExt |
 125 """]]
 126
 127 # List of 1-arg trigonometric opcodes
 128
 129 [[!table  data="""
 130 opcode   | Description              | pseudo-code             | Extension |
 131 FSIN     | sin (radians)            | rd = sin(rs1)           | Ztrignpi    |
 132 FCOS     | cos (radians)            | rd = cos(rs1)           | Ztrignpi    |
 133 FTAN     | tan (radians)            | rd = tan(rs1)           | Ztrignpi    |
 134 FASIN    | arcsin (radians)         | rd = asin(rs1)          | Zarctrignpi |
 135 FACOS    | arccos (radians)         | rd = acos(rs1)          | Zarctrignpi |
 136 FSINPI   | sin times pi             | rd = sin(pi * rs1)      | Ztrigpi |
 137 FCOSPI   | cos times pi             | rd = cos(pi * rs1)      | Ztrigpi |
 138 FTANPI   | tan times pi             | rd = tan(pi * rs1)      | Ztrigpi |
 139 FASINPI  | arcsin times pi          | rd = asin(pi * rs1)     | Zarctrigpi |
 140 FACOSPI  | arccos times pi          | rd = acos(pi * rs1)     | Zarctrigpi |
 141 FATANPI  | arctan times pi          | rd = atan(pi * rs1)     | Zarctrigpi |
 142 FSINH    | hyperbolic sin (radians) | rd = sinh(rs1)          | Zfhyp |
 143 FCOSH    | hyperbolic cos (radians) | rd = cosh(rs1)          | Zfhyp |
 144 FTANH    | hyperbolic tan (radians) | rd = tanh(rs1)          | Zfhyp |
 145 FASINH   | inverse hyperbolic sin   | rd = asinh(rs1)         | Zfhyp |
 146 FACOSH   | inverse hyperbolic cos   | rd = acosh(rs1)         | Zfhyp |
 147 FATANH   | inverse hyperbolic tan   | rd = atanh(rs1)         | Zfhyp |
 148 """]]
 149
 150 # Synthesis, Pseudo-code ops and macro-ops
 151
 152 The pseudo-ops are best left up to the compiler rather than being actual
 153 pseudo-ops, by allocating one scalar FP register for use as a constant
 154 (loop invariant) set to "1.0" at the beginning of a function or other
 155 suitable code block.
 156
 157 * FRCP rd, rs1 - pseudo-code alias for rd = 1.0 / rs1
 158 * FATAN - pseudo-code alias for rd = atan2(rs1, 1.0) - FATAN2
 159 * FATANPI - pseudo alias for rd = atan2pi(rs1, 1.0) - FATAN2PI
 160 * FSINCOS - fused macro-op between FSIN and FCOS (issued in that order).
 161 * FSINCOSPI - fused macro-op between FSINPI and FCOSPI (issued in that order).
 162
 163 FATANPI example pseudo-code:
 164
 165     lui t0, 0x3F800 // upper bits of f32 1.0
 166     fmv.x.s ft0, t0
 167     fatan2pi.s rd, rs1, ft0
 168
 169 Hypotenuse example (obviates need for Zfhyp except for high-performance):
 170
 171     ASINH( x ) = ln( x + SQRT(x**2+1)
 172
 173 LOG / LOGP1 example:
 174
 175     LOG(x) = LOGP1(x) + 1.0
 176     EXP(x) = EXPM1(x-1.0)
 177
 178 # To evaluate: should LOG be replaced with LOG1P (and EXP with EXPM1)?
 179
 180 RISC principle says "exclude LOG because it's covered by LOGP1 plus an ADD".
 181 Research needed to ensure that implementors are not compromised by such
 182 a decision
 183 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002358.html>
 184