ztrans_proposal.mdwn

   1 # Zftrans - transcendental operations
   2
   3 See:
   4
   5 * <http://bugs.libre-riscv.org/show_bug.cgi?id=127>
   6 * <https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html>
   7 * Discussion: <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002342.html>
   8 * [[rv_major_opcode_1010011]] for opcode listing.
   9 * [[zfpacc_proposal]] for accuracy settings proposal
  10
  11 Extension subsets:
  12
  13 * **Zftrans**: standard transcendentals (best suited to 3D)
  14 * **ZftransExt**: extra functions (useful, not generally needed for 3D,
  15   can be synthesised using Ztrans)
  16 * **Ztrigpi**: trig. xxx-pi sinpi cospi tanpi
  17 * **Ztrignpi**: trig non-xxx-pi sin cos tan
  18 * **Zarctrigpi**: arc-trig. a-xxx-pi: atan2pi asinpi acospi
  19 * **Zarctrignpi**: arc-trig. non-a-xxx-pi: atan2, asin, acos
  20 * **Zfhyp**: hyperbolic/inverse-hyperbolic.  sinh, cosh, tanh, asinh,
  21   acosh, atanh (can be synthesised - see below)
  22 * **ZftransAdv**: much more complex to implement in hardware
  23 * **Zfrsqrt**: Reciprocal square-root.
  24
  25 Minimum recommended requirements for 3D: Zftrans, Ztrigpi, Zarctrigpi,
  26 Zarctrignpi
  27
  28 [[!toc levels=2]]
  29
  30 # TODO:
  31
  32 * Decision on accuracy, moved to [[zfpacc_proposal]]
  33 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002355.html>
  34 * Errors **MUST** be repeatable.
  35 * How about four Platform Specifications? 3DUNIX, UNIX, 3DEmbedded and Embedded?
  36 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002361.html>
  37   Accuracy requirements for dual (triple) purpose implementations must
  38   meet the higher standard.
  39 * Reciprocal Square-root is in its own separate extension (Zfrsqrt) as
  40   it is desirable on its own by other implementors.  This to be evaluated.
  41
  42 # Proposed Opcodes vs Khronos OpenCL Opcodes <a name="khronos_equiv"></a>
  43
  44 This list shows the (direct) equivalence between proposed opcodes and
  45 their Khronos OpenCL equivalents.
  46
  47 See
  48 <https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html>
  49
  50 Special FP16 opcodes are *not* being proposed, except by indirect / inherent
  51 use of the "fmt" field that is already present in the RISC-V Specification.
  52
  53 "Native" opcodes are *not* being proposed: implementors will be expected
  54 to use the (equivalent) proposed opcode covering the same function.
  55
  56 "Fast" opcodes are *not* being proposed, because the Khronos Specification
  57 fast\_length, fast\_normalise and fast\_distance OpenCL opcodes require
  58 vectors (or can be done as scalar operations using other RISC-V instructions).
  59
  60 The OpenCL FP32 opcodes are **direct** equivalents to the proposed opcodes.
  61 Deviation from conformance with the Khronos Specification - including the
  62 Khronos Specification accuracy requirements - is not an option.
  63
  64 [[!table data="""
  65 Proposed opcode | OpenCL FP32 | OpenCL FP16 | OpenCL native | OpenCL fast |
  66 FSIN            | sin         | half\_sin   | native\_sin   | NONE        |
  67 FCOS            | cos         | half\_cos   | native\_cos   | NONE        |
  68 FTAN            | tan         | half\_tan   | native\_tan   | NONE        |
  69 NONE (1)        | sincos      | NONE        | NONE          | NONE        |
  70 FASIN           | asin        | NONE        | NONE          | NONE        |
  71 FACOS           | acos        | NONE        | NONE          | NONE        |
  72 NONE (3)        | atan        | NONE        | NONE          | NONE        |
  73 FSINPI          | sinpi       | NONE        | NONE          | NONE        |
  74 FCOSPI          | cospi       | NONE        | NONE          | NONE        |
  75 FTANPI          | tanpi       | NONE        | NONE          | NONE        |
  76 FASINPI         | asinpi      | NONE        | NONE          | NONE        |
  77 FACOSPI         | acospi      | NONE        | NONE          | NONE        |
  78 NONE (2)        | atanpi      | NONE        | NONE          | NONE        |
  79 FSINH           | sinh        | NONE        | NONE          | NONE        |
  80 FCOSH           | cosh        | NONE        | NONE          | NONE        |
  81 FTANH           | tanh        | NONE        | NONE          | NONE        |
  82 FASINH          | asinh       | NONE        | NONE          | NONE        |
  83 FACOSH          | acosh       | NONE        | NONE          | NONE        |
  84 FATANH          | atanh       | NONE        | NONE          | NONE        |
  85 FRSQRT          | rsqrt       | half\_rsqrt | native\_rsqrt | NONE        |
  86 FCBRT           | cbrt        | NONE        | NONE          | NONE        |
  87 FEXP2           | exp2        | half\_exp2  | native\_exp2  | NONE        |
  88 FLOG2           | log2        | half\_log2  | native\_log2  | NONE        |
  89 FEXPM1          | expm1       | NONE        | NONE          | NONE        |
  90 FLOG1P          | log1p       | NONE        | NONE          | NONE        |
  91 FEXP            | exp         | half\_exp   | native\_exp   | NONE        |
  92 FLOG            | log         | half\_log   | native\_log   | NONE        |
  93 FEXP10          | exp10       | half\_exp10 | native\_exp10 | NONE        |
  94 FLOG10          | log10       | half\_log10 | native\_log10 | NONE        |
  95 FATAN2          | atan2       | NONE        | NONE          | NONE        |
  96 FATAN2PI        | atan2pi     | NONE        | NONE          | NONE        |
  97 FPOW            | pow         | NONE        | NONE          | NONE        |
  98 FROOT           | rootn       | NONE        | NONE          | NONE        |
  99 FHYPOT          | hypot       | NONE        | NONE          | NONE        |
 100 """]]
 101
 102 Note (1) FSINCOS is macro-op fused (see below).
 103
 104 Note (2) FATANPI is a synthesised alias, below.
 105
 106 Note (3) FATAN2 is a sythesised alias, below.
 107
 108 # List of 2-arg opcodes
 109
 110 [[!table  data="""
 111 opcode    | Description            | pseudo-code                | Extension   |
 112 FATAN2    | atan2 arc tangent      | rd = atan2(rs2, rs1)       | Zarctrignpi |
 113 FATAN2PI  | atan2 arc tangent / pi | rd = atan2(rs2, rs1) / pi  | Zarctrigpi  |
 114 FPOW      | x power of y           | rd = pow(rs1, rs2)         | ZftransAdv  |
 115 FROOT     | x power 1/y            | rd = pow(rs1, 1/rs2)       | ZftransAdv  |
 116 FHYPOT    | hypotenuse             | rd = sqrt(rs1^2 + rs2^2)   | Zftrans     |
 117 """]]
 118
 119 # List of 1-arg transcendental opcodes
 120
 121 [[!table  data="""
 122 opcode   | Description              | pseudo-code             | Extension  |
 123 FRSQRT   | Reciprocal Square-root   | rd = sqrt(rs1)          | Zfrsqrt    |
 124 FCBRT    | Cube Root                | rd = pow(rs1, 3)        | Zftrans    |
 125 FEXP2    | power-of-2               | rd = pow(2, rs1)        | Zftrans    |
 126 FLOG2    | log2                     | rd = log2(rs1)          | Zftrans    |
 127 FEXPM1   | exponential minus 1      | rd = pow(e, rs1) - 1.0  | Zftrans    |
 128 FLOG1P   | log plus 1               | rd = log(e, 1 + rs1)    | Zftrans    |
 129 FEXP     | exponential              | rd = pow(e, rs1)        | ZftransExt |
 130 FLOG     | natural log (base e)     | rd = log(e, rs1)        | ZftransExt |
 131 FEXP10   | power-of-10              | rd = pow(10, rs1)       | ZftransExt |
 132 FLOG10   | log base 10              | rd = log10(rs1)         | ZftransExt |
 133 """]]
 134
 135 # List of 1-arg trigonometric opcodes
 136
 137 [[!table  data="""
 138 opcode   | Description              | pseudo-code             | Extension |
 139 FSIN     | sin (radians)            | rd = sin(rs1)           | Ztrignpi    |
 140 FCOS     | cos (radians)            | rd = cos(rs1)           | Ztrignpi    |
 141 FTAN     | tan (radians)            | rd = tan(rs1)           | Ztrignpi    |
 142 FASIN    | arcsin (radians)         | rd = asin(rs1)          | Zarctrignpi |
 143 FACOS    | arccos (radians)         | rd = acos(rs1)          | Zarctrignpi |
 144 FSINPI   | sin times pi             | rd = sin(pi * rs1)      | Ztrigpi |
 145 FCOSPI   | cos times pi             | rd = cos(pi * rs1)      | Ztrigpi |
 146 FTANPI   | tan times pi             | rd = tan(pi * rs1)      | Ztrigpi |
 147 FASINPI  | arcsin / pi              | rd = asin(rs1) / pi     | Zarctrigpi |
 148 FACOSPI  | arccos / pi              | rd = acos(rs1) / pi     | Zarctrigpi |
 149 FATANPI  | arctan / pi              | rd = atan(rs1) / pi     | Zarctrigpi |
 150 FSINH    | hyperbolic sin (radians) | rd = sinh(rs1)          | Zfhyp |
 151 FCOSH    | hyperbolic cos (radians) | rd = cosh(rs1)          | Zfhyp |
 152 FTANH    | hyperbolic tan (radians) | rd = tanh(rs1)          | Zfhyp |
 153 FASINH   | inverse hyperbolic sin   | rd = asinh(rs1)         | Zfhyp |
 154 FACOSH   | inverse hyperbolic cos   | rd = acosh(rs1)         | Zfhyp |
 155 FATANH   | inverse hyperbolic tan   | rd = atanh(rs1)         | Zfhyp |
 156 """]]
 157
 158 # Synthesis, Pseudo-code ops and macro-ops
 159
 160 The pseudo-ops are best left up to the compiler rather than being actual
 161 pseudo-ops, by allocating one scalar FP register for use as a constant
 162 (loop invariant) set to "1.0" at the beginning of a function or other
 163 suitable code block.
 164
 165 * FRCP rd, rs1 - pseudo-code alias for rd = 1.0 / rs1
 166 * FATAN - pseudo-code alias for rd = atan2(rs1, 1.0) - FATAN2
 167 * FATANPI - pseudo alias for rd = atan2pi(rs1, 1.0) - FATAN2PI
 168 * FSINCOS - fused macro-op between FSIN and FCOS (issued in that order).
 169 * FSINCOSPI - fused macro-op between FSINPI and FCOSPI (issued in that order).
 170
 171 FATANPI example pseudo-code:
 172
 173     lui t0, 0x3F800 // upper bits of f32 1.0
 174     fmv.x.s ft0, t0
 175     fatan2pi.s rd, rs1, ft0
 176
 177 Hyperbolic function example (obviates need for Zfhyp except for high-performance or correctly-rounding):
 178
 179     ASINH( x ) = ln( x + SQRT(x**2+1))
 180
 181 LOG / LOGP1 example:
 182
 183     LOG(x) = LOGP1(x - 1.0)
 184     EXP(x) = EXPM1(x) + 1.0
 185
 186 # To evaluate: should LOG be replaced with LOG1P (and EXP with EXPM1)?
 187
 188 RISC principle says "exclude LOG because it's covered by LOGP1 plus an ADD".
 189 Research needed to ensure that implementors are not compromised by such
 190 a decision
 191 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002358.html>
 192
 193 correctly-rounded LOG will return different results than LOGP1 and ADD.
 194 Likewise for EXP and EXPM1
 195