ztrans_proposal.mdwn

   1 # Zftrans - transcendental operations
   2
   3 See:
   4
   5 * <http://bugs.libre-riscv.org/show_bug.cgi?id=127>
   6 * <https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html>
   7 * Discussion: <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002342.html>
   8 * [[rv_major_opcode_1010011]] for opcode listing.
   9
  10 Extension subsets:
  11
  12 * **Zftrans**: standard transcendentals (best suited to 3D)
  13 * **ZftransExt**: extra functions (useful, not generally needed for 3D,
  14   can be synthesised using Ztrans)
  15 * **Ztrigpi**: trig. xxx-pi sinpi cospi tanpi
  16 * **Ztrignpi**: trig non-xxx-pi sin cos tan
  17 * **Zarctrigpi**: arc-trig. a-xxx-pi: atan2pi asinpi acospi
  18 * **Zarctrignpi**: arc-trig. non-a-xxx-pi: atan2, asin, acos
  19 * **Zfhyp**: hyperbolic/inverse-hyperbolic.  sinh, cosh, tanh, asinh,
  20   acosh, atanh (can be synthesised - see below)
  21 * **ZftransAdv**: much more complex to implement in hardware
  22 * **Zfrsqrt**: Reciprocal square-root.
  23
  24 Minimum recommended requirements for 3D: Zftrans, Ztrigpi, Zarctrigpi,
  25 Zarctrignpi
  26
  27 [[!toc levels=2]]
  28
  29 # TODO:
  30
  31 * Decision on accuracy
  32 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002355.html>
  33 * Errors **MUST** be repeatable.
  34 * How about three Platform Specifications? 3D, UNIX and Embedded?
  35 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002361.html>
  36   Accuracy requirements for dual (triple) purpose implementations must
  37   meet the higher standard.
  38 * Reciprocal Square-root is in its own separate extension (Zfrsqrt) as
  39   it is desirable on its own by other implementors.  This to be evaluated.
  40
  41
  42 # List of 2-arg opcodes
  43
  44 [[!table  data="""
  45 opcode    | Description           | pseudo-code                | Extension |
  46 FATAN2    | atan2 arc tangent     | rd = atan2(rs2, rs1)       | Zarctrignpi |
  47 FATAN2PI  | atan arc tangent / pi | rd = atan2(rs2, rs1) / pi  | Zarctrigpi |
  48 FPOW      | x power of y          | rd = pow(rs1, rs2)         | ZftransAdv |
  49 FROOT     | x power 1/y           | rd = pow(rs1, 1/rs2)       | ZftransAdv |
  50 FHYPOT    | hypotenuse            | rd = sqrt(rs1^2 + rs2^2)       | Zftrans    |
  51 """]]
  52
  53 # List of 1-arg transcendental opcodes
  54
  55 [[!table  data="""
  56 opcode   | Description              | pseudo-code             | Extension |
  57 FRSQRT   | Reciprocal Square-root   | rd = sqrt(rs1)          | Zfrsqrt    |
  58 FCBRT    | Cube Root                | rd = pow(rs1, 3)        | Zftrans    |
  59 FEXP2    | power-of-2               | rd = pow(2, rs1)        | Zftrans    |
  60 FLOG2    | log2                     | rd = log2(rs1)          | Zftrans    |
  61 FEXPM1   | exponent minus 1         | rd = pow(e, rs1) - 1.0  | Zftrans    |
  62 FLOG1P   | log plus 1               | rd = log(e, 1 + rs1)    | Zftrans    |
  63 FEXP     | exponent                 | rd = pow(e, rs1)        | ZftransExt |
  64 FLOG     | natural log (base e)     | rd = log(e, rs1)        | ZftransExt |
  65 FEXP10   | power-of-10              | rd = pow(10, rs1)       | ZftransExt |
  66 FLOG10   | log base 10              | rd = log10(rs1)         | ZftransExt |
  67 """]]
  68
  69 # List of 1-arg trigonometric opcodes
  70
  71 [[!table  data="""
  72 opcode   | Description              | pseudo-code             | Extension |
  73 FSIN     | sin (radians)            | rd = sin(rs1)           | Ztrignpi    |
  74 FCOS     | cos (radians)            | rd = cos(rs1)           | Ztrignpi    |
  75 FTAN     | tan (radians)            | rd = tan(rs1)           | Ztrignpi    |
  76 FASIN    | arcsin (radians)         | rd = asin(rs1)          | Zarctrignpi |
  77 FACOS    | arccos (radians)         | rd = acos(rs1)          | Zarctrignpi |
  78 FSINPI   | sin times pi             | rd = sin(pi * rs1)      | Ztrigpi |
  79 FCOSPI   | cos times pi             | rd = cos(pi * rs1)      | Ztrigpi |
  80 FTANPI   | tan times pi             | rd = tan(pi * rs1)      | Ztrigpi |
  81 FASINPI  | arcsin times pi          | rd = asin(pi * rs1)     | Zarctrigpi |
  82 FACOSPI  | arccos times pi          | rd = acos(pi * rs1)     | Zarctrigpi |
  83 FATANPI  | arctan times pi          | rd = atan(pi * rs1)     | Zarctrigpi |
  84 FSINH    | hyperbolic sin (radians) | rd = sinh(rs1)          | Zfhyp |
  85 FCOSH    | hyperbolic cos (radians) | rd = cosh(rs1)          | Zfhyp |
  86 FTANH    | hyperbolic tan (radians) | rd = tanh(rs1)          | Zfhyp |
  87 FASINH   | inverse hyperbolic sin   | rd = asinh(rs1)         | Zfhyp |
  88 FACOSH   | inverse hyperbolic cos   | rd = acosh(rs1)         | Zfhyp |
  89 FATANH   | inverse hyperbolic tan   | rd = atanh(rs1)         | Zfhyp |
  90 """]]
  91
  92 # Synthesis, Pseudo-code ops and macro-ops
  93
  94 The pseudo-ops are best left up to the compiler rather than being actual
  95 pseudo-ops, by allocating one scalar FP register for use as a constant
  96 (loop invariant) set to "1.0" at the beginning of a function or other
  97 suitable code block.
  98
  99 * FRCP rd, rs1 - pseudo-code alias for rd = 1.0 / rs1
 100 * FATAN - pseudo-code alias for rd = atan2(rs1, 1.0) - FATAN2
 101 * FATANPI - pseudo alias for rd = atan2pi(rs1, 1.0) - FATAN2PI
 102 * FSINCOS - fused macro-op between FSIN and FCOS (issued in that order).
 103 * FSINCOSPI - fused macro-op between FSINPI and FCOSPI (issued in that order).
 104
 105 FATANPI example pseudo-code:
 106
 107     lui t0, 0x3F800 // upper bits of f32 1.0
 108     fmv.x.s ft0, t0
 109     fatan2pi.s rd, rs1, ft0
 110
 111 Hypotenuse example (obviates need for Zfhyp except for high-performance):
 112
 113     ASINH( x ) = ln( x + SQRT(x**2+1)
 114
 115 LOG / LOGP1 example:
 116
 117     LOG(x) = LOGP1(x) + 1.0
 118     EXP(x) = EXPM1(x-1.0)
 119
 120 # To evaluate: should LOG be replaced with LOG1P (and EXP with EXPM1)?
 121
 122 RISC principle says "exclude LOG because it's covered by LOGP1 plus an ADD".
 123 Research needed to ensure that implementors are not compromised by such
 124 a decision
 125 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002358.html>
 126
 127 # Dynamic accuracy CSR
 128
 129 maybe a solution would be to add an extra field to the fp control csr
 130 to allow selecting one of several accurate or fast modes:
 131
 132 - machine-learning-mode: fast as possible
 133   (maybe need additional requirements such as monotonicity for atanh?)
 134 - GPU-mode: accurate to within a few ULP
 135   (see Vulkan, OpenGL, and OpenCL specs for accuracy guidelines)
 136 - almost-accurate-mode: accurate to <1 ULP
 137      (would 0.51 or some other value be better?)
 138 - fully-accurate-mode: correctly rounded in all cases
 139 - maybe more modes?
 140
 141 Question: should better accuracy than is requested be permitted? Example:
 142 Ahmdahl-370 issues.
 143
 144 Comments:
 145
 146     Yes, embedded systems typically can do with 12, 16 or 32 bit
 147     accuracy. Rarely does it require 64 bits. But the idea of making
 148     a low power 32 bit FPU/DSP that can accommodate 64 bits is already
 149     being done in other designs such as PIC etc I believe. For embedded
 150     graphics 16 bit is more than adequate. In fact, Cornell had a very
 151     innovative 18-bit floating point format described here (useful for
 152     FPGA designs with 18-bit DSPs):
 153
 154     <https://people.ece.cornell.edu/land/courses/ece5760/FloatingPoint/index.html>
 155
 156     A very interesting GPU using the 18-bit FPU is also described here:
 157
 158     <https://people.ece.cornell.edu/land/courses/ece5760/FinalProjects/f2008/ap328_sjp45/website/hardwaredesign.html>
 159
 160     There are also 8 and 9-bit floating point formats that could be useful
 161
 162     <https://en.wikipedia.org/wiki/Minifloat>