add comments on accuracy
[libreriscv.git] / ztrans_proposal.mdwn
1 # Zftrans - transcendental operations
2
3 See:
4
5 * <http://bugs.libre-riscv.org/show_bug.cgi?id=127>
6 * <https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html>
7 * Discussion: <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002342.html>
8 * [[rv_major_opcode_1010011]] for opcode listing.
9
10 Extension subsets:
11
12 * **Zftrans**: standard transcendentals (best suited to 3D)
13 * **ZftransExt**: extra functions (useful, not generally needed for 3D,
14 can be synthesised using Ztrans)
15 * **Ztrigpi**: trig. xxx-pi sinpi cospi tanpi
16 * **Ztrignpi**: trig non-xxx-pi sin cos tan
17 * **Zarctrigpi**: arc-trig. a-xxx-pi: atan2pi asinpi acospi
18 * **Zarctrignpi**: arc-trig. non-a-xxx-pi: atan2, asin, acos
19 * **Zfhyp**: hyperbolic/inverse-hyperbolic. sinh, cosh, tanh, asinh,
20 acosh, atanh (can be synthesised - see below)
21 * **ZftransAdv**: much more complex to implement in hardware
22 * **Zfrsqrt**: Reciprocal square-root.
23
24 Minimum recommended requirements for 3D: Zftrans, Ztrigpi, Zarctrigpi,
25 Zarctrignpi
26
27 [[!toc levels=2]]
28
29 # TODO:
30
31 * Decision on accuracy
32 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002355.html>
33 * Errors **MUST** be repeatable.
34 * How about three Platform Specifications? 3D, UNIX and Embedded?
35 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002361.html>
36 Accuracy requirements for dual (triple) purpose implementations must
37 meet the higher standard.
38 * Reciprocal Square-root is in its own separate extension (Zfrsqrt) as
39 it is desirable on its own by other implementors. This to be evaluated.
40
41
42 # List of 2-arg opcodes
43
44 [[!table data="""
45 opcode | Description | pseudo-code | Extension |
46 FATAN2 | atan2 arc tangent | rd = atan2(rs2, rs1) | Zarctrignpi |
47 FATAN2PI | atan arc tangent / pi | rd = atan2(rs2, rs1) / pi | Zarctrigpi |
48 FPOW | x power of y | rd = pow(rs1, rs2) | ZftransAdv |
49 FROOT | x power 1/y | rd = pow(rs1, 1/rs2) | ZftransAdv |
50 FHYPOT | hypotenuse | rd = sqrt(rs1^2 + rs2^2) | Zftrans |
51 """]]
52
53 # List of 1-arg transcendental opcodes
54
55 [[!table data="""
56 opcode | Description | pseudo-code | Extension |
57 FRSQRT | Reciprocal Square-root | rd = sqrt(rs1) | Zfrsqrt |
58 FCBRT | Cube Root | rd = pow(rs1, 3) | Zftrans |
59 FEXP2 | power-of-2 | rd = pow(2, rs1) | Zftrans |
60 FLOG2 | log2 | rd = log2(rs1) | Zftrans |
61 FEXPM1 | exponent minus 1 | rd = pow(e, rs1) - 1.0 | Zftrans |
62 FLOG1P | log plus 1 | rd = log(e, 1 + rs1) | Zftrans |
63 FEXP | exponent | rd = pow(e, rs1) | ZftransExt |
64 FLOG | natural log (base e) | rd = log(e, rs1) | ZftransExt |
65 FEXP10 | power-of-10 | rd = pow(10, rs1) | ZftransExt |
66 FLOG10 | log base 10 | rd = log10(rs1) | ZftransExt |
67 """]]
68
69 # List of 1-arg trigonometric opcodes
70
71 [[!table data="""
72 opcode | Description | pseudo-code | Extension |
73 FSIN | sin (radians) | rd = sin(rs1) | Ztrignpi |
74 FCOS | cos (radians) | rd = cos(rs1) | Ztrignpi |
75 FTAN | tan (radians) | rd = tan(rs1) | Ztrignpi |
76 FASIN | arcsin (radians) | rd = asin(rs1) | Zarctrignpi |
77 FACOS | arccos (radians) | rd = acos(rs1) | Zarctrignpi |
78 FSINPI | sin times pi | rd = sin(pi * rs1) | Ztrigpi |
79 FCOSPI | cos times pi | rd = cos(pi * rs1) | Ztrigpi |
80 FTANPI | tan times pi | rd = tan(pi * rs1) | Ztrigpi |
81 FASINPI | arcsin times pi | rd = asin(pi * rs1) | Zarctrigpi |
82 FACOSPI | arccos times pi | rd = acos(pi * rs1) | Zarctrigpi |
83 FATANPI | arctan times pi | rd = atan(pi * rs1) | Zarctrigpi |
84 FSINH | hyperbolic sin (radians) | rd = sinh(rs1) | Zfhyp |
85 FCOSH | hyperbolic cos (radians) | rd = cosh(rs1) | Zfhyp |
86 FTANH | hyperbolic tan (radians) | rd = tanh(rs1) | Zfhyp |
87 FASINH | inverse hyperbolic sin | rd = asinh(rs1) | Zfhyp |
88 FACOSH | inverse hyperbolic cos | rd = acosh(rs1) | Zfhyp |
89 FATANH | inverse hyperbolic tan | rd = atanh(rs1) | Zfhyp |
90 """]]
91
92 # Synthesis, Pseudo-code ops and macro-ops
93
94 The pseudo-ops are best left up to the compiler rather than being actual
95 pseudo-ops, by allocating one scalar FP register for use as a constant
96 (loop invariant) set to "1.0" at the beginning of a function or other
97 suitable code block.
98
99 * FRCP rd, rs1 - pseudo-code alias for rd = 1.0 / rs1
100 * FATAN - pseudo-code alias for rd = atan2(rs1, 1.0) - FATAN2
101 * FATANPI - pseudo alias for rd = atan2pi(rs1, 1.0) - FATAN2PI
102 * FSINCOS - fused macro-op between FSIN and FCOS (issued in that order).
103 * FSINCOSPI - fused macro-op between FSINPI and FCOSPI (issued in that order).
104
105 FATANPI example pseudo-code:
106
107 lui t0, 0x3F800 // upper bits of f32 1.0
108 fmv.x.s ft0, t0
109 fatan2pi.s rd, rs1, ft0
110
111 Hypotenuse example (obviates need for Zfhyp except for high-performance):
112
113 ASINH( x ) = ln( x + SQRT(x**2+1)
114
115 LOG / LOGP1 example:
116
117 LOG(x) = LOGP1(x) + 1.0
118 EXP(x) = EXPM1(x-1.0)
119
120 # To evaluate: should LOG be replaced with LOG1P (and EXP with EXPM1)?
121
122 RISC principle says "exclude LOG because it's covered by LOGP1 plus an ADD".
123 Research needed to ensure that implementors are not compromised by such
124 a decision
125 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002358.html>
126
127 # Dynamic accuracy CSR
128
129 maybe a solution would be to add an extra field to the fp control csr
130 to allow selecting one of several accurate or fast modes:
131
132 - machine-learning-mode: fast as possible
133 (maybe need additional requirements such as monotonicity for atanh?)
134 - GPU-mode: accurate to within a few ULP
135 (see Vulkan, OpenGL, and OpenCL specs for accuracy guidelines)
136 - almost-accurate-mode: accurate to <1 ULP
137 (would 0.51 or some other value be better?)
138 - fully-accurate-mode: correctly rounded in all cases
139 - maybe more modes?
140
141 Question: should better accuracy than is requested be permitted? Example:
142 Ahmdahl-370 issues.
143
144 Comments:
145
146 Yes, embedded systems typically can do with 12, 16 or 32 bit
147 accuracy. Rarely does it require 64 bits. But the idea of making
148 a low power 32 bit FPU/DSP that can accommodate 64 bits is already
149 being done in other designs such as PIC etc I believe. For embedded
150 graphics 16 bit is more than adequate. In fact, Cornell had a very
151 innovative 18-bit floating point format described here (useful for
152 FPGA designs with 18-bit DSPs):
153
154 <https://people.ece.cornell.edu/land/courses/ece5760/FloatingPoint/index.html>
155
156 A very interesting GPU using the 18-bit FPU is also described here:
157
158 <https://people.ece.cornell.edu/land/courses/ece5760/FinalProjects/f2008/ap328_sjp45/website/hardwaredesign.html>
159
160 There are also 8 and 9-bit floating point formats that could be useful
161
162 <https://en.wikipedia.org/wiki/Minifloat>