(no commit message)
[libreriscv.git] / ztrans_proposal.mdwn
1 # Zftrans - transcendental operations
2
3 See:
4
5 * <http://bugs.libre-riscv.org/show_bug.cgi?id=127>
6 * <https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html>
7 * Discussion: <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002342.html>
8 * [[rv_major_opcode_1010011]] for opcode listing.
9 * [[zfpacc_proposal]] for accuracy settings proposal
10
11 Extension subsets:
12
13 * **Zftrans**: standard transcendentals (best suited to 3D)
14 * **ZftransExt**: extra functions (useful, not generally needed for 3D,
15 can be synthesised using Ztrans)
16 * **Ztrigpi**: trig. xxx-pi sinpi cospi tanpi
17 * **Ztrignpi**: trig non-xxx-pi sin cos tan
18 * **Zarctrigpi**: arc-trig. a-xxx-pi: atan2pi asinpi acospi
19 * **Zarctrignpi**: arc-trig. non-a-xxx-pi: atan2, asin, acos
20 * **Zfhyp**: hyperbolic/inverse-hyperbolic. sinh, cosh, tanh, asinh,
21 acosh, atanh (can be synthesised - see below)
22 * **ZftransAdv**: much more complex to implement in hardware
23 * **Zfrsqrt**: Reciprocal square-root.
24
25 Minimum recommended requirements for 3D: Zftrans, Ztrigpi, Zarctrigpi,
26 Zarctrignpi
27
28 [[!toc levels=2]]
29
30 # TODO:
31
32 * Decision on accuracy, moved to [[zfpacc_proposal]]
33 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002355.html>
34 * Errors **MUST** be repeatable.
35 * How about four Platform Specifications? 3DUNIX, UNIX, 3DEmbedded and Embedded?
36 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002361.html>
37 Accuracy requirements for dual (triple) purpose implementations must
38 meet the higher standard.
39 * Reciprocal Square-root is in its own separate extension (Zfrsqrt) as
40 it is desirable on its own by other implementors. This to be evaluated.
41
42 # Proposed Opcodes vs Khronos OpenCL Opcodes <a name="khronos_equiv"></a>
43
44 This list shows the (direct) equivalence between proposed opcodes and
45 their Khronos OpenCL equivalents.
46
47 See
48 <https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html>
49
50 Special FP16 opcodes are *not* being proposed, except by indirect / inherent
51 use of the "fmt" field that is already present in the RISC-V Specification.
52
53 "Native" opcodes are *not* being proposed: implementors will be expected
54 to use the (equivalent) proposed opcode covering the same function.
55
56 "Fast" opcodes are *not* being proposed, because the Khronos Specification
57 fast\_length, fast\_normalise and fast\_distance OpenCL opcodes require
58 vectors (or can be done as scalar operations using other RISC-V instructions).
59
60 The OpenCL FP32 opcodes are **direct** equivalents to the proposed opcodes.
61 Deviation from conformance with the Khronos Specification - including the
62 Khronos Specification accuracy requirements - is not an option.
63
64 [[!table data="""
65 Proposed opcode | OpenCL FP32 | OpenCL FP16 | OpenCL native | OpenCL fast |
66 FSIN | sin | half\_sin | native\_sin | NONE |
67 FCOS | cos | half\_cos | native\_cos | NONE |
68 FTAN | tan | half\_tan | native\_tan | NONE |
69 FASIN | asin | NONE | NONE | NONE |
70 FACOS | acos | NONE | NONE | NONE |
71 NONE (3) | atan | NONE | NONE | NONE |
72 FSINPI | sinpi | NONE | NONE | NONE |
73 FCOSPI | cospi | NONE | NONE | NONE |
74 FTANPI | tanpi | NONE | NONE | NONE |
75 FASINPI | asinpi | NONE | NONE | NONE |
76 FACOSPI | acospi | NONE | NONE | NONE |
77 NONE (2) | atanpi | NONE | NONE | NONE |
78 FSINH | sinh | NONE | NONE | NONE |
79 FCOSH | cosh | NONE | NONE | NONE |
80 FTANH | tanh | NONE | NONE | NONE |
81 FASINH | asinh | NONE | NONE | NONE |
82 FACOSH | acosh | NONE | NONE | NONE |
83 FATANH | atanh | NONE | NONE | NONE |
84 FRSQRT | rsqrt | half\_rsqrt | native\_rsqrt | NONE |
85 FCBRT | cbrt | NONE | NONE | NONE |
86 FEXP2 | exp2 | half\_exp2 | native\_exp2 | NONE |
87 FLOG2 | log2 | half\_log2 | native\_log2 | NONE |
88 FEXPM1 | expm1 | NONE | NONE | NONE |
89 FLOG1P | log1p | NONE | NONE | NONE |
90 FEXP | exp | half\_exp | native\_exp | NONE |
91 FLOG | log | half\_log | native\_log | NONE |
92 FEXP10 | exp10 | half\_exp10 | native\_exp10 | NONE |
93 FLOG10 | log10 | half\_log10 | native\_log10 | NONE |
94 FATAN2 | atan2 | NONE | NONE | NONE |
95 FATAN2PI | atan2pi | NONE | NONE | NONE |
96 FPOW | pow | NONE | NONE | NONE |
97 FROOT | rootn | NONE | NONE | NONE |
98 FHYPOT | hypot | NONE | NONE | NONE |
99 """]]
100
101 Note (2) FATANPI is a synthesised alias, below.
102
103 Note (3) FATAN2 is a sythesised alias, below.
104
105 # List of 2-arg opcodes
106
107 [[!table data="""
108 opcode | Description | pseudo-code | Extension |
109 FATAN2 | atan2 arc tangent | rd = atan2(rs2, rs1) | Zarctrignpi |
110 FATAN2PI | atan2 arc tangent / pi | rd = atan2(rs2, rs1) / pi | Zarctrigpi |
111 FPOW | x power of y | rd = pow(rs1, rs2) | ZftransAdv |
112 FROOT | x power 1/y | rd = pow(rs1, 1/rs2) | ZftransAdv |
113 FHYPOT | hypotenuse | rd = sqrt(rs1^2 + rs2^2) | Zftrans |
114 """]]
115
116 # List of 1-arg transcendental opcodes
117
118 [[!table data="""
119 opcode | Description | pseudo-code | Extension |
120 FRSQRT | Reciprocal Square-root | rd = sqrt(rs1) | Zfrsqrt |
121 FCBRT | Cube Root | rd = pow(rs1, 3) | Zftrans |
122 FEXP2 | power-of-2 | rd = pow(2, rs1) | Zftrans |
123 FLOG2 | log2 | rd = log2(rs1) | Zftrans |
124 FEXPM1 | exponential minus 1 | rd = pow(e, rs1) - 1.0 | Zftrans |
125 FLOG1P | log plus 1 | rd = log(e, 1 + rs1) | Zftrans |
126 FEXP | exponential | rd = pow(e, rs1) | ZftransExt |
127 FLOG | natural log (base e) | rd = log(e, rs1) | ZftransExt |
128 FEXP10 | power-of-10 | rd = pow(10, rs1) | ZftransExt |
129 FLOG10 | log base 10 | rd = log10(rs1) | ZftransExt |
130 """]]
131
132 # List of 1-arg trigonometric opcodes
133
134 [[!table data="""
135 opcode | Description | pseudo-code | Extension |
136 FSIN | sin (radians) | rd = sin(rs1) | Ztrignpi |
137 FCOS | cos (radians) | rd = cos(rs1) | Ztrignpi |
138 FTAN | tan (radians) | rd = tan(rs1) | Ztrignpi |
139 FASIN | arcsin (radians) | rd = asin(rs1) | Zarctrignpi |
140 FACOS | arccos (radians) | rd = acos(rs1) | Zarctrignpi |
141 FSINPI | sin times pi | rd = sin(pi * rs1) | Ztrigpi |
142 FCOSPI | cos times pi | rd = cos(pi * rs1) | Ztrigpi |
143 FTANPI | tan times pi | rd = tan(pi * rs1) | Ztrigpi |
144 FASINPI | arcsin / pi | rd = asin(rs1) / pi | Zarctrigpi |
145 FACOSPI | arccos / pi | rd = acos(rs1) / pi | Zarctrigpi |
146 FATANPI | arctan / pi | rd = atan(rs1) / pi | Zarctrigpi |
147 FSINH | hyperbolic sin (radians) | rd = sinh(rs1) | Zfhyp |
148 FCOSH | hyperbolic cos (radians) | rd = cosh(rs1) | Zfhyp |
149 FTANH | hyperbolic tan (radians) | rd = tanh(rs1) | Zfhyp |
150 FASINH | inverse hyperbolic sin | rd = asinh(rs1) | Zfhyp |
151 FACOSH | inverse hyperbolic cos | rd = acosh(rs1) | Zfhyp |
152 FATANH | inverse hyperbolic tan | rd = atanh(rs1) | Zfhyp |
153 """]]
154
155 # Synthesis, Pseudo-code ops and macro-ops
156
157 The pseudo-ops are best left up to the compiler rather than being actual
158 pseudo-ops, by allocating one scalar FP register for use as a constant
159 (loop invariant) set to "1.0" at the beginning of a function or other
160 suitable code block.
161
162 * FRCP rd, rs1 - pseudo-code alias for rd = 1.0 / rs1
163 * FATAN - pseudo-code alias for rd = atan2(rs1, 1.0) - FATAN2
164 * FATANPI - pseudo alias for rd = atan2pi(rs1, 1.0) - FATAN2PI
165 * FSINCOS - fused macro-op between FSIN and FCOS (issued in that order).
166 * FSINCOSPI - fused macro-op between FSINPI and FCOSPI (issued in that order).
167
168 FATANPI example pseudo-code:
169
170 lui t0, 0x3F800 // upper bits of f32 1.0
171 fmv.x.s ft0, t0
172 fatan2pi.s rd, rs1, ft0
173
174 Hyperbolic function example (obviates need for Zfhyp except for high-performance or correctly-rounding):
175
176 ASINH( x ) = ln( x + SQRT(x**2+1))
177
178 LOG / LOGP1 example:
179
180 LOG(x) = LOGP1(x - 1.0)
181 EXP(x) = EXPM1(x) + 1.0
182
183 # To evaluate: should LOG be replaced with LOG1P (and EXP with EXPM1)?
184
185 RISC principle says "exclude LOG because it's covered by LOGP1 plus an ADD".
186 Research needed to ensure that implementors are not compromised by such
187 a decision
188 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002358.html>
189
190 correctly-rounded LOG will return different results than LOGP1 and ADD.
191 Likewise for EXP and EXPM1
192