(no commit message)
[libreriscv.git] / simple_v_extension / vector_ops.mdwn
1 [[!tag standards]]
2
3 # Vector Operations Extension to SV
4
5 This extension is usually dependent on SV SUBVL being implemented. When SUBVL is set to define the length of a subvector the operations in this extension interpret the elements as a single vector.
6
7 Normally in SV all operations are scalar and independent, and the operations on them may inherently be independently parallelised, with the result being a vector of length exactly equal to the input vectors.
8
9 In this extension, the subvector itself is typically the unit, although some operations will work on scalars or standard vectors as well, or the result is a scalar that is dependent on all elements within the vector arguments.
10
11 However given that some of the parameters are vectors (with and without SUBVL set), and some are scalars (where SUBVL will not apply), some clear rules need to be defined as to how the operations work.
12
13 Examples which can require SUBVL include cross product and may in future involve complex numbers.
14
15 ## CORDIC
16
17 * SUBVL=2, vd, vs; SUBVL ignored on beta.
18 * VL > 1 ok. beta as scalar ok (applies across all vectors)
19 * non vector args vd, vs, or SUBVL!=2 reserved.
20
21 * CORDIC.lin.rot vd, vs, beta
22 * CORDIC.cir.rot vd, vs, beta
23 * CORDIC.hyp.rot vd, vs, beta
24 * CORDIC.lin.vec vd, vs, beta
25 * CORDIC.cir.vec vd, vs, beta
26 * CORDIC.hyp.vec vd, vs, beta
27
28 CORDIC is an extremely general-purpose algorithm useful for a huge number
29 of diverse purposes. In its full form it does however require quite a
30 few parameters, one of which is a vector, making it awkward to include in
31 a standard "scalar" ISA. Additionally the coordinates can be set to circular,
32 linear or hyperbolic, producing three different modes, and the algorithm
33 may also be run in either "vector" mode or "rotation" mode. See [[discussion]]
34
35 CORDIC can also be used for performing DCT. See
36 <https://arxiv.org/abs/1606.02424>
37
38 vx, vy = CORDIC(vx, vy, coordinate\_mode, beta)
39
40 int i = 0;
41 int iterations = 0; // Number of times to run the algorithm
42 float arctanTable[iterations]; // in Radians
43 float K = 0.6073; // K
44 float v_x,v_y; // Vector v; x and y components
45
46 for(i=0; i < iterations; i++) {
47 arctanTable[i] = atan(pow(2,-i));
48 }
49
50 float vnew_x; // To store the new value of x;
51 for(i = 0; i < iterations; i++) {
52 // If beta is negative, we need to do a counter-clockwise rotation:
53 if( beta < 0) {
54 vnew_x = v_x + (v_y*pow(2,-i));
55 v_y -= (v_x*pow(2,-i));
56 beta += arctanTable[i];
57 }
58 // If beta is positive, we need to do a clockwise rotation:
59 else {
60 vnew_x = v_x - (v_y*pow(2,-i));
61 v_y += (v_x*pow(2,-i));
62 beta -= arctanTable[i];
63 }
64 v_x = vnew_x;
65 }
66 v_x *= K;
67 v_y *= K;
68
69 Links:
70
71 * <http://www.myhdl.org/docs/examples/sinecomp/>
72
73 ## Vector cross product
74
75 SUBVL=2,3,4 all regs
76
77 * VCROSS vd, vs1, vs1
78
79 Result is the cross product of x and y, i.e., the resulting components are, in order:
80
81 x[1] * y[2] - y[1] * x[2]
82 x[2] * y[0] - y[2] * x[0]
83 x[0] * y[1] - y[0] * x[1]
84
85 All the operands must be vectors of 3 components of a floating-point type.
86
87 Pseudocode:
88
89 vec3 a, b; // elements in order a.x, a.y, a.z
90 // compute a cross b:
91 vec3 t1 = a.yzx; // produce vector [a.y, a.z, a.x]
92 vec3 t2 = b.zxy;
93 vec3 t3 = a.zxy;
94 vec3 t4 = b.yzx;
95 vec3 p = t3 * t4;
96 vec3 cross = t1 * t2 - p;
97
98 ## Vector dot product
99
100 * SUBVL ignored on rd. SUBVL=2,3,4 vs1,vs2
101 * rd=scalar, SUBVL=1, vs1, vs2=vec
102
103 * VDOT rd, vs1, vs2
104
105 Computes the dot product of two vectors. Internal accuracy must be
106 greater than the input vectors and the result.
107
108 Pseudocode in python:
109
110 from operator import mul
111 sum(map(mul, A, B))
112
113 Pseudocode in c:
114
115 double dot_product(float v[], float u[], int n)
116 {
117 double result = 0.0;
118 for (int i = 0; i < n; i++)
119 result += v[i] * u[i];
120 return result;
121 }
122
123 ## Vector length
124
125 * rd=scalar, vs1=vec (SUBVL=1)
126 * rd=scalar, vs1=vec (SUBVL=2,3,4) only 1 (predication rules apply)
127 * rd=vec, SUBVL ignored; vs1=vec, SUBVL=2,3,4
128 * rd=vec, SUBVL ignored; vs1=vec, SUBVL=1: reserved encoding.
129
130 * VLEN rd, vs1
131
132 The scalar length of a vector:
133
134 sqrt(x[0]^2 + x[1]^2 + ...).
135
136 ## Vector distance
137
138 * VDIST rd, vs1, vs2
139
140 The scalar distance between two vectors. Subtracts one vector from the
141 other and returns length:
142
143 length(v0 - v1)
144
145 ## Vector LERP
146
147 * VLERP rd, vs1, rs2 # SUBVL=2: vs1.v0 vs1.v1
148
149 Known as **fmix** in GLSL.
150
151 <https://en.m.wikipedia.org/wiki/Linear_interpolation>
152
153 Pseudocode:
154
155 // Imprecise method, which does not guarantee v = v1 when t = 1,
156 // due to floating-point arithmetic error.
157 // This form may be used when the hardware has a native fused
158 // multiply-add instruction.
159 float lerp(float v0, float v1, float t) {
160 return v0 + t * (v1 - v0);
161 }
162
163 // Precise method, which guarantees v = v1 when t = 1.
164 float lerp(float v0, float v1, float t) {
165 return (1 - t) * v0 + t * v1;
166 }
167
168 ## Vector SLERP
169
170 * VSLERP vd, vs1, vs2, rs3
171
172 Not recommended as it is not commonly used and has several trigonometric
173 functions, although CORDIC in vector rotate circular mode is designed for this purpose. Also a costly 4 arg operation.
174
175 <https://en.m.wikipedia.org/wiki/Slerp>
176
177 Pseudocode:
178
179 Quaternion slerp(Quaternion v0, Quaternion v1, double t) {
180 // Only unit quaternions are valid rotations.
181 // Normalize to avoid undefined behavior.
182 v0.normalize();
183 v1.normalize();
184
185 // Compute the cosine of the angle between the two vectors.
186 double dot = dot_product(v0, v1);
187
188 // If the dot product is negative, slerp won't take
189 // the shorter path. Note that v1 and -v1 are equivalent when
190 // the negation is applied to all four components. Fix by
191 // reversing one quaternion.
192 if (dot < 0.0f) {
193 v1 = -v1;
194 dot = -dot;
195 }
196
197 const double DOT_THRESHOLD = 0.9995;
198 if (dot > DOT_THRESHOLD) {
199 // If the inputs are too close for comfort, linearly interpolate
200 // and normalize the result.
201
202 Quaternion result = v0 + t*(v1 - v0);
203 result.normalize();
204 return result;
205 }
206
207 // Since dot is in range [0, DOT_THRESHOLD], acos is safe
208 double theta_0 = acos(dot); // theta_0 = angle between input vectors
209 double theta = theta_0*t; // theta = angle between v0 and result
210 double sin_theta = sin(theta); // compute this value only once
211 double sin_theta_0 = sin(theta_0); // compute this value only once
212
213 double s0 = cos(theta) - dot * sin_theta / sin_theta_0; // == sin(theta_0 - theta) / sin(theta_0)
214 double s1 = sin_theta / sin_theta_0;
215
216 return (s0 * v0) + (s1 * v1);
217 }
218
219 However this algorithm does not involve transcendentals except in
220 the computation of the tables: <https://en.wikipedia.org/wiki/CORDIC#Rotation_mode>
221
222 function v = cordic(beta,n)
223 % This function computes v = [cos(beta), sin(beta)] (beta in radians)
224 % using n iterations. Increasing n will increase the precision.
225
226 if beta < -pi/2 || beta > pi/2
227 if beta < 0
228 v = cordic(beta + pi, n);
229 else
230 v = cordic(beta - pi, n);
231 end
232 v = -v; % flip the sign for second or third quadrant
233 return
234 end
235
236 % Initialization of tables of constants used by CORDIC
237 % need a table of arctangents of negative powers of two, in radians:
238 % angles = atan(2.^-(0:27));
239 angles = [ ...
240 0.78539816339745 0.46364760900081
241 0.24497866312686 0.12435499454676 ...
242 0.06241880999596 0.03123983343027
243 0.01562372862048 0.00781234106010 ...
244 0.00390623013197 0.00195312251648
245 0.00097656218956 0.00048828121119 ...
246 0.00024414062015 0.00012207031189
247 0.00006103515617 0.00003051757812 ...
248 0.00001525878906 0.00000762939453
249 0.00000381469727 0.00000190734863 ...
250 0.00000095367432 0.00000047683716
251 0.00000023841858 0.00000011920929 ...
252 0.00000005960464 0.00000002980232
253 0.00000001490116 0.00000000745058 ];
254 % and a table of products of reciprocal lengths of vectors [1, 2^-2j]:
255 % Kvalues = cumprod(1./abs(1 + 1j*2.^(-(0:23))))
256 Kvalues = [ ...
257 0.70710678118655 0.63245553203368
258 0.61357199107790 0.60883391251775 ...
259 0.60764825625617 0.60735177014130
260 0.60727764409353 0.60725911229889 ...
261 0.60725447933256 0.60725332108988
262 0.60725303152913 0.60725295913894 ...
263 0.60725294104140 0.60725293651701
264 0.60725293538591 0.60725293510314 ...
265 0.60725293503245 0.60725293501477
266 0.60725293501035 0.60725293500925 ...
267 0.60725293500897 0.60725293500890
268 0.60725293500889 0.60725293500888 ];
269 Kn = Kvalues(min(n, length(Kvalues)));
270
271 % Initialize loop variables:
272 v = [1;0]; % start with 2-vector cosine and sine of zero
273 poweroftwo = 1;
274 angle = angles(1);
275
276 % Iterations
277 for j = 0:n-1;
278 if beta < 0
279 sigma = -1;
280 else
281 sigma = 1;
282 end
283 factor = sigma * poweroftwo;
284 % Note the matrix multiplication can be done using scaling by
285 % powers of two and addition subtraction
286 R = [1, -factor; factor, 1];
287 v = R * v; % 2-by-2 matrix multiply
288 beta = beta - sigma * angle; % update the remaining angle
289 poweroftwo = poweroftwo / 2;
290 % update the angle from table, or eventually by just dividing by two
291 if j+2 > length(angles)
292 angle = angle / 2;
293 else
294 angle = angles(j+2);
295 end
296 end
297
298 % Adjust length of output vector to be [cos(beta), sin(beta)]:
299 v = v * Kn;
300 return
301
302 endfunction
303
304 2x2 matrix multiply can be done with shifts and adds:
305
306 x = v[0] - sigma * (v[1] * 2^(-j));
307 y = sigma * (v[0] * 2^(-j)) + v[1];
308 v = [x; y];
309
310 The technique is outlined in a paper as being applicable to 3D:
311 <https://www.atlantis-press.com/proceedings/jcis2006/232>
312
313 # Expensive 3-operand OP32 operations
314
315 3-operand operations are extremely expensive in terms of OP32 encoding space. A potential idea is to embed 3 RVC register formats across two out of three 5-bit fields rs1/rs2/rd
316
317 Another is to overwrite one of the src registers.
318
319 # Opcode Table
320
321 TODO
322
323 # Links
324
325 * <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-September/002736.html>
326 * <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-September/002733.html>
327 * <http://bugs.libre-riscv.org/show_bug.cgi?id=142>