(no commit message)
[libreriscv.git] / simple_v_extension / vector_ops.mdwn
1 [[!tag standards]]
2
3 # Vector Operations Extension to SV
4
5 This extension is usually dependent on SV SUBVL being implemented. When SUBVL is set to define the length of a subvector the operations in this extension interpret the elements as a single vector.
6
7 Normally in SV all operations are scalar and independent, and the operations on them may inherently be independently parallelised, with the result being a vector of length exactly equal to the input vectors.
8
9 In this extension, the subvector itself is typically the unit, although some operations will work on scalars or standard vectors as well, or the result is a scalar that is dependent on all elements within the vector arguments.
10
11 However given that some of the parameters are vectors (with and without SUBVL set), and some are scalars (where SUBVL will not apply), some clear rules need to be defined as to how the operations work.
12
13 Examples which can require SUBVL include cross product and may in future involve complex numbers.
14
15 ## CORDIC
16
17 * SUBVL=2, vd, vs; SUBVL ignored on beta.
18
19 * CORDIC.lin.rot vd, vs, beta
20 * CORDIC.cir.rot vd, vs, beta
21 * CORDIC.hyp.rot vd, vs, beta
22 * CORDIC.lin.vec vd, vs, beta
23 * CORDIC.cir.vec vd, vs, beta
24 * CORDIC.hyp.vec vd, vs, beta
25
26 CORDIC is an extremely general-purpose algorithm useful for a huge number
27 of diverse purposes. In its full form it does however require quite a
28 few parameters, one of which is a vector, making it awkward to include in
29 a standard "scalar" ISA. Additionally the coordinates can be set to circular,
30 linear or hyperbolic, producing three different modes, and the algorithm
31 may also be run in either "vector" mode or "rotation" mode. See [[discussion]]
32
33 CORDIC can also be used for performing DCT. See
34 <https://arxiv.org/abs/1606.02424>
35
36 vx, vy = CORDIC(vx, vy, coordinate\_mode, beta)
37
38 int i = 0;
39 int iterations = 0; // Number of times to run the algorithm
40 float arctanTable[iterations]; // in Radians
41 float K = 0.6073; // K
42 float v_x,v_y; // Vector v; x and y components
43
44 for(i=0; i < iterations; i++) {
45 arctanTable[i] = atan(pow(2,-i));
46 }
47
48 float vnew_x; // To store the new value of x;
49 for(i = 0; i < iterations; i++) {
50 // If beta is negative, we need to do a counter-clockwise rotation:
51 if( beta < 0) {
52 vnew_x = v_x + (v_y*pow(2,-i));
53 v_y -= (v_x*pow(2,-i));
54 beta += arctanTable[i];
55 }
56 // If beta is positive, we need to do a clockwise rotation:
57 else {
58 vnew_x = v_x - (v_y*pow(2,-i));
59 v_y += (v_x*pow(2,-i));
60 beta -= arctanTable[i];
61 }
62 v_x = vnew_x;
63 }
64 v_x *= K;
65 v_y *= K;
66
67 Links:
68
69 * <http://www.myhdl.org/docs/examples/sinecomp/>
70
71 ## Vector cross product
72
73 SUBVL=2,3,4 all regs
74
75 * VCROSS vd, vs1, vs1
76
77 Result is the cross product of x and y, i.e., the resulting components are, in order:
78
79 x[1] * y[2] - y[1] * x[2]
80 x[2] * y[0] - y[2] * x[0]
81 x[0] * y[1] - y[0] * x[1]
82
83 All the operands must be vectors of 3 components of a floating-point type.
84
85 Pseudocode:
86
87 vec3 a, b; // elements in order a.x, a.y, a.z
88 // compute a cross b:
89 vec3 t1 = a.yzx; // produce vector [a.y, a.z, a.x]
90 vec3 t2 = b.zxy;
91 vec3 t3 = a.zxy;
92 vec3 t4 = b.yzx;
93 vec3 p = t3 * t4;
94 vec3 cross = t1 * t2 - p;
95
96 ## Vector dot product
97
98 * SUBVL ignored on rd. SUBVL=2,3,4 vs1,vs2
99 * rd=scalar, SUBVL=1, vs1, vs2=vec
100
101 * VDOT rd, vs1, vs2
102
103 Computes the dot product of two vectors. Internal accuracy must be
104 greater than the input vectors and the result.
105
106 Pseudocode in python:
107
108 from operator import mul
109 sum(map(mul, A, B))
110
111 Pseudocode in c:
112
113 double dot_product(float v[], float u[], int n)
114 {
115 double result = 0.0;
116 for (int i = 0; i < n; i++)
117 result += v[i] * u[i];
118 return result;
119 }
120
121 ## Vector length
122
123 * rd=scalar, vs1=vec (SUBVL=1)
124 * rd=scalar, vs1=vec (SUBVL=2,3,4) only 1 (predication rules apply)
125 * rd=vec, SUBVL ignored; vs1=vec, SUBVL=2,3,4
126 * rd=vec, SUBVL ignored; vs1=vec, SUBVL=1: reserved encoding.
127
128 * VLEN rd, vs1
129
130 The scalar length of a vector:
131
132 sqrt(x[0]^2 + x[1]^2 + ...).
133
134 ## Vector distance
135
136 * VDIST rd, vs1, vs2
137
138 The scalar distance between two vectors. Subtracts one vector from the
139 other and returns length:
140
141 length(v0 - v1)
142
143 ## Vector LERP
144
145 * VLERP rd, vs1, rs2 # SUBVL=2: vs1.v0 vs1.v1
146
147 Known as **fmix** in GLSL.
148
149 <https://en.m.wikipedia.org/wiki/Linear_interpolation>
150
151 Pseudocode:
152
153 // Imprecise method, which does not guarantee v = v1 when t = 1,
154 // due to floating-point arithmetic error.
155 // This form may be used when the hardware has a native fused
156 // multiply-add instruction.
157 float lerp(float v0, float v1, float t) {
158 return v0 + t * (v1 - v0);
159 }
160
161 // Precise method, which guarantees v = v1 when t = 1.
162 float lerp(float v0, float v1, float t) {
163 return (1 - t) * v0 + t * v1;
164 }
165
166 ## Vector SLERP
167
168 * VSLERP vd, vs1, vs2, rs3
169
170 Not recommended as it is not commonly used and has several trigonometric
171 functions, although CORDIC in vector rotate circular mode is designed for this purpose. Also a costly 4 arg operation.
172
173 <https://en.m.wikipedia.org/wiki/Slerp>
174
175 Pseudocode:
176
177 Quaternion slerp(Quaternion v0, Quaternion v1, double t) {
178 // Only unit quaternions are valid rotations.
179 // Normalize to avoid undefined behavior.
180 v0.normalize();
181 v1.normalize();
182
183 // Compute the cosine of the angle between the two vectors.
184 double dot = dot_product(v0, v1);
185
186 // If the dot product is negative, slerp won't take
187 // the shorter path. Note that v1 and -v1 are equivalent when
188 // the negation is applied to all four components. Fix by
189 // reversing one quaternion.
190 if (dot < 0.0f) {
191 v1 = -v1;
192 dot = -dot;
193 }
194
195 const double DOT_THRESHOLD = 0.9995;
196 if (dot > DOT_THRESHOLD) {
197 // If the inputs are too close for comfort, linearly interpolate
198 // and normalize the result.
199
200 Quaternion result = v0 + t*(v1 - v0);
201 result.normalize();
202 return result;
203 }
204
205 // Since dot is in range [0, DOT_THRESHOLD], acos is safe
206 double theta_0 = acos(dot); // theta_0 = angle between input vectors
207 double theta = theta_0*t; // theta = angle between v0 and result
208 double sin_theta = sin(theta); // compute this value only once
209 double sin_theta_0 = sin(theta_0); // compute this value only once
210
211 double s0 = cos(theta) - dot * sin_theta / sin_theta_0; // == sin(theta_0 - theta) / sin(theta_0)
212 double s1 = sin_theta / sin_theta_0;
213
214 return (s0 * v0) + (s1 * v1);
215 }
216
217 However this algorithm does not involve transcendentals except in
218 the computation of the tables: <https://en.wikipedia.org/wiki/CORDIC#Rotation_mode>
219
220 function v = cordic(beta,n)
221 % This function computes v = [cos(beta), sin(beta)] (beta in radians)
222 % using n iterations. Increasing n will increase the precision.
223
224 if beta < -pi/2 || beta > pi/2
225 if beta < 0
226 v = cordic(beta + pi, n);
227 else
228 v = cordic(beta - pi, n);
229 end
230 v = -v; % flip the sign for second or third quadrant
231 return
232 end
233
234 % Initialization of tables of constants used by CORDIC
235 % need a table of arctangents of negative powers of two, in radians:
236 % angles = atan(2.^-(0:27));
237 angles = [ ...
238 0.78539816339745 0.46364760900081
239 0.24497866312686 0.12435499454676 ...
240 0.06241880999596 0.03123983343027
241 0.01562372862048 0.00781234106010 ...
242 0.00390623013197 0.00195312251648
243 0.00097656218956 0.00048828121119 ...
244 0.00024414062015 0.00012207031189
245 0.00006103515617 0.00003051757812 ...
246 0.00001525878906 0.00000762939453
247 0.00000381469727 0.00000190734863 ...
248 0.00000095367432 0.00000047683716
249 0.00000023841858 0.00000011920929 ...
250 0.00000005960464 0.00000002980232
251 0.00000001490116 0.00000000745058 ];
252 % and a table of products of reciprocal lengths of vectors [1, 2^-2j]:
253 % Kvalues = cumprod(1./abs(1 + 1j*2.^(-(0:23))))
254 Kvalues = [ ...
255 0.70710678118655 0.63245553203368
256 0.61357199107790 0.60883391251775 ...
257 0.60764825625617 0.60735177014130
258 0.60727764409353 0.60725911229889 ...
259 0.60725447933256 0.60725332108988
260 0.60725303152913 0.60725295913894 ...
261 0.60725294104140 0.60725293651701
262 0.60725293538591 0.60725293510314 ...
263 0.60725293503245 0.60725293501477
264 0.60725293501035 0.60725293500925 ...
265 0.60725293500897 0.60725293500890
266 0.60725293500889 0.60725293500888 ];
267 Kn = Kvalues(min(n, length(Kvalues)));
268
269 % Initialize loop variables:
270 v = [1;0]; % start with 2-vector cosine and sine of zero
271 poweroftwo = 1;
272 angle = angles(1);
273
274 % Iterations
275 for j = 0:n-1;
276 if beta < 0
277 sigma = -1;
278 else
279 sigma = 1;
280 end
281 factor = sigma * poweroftwo;
282 % Note the matrix multiplication can be done using scaling by
283 % powers of two and addition subtraction
284 R = [1, -factor; factor, 1];
285 v = R * v; % 2-by-2 matrix multiply
286 beta = beta - sigma * angle; % update the remaining angle
287 poweroftwo = poweroftwo / 2;
288 % update the angle from table, or eventually by just dividing by two
289 if j+2 > length(angles)
290 angle = angle / 2;
291 else
292 angle = angles(j+2);
293 end
294 end
295
296 % Adjust length of output vector to be [cos(beta), sin(beta)]:
297 v = v * Kn;
298 return
299
300 endfunction
301
302 2x2 matrix multiply can be done with shifts and adds:
303
304 x = v[0] - sigma * (v[1] * 2^(-j));
305 y = sigma * (v[0] * 2^(-j)) + v[1];
306 v = [x; y];
307
308 The technique is outlined in a paper as being applicable to 3D:
309 <https://www.atlantis-press.com/proceedings/jcis2006/232>
310
311 # Expensive 3-operand OP32 operations
312
313 3-operand operations are extremely expensive in terms of OP32 encoding space. A potential idea is to embed 3 RVC register formats across two out of three 5-bit fields rs1/rs2/rd
314
315 Another is to overwrite one of the src registers.
316
317 # Opcode Table
318
319 TODO
320
321 # Links
322
323 * <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-September/002736.html>
324 * <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-September/002733.html>
325 * <http://bugs.libre-riscv.org/show_bug.cgi?id=142>