(no commit message)
[libreriscv.git] / simple_v_extension / vector_ops.mdwn
1 [[!tag standards]]
2
3 # Vector Operations Extension to SV
4
5 This extension is usually dependent on SV SUBVL being implemented. When SUBVL is set to define the length of a subvector the operations in this extension interpret the elements as a single vector.
6
7 Normally in SV all operations are scalar and independent, and the operations on them may inherently be independently parallelised, with the result being a vector of length exactly equal to the input vectors.
8
9 In this extension, the subvector itself is typically the unit, although some operations will work on scalars or standard vectors as well, or the result is a scalar that is dependent on all elements within the vector arguments.
10
11 Examples which can require SUBVL include cross product and may in future involve complex numbers.
12
13 ## CORDIC
14
15 SUBVL=2
16
17 * CORDIC.lin.rot vd, vs, beta
18 * CORDIC.cir.rot vd, vs, beta
19 * CORDIC.hyp.rot vd, vs, beta
20 * CORDIC.lin.vec vd, vs, beta
21 * CORDIC.cir.vec vd, vs, beta
22 * CORDIC.hyp.vec vd, vs, beta
23
24 CORDIC is an extremely general-purpose algorithm useful for a huge number
25 of diverse purposes. In its full form it does however require quite a
26 few parameters, one of which is a vector, making it awkward to include in
27 a standard "scalar" ISA. Additionally the coordinates can be set to circular,
28 linear or hyperbolic, producing three different modes, and the algorithm
29 may also be run in either "vector" mode or "rotation" mode. See [[discussion]]
30
31 CORDIC can also be used for performing DCT. See
32 <https://arxiv.org/abs/1606.02424>
33
34 vx, vy = CORDIC(vx, vy, coordinate\_mode, beta)
35
36 int i = 0;
37 int iterations = 0; // Number of times to run the algorithm
38 float arctanTable[iterations]; // in Radians
39 float K = 0.6073; // K
40 float v_x,v_y; // Vector v; x and y components
41
42 for(i=0; i < iterations; i++) {
43 arctanTable[i] = atan(pow(2,-i));
44 }
45
46 float vnew_x; // To store the new value of x;
47 for(i = 0; i < iterations; i++) {
48 // If beta is negative, we need to do a counter-clockwise rotation:
49 if( beta < 0) {
50 vnew_x = v_x + (v_y*pow(2,-i));
51 v_y -= (v_x*pow(2,-i));
52 beta += arctanTable[i];
53 }
54 // If beta is positive, we need to do a clockwise rotation:
55 else {
56 vnew_x = v_x - (v_y*pow(2,-i));
57 v_y += (v_x*pow(2,-i));
58 beta -= arctanTable[i];
59 }
60 v_x = vnew_x;
61 }
62 v_x *= K;
63 v_y *= K;
64
65 Links:
66
67 * <http://www.myhdl.org/docs/examples/sinecomp/>
68
69 ## Vector cross product
70
71 * VCROSS vd, vs1, vs1
72
73 Result is the cross product of x and y, i.e., the resulting components are, in order:
74
75 x[1] * y[2] - y[1] * x[2]
76 x[2] * y[0] - y[2] * x[0]
77 x[0] * y[1] - y[0] * x[1]
78
79 All the operands must be vectors of 3 components of a floating-point type.
80
81 Pseudocode:
82
83 vec3 a, b; // elements in order a.x, a.y, a.z
84 // compute a cross b:
85 vec3 t1 = a.yzx; // produce vector [a.y, a.z, a.x]
86 vec3 t2 = b.zxy;
87 vec3 t3 = a.zxy;
88 vec3 t4 = b.yzx;
89 vec3 p = t3 * t4;
90 vec3 cross = t1 * t2 - p;
91
92 ## Vector dot product
93
94 * VDOT rd, vs1, vs2
95
96 Computes the dot product of two vectors. Internal accuracy must be
97 greater than the input vectors and the result.
98
99 Pseudocode in python:
100
101 from operator import mul
102 sum(map(mul, A, B))
103
104 Pseudocode in c:
105
106 double dot_product(float v[], float u[], int n)
107 {
108 double result = 0.0;
109 for (int i = 0; i < n; i++)
110 result += v[i] * u[i];
111 return result;
112 }
113
114 ## Vector length
115
116 * VLEN rd, vs1
117
118 The scalar length of a vector:
119
120 sqrt(x[0]^2 + x[1]^2 + ...).
121
122 ## Vector distance
123
124 * VDIST rd, vs1, vs2
125
126 The scalar distance between two vectors. Subtracts one vector from the
127 other and returns length:
128
129 length(v0 - v1)
130
131 ## Vector LERP
132
133 * VLERP rd, vs1, rs2 # SUBVL=2: vs1.v0 vs1.v1
134
135 Known as **fmix** in GLSL.
136
137 <https://en.m.wikipedia.org/wiki/Linear_interpolation>
138
139 Pseudocode:
140
141 // Imprecise method, which does not guarantee v = v1 when t = 1,
142 // due to floating-point arithmetic error.
143 // This form may be used when the hardware has a native fused
144 // multiply-add instruction.
145 float lerp(float v0, float v1, float t) {
146 return v0 + t * (v1 - v0);
147 }
148
149 // Precise method, which guarantees v = v1 when t = 1.
150 float lerp(float v0, float v1, float t) {
151 return (1 - t) * v0 + t * v1;
152 }
153
154 ## Vector SLERP
155
156 * VSLERP vd, vs1, vs2, rs3
157
158 Not recommended as it is not commonly used and has several trigonometric
159 functions. Also a costly 4 arg operation.
160
161 <https://en.m.wikipedia.org/wiki/Slerp>
162
163 Pseudocode:
164
165 Quaternion slerp(Quaternion v0, Quaternion v1, double t) {
166 // Only unit quaternions are valid rotations.
167 // Normalize to avoid undefined behavior.
168 v0.normalize();
169 v1.normalize();
170
171 // Compute the cosine of the angle between the two vectors.
172 double dot = dot_product(v0, v1);
173
174 // If the dot product is negative, slerp won't take
175 // the shorter path. Note that v1 and -v1 are equivalent when
176 // the negation is applied to all four components. Fix by
177 // reversing one quaternion.
178 if (dot < 0.0f) {
179 v1 = -v1;
180 dot = -dot;
181 }
182
183 const double DOT_THRESHOLD = 0.9995;
184 if (dot > DOT_THRESHOLD) {
185 // If the inputs are too close for comfort, linearly interpolate
186 // and normalize the result.
187
188 Quaternion result = v0 + t*(v1 - v0);
189 result.normalize();
190 return result;
191 }
192
193 // Since dot is in range [0, DOT_THRESHOLD], acos is safe
194 double theta_0 = acos(dot); // theta_0 = angle between input vectors
195 double theta = theta_0*t; // theta = angle between v0 and result
196 double sin_theta = sin(theta); // compute this value only once
197 double sin_theta_0 = sin(theta_0); // compute this value only once
198
199 double s0 = cos(theta) - dot * sin_theta / sin_theta_0; // == sin(theta_0 - theta) / sin(theta_0)
200 double s1 = sin_theta / sin_theta_0;
201
202 return (s0 * v0) + (s1 * v1);
203 }
204
205 However this algorithm does not involve transcendentals except in
206 the computation of the tables: <https://en.wikipedia.org/wiki/CORDIC#Rotation_mode>
207
208 function v = cordic(beta,n)
209 % This function computes v = [cos(beta), sin(beta)] (beta in radians)
210 % using n iterations. Increasing n will increase the precision.
211
212 if beta < -pi/2 || beta > pi/2
213 if beta < 0
214 v = cordic(beta + pi, n);
215 else
216 v = cordic(beta - pi, n);
217 end
218 v = -v; % flip the sign for second or third quadrant
219 return
220 end
221
222 % Initialization of tables of constants used by CORDIC
223 % need a table of arctangents of negative powers of two, in radians:
224 % angles = atan(2.^-(0:27));
225 angles = [ ...
226 0.78539816339745 0.46364760900081
227 0.24497866312686 0.12435499454676 ...
228 0.06241880999596 0.03123983343027
229 0.01562372862048 0.00781234106010 ...
230 0.00390623013197 0.00195312251648
231 0.00097656218956 0.00048828121119 ...
232 0.00024414062015 0.00012207031189
233 0.00006103515617 0.00003051757812 ...
234 0.00001525878906 0.00000762939453
235 0.00000381469727 0.00000190734863 ...
236 0.00000095367432 0.00000047683716
237 0.00000023841858 0.00000011920929 ...
238 0.00000005960464 0.00000002980232
239 0.00000001490116 0.00000000745058 ];
240 % and a table of products of reciprocal lengths of vectors [1, 2^-2j]:
241 % Kvalues = cumprod(1./abs(1 + 1j*2.^(-(0:23))))
242 Kvalues = [ ...
243 0.70710678118655 0.63245553203368
244 0.61357199107790 0.60883391251775 ...
245 0.60764825625617 0.60735177014130
246 0.60727764409353 0.60725911229889 ...
247 0.60725447933256 0.60725332108988
248 0.60725303152913 0.60725295913894 ...
249 0.60725294104140 0.60725293651701
250 0.60725293538591 0.60725293510314 ...
251 0.60725293503245 0.60725293501477
252 0.60725293501035 0.60725293500925 ...
253 0.60725293500897 0.60725293500890
254 0.60725293500889 0.60725293500888 ];
255 Kn = Kvalues(min(n, length(Kvalues)));
256
257 % Initialize loop variables:
258 v = [1;0]; % start with 2-vector cosine and sine of zero
259 poweroftwo = 1;
260 angle = angles(1);
261
262 % Iterations
263 for j = 0:n-1;
264 if beta < 0
265 sigma = -1;
266 else
267 sigma = 1;
268 end
269 factor = sigma * poweroftwo;
270 % Note the matrix multiplication can be done using scaling by
271 % powers of two and addition subtraction
272 R = [1, -factor; factor, 1];
273 v = R * v; % 2-by-2 matrix multiply
274 beta = beta - sigma * angle; % update the remaining angle
275 poweroftwo = poweroftwo / 2;
276 % update the angle from table, or eventually by just dividing by two
277 if j+2 > length(angles)
278 angle = angle / 2;
279 else
280 angle = angles(j+2);
281 end
282 end
283
284 % Adjust length of output vector to be [cos(beta), sin(beta)]:
285 v = v * Kn;
286 return
287
288 endfunction
289
290 2x2 matrix multiply can be done with shifts and adds:
291
292 x = v[0] - sigma * (v[1] * 2^(-j));
293 y = sigma * (v[0] * 2^(-j)) + v[1];
294 v = [x; y];
295
296 The technique is outlined in a paper as being applicable to 3D:
297 <https://www.atlantis-press.com/proceedings/jcis2006/232>
298
299 # Expensive 3-operand OP32 operations
300
301 3-operand operations are extremely expensive in terms of OP32 encoding space. A potential idea is to embed 3 RVC register formats across two out of three 5-bit fields rs1/rs2/rd
302
303 Another is to overwrite one of the src registers.
304
305 # Opcode Table
306
307 TODO