add standards tag
[libreriscv.git] / simple_v_extension / vector_ops.mdwn
1 [[!tag standards]]
2
3 # Vector Operations Extension to SV
4
5 This extension is usually dependent on SV SUBVL being implemented. When SUBVL is set to define the length of a subvector the operations in this extension interpret the elements as a single vector.
6
7 Normally in SV all operations are scalar and independent, and the operations on them may inherently be independently parallelised, with the result being a vector of length exactly equal to the input vectors.
8
9 In this extension, the subvector itself is typically the unit, although some operations will work on scalars or standard vectors as well, or the result is a scalar that is dependent on all elements within the vector arguments.
10
11 Examples which can require SUBVL include cross product and may in future involve complex numbers.
12
13 ## CORDIC
14
15 CORDIC is an extremely general-purpose algorithm useful for a huge number
16 of diverse purposes. In its full form it does however require quite a
17 few parameters, one of which is a vector, making it awkward to include in
18 a standard "scalar" ISA. Additionally the coordinates can be set to circular,
19 linear or hyperbolic, producing three different modes, and the algorithm
20 may also be run in either "vector" mode or "rotation" mode. See [[discussion]]
21
22 CORDIC can also be used for performing DCT. See
23 <https://arxiv.org/abs/1606.02424>
24
25 vx, vy = CORDIC(vx, vy, coordinate\_mode, beta)
26
27 int i = 0;
28 int iterations = 0; // Number of times to run the algorithm
29 float arctanTable[iterations]; // in Radians
30 float K = 0.6073; // K
31 float v_x,v_y; // Vector v; x and y components
32
33 for(i=0; i < iterations; i++) {
34 arctanTable[i] = atan(pow(2,-i));
35 }
36
37 float vnew_x; // To store the new value of x;
38 for(i = 0; i < iterations; i++) {
39 // If beta is negative, we need to do a counter-clockwise rotation:
40 if( beta < 0) {
41 vnew_x = v_x + (v_y*pow(2,-i));
42 v_y -= (v_x*pow(2,-i));
43 beta += arctanTable[i];
44 }
45 // If beta is positive, we need to do a clockwise rotation:
46 else {
47 vnew_x = v_x - (v_y*pow(2,-i));
48 v_y += (v_x*pow(2,-i));
49 beta -= arctanTable[i];
50 }
51 v_x = vnew_x;
52 }
53 v_x *= K;
54 v_y *= K;
55
56 Links:
57
58 * <http://www.myhdl.org/docs/examples/sinecomp/>
59
60 ## Vector cross product
61
62 Result is the cross product of x and y, i.e., the resulting components are, in order:
63
64 x[1] * y[2] - y[1] * x[2]
65 x[2] * y[0] - y[2] * x[0]
66 x[0] * y[1] - y[0] * x[1]
67
68 All the operands must be vectors of 3 components of a floating-point type.
69
70 Pseudocode:
71
72 vec3 a, b; // elements in order a.x, a.y, a.z
73 // compute a cross b:
74 vec3 t1 = a.yzx; // produce vector [a.y, a.z, a.x]
75 vec3 t2 = b.zxy;
76 vec3 t3 = a.zxy;
77 vec3 t4 = b.yzx;
78 vec3 p = t3 * t4;
79 vec3 cross = t1 * t2 - p;
80
81 ## Vector dot product
82
83 Computes the dot product of two vectors. Internal accuracy must be
84 greater than the input vectors and the result.
85
86 Pseudocode in python:
87
88 from operator import mul
89 sum(map(mul, A, B))
90
91 Pseudocode in c:
92
93 double dot_product(float v[], float u[], int n)
94 {
95 double result = 0.0;
96 for (int i = 0; i < n; i++)
97 result += v[i] * u[i];
98 return result;
99 }
100
101 ## Vector length
102
103 The scalar length of a vector:
104
105 sqrt(x[0]^2 + x[1]^2 + ...).
106
107 ## Vector distance
108
109 The scalar distance between two vectors. Subtracts one vector from the
110 other and returns length:
111
112 length(v0 - v1)
113
114 ## Vector LERP
115
116 Known as **fmix** in GLSL.
117
118 <https://en.m.wikipedia.org/wiki/Linear_interpolation>
119
120 Pseudocode:
121
122 // Imprecise method, which does not guarantee v = v1 when t = 1,
123 // due to floating-point arithmetic error.
124 // This form may be used when the hardware has a native fused
125 // multiply-add instruction.
126 float lerp(float v0, float v1, float t) {
127 return v0 + t * (v1 - v0);
128 }
129
130 // Precise method, which guarantees v = v1 when t = 1.
131 float lerp(float v0, float v1, float t) {
132 return (1 - t) * v0 + t * v1;
133 }
134
135 ## Vector SLERP
136
137 Not recommended as it is not commonly used and has several trigonometric
138 functions.
139
140 <https://en.m.wikipedia.org/wiki/Slerp>
141
142 Pseudocode:
143
144 Quaternion slerp(Quaternion v0, Quaternion v1, double t) {
145 // Only unit quaternions are valid rotations.
146 // Normalize to avoid undefined behavior.
147 v0.normalize();
148 v1.normalize();
149
150 // Compute the cosine of the angle between the two vectors.
151 double dot = dot_product(v0, v1);
152
153 // If the dot product is negative, slerp won't take
154 // the shorter path. Note that v1 and -v1 are equivalent when
155 // the negation is applied to all four components. Fix by
156 // reversing one quaternion.
157 if (dot < 0.0f) {
158 v1 = -v1;
159 dot = -dot;
160 }
161
162 const double DOT_THRESHOLD = 0.9995;
163 if (dot > DOT_THRESHOLD) {
164 // If the inputs are too close for comfort, linearly interpolate
165 // and normalize the result.
166
167 Quaternion result = v0 + t*(v1 - v0);
168 result.normalize();
169 return result;
170 }
171
172 // Since dot is in range [0, DOT_THRESHOLD], acos is safe
173 double theta_0 = acos(dot); // theta_0 = angle between input vectors
174 double theta = theta_0*t; // theta = angle between v0 and result
175 double sin_theta = sin(theta); // compute this value only once
176 double sin_theta_0 = sin(theta_0); // compute this value only once
177
178 double s0 = cos(theta) - dot * sin_theta / sin_theta_0; // == sin(theta_0 - theta) / sin(theta_0)
179 double s1 = sin_theta / sin_theta_0;
180
181 return (s0 * v0) + (s1 * v1);
182 }
183
184 However this algorithm does not involve transcendentals except in
185 the computation of the tables: <https://en.wikipedia.org/wiki/CORDIC#Rotation_mode>
186
187 function v = cordic(beta,n)
188 % This function computes v = [cos(beta), sin(beta)] (beta in radians)
189 % using n iterations. Increasing n will increase the precision.
190
191 if beta < -pi/2 || beta > pi/2
192 if beta < 0
193 v = cordic(beta + pi, n);
194 else
195 v = cordic(beta - pi, n);
196 end
197 v = -v; % flip the sign for second or third quadrant
198 return
199 end
200
201 % Initialization of tables of constants used by CORDIC
202 % need a table of arctangents of negative powers of two, in radians:
203 % angles = atan(2.^-(0:27));
204 angles = [ ...
205 0.78539816339745 0.46364760900081
206 0.24497866312686 0.12435499454676 ...
207 0.06241880999596 0.03123983343027
208 0.01562372862048 0.00781234106010 ...
209 0.00390623013197 0.00195312251648
210 0.00097656218956 0.00048828121119 ...
211 0.00024414062015 0.00012207031189
212 0.00006103515617 0.00003051757812 ...
213 0.00001525878906 0.00000762939453
214 0.00000381469727 0.00000190734863 ...
215 0.00000095367432 0.00000047683716
216 0.00000023841858 0.00000011920929 ...
217 0.00000005960464 0.00000002980232
218 0.00000001490116 0.00000000745058 ];
219 % and a table of products of reciprocal lengths of vectors [1, 2^-2j]:
220 % Kvalues = cumprod(1./abs(1 + 1j*2.^(-(0:23))))
221 Kvalues = [ ...
222 0.70710678118655 0.63245553203368
223 0.61357199107790 0.60883391251775 ...
224 0.60764825625617 0.60735177014130
225 0.60727764409353 0.60725911229889 ...
226 0.60725447933256 0.60725332108988
227 0.60725303152913 0.60725295913894 ...
228 0.60725294104140 0.60725293651701
229 0.60725293538591 0.60725293510314 ...
230 0.60725293503245 0.60725293501477
231 0.60725293501035 0.60725293500925 ...
232 0.60725293500897 0.60725293500890
233 0.60725293500889 0.60725293500888 ];
234 Kn = Kvalues(min(n, length(Kvalues)));
235
236 % Initialize loop variables:
237 v = [1;0]; % start with 2-vector cosine and sine of zero
238 poweroftwo = 1;
239 angle = angles(1);
240
241 % Iterations
242 for j = 0:n-1;
243 if beta < 0
244 sigma = -1;
245 else
246 sigma = 1;
247 end
248 factor = sigma * poweroftwo;
249 % Note the matrix multiplication can be done using scaling by
250 % powers of two and addition subtraction
251 R = [1, -factor; factor, 1];
252 v = R * v; % 2-by-2 matrix multiply
253 beta = beta - sigma * angle; % update the remaining angle
254 poweroftwo = poweroftwo / 2;
255 % update the angle from table, or eventually by just dividing by two
256 if j+2 > length(angles)
257 angle = angle / 2;
258 else
259 angle = angles(j+2);
260 end
261 end
262
263 % Adjust length of output vector to be [cos(beta), sin(beta)]:
264 v = v * Kn;
265 return
266
267 endfunction
268
269 2x2 matrix multiply can be done with shifts and adds:
270
271 x = v[0] - sigma * (v[1] * 2^(-j));
272 y = sigma * (v[0] * 2^(-j)) + v[1];
273 v = [x; y];
274
275 The technique is outlined in a paper as being applicable to 3D:
276 <https://www.atlantis-press.com/proceedings/jcis2006/232>
277
278 # Expensive 3-operand OP32 operations
279
280 3-operand operations are extremely expensive in terms of OP32 encoding space. A potential idea is to embed 3 RVC register formats across two out of three 5-bit fields rs1/rs2/rd