add vector CORDIC
[libreriscv.git] / simple_v_extension / vector_ops.mdwn
1 # Vector Operations Extension to SV
2
3 This extension is usually dependent on SV SUBVL being implemented. When SUBVL is set to define the length of a subvector the operations in this extension interpret the elements as a single vector.
4
5 Normally in SV all operations are scalar and independent, and the operations on them may inherently be independently parallelised, with the result being a vector of length exactly equal to the input vectors.
6
7 In this extension, the subvector itself is typically the unit, although some operations will work on scalars or standard vectors as well, or the result is a scalar that is dependent on all elements within the vector arguments.
8
9 Examples which can require SUBVL include cross product and may in future involve complex numbers.
10
11 ## CORDIC
12
13 CORDIC is an extremely general-purpose algorithm useful for a huge number
14 of diverse purposes. In its full form it does however require quite a
15 few parameters, one of which is a vector, making it awkward to include in
16 a standard "scalar" ISA. Additionally the coordinates can be set to circular,
17 linear or hyperbolic, producing three different modes, and the algorithm
18 may also be run in either "vector" mode or "rotation" mode. See [[discussion]]
19
20 vx, vy = CORDIC(vx, vy, coordinate\_mode, beta)
21
22 int i = 0;
23 int iterations = 0; // Number of times to run the algorithm
24 float arctanTable[iterations]; // in Radians
25 float K = 0.6073; // K
26 float v_x,v_y; // Vector v; x and y components
27
28 for(i=0; i < iterations; i++) {
29 arctanTable[i] = atan(pow(2,-i));
30 }
31
32 float vnew_x; // To store the new value of x;
33 for(i = 0; i < iterations; i++) {
34 // If beta is negative, we need to do a counter-clockwise rotation:
35 if( beta < 0) {
36 vnew_x = v_x + (v_y*pow(2,-i));
37 v_y -= (v_x*pow(2,-i));
38 beta += arctanTable[i];
39 }
40 // If beta is positive, we need to do a clockwise rotation:
41 else {
42 vnew_x = v_x - (v_y*pow(2,-i));
43 v_y += (v_x*pow(2,-i));
44 beta -= arctanTable[i];
45 }
46 v_x = vnew_x;
47 }
48 v_x *= K;
49 v_y *= K;
50
51 Links:
52
53 * <http://www.myhdl.org/docs/examples/sinecomp/>
54
55 ## Vector cross product
56
57 Result is the cross product of x and y, i.e., the resulting components are, in order:
58
59 x[1] * y[2] - y[1] * x[2]
60 x[2] * y[0] - y[2] * x[0]
61 x[0] * y[1] - y[0] * x[1]
62
63 All the operands must be vectors of 3 components of a floating-point type.
64
65 Pseudocode:
66
67 vec3 a, b; // elements in order a.x, a.y, a.z
68 // compute a cross b:
69 vec3 t1 = a.yzx; // produce vector [a.y, a.z, a.x]
70 vec3 t2 = b.zxy;
71 vec3 t3 = a.zxy;
72 vec3 t4 = b.yzx;
73 vec3 p = t3 * t4;
74 vec3 cross = t1 * t2 - p;
75
76 ## Vector dot product
77
78 Computes the dot product of two vectors. Internal accuracy must be
79 greater than the input vectors and the result.
80
81 Pseudocode in python:
82
83 from operator import mul
84 sum(map(mul, A, B))
85
86 Pseudocode in c:
87
88 double dot_product(float v[], float u[], int n)
89 {
90 double result = 0.0;
91 for (int i = 0; i < n; i++)
92 result += v[i] * u[i];
93 return result;
94 }
95
96 ## Vector length
97
98 The scalar length of a vector:
99
100 sqrt(x[0]^2 + x[1]^2 + ...).
101
102 ## Vector distance
103
104 The scalar distance between two vectors. Subtracts one vector from the
105 other and returns length:
106
107 length(v0 - v1)
108
109 ## Vector LERP
110
111 Known as **fmix** in GLSL.
112
113 <https://en.m.wikipedia.org/wiki/Linear_interpolation>
114
115 Pseudocode:
116
117 // Imprecise method, which does not guarantee v = v1 when t = 1,
118 // due to floating-point arithmetic error.
119 // This form may be used when the hardware has a native fused
120 // multiply-add instruction.
121 float lerp(float v0, float v1, float t) {
122 return v0 + t * (v1 - v0);
123 }
124
125 // Precise method, which guarantees v = v1 when t = 1.
126 float lerp(float v0, float v1, float t) {
127 return (1 - t) * v0 + t * v1;
128 }
129
130 ## Vector SLERP
131
132 Not recommended as it is not commonly used and has several trigonometric
133 functions.
134
135 <https://en.m.wikipedia.org/wiki/Slerp>
136
137 Pseudocode:
138
139 Quaternion slerp(Quaternion v0, Quaternion v1, double t) {
140 // Only unit quaternions are valid rotations.
141 // Normalize to avoid undefined behavior.
142 v0.normalize();
143 v1.normalize();
144
145 // Compute the cosine of the angle between the two vectors.
146 double dot = dot_product(v0, v1);
147
148 // If the dot product is negative, slerp won't take
149 // the shorter path. Note that v1 and -v1 are equivalent when
150 // the negation is applied to all four components. Fix by
151 // reversing one quaternion.
152 if (dot < 0.0f) {
153 v1 = -v1;
154 dot = -dot;
155 }
156
157 const double DOT_THRESHOLD = 0.9995;
158 if (dot > DOT_THRESHOLD) {
159 // If the inputs are too close for comfort, linearly interpolate
160 // and normalize the result.
161
162 Quaternion result = v0 + t*(v1 - v0);
163 result.normalize();
164 return result;
165 }
166
167 // Since dot is in range [0, DOT_THRESHOLD], acos is safe
168 double theta_0 = acos(dot); // theta_0 = angle between input vectors
169 double theta = theta_0*t; // theta = angle between v0 and result
170 double sin_theta = sin(theta); // compute this value only once
171 double sin_theta_0 = sin(theta_0); // compute this value only once
172
173 double s0 = cos(theta) - dot * sin_theta / sin_theta_0; // == sin(theta_0 - theta) / sin(theta_0)
174 double s1 = sin_theta / sin_theta_0;
175
176 return (s0 * v0) + (s1 * v1);
177 }
178
179 However this algorithm does not involve transcendentals except in
180 the computation of the tables: <https://en.wikipedia.org/wiki/CORDIC#Rotation_mode>
181
182 function v = cordic(beta,n)
183 % This function computes v = [cos(beta), sin(beta)] (beta in radians)
184 % using n iterations. Increasing n will increase the precision.
185
186 if beta < -pi/2 || beta > pi/2
187 if beta < 0
188 v = cordic(beta + pi, n);
189 else
190 v = cordic(beta - pi, n);
191 end
192 v = -v; % flip the sign for second or third quadrant
193 return
194 end
195
196 % Initialization of tables of constants used by CORDIC
197 % need a table of arctangents of negative powers of two, in radians:
198 % angles = atan(2.^-(0:27));
199 angles = [ ...
200 0.78539816339745 0.46364760900081
201 0.24497866312686 0.12435499454676 ...
202 0.06241880999596 0.03123983343027
203 0.01562372862048 0.00781234106010 ...
204 0.00390623013197 0.00195312251648
205 0.00097656218956 0.00048828121119 ...
206 0.00024414062015 0.00012207031189
207 0.00006103515617 0.00003051757812 ...
208 0.00001525878906 0.00000762939453
209 0.00000381469727 0.00000190734863 ...
210 0.00000095367432 0.00000047683716
211 0.00000023841858 0.00000011920929 ...
212 0.00000005960464 0.00000002980232
213 0.00000001490116 0.00000000745058 ];
214 % and a table of products of reciprocal lengths of vectors [1, 2^-2j]:
215 % Kvalues = cumprod(1./abs(1 + 1j*2.^(-(0:23))))
216 Kvalues = [ ...
217 0.70710678118655 0.63245553203368
218 0.61357199107790 0.60883391251775 ...
219 0.60764825625617 0.60735177014130
220 0.60727764409353 0.60725911229889 ...
221 0.60725447933256 0.60725332108988
222 0.60725303152913 0.60725295913894 ...
223 0.60725294104140 0.60725293651701
224 0.60725293538591 0.60725293510314 ...
225 0.60725293503245 0.60725293501477
226 0.60725293501035 0.60725293500925 ...
227 0.60725293500897 0.60725293500890
228 0.60725293500889 0.60725293500888 ];
229 Kn = Kvalues(min(n, length(Kvalues)));
230
231 % Initialize loop variables:
232 v = [1;0]; % start with 2-vector cosine and sine of zero
233 poweroftwo = 1;
234 angle = angles(1);
235
236 % Iterations
237 for j = 0:n-1;
238 if beta < 0
239 sigma = -1;
240 else
241 sigma = 1;
242 end
243 factor = sigma * poweroftwo;
244 % Note the matrix multiplication can be done using scaling by
245 % powers of two and addition subtraction
246 R = [1, -factor; factor, 1];
247 v = R * v; % 2-by-2 matrix multiply
248 beta = beta - sigma * angle; % update the remaining angle
249 poweroftwo = poweroftwo / 2;
250 % update the angle from table, or eventually by just dividing by two
251 if j+2 > length(angles)
252 angle = angle / 2;
253 else
254 angle = angles(j+2);
255 end
256 end
257
258 % Adjust length of output vector to be [cos(beta), sin(beta)]:
259 v = v * Kn;
260 return
261
262 endfunction
263
264 2x2 matrix multiply can be done with shifts and adds:
265
266 x = v[0] - sigma * (v[1] * 2^(-j));
267 y = sigma * (v[0] * 2^(-j)) + v[1];
268 v = [x; y];
269
270 The technique is outlined in a paper as being applicable to 3D:
271 <https://www.atlantis-press.com/proceedings/jcis2006/232>
272
273 # Expensive 3-operand OP32 operations
274
275 3-operand operations are extremely expensive in terms of OP32 encoding space. A potential idea is to embed 3 RVC register formats across two out of three 5-bit fields rs1/rs2/rd