simple_v_extension/vector_ops.mdwn

   1 # Vector Operations Extension to SV
   2
   3 This extension is usually dependent on SV SUBVL being implemented. When SUBVL is set to define the length of a subvector the operations in this extension interpret the elements as a single vector.
   4
   5 Normally in SV all operations are scalar and independent, and the operations on them may inherently be independently parallelised, with the result being a vector of length exactly equal to the input vectors.
   6
   7 In this extension, the subvector itself is typically the unit, although some operations will work on scalars or standard vectors as well, or the result is a scalar that is dependent on all elements within the vector arguments.
   8
   9 Examples which can require SUBVL include cross product and may in future involve complex numbers.
  10
  11 ## CORDIC
  12
  13 CORDIC is an extremely general-purpose algorithm useful for a huge number
  14 of diverse purposes.  In its full form it does however require quite a
  15 few parameters, one of which is a vector, making it awkward to include in
  16 a standard "scalar" ISA.  Additionally the coordinates can be set to circular,
  17 linear or hyperbolic, producing three different modes, and the algorithm
  18 may also be run in either "vector" mode or "rotation" mode.  See [[discussion]]
  19
  20 vx, vy = CORDIC(vx, vy, coordinate\_mode, beta)
  21
  22      int i = 0;
  23      int iterations = 0; // Number of times to run the algorithm
  24      float arctanTable[iterations]; // in Radians
  25      float K = 0.6073; // K
  26      float v_x,v_y; // Vector v; x and y components
  27
  28      for(i=0; i < iterations; i++) {
  29         arctanTable[i] = atan(pow(2,-i));
  30      }
  31
  32      float vnew_x;   // To store the new value of x;
  33      for(i = 0; i < iterations; i++) {
  34          // If beta is negative, we need to do a counter-clockwise rotation:
  35          if( beta < 0) {
  36             vnew_x = v_x + (v_y*pow(2,-i));
  37             v_y -= (v_x*pow(2,-i));
  38             beta += arctanTable[i];
  39          }
  40          // If beta is positive, we need to do a clockwise rotation:
  41          else {
  42             vnew_x = v_x - (v_y*pow(2,-i));
  43             v_y += (v_x*pow(2,-i));
  44             beta -= arctanTable[i];
  45          }
  46          v_x = vnew_x;
  47      }
  48      v_x *= K;
  49      v_y *= K;
  50
  51 Links:
  52
  53 * <http://www.myhdl.org/docs/examples/sinecomp/>
  54
  55 ## Vector cross product
  56
  57 Result is the cross product of x and y, i.e., the resulting components are, in order:
  58
  59     x[1] * y[2] - y[1] * x[2]
  60     x[2] * y[0] - y[2] * x[0]
  61     x[0] * y[1] - y[0] * x[1]
  62
  63 All the operands must be vectors of 3 components of a floating-point type.
  64
  65 Pseudocode:
  66
  67     vec3 a, b; // elements in order a.x, a.y, a.z
  68     // compute a cross b:
  69     vec3 t1 = a.yzx; // produce vector [a.y, a.z, a.x]
  70     vec3 t2 = b.zxy;
  71     vec3 t3 = a.zxy;
  72     vec3 t4 = b.yzx;
  73     vec3 p = t3 * t4;
  74     vec3 cross = t1 * t2 - p;
  75
  76 ## Vector dot product
  77
  78 Computes the dot product of two vectors. Internal accuracy must be
  79 greater than the input vectors and the result.
  80
  81 Pseudocode in python:
  82
  83     from operator import mul
  84     sum(map(mul, A, B))
  85
  86 Pseudocode in c:
  87
  88     double dot_product(float v[], float u[], int n)
  89     {
  90         double result = 0.0;
  91         for (int i = 0; i < n; i++)
  92             result += v[i] * u[i];
  93         return result;
  94     }
  95
  96 ## Vector length
  97
  98 The scalar length of a vector:
  99
 100     sqrt(x[0]^2 + x[1]^2 + ...).
 101
 102 ## Vector distance
 103
 104 The scalar distance between two vectors. Subtracts one vector from the
 105 other and returns length:
 106
 107     length(v0 - v1)
 108
 109 ## Vector LERP
 110
 111 Known as **fmix** in GLSL.
 112
 113 <https://en.m.wikipedia.org/wiki/Linear_interpolation>
 114
 115 Pseudocode:
 116
 117     // Imprecise method, which does not guarantee v = v1 when t = 1,
 118     // due to floating-point arithmetic error.
 119     // This form may be used when the hardware has a native fused
 120     // multiply-add instruction.
 121     float lerp(float v0, float v1, float t) {
 122       return v0 + t * (v1 - v0);
 123     }
 124
 125     // Precise method, which guarantees v = v1 when t = 1.
 126     float lerp(float v0, float v1, float t) {
 127       return (1 - t) * v0 + t * v1;
 128     }
 129
 130 ## Vector SLERP
 131
 132 Not recommended as it is not commonly used and has several trigonometric
 133 functions.
 134
 135 <https://en.m.wikipedia.org/wiki/Slerp>
 136
 137 Pseudocode:
 138
 139     Quaternion slerp(Quaternion v0, Quaternion v1, double t) {
 140         // Only unit quaternions are valid rotations.
 141         // Normalize to avoid undefined behavior.
 142         v0.normalize();
 143         v1.normalize();
 144
 145         // Compute the cosine of the angle between the two vectors.
 146         double dot = dot_product(v0, v1);
 147
 148         // If the dot product is negative, slerp won't take
 149         // the shorter path. Note that v1 and -v1 are equivalent when
 150         // the negation is applied to all four components. Fix by
 151         // reversing one quaternion.
 152         if (dot < 0.0f) {
 153             v1 = -v1;
 154             dot = -dot;
 155         }
 156
 157         const double DOT_THRESHOLD = 0.9995;
 158         if (dot > DOT_THRESHOLD) {
 159             // If the inputs are too close for comfort, linearly interpolate
 160             // and normalize the result.
 161
 162             Quaternion result = v0 + t*(v1 - v0);
 163             result.normalize();
 164             return result;
 165         }
 166
 167         // Since dot is in range [0, DOT_THRESHOLD], acos is safe
 168         double theta_0 = acos(dot);        // theta_0 = angle between input vectors
 169         double theta = theta_0*t;          // theta = angle between v0 and result
 170         double sin_theta = sin(theta);     // compute this value only once
 171         double sin_theta_0 = sin(theta_0); // compute this value only once
 172
 173         double s0 = cos(theta) - dot * sin_theta / sin_theta_0;  // == sin(theta_0 - theta) / sin(theta_0)
 174         double s1 = sin_theta / sin_theta_0;
 175
 176         return (s0 * v0) + (s1 * v1);
 177     }
 178
 179 However this algorithm does not involve transcendentals except in
 180 the computation of the tables: <https://en.wikipedia.org/wiki/CORDIC#Rotation_mode>
 181
 182     function v = cordic(beta,n)
 183         % This function computes v = [cos(beta), sin(beta)] (beta in radians)
 184         % using n iterations. Increasing n will increase the precision.
 185
 186         if beta < -pi/2 || beta > pi/2
 187             if beta < 0
 188                 v = cordic(beta + pi, n);
 189             else
 190                 v = cordic(beta - pi, n);
 191             end
 192             v = -v; % flip the sign for second or third quadrant
 193             return
 194         end
 195
 196         % Initialization of tables of constants used by CORDIC
 197         % need a table of arctangents of negative powers of two, in radians:
 198         % angles = atan(2.^-(0:27));
 199         angles =  [  ...
 200             0.78539816339745   0.46364760900081
 201             0.24497866312686   0.12435499454676 ...
 202             0.06241880999596   0.03123983343027
 203             0.01562372862048   0.00781234106010 ...
 204             0.00390623013197   0.00195312251648
 205             0.00097656218956   0.00048828121119 ...
 206             0.00024414062015   0.00012207031189
 207             0.00006103515617   0.00003051757812 ...
 208             0.00001525878906   0.00000762939453
 209             0.00000381469727   0.00000190734863 ...
 210             0.00000095367432   0.00000047683716
 211             0.00000023841858   0.00000011920929 ...
 212             0.00000005960464   0.00000002980232
 213             0.00000001490116   0.00000000745058 ];
 214         % and a table of products of reciprocal lengths of vectors [1, 2^-2j]:
 215         % Kvalues = cumprod(1./abs(1 + 1j*2.^(-(0:23))))
 216         Kvalues = [ ...
 217             0.70710678118655   0.63245553203368
 218             0.61357199107790   0.60883391251775 ...
 219             0.60764825625617   0.60735177014130
 220             0.60727764409353   0.60725911229889 ...
 221             0.60725447933256   0.60725332108988
 222             0.60725303152913   0.60725295913894 ...
 223             0.60725294104140   0.60725293651701
 224             0.60725293538591   0.60725293510314 ...
 225             0.60725293503245   0.60725293501477
 226             0.60725293501035   0.60725293500925 ...
 227             0.60725293500897   0.60725293500890
 228             0.60725293500889   0.60725293500888 ];
 229         Kn = Kvalues(min(n, length(Kvalues)));
 230
 231         % Initialize loop variables:
 232         v = [1;0]; % start with 2-vector cosine and sine of zero
 233         poweroftwo = 1;
 234         angle = angles(1);
 235
 236         % Iterations
 237         for j = 0:n-1;
 238             if beta < 0
 239                 sigma = -1;
 240             else
 241                 sigma = 1;
 242             end
 243             factor = sigma * poweroftwo;
 244             % Note the matrix multiplication can be done using scaling by
 245             % powers of two and addition subtraction
 246             R = [1, -factor; factor, 1];
 247             v = R * v; % 2-by-2 matrix multiply
 248             beta = beta - sigma * angle; % update the remaining angle
 249             poweroftwo = poweroftwo / 2;
 250             % update the angle from table, or eventually by just dividing by two
 251             if j+2 > length(angles)
 252                 angle = angle / 2;
 253             else
 254                 angle = angles(j+2);
 255             end
 256         end
 257
 258         % Adjust length of output vector to be [cos(beta), sin(beta)]:
 259         v = v * Kn;
 260         return
 261
 262     endfunction
 263
 264 2x2 matrix multiply can be done with shifts and adds:
 265
 266     x = v[0] - sigma * (v[1] * 2^(-j));
 267     y = sigma * (v[0] * 2^(-j)) + v[1];
 268     v = [x; y];
 269
 270 The technique is outlined in a paper as being applicable to 3D:
 271 <https://www.atlantis-press.com/proceedings/jcis2006/232>
 272
 273 # Expensive 3-operand OP32 operations
 274
 275 3-operand operations are extremely expensive in terms of OP32 encoding space.  A potential idea is to embed 3 RVC register formats across two out of three 5-bit fields rs1/rs2/rd