simple_v_extension/vector_ops.mdwn

   1 [[!tag standards]]
   2
   3 # Vector Operations Extension to SV
   4
   5 This extension is usually dependent on SV SUBVL being implemented. When SUBVL is set to define the length of a subvector the operations in this extension interpret the elements as a single vector.
   6
   7 Normally in SV all operations are scalar and independent, and the operations on them may inherently be independently parallelised, with the result being a vector of length exactly equal to the input vectors.
   8
   9 In this extension, the subvector itself is typically the unit, although some operations will work on scalars or standard vectors as well, or the result is a scalar that is dependent on all elements within the vector arguments.
  10
  11 Examples which can require SUBVL include cross product and may in future involve complex numbers.
  12
  13 ## CORDIC
  14
  15 CORDIC is an extremely general-purpose algorithm useful for a huge number
  16 of diverse purposes.  In its full form it does however require quite a
  17 few parameters, one of which is a vector, making it awkward to include in
  18 a standard "scalar" ISA.  Additionally the coordinates can be set to circular,
  19 linear or hyperbolic, producing three different modes, and the algorithm
  20 may also be run in either "vector" mode or "rotation" mode.  See [[discussion]]
  21
  22 CORDIC can also be used for performing DCT.  See
  23 <https://arxiv.org/abs/1606.02424>
  24
  25 vx, vy = CORDIC(vx, vy, coordinate\_mode, beta)
  26
  27      int i = 0;
  28      int iterations = 0; // Number of times to run the algorithm
  29      float arctanTable[iterations]; // in Radians
  30      float K = 0.6073; // K
  31      float v_x,v_y; // Vector v; x and y components
  32
  33      for(i=0; i < iterations; i++) {
  34         arctanTable[i] = atan(pow(2,-i));
  35      }
  36
  37      float vnew_x;   // To store the new value of x;
  38      for(i = 0; i < iterations; i++) {
  39          // If beta is negative, we need to do a counter-clockwise rotation:
  40          if( beta < 0) {
  41             vnew_x = v_x + (v_y*pow(2,-i));
  42             v_y -= (v_x*pow(2,-i));
  43             beta += arctanTable[i];
  44          }
  45          // If beta is positive, we need to do a clockwise rotation:
  46          else {
  47             vnew_x = v_x - (v_y*pow(2,-i));
  48             v_y += (v_x*pow(2,-i));
  49             beta -= arctanTable[i];
  50          }
  51          v_x = vnew_x;
  52      }
  53      v_x *= K;
  54      v_y *= K;
  55
  56 Links:
  57
  58 * <http://www.myhdl.org/docs/examples/sinecomp/>
  59
  60 ## Vector cross product
  61
  62 Result is the cross product of x and y, i.e., the resulting components are, in order:
  63
  64     x[1] * y[2] - y[1] * x[2]
  65     x[2] * y[0] - y[2] * x[0]
  66     x[0] * y[1] - y[0] * x[1]
  67
  68 All the operands must be vectors of 3 components of a floating-point type.
  69
  70 Pseudocode:
  71
  72     vec3 a, b; // elements in order a.x, a.y, a.z
  73     // compute a cross b:
  74     vec3 t1 = a.yzx; // produce vector [a.y, a.z, a.x]
  75     vec3 t2 = b.zxy;
  76     vec3 t3 = a.zxy;
  77     vec3 t4 = b.yzx;
  78     vec3 p = t3 * t4;
  79     vec3 cross = t1 * t2 - p;
  80
  81 ## Vector dot product
  82
  83 Computes the dot product of two vectors. Internal accuracy must be
  84 greater than the input vectors and the result.
  85
  86 Pseudocode in python:
  87
  88     from operator import mul
  89     sum(map(mul, A, B))
  90
  91 Pseudocode in c:
  92
  93     double dot_product(float v[], float u[], int n)
  94     {
  95         double result = 0.0;
  96         for (int i = 0; i < n; i++)
  97             result += v[i] * u[i];
  98         return result;
  99     }
 100
 101 ## Vector length
 102
 103 The scalar length of a vector:
 104
 105     sqrt(x[0]^2 + x[1]^2 + ...).
 106
 107 ## Vector distance
 108
 109 The scalar distance between two vectors. Subtracts one vector from the
 110 other and returns length:
 111
 112     length(v0 - v1)
 113
 114 ## Vector LERP
 115
 116 Known as **fmix** in GLSL.
 117
 118 <https://en.m.wikipedia.org/wiki/Linear_interpolation>
 119
 120 Pseudocode:
 121
 122     // Imprecise method, which does not guarantee v = v1 when t = 1,
 123     // due to floating-point arithmetic error.
 124     // This form may be used when the hardware has a native fused
 125     // multiply-add instruction.
 126     float lerp(float v0, float v1, float t) {
 127       return v0 + t * (v1 - v0);
 128     }
 129
 130     // Precise method, which guarantees v = v1 when t = 1.
 131     float lerp(float v0, float v1, float t) {
 132       return (1 - t) * v0 + t * v1;
 133     }
 134
 135 ## Vector SLERP
 136
 137 Not recommended as it is not commonly used and has several trigonometric
 138 functions.
 139
 140 <https://en.m.wikipedia.org/wiki/Slerp>
 141
 142 Pseudocode:
 143
 144     Quaternion slerp(Quaternion v0, Quaternion v1, double t) {
 145         // Only unit quaternions are valid rotations.
 146         // Normalize to avoid undefined behavior.
 147         v0.normalize();
 148         v1.normalize();
 149
 150         // Compute the cosine of the angle between the two vectors.
 151         double dot = dot_product(v0, v1);
 152
 153         // If the dot product is negative, slerp won't take
 154         // the shorter path. Note that v1 and -v1 are equivalent when
 155         // the negation is applied to all four components. Fix by
 156         // reversing one quaternion.
 157         if (dot < 0.0f) {
 158             v1 = -v1;
 159             dot = -dot;
 160         }
 161
 162         const double DOT_THRESHOLD = 0.9995;
 163         if (dot > DOT_THRESHOLD) {
 164             // If the inputs are too close for comfort, linearly interpolate
 165             // and normalize the result.
 166
 167             Quaternion result = v0 + t*(v1 - v0);
 168             result.normalize();
 169             return result;
 170         }
 171
 172         // Since dot is in range [0, DOT_THRESHOLD], acos is safe
 173         double theta_0 = acos(dot);        // theta_0 = angle between input vectors
 174         double theta = theta_0*t;          // theta = angle between v0 and result
 175         double sin_theta = sin(theta);     // compute this value only once
 176         double sin_theta_0 = sin(theta_0); // compute this value only once
 177
 178         double s0 = cos(theta) - dot * sin_theta / sin_theta_0;  // == sin(theta_0 - theta) / sin(theta_0)
 179         double s1 = sin_theta / sin_theta_0;
 180
 181         return (s0 * v0) + (s1 * v1);
 182     }
 183
 184 However this algorithm does not involve transcendentals except in
 185 the computation of the tables: <https://en.wikipedia.org/wiki/CORDIC#Rotation_mode>
 186
 187     function v = cordic(beta,n)
 188         % This function computes v = [cos(beta), sin(beta)] (beta in radians)
 189         % using n iterations. Increasing n will increase the precision.
 190
 191         if beta < -pi/2 || beta > pi/2
 192             if beta < 0
 193                 v = cordic(beta + pi, n);
 194             else
 195                 v = cordic(beta - pi, n);
 196             end
 197             v = -v; % flip the sign for second or third quadrant
 198             return
 199         end
 200
 201         % Initialization of tables of constants used by CORDIC
 202         % need a table of arctangents of negative powers of two, in radians:
 203         % angles = atan(2.^-(0:27));
 204         angles =  [  ...
 205             0.78539816339745   0.46364760900081
 206             0.24497866312686   0.12435499454676 ...
 207             0.06241880999596   0.03123983343027
 208             0.01562372862048   0.00781234106010 ...
 209             0.00390623013197   0.00195312251648
 210             0.00097656218956   0.00048828121119 ...
 211             0.00024414062015   0.00012207031189
 212             0.00006103515617   0.00003051757812 ...
 213             0.00001525878906   0.00000762939453
 214             0.00000381469727   0.00000190734863 ...
 215             0.00000095367432   0.00000047683716
 216             0.00000023841858   0.00000011920929 ...
 217             0.00000005960464   0.00000002980232
 218             0.00000001490116   0.00000000745058 ];
 219         % and a table of products of reciprocal lengths of vectors [1, 2^-2j]:
 220         % Kvalues = cumprod(1./abs(1 + 1j*2.^(-(0:23))))
 221         Kvalues = [ ...
 222             0.70710678118655   0.63245553203368
 223             0.61357199107790   0.60883391251775 ...
 224             0.60764825625617   0.60735177014130
 225             0.60727764409353   0.60725911229889 ...
 226             0.60725447933256   0.60725332108988
 227             0.60725303152913   0.60725295913894 ...
 228             0.60725294104140   0.60725293651701
 229             0.60725293538591   0.60725293510314 ...
 230             0.60725293503245   0.60725293501477
 231             0.60725293501035   0.60725293500925 ...
 232             0.60725293500897   0.60725293500890
 233             0.60725293500889   0.60725293500888 ];
 234         Kn = Kvalues(min(n, length(Kvalues)));
 235
 236         % Initialize loop variables:
 237         v = [1;0]; % start with 2-vector cosine and sine of zero
 238         poweroftwo = 1;
 239         angle = angles(1);
 240
 241         % Iterations
 242         for j = 0:n-1;
 243             if beta < 0
 244                 sigma = -1;
 245             else
 246                 sigma = 1;
 247             end
 248             factor = sigma * poweroftwo;
 249             % Note the matrix multiplication can be done using scaling by
 250             % powers of two and addition subtraction
 251             R = [1, -factor; factor, 1];
 252             v = R * v; % 2-by-2 matrix multiply
 253             beta = beta - sigma * angle; % update the remaining angle
 254             poweroftwo = poweroftwo / 2;
 255             % update the angle from table, or eventually by just dividing by two
 256             if j+2 > length(angles)
 257                 angle = angle / 2;
 258             else
 259                 angle = angles(j+2);
 260             end
 261         end
 262
 263         % Adjust length of output vector to be [cos(beta), sin(beta)]:
 264         v = v * Kn;
 265         return
 266
 267     endfunction
 268
 269 2x2 matrix multiply can be done with shifts and adds:
 270
 271     x = v[0] - sigma * (v[1] * 2^(-j));
 272     y = sigma * (v[0] * 2^(-j)) + v[1];
 273     v = [x; y];
 274
 275 The technique is outlined in a paper as being applicable to 3D:
 276 <https://www.atlantis-press.com/proceedings/jcis2006/232>
 277
 278 # Expensive 3-operand OP32 operations
 279
 280 3-operand operations are extremely expensive in terms of OP32 encoding space.  A potential idea is to embed 3 RVC register formats across two out of three 5-bit fields rs1/rs2/rd