simple_v_extension/vector_ops.mdwn

   1 [[!tag standards]]
   2
   3 # Vector Operations Extension to SV
   4
   5 This extension is usually dependent on SV SUBVL being implemented. When SUBVL is set to define the length of a subvector the operations in this extension interpret the elements as a single vector.
   6
   7 Normally in SV all operations are scalar and independent, and the operations on them may inherently be independently parallelised, with the result being a vector of length exactly equal to the input vectors.
   8
   9 In this extension, the subvector itself is typically the unit, although some operations will work on scalars or standard vectors as well, or the result is a scalar that is dependent on all elements within the vector arguments.
  10
  11 Examples which can require SUBVL include cross product and may in future involve complex numbers.
  12
  13 ## CORDIC
  14
  15 SUBVL=2
  16
  17 * CORDIC.lin.rot vd, vs, beta
  18 * CORDIC.cir.rot vd, vs, beta
  19 * CORDIC.hyp.rot vd, vs, beta
  20 * CORDIC.lin.vec vd, vs, beta
  21 * CORDIC.cir.vec vd, vs, beta
  22 * CORDIC.hyp.vec vd, vs, beta
  23
  24 CORDIC is an extremely general-purpose algorithm useful for a huge number
  25 of diverse purposes.  In its full form it does however require quite a
  26 few parameters, one of which is a vector, making it awkward to include in
  27 a standard "scalar" ISA.  Additionally the coordinates can be set to circular,
  28 linear or hyperbolic, producing three different modes, and the algorithm
  29 may also be run in either "vector" mode or "rotation" mode.  See [[discussion]]
  30
  31 CORDIC can also be used for performing DCT.  See
  32 <https://arxiv.org/abs/1606.02424>
  33
  34 vx, vy = CORDIC(vx, vy, coordinate\_mode, beta)
  35
  36      int i = 0;
  37      int iterations = 0; // Number of times to run the algorithm
  38      float arctanTable[iterations]; // in Radians
  39      float K = 0.6073; // K
  40      float v_x,v_y; // Vector v; x and y components
  41
  42      for(i=0; i < iterations; i++) {
  43         arctanTable[i] = atan(pow(2,-i));
  44      }
  45
  46      float vnew_x;   // To store the new value of x;
  47      for(i = 0; i < iterations; i++) {
  48          // If beta is negative, we need to do a counter-clockwise rotation:
  49          if( beta < 0) {
  50             vnew_x = v_x + (v_y*pow(2,-i));
  51             v_y -= (v_x*pow(2,-i));
  52             beta += arctanTable[i];
  53          }
  54          // If beta is positive, we need to do a clockwise rotation:
  55          else {
  56             vnew_x = v_x - (v_y*pow(2,-i));
  57             v_y += (v_x*pow(2,-i));
  58             beta -= arctanTable[i];
  59          }
  60          v_x = vnew_x;
  61      }
  62      v_x *= K;
  63      v_y *= K;
  64
  65 Links:
  66
  67 * <http://www.myhdl.org/docs/examples/sinecomp/>
  68
  69 ## Vector cross product
  70
  71 * VCROSS vd, vs1, vs1
  72
  73 Result is the cross product of x and y, i.e., the resulting components are, in order:
  74
  75     x[1] * y[2] - y[1] * x[2]
  76     x[2] * y[0] - y[2] * x[0]
  77     x[0] * y[1] - y[0] * x[1]
  78
  79 All the operands must be vectors of 3 components of a floating-point type.
  80
  81 Pseudocode:
  82
  83     vec3 a, b; // elements in order a.x, a.y, a.z
  84     // compute a cross b:
  85     vec3 t1 = a.yzx; // produce vector [a.y, a.z, a.x]
  86     vec3 t2 = b.zxy;
  87     vec3 t3 = a.zxy;
  88     vec3 t4 = b.yzx;
  89     vec3 p = t3 * t4;
  90     vec3 cross = t1 * t2 - p;
  91
  92 ## Vector dot product
  93
  94 * VDOT rd, vs1, vs2
  95
  96 Computes the dot product of two vectors. Internal accuracy must be
  97 greater than the input vectors and the result.
  98
  99 Pseudocode in python:
 100
 101     from operator import mul
 102     sum(map(mul, A, B))
 103
 104 Pseudocode in c:
 105
 106     double dot_product(float v[], float u[], int n)
 107     {
 108         double result = 0.0;
 109         for (int i = 0; i < n; i++)
 110             result += v[i] * u[i];
 111         return result;
 112     }
 113
 114 ## Vector length
 115
 116 * VLEN rd, vs1
 117
 118 The scalar length of a vector:
 119
 120     sqrt(x[0]^2 + x[1]^2 + ...).
 121
 122 ## Vector distance
 123
 124 * VDIST rd, vs1, vs2
 125
 126 The scalar distance between two vectors. Subtracts one vector from the
 127 other and returns length:
 128
 129     length(v0 - v1)
 130
 131 ## Vector LERP
 132
 133 * VLERP rd, vs1, rs2 # SUBVL=2: vs1.v0 vs1.v1
 134
 135 Known as **fmix** in GLSL.
 136
 137 <https://en.m.wikipedia.org/wiki/Linear_interpolation>
 138
 139 Pseudocode:
 140
 141     // Imprecise method, which does not guarantee v = v1 when t = 1,
 142     // due to floating-point arithmetic error.
 143     // This form may be used when the hardware has a native fused
 144     // multiply-add instruction.
 145     float lerp(float v0, float v1, float t) {
 146       return v0 + t * (v1 - v0);
 147     }
 148
 149     // Precise method, which guarantees v = v1 when t = 1.
 150     float lerp(float v0, float v1, float t) {
 151       return (1 - t) * v0 + t * v1;
 152     }
 153
 154 ## Vector SLERP
 155
 156 * VSLERP vd, vs1, vs2, rs3
 157
 158 Not recommended as it is not commonly used and has several trigonometric
 159 functions. Also a costly 4 arg operation.
 160
 161 <https://en.m.wikipedia.org/wiki/Slerp>
 162
 163 Pseudocode:
 164
 165     Quaternion slerp(Quaternion v0, Quaternion v1, double t) {
 166         // Only unit quaternions are valid rotations.
 167         // Normalize to avoid undefined behavior.
 168         v0.normalize();
 169         v1.normalize();
 170
 171         // Compute the cosine of the angle between the two vectors.
 172         double dot = dot_product(v0, v1);
 173
 174         // If the dot product is negative, slerp won't take
 175         // the shorter path. Note that v1 and -v1 are equivalent when
 176         // the negation is applied to all four components. Fix by
 177         // reversing one quaternion.
 178         if (dot < 0.0f) {
 179             v1 = -v1;
 180             dot = -dot;
 181         }
 182
 183         const double DOT_THRESHOLD = 0.9995;
 184         if (dot > DOT_THRESHOLD) {
 185             // If the inputs are too close for comfort, linearly interpolate
 186             // and normalize the result.
 187
 188             Quaternion result = v0 + t*(v1 - v0);
 189             result.normalize();
 190             return result;
 191         }
 192
 193         // Since dot is in range [0, DOT_THRESHOLD], acos is safe
 194         double theta_0 = acos(dot);        // theta_0 = angle between input vectors
 195         double theta = theta_0*t;          // theta = angle between v0 and result
 196         double sin_theta = sin(theta);     // compute this value only once
 197         double sin_theta_0 = sin(theta_0); // compute this value only once
 198
 199         double s0 = cos(theta) - dot * sin_theta / sin_theta_0;  // == sin(theta_0 - theta) / sin(theta_0)
 200         double s1 = sin_theta / sin_theta_0;
 201
 202         return (s0 * v0) + (s1 * v1);
 203     }
 204
 205 However this algorithm does not involve transcendentals except in
 206 the computation of the tables: <https://en.wikipedia.org/wiki/CORDIC#Rotation_mode>
 207
 208     function v = cordic(beta,n)
 209         % This function computes v = [cos(beta), sin(beta)] (beta in radians)
 210         % using n iterations. Increasing n will increase the precision.
 211
 212         if beta < -pi/2 || beta > pi/2
 213             if beta < 0
 214                 v = cordic(beta + pi, n);
 215             else
 216                 v = cordic(beta - pi, n);
 217             end
 218             v = -v; % flip the sign for second or third quadrant
 219             return
 220         end
 221
 222         % Initialization of tables of constants used by CORDIC
 223         % need a table of arctangents of negative powers of two, in radians:
 224         % angles = atan(2.^-(0:27));
 225         angles =  [  ...
 226             0.78539816339745   0.46364760900081
 227             0.24497866312686   0.12435499454676 ...
 228             0.06241880999596   0.03123983343027
 229             0.01562372862048   0.00781234106010 ...
 230             0.00390623013197   0.00195312251648
 231             0.00097656218956   0.00048828121119 ...
 232             0.00024414062015   0.00012207031189
 233             0.00006103515617   0.00003051757812 ...
 234             0.00001525878906   0.00000762939453
 235             0.00000381469727   0.00000190734863 ...
 236             0.00000095367432   0.00000047683716
 237             0.00000023841858   0.00000011920929 ...
 238             0.00000005960464   0.00000002980232
 239             0.00000001490116   0.00000000745058 ];
 240         % and a table of products of reciprocal lengths of vectors [1, 2^-2j]:
 241         % Kvalues = cumprod(1./abs(1 + 1j*2.^(-(0:23))))
 242         Kvalues = [ ...
 243             0.70710678118655   0.63245553203368
 244             0.61357199107790   0.60883391251775 ...
 245             0.60764825625617   0.60735177014130
 246             0.60727764409353   0.60725911229889 ...
 247             0.60725447933256   0.60725332108988
 248             0.60725303152913   0.60725295913894 ...
 249             0.60725294104140   0.60725293651701
 250             0.60725293538591   0.60725293510314 ...
 251             0.60725293503245   0.60725293501477
 252             0.60725293501035   0.60725293500925 ...
 253             0.60725293500897   0.60725293500890
 254             0.60725293500889   0.60725293500888 ];
 255         Kn = Kvalues(min(n, length(Kvalues)));
 256
 257         % Initialize loop variables:
 258         v = [1;0]; % start with 2-vector cosine and sine of zero
 259         poweroftwo = 1;
 260         angle = angles(1);
 261
 262         % Iterations
 263         for j = 0:n-1;
 264             if beta < 0
 265                 sigma = -1;
 266             else
 267                 sigma = 1;
 268             end
 269             factor = sigma * poweroftwo;
 270             % Note the matrix multiplication can be done using scaling by
 271             % powers of two and addition subtraction
 272             R = [1, -factor; factor, 1];
 273             v = R * v; % 2-by-2 matrix multiply
 274             beta = beta - sigma * angle; % update the remaining angle
 275             poweroftwo = poweroftwo / 2;
 276             % update the angle from table, or eventually by just dividing by two
 277             if j+2 > length(angles)
 278                 angle = angle / 2;
 279             else
 280                 angle = angles(j+2);
 281             end
 282         end
 283
 284         % Adjust length of output vector to be [cos(beta), sin(beta)]:
 285         v = v * Kn;
 286         return
 287
 288     endfunction
 289
 290 2x2 matrix multiply can be done with shifts and adds:
 291
 292     x = v[0] - sigma * (v[1] * 2^(-j));
 293     y = sigma * (v[0] * 2^(-j)) + v[1];
 294     v = [x; y];
 295
 296 The technique is outlined in a paper as being applicable to 3D:
 297 <https://www.atlantis-press.com/proceedings/jcis2006/232>
 298
 299 # Expensive 3-operand OP32 operations
 300
 301 3-operand operations are extremely expensive in terms of OP32 encoding space.  A potential idea is to embed 3 RVC register formats across two out of three 5-bit fields rs1/rs2/rd
 302
 303 Another is to overwrite one of the src registers.
 304
 305 # Opcode Table
 306
 307 TODO