simple_v_extension/vector_ops.mdwn

   1 [[!tag standards]]
   2
   3 # Vector Operations Extension to SV
   4
   5 This extension is usually dependent on SV SUBVL being implemented. When SUBVL is set to define the length of a subvector the operations in this extension interpret the elements as a single vector.
   6
   7 Normally in SV all operations are scalar and independent, and the operations on them may inherently be independently parallelised, with the result being a vector of length exactly equal to the input vectors.
   8
   9 In this extension, the subvector itself is typically the unit, although some operations will work on scalars or standard vectors as well, or the result is a scalar that is dependent on all elements within the vector arguments.
  10
  11 However given that some of the parameters are vectors (with and without SUBVL set), and some are scalars (where SUBVL will not apply), some clear rules need to be defined as to how the operations work.
  12
  13 Examples which can require SUBVL include cross product and may in future involve complex numbers.
  14
  15 ## CORDIC
  16
  17 * SUBVL=2, vd, vs; SUBVL ignored on beta.
  18 * VL > 1 ok.  beta as scalar ok (applies across all vectors)
  19 * non vector args vd, vs, or SUBVL!=2 reserved.
  20
  21 * CORDIC.lin.rot vd, vs, beta
  22 * CORDIC.cir.rot vd, vs, beta
  23 * CORDIC.hyp.rot vd, vs, beta
  24 * CORDIC.lin.vec vd, vs, beta
  25 * CORDIC.cir.vec vd, vs, beta
  26 * CORDIC.hyp.vec vd, vs, beta
  27
  28 CORDIC is an extremely general-purpose algorithm useful for a huge number
  29 of diverse purposes.  In its full form it does however require quite a
  30 few parameters, one of which is a vector, making it awkward to include in
  31 a standard "scalar" ISA.  Additionally the coordinates can be set to circular,
  32 linear or hyperbolic, producing three different modes, and the algorithm
  33 may also be run in either "vector" mode or "rotation" mode.  See [[discussion]]
  34
  35 CORDIC can also be used for performing DCT.  See
  36 <https://arxiv.org/abs/1606.02424>
  37
  38 vx, vy = CORDIC(vx, vy, coordinate\_mode, beta)
  39
  40      int i = 0;
  41      int iterations = 0; // Number of times to run the algorithm
  42      float arctanTable[iterations]; // in Radians
  43      float K = 0.6073; // K
  44      float v_x,v_y; // Vector v; x and y components
  45
  46      for(i=0; i < iterations; i++) {
  47         arctanTable[i] = atan(pow(2,-i));
  48      }
  49
  50      float vnew_x;   // To store the new value of x;
  51      for(i = 0; i < iterations; i++) {
  52          // If beta is negative, we need to do a counter-clockwise rotation:
  53          if( beta < 0) {
  54             vnew_x = v_x + (v_y*pow(2,-i));
  55             v_y -= (v_x*pow(2,-i));
  56             beta += arctanTable[i];
  57          }
  58          // If beta is positive, we need to do a clockwise rotation:
  59          else {
  60             vnew_x = v_x - (v_y*pow(2,-i));
  61             v_y += (v_x*pow(2,-i));
  62             beta -= arctanTable[i];
  63          }
  64          v_x = vnew_x;
  65      }
  66      v_x *= K;
  67      v_y *= K;
  68
  69 Links:
  70
  71 * <http://www.myhdl.org/docs/examples/sinecomp/>
  72
  73 ## Vector cross product
  74
  75 SUBVL=2,3,4 all regs
  76
  77 * VCROSS vd, vs1, vs1
  78
  79 Result is the cross product of x and y, i.e., the resulting components are, in order:
  80
  81     x[1] * y[2] - y[1] * x[2]
  82     x[2] * y[0] - y[2] * x[0]
  83     x[0] * y[1] - y[0] * x[1]
  84
  85 All the operands must be vectors of 3 components of a floating-point type.
  86
  87 Pseudocode:
  88
  89     vec3 a, b; // elements in order a.x, a.y, a.z
  90     // compute a cross b:
  91     vec3 t1 = a.yzx; // produce vector [a.y, a.z, a.x]
  92     vec3 t2 = b.zxy;
  93     vec3 t3 = a.zxy;
  94     vec3 t4 = b.yzx;
  95     vec3 p = t3 * t4;
  96     vec3 cross = t1 * t2 - p;
  97
  98 ## Vector dot product
  99
 100 * SUBVL ignored on rd.  SUBVL=2,3,4 vs1,vs2
 101 * rd=scalar, SUBVL=1, vs1, vs2=vec
 102
 103 * VDOT rd, vs1, vs2
 104
 105 Computes the dot product of two vectors. Internal accuracy must be
 106 greater than the input vectors and the result.
 107
 108 Pseudocode in python:
 109
 110     from operator import mul
 111     sum(map(mul, A, B))
 112
 113 Pseudocode in c:
 114
 115     double dot_product(float v[], float u[], int n)
 116     {
 117         double result = 0.0;
 118         for (int i = 0; i < n; i++)
 119             result += v[i] * u[i];
 120         return result;
 121     }
 122
 123 ## Vector length
 124
 125 * rd=scalar, vs1=vec (SUBVL=1)
 126 * rd=scalar, vs1=vec (SUBVL=2,3,4) only 1 (predication rules apply)
 127 * rd=vec, SUBVL ignored; vs1=vec, SUBVL=2,3,4
 128 * rd=vec, SUBVL ignored; vs1=vec, SUBVL=1: reserved encoding.
 129
 130 * VLEN rd, vs1
 131
 132 The scalar length of a vector:
 133
 134     sqrt(x[0]^2 + x[1]^2 + ...).
 135
 136 ## Vector distance
 137
 138 * VDIST rd, vs1, vs2
 139
 140 The scalar distance between two vectors. Subtracts one vector from the
 141 other and returns length:
 142
 143     length(v0 - v1)
 144
 145 ## Vector LERP
 146
 147 * VLERP rd, vs1, rs2 # SUBVL=2: vs1.v0 vs1.v1
 148
 149 Known as **fmix** in GLSL.
 150
 151 <https://en.m.wikipedia.org/wiki/Linear_interpolation>
 152
 153 Pseudocode:
 154
 155     // Imprecise method, which does not guarantee v = v1 when t = 1,
 156     // due to floating-point arithmetic error.
 157     // This form may be used when the hardware has a native fused
 158     // multiply-add instruction.
 159     float lerp(float v0, float v1, float t) {
 160       return v0 + t * (v1 - v0);
 161     }
 162
 163     // Precise method, which guarantees v = v1 when t = 1.
 164     float lerp(float v0, float v1, float t) {
 165       return (1 - t) * v0 + t * v1;
 166     }
 167
 168 ## Vector SLERP
 169
 170 * VSLERP vd, vs1, vs2, rs3
 171
 172 Not recommended as it is not commonly used and has several trigonometric
 173 functions, although CORDIC in vector rotate circular mode is designed for this purpose. Also a costly 4 arg operation.
 174
 175 <https://en.m.wikipedia.org/wiki/Slerp>
 176
 177 Pseudocode:
 178
 179     Quaternion slerp(Quaternion v0, Quaternion v1, double t) {
 180         // Only unit quaternions are valid rotations.
 181         // Normalize to avoid undefined behavior.
 182         v0.normalize();
 183         v1.normalize();
 184
 185         // Compute the cosine of the angle between the two vectors.
 186         double dot = dot_product(v0, v1);
 187
 188         // If the dot product is negative, slerp won't take
 189         // the shorter path. Note that v1 and -v1 are equivalent when
 190         // the negation is applied to all four components. Fix by
 191         // reversing one quaternion.
 192         if (dot < 0.0f) {
 193             v1 = -v1;
 194             dot = -dot;
 195         }
 196
 197         const double DOT_THRESHOLD = 0.9995;
 198         if (dot > DOT_THRESHOLD) {
 199             // If the inputs are too close for comfort, linearly interpolate
 200             // and normalize the result.
 201
 202             Quaternion result = v0 + t*(v1 - v0);
 203             result.normalize();
 204             return result;
 205         }
 206
 207         // Since dot is in range [0, DOT_THRESHOLD], acos is safe
 208         double theta_0 = acos(dot);        // theta_0 = angle between input vectors
 209         double theta = theta_0*t;          // theta = angle between v0 and result
 210         double sin_theta = sin(theta);     // compute this value only once
 211         double sin_theta_0 = sin(theta_0); // compute this value only once
 212
 213         double s0 = cos(theta) - dot * sin_theta / sin_theta_0;  // == sin(theta_0 - theta) / sin(theta_0)
 214         double s1 = sin_theta / sin_theta_0;
 215
 216         return (s0 * v0) + (s1 * v1);
 217     }
 218
 219 However this algorithm does not involve transcendentals except in
 220 the computation of the tables: <https://en.wikipedia.org/wiki/CORDIC#Rotation_mode>
 221
 222     function v = cordic(beta,n)
 223         % This function computes v = [cos(beta), sin(beta)] (beta in radians)
 224         % using n iterations. Increasing n will increase the precision.
 225
 226         if beta < -pi/2 || beta > pi/2
 227             if beta < 0
 228                 v = cordic(beta + pi, n);
 229             else
 230                 v = cordic(beta - pi, n);
 231             end
 232             v = -v; % flip the sign for second or third quadrant
 233             return
 234         end
 235
 236         % Initialization of tables of constants used by CORDIC
 237         % need a table of arctangents of negative powers of two, in radians:
 238         % angles = atan(2.^-(0:27));
 239         angles =  [  ...
 240             0.78539816339745   0.46364760900081
 241             0.24497866312686   0.12435499454676 ...
 242             0.06241880999596   0.03123983343027
 243             0.01562372862048   0.00781234106010 ...
 244             0.00390623013197   0.00195312251648
 245             0.00097656218956   0.00048828121119 ...
 246             0.00024414062015   0.00012207031189
 247             0.00006103515617   0.00003051757812 ...
 248             0.00001525878906   0.00000762939453
 249             0.00000381469727   0.00000190734863 ...
 250             0.00000095367432   0.00000047683716
 251             0.00000023841858   0.00000011920929 ...
 252             0.00000005960464   0.00000002980232
 253             0.00000001490116   0.00000000745058 ];
 254         % and a table of products of reciprocal lengths of vectors [1, 2^-2j]:
 255         % Kvalues = cumprod(1./abs(1 + 1j*2.^(-(0:23))))
 256         Kvalues = [ ...
 257             0.70710678118655   0.63245553203368
 258             0.61357199107790   0.60883391251775 ...
 259             0.60764825625617   0.60735177014130
 260             0.60727764409353   0.60725911229889 ...
 261             0.60725447933256   0.60725332108988
 262             0.60725303152913   0.60725295913894 ...
 263             0.60725294104140   0.60725293651701
 264             0.60725293538591   0.60725293510314 ...
 265             0.60725293503245   0.60725293501477
 266             0.60725293501035   0.60725293500925 ...
 267             0.60725293500897   0.60725293500890
 268             0.60725293500889   0.60725293500888 ];
 269         Kn = Kvalues(min(n, length(Kvalues)));
 270
 271         % Initialize loop variables:
 272         v = [1;0]; % start with 2-vector cosine and sine of zero
 273         poweroftwo = 1;
 274         angle = angles(1);
 275
 276         % Iterations
 277         for j = 0:n-1;
 278             if beta < 0
 279                 sigma = -1;
 280             else
 281                 sigma = 1;
 282             end
 283             factor = sigma * poweroftwo;
 284             % Note the matrix multiplication can be done using scaling by
 285             % powers of two and addition subtraction
 286             R = [1, -factor; factor, 1];
 287             v = R * v; % 2-by-2 matrix multiply
 288             beta = beta - sigma * angle; % update the remaining angle
 289             poweroftwo = poweroftwo / 2;
 290             % update the angle from table, or eventually by just dividing by two
 291             if j+2 > length(angles)
 292                 angle = angle / 2;
 293             else
 294                 angle = angles(j+2);
 295             end
 296         end
 297
 298         % Adjust length of output vector to be [cos(beta), sin(beta)]:
 299         v = v * Kn;
 300         return
 301
 302     endfunction
 303
 304 2x2 matrix multiply can be done with shifts and adds:
 305
 306     x = v[0] - sigma * (v[1] * 2^(-j));
 307     y = sigma * (v[0] * 2^(-j)) + v[1];
 308     v = [x; y];
 309
 310 The technique is outlined in a paper as being applicable to 3D:
 311 <https://www.atlantis-press.com/proceedings/jcis2006/232>
 312
 313 # Expensive 3-operand OP32 operations
 314
 315 3-operand operations are extremely expensive in terms of OP32 encoding space.  A potential idea is to embed 3 RVC register formats across two out of three 5-bit fields rs1/rs2/rd
 316
 317 Another is to overwrite one of the src registers.
 318
 319 # Opcode Table
 320
 321 TODO
 322
 323 # Links
 324
 325 * <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-September/002736.html>
 326 * <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-September/002733.html>
 327 * <http://bugs.libre-riscv.org/show_bug.cgi?id=142>