simple_v_extension/vector_ops.mdwn

   1 [[!tag standards]]
   2
   3 # Vector Operations Extension to SV
   4
   5 This extension is usually dependent on SV SUBVL being implemented. When SUBVL is set to define the length of a subvector the operations in this extension interpret the elements as a single vector.
   6
   7 Normally in SV all operations are scalar and independent, and the operations on them may inherently be independently parallelised, with the result being a vector of length exactly equal to the input vectors.
   8
   9 In this extension, the subvector itself is typically the unit, although some operations will work on scalars or standard vectors as well, or the result is a scalar that is dependent on all elements within the vector arguments.
  10
  11 However given that some of the parameters are vectors (with and without SUBVL set), and some are scalars (where SUBVL will not apply), some clear rules need to be defined as to how the operations work.
  12
  13 Examples which can require SUBVL include cross product and may in future involve complex numbers.
  14
  15 ## CORDIC
  16
  17 * SUBVL=2, vd, vs; SUBVL ignored on beta.
  18
  19 * CORDIC.lin.rot vd, vs, beta
  20 * CORDIC.cir.rot vd, vs, beta
  21 * CORDIC.hyp.rot vd, vs, beta
  22 * CORDIC.lin.vec vd, vs, beta
  23 * CORDIC.cir.vec vd, vs, beta
  24 * CORDIC.hyp.vec vd, vs, beta
  25
  26 CORDIC is an extremely general-purpose algorithm useful for a huge number
  27 of diverse purposes.  In its full form it does however require quite a
  28 few parameters, one of which is a vector, making it awkward to include in
  29 a standard "scalar" ISA.  Additionally the coordinates can be set to circular,
  30 linear or hyperbolic, producing three different modes, and the algorithm
  31 may also be run in either "vector" mode or "rotation" mode.  See [[discussion]]
  32
  33 CORDIC can also be used for performing DCT.  See
  34 <https://arxiv.org/abs/1606.02424>
  35
  36 vx, vy = CORDIC(vx, vy, coordinate\_mode, beta)
  37
  38      int i = 0;
  39      int iterations = 0; // Number of times to run the algorithm
  40      float arctanTable[iterations]; // in Radians
  41      float K = 0.6073; // K
  42      float v_x,v_y; // Vector v; x and y components
  43
  44      for(i=0; i < iterations; i++) {
  45         arctanTable[i] = atan(pow(2,-i));
  46      }
  47
  48      float vnew_x;   // To store the new value of x;
  49      for(i = 0; i < iterations; i++) {
  50          // If beta is negative, we need to do a counter-clockwise rotation:
  51          if( beta < 0) {
  52             vnew_x = v_x + (v_y*pow(2,-i));
  53             v_y -= (v_x*pow(2,-i));
  54             beta += arctanTable[i];
  55          }
  56          // If beta is positive, we need to do a clockwise rotation:
  57          else {
  58             vnew_x = v_x - (v_y*pow(2,-i));
  59             v_y += (v_x*pow(2,-i));
  60             beta -= arctanTable[i];
  61          }
  62          v_x = vnew_x;
  63      }
  64      v_x *= K;
  65      v_y *= K;
  66
  67 Links:
  68
  69 * <http://www.myhdl.org/docs/examples/sinecomp/>
  70
  71 ## Vector cross product
  72
  73 SUBVL=2,3,4 all regs
  74
  75 * VCROSS vd, vs1, vs1
  76
  77 Result is the cross product of x and y, i.e., the resulting components are, in order:
  78
  79     x[1] * y[2] - y[1] * x[2]
  80     x[2] * y[0] - y[2] * x[0]
  81     x[0] * y[1] - y[0] * x[1]
  82
  83 All the operands must be vectors of 3 components of a floating-point type.
  84
  85 Pseudocode:
  86
  87     vec3 a, b; // elements in order a.x, a.y, a.z
  88     // compute a cross b:
  89     vec3 t1 = a.yzx; // produce vector [a.y, a.z, a.x]
  90     vec3 t2 = b.zxy;
  91     vec3 t3 = a.zxy;
  92     vec3 t4 = b.yzx;
  93     vec3 p = t3 * t4;
  94     vec3 cross = t1 * t2 - p;
  95
  96 ## Vector dot product
  97
  98 * SUBVL ignored on rd.  SUBVL=2,3,4 vs1,vs2
  99 * rd=scalar, SUBVL=1, vs1, vs2=vec
 100
 101 * VDOT rd, vs1, vs2
 102
 103 Computes the dot product of two vectors. Internal accuracy must be
 104 greater than the input vectors and the result.
 105
 106 Pseudocode in python:
 107
 108     from operator import mul
 109     sum(map(mul, A, B))
 110
 111 Pseudocode in c:
 112
 113     double dot_product(float v[], float u[], int n)
 114     {
 115         double result = 0.0;
 116         for (int i = 0; i < n; i++)
 117             result += v[i] * u[i];
 118         return result;
 119     }
 120
 121 ## Vector length
 122
 123 * rd=scalar, vs1=vec (SUBVL=1)
 124 * rd=scalar, vs1=vec (SUBVL=2,3,4) only 1 (predication rules apply)
 125 * rd=vec, SUBVL ignored; vs1=vec, SUBVL=2,3,4
 126 * rd=vec, SUBVL ignored; vs1=vec, SUBVL=1: reserved encoding.
 127
 128 * VLEN rd, vs1
 129
 130 The scalar length of a vector:
 131
 132     sqrt(x[0]^2 + x[1]^2 + ...).
 133
 134 ## Vector distance
 135
 136 * VDIST rd, vs1, vs2
 137
 138 The scalar distance between two vectors. Subtracts one vector from the
 139 other and returns length:
 140
 141     length(v0 - v1)
 142
 143 ## Vector LERP
 144
 145 * VLERP rd, vs1, rs2 # SUBVL=2: vs1.v0 vs1.v1
 146
 147 Known as **fmix** in GLSL.
 148
 149 <https://en.m.wikipedia.org/wiki/Linear_interpolation>
 150
 151 Pseudocode:
 152
 153     // Imprecise method, which does not guarantee v = v1 when t = 1,
 154     // due to floating-point arithmetic error.
 155     // This form may be used when the hardware has a native fused
 156     // multiply-add instruction.
 157     float lerp(float v0, float v1, float t) {
 158       return v0 + t * (v1 - v0);
 159     }
 160
 161     // Precise method, which guarantees v = v1 when t = 1.
 162     float lerp(float v0, float v1, float t) {
 163       return (1 - t) * v0 + t * v1;
 164     }
 165
 166 ## Vector SLERP
 167
 168 * VSLERP vd, vs1, vs2, rs3
 169
 170 Not recommended as it is not commonly used and has several trigonometric
 171 functions, although CORDIC in vector rotate circular mode is designed for this purpose. Also a costly 4 arg operation.
 172
 173 <https://en.m.wikipedia.org/wiki/Slerp>
 174
 175 Pseudocode:
 176
 177     Quaternion slerp(Quaternion v0, Quaternion v1, double t) {
 178         // Only unit quaternions are valid rotations.
 179         // Normalize to avoid undefined behavior.
 180         v0.normalize();
 181         v1.normalize();
 182
 183         // Compute the cosine of the angle between the two vectors.
 184         double dot = dot_product(v0, v1);
 185
 186         // If the dot product is negative, slerp won't take
 187         // the shorter path. Note that v1 and -v1 are equivalent when
 188         // the negation is applied to all four components. Fix by
 189         // reversing one quaternion.
 190         if (dot < 0.0f) {
 191             v1 = -v1;
 192             dot = -dot;
 193         }
 194
 195         const double DOT_THRESHOLD = 0.9995;
 196         if (dot > DOT_THRESHOLD) {
 197             // If the inputs are too close for comfort, linearly interpolate
 198             // and normalize the result.
 199
 200             Quaternion result = v0 + t*(v1 - v0);
 201             result.normalize();
 202             return result;
 203         }
 204
 205         // Since dot is in range [0, DOT_THRESHOLD], acos is safe
 206         double theta_0 = acos(dot);        // theta_0 = angle between input vectors
 207         double theta = theta_0*t;          // theta = angle between v0 and result
 208         double sin_theta = sin(theta);     // compute this value only once
 209         double sin_theta_0 = sin(theta_0); // compute this value only once
 210
 211         double s0 = cos(theta) - dot * sin_theta / sin_theta_0;  // == sin(theta_0 - theta) / sin(theta_0)
 212         double s1 = sin_theta / sin_theta_0;
 213
 214         return (s0 * v0) + (s1 * v1);
 215     }
 216
 217 However this algorithm does not involve transcendentals except in
 218 the computation of the tables: <https://en.wikipedia.org/wiki/CORDIC#Rotation_mode>
 219
 220     function v = cordic(beta,n)
 221         % This function computes v = [cos(beta), sin(beta)] (beta in radians)
 222         % using n iterations. Increasing n will increase the precision.
 223
 224         if beta < -pi/2 || beta > pi/2
 225             if beta < 0
 226                 v = cordic(beta + pi, n);
 227             else
 228                 v = cordic(beta - pi, n);
 229             end
 230             v = -v; % flip the sign for second or third quadrant
 231             return
 232         end
 233
 234         % Initialization of tables of constants used by CORDIC
 235         % need a table of arctangents of negative powers of two, in radians:
 236         % angles = atan(2.^-(0:27));
 237         angles =  [  ...
 238             0.78539816339745   0.46364760900081
 239             0.24497866312686   0.12435499454676 ...
 240             0.06241880999596   0.03123983343027
 241             0.01562372862048   0.00781234106010 ...
 242             0.00390623013197   0.00195312251648
 243             0.00097656218956   0.00048828121119 ...
 244             0.00024414062015   0.00012207031189
 245             0.00006103515617   0.00003051757812 ...
 246             0.00001525878906   0.00000762939453
 247             0.00000381469727   0.00000190734863 ...
 248             0.00000095367432   0.00000047683716
 249             0.00000023841858   0.00000011920929 ...
 250             0.00000005960464   0.00000002980232
 251             0.00000001490116   0.00000000745058 ];
 252         % and a table of products of reciprocal lengths of vectors [1, 2^-2j]:
 253         % Kvalues = cumprod(1./abs(1 + 1j*2.^(-(0:23))))
 254         Kvalues = [ ...
 255             0.70710678118655   0.63245553203368
 256             0.61357199107790   0.60883391251775 ...
 257             0.60764825625617   0.60735177014130
 258             0.60727764409353   0.60725911229889 ...
 259             0.60725447933256   0.60725332108988
 260             0.60725303152913   0.60725295913894 ...
 261             0.60725294104140   0.60725293651701
 262             0.60725293538591   0.60725293510314 ...
 263             0.60725293503245   0.60725293501477
 264             0.60725293501035   0.60725293500925 ...
 265             0.60725293500897   0.60725293500890
 266             0.60725293500889   0.60725293500888 ];
 267         Kn = Kvalues(min(n, length(Kvalues)));
 268
 269         % Initialize loop variables:
 270         v = [1;0]; % start with 2-vector cosine and sine of zero
 271         poweroftwo = 1;
 272         angle = angles(1);
 273
 274         % Iterations
 275         for j = 0:n-1;
 276             if beta < 0
 277                 sigma = -1;
 278             else
 279                 sigma = 1;
 280             end
 281             factor = sigma * poweroftwo;
 282             % Note the matrix multiplication can be done using scaling by
 283             % powers of two and addition subtraction
 284             R = [1, -factor; factor, 1];
 285             v = R * v; % 2-by-2 matrix multiply
 286             beta = beta - sigma * angle; % update the remaining angle
 287             poweroftwo = poweroftwo / 2;
 288             % update the angle from table, or eventually by just dividing by two
 289             if j+2 > length(angles)
 290                 angle = angle / 2;
 291             else
 292                 angle = angles(j+2);
 293             end
 294         end
 295
 296         % Adjust length of output vector to be [cos(beta), sin(beta)]:
 297         v = v * Kn;
 298         return
 299
 300     endfunction
 301
 302 2x2 matrix multiply can be done with shifts and adds:
 303
 304     x = v[0] - sigma * (v[1] * 2^(-j));
 305     y = sigma * (v[0] * 2^(-j)) + v[1];
 306     v = [x; y];
 307
 308 The technique is outlined in a paper as being applicable to 3D:
 309 <https://www.atlantis-press.com/proceedings/jcis2006/232>
 310
 311 # Expensive 3-operand OP32 operations
 312
 313 3-operand operations are extremely expensive in terms of OP32 encoding space.  A potential idea is to embed 3 RVC register formats across two out of three 5-bit fields rs1/rs2/rd
 314
 315 Another is to overwrite one of the src registers.
 316
 317 # Opcode Table
 318
 319 TODO
 320
 321 # Links
 322
 323 * <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-September/002736.html>
 324 * <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-September/002733.html>
 325 * <http://bugs.libre-riscv.org/show_bug.cgi?id=142>