simple_v_extension/vector_ops.mdwn

   1 [[!tag standards]]
   2
   3 # Vector Operations Extension to SV
   4
   5 This extension is usually dependent on SV SUBVL being implemented. When SUBVL is set to define the length of a subvector the operations in this extension interpret the elements as a single vector.
   6
   7 Normally in SV all operations are scalar and independent, and the operations on them may inherently be independently parallelised, with the result being a vector of length exactly equal to the input vectors.
   8
   9 In this extension, the subvector itself is typically the unit, although some operations will work on scalars or standard vectors as well, or the result is a scalar that is dependent on all elements within the vector arguments.
  10
  11 However given that some of the parameters are vectors (with and without SUBVL set), and some are scalars (where SUBVL will not apply), some clear rules need to be defined as to how the operations work.
  12
  13 Examples which can require SUBVL include cross product and may in future involve complex numbers.
  14
  15 ## CORDIC
  16
  17 * SUBVL=2, vd, vs; SUBVL ignored on beta.
  18 * VL nonzero ok.  beta as scalar ok (applies across all vectors)
  19 * non vector args vd, vs, or SUBVL!=2 reserved.
  20
  21 6 opcode options (fmt3):
  22
  23 * CORDIC.lin.rot vd, vs, beta
  24 * CORDIC.cir.rot vd, vs, beta
  25 * CORDIC.hyp.rot vd, vs, beta
  26 * CORDIC.lin.vec vd, vs, beta
  27 * CORDIC.cir.vec vd, vs, beta
  28 * CORDIC.hyp.vec vd, vs, beta
  29
  30 CORDIC is an extremely general-purpose algorithm useful for a huge number
  31 of diverse purposes.  In its full form it does however require quite a
  32 few parameters, one of which is a vector, making it awkward to include in
  33 a standard "scalar" ISA.  Additionally the coordinates can be set to circular,
  34 linear or hyperbolic, producing three different modes, and the algorithm
  35 may also be run in either "vector" mode or "rotation" mode.  See [[discussion]]
  36
  37 CORDIC can also be used for performing DCT.  See
  38 <https://arxiv.org/abs/1606.02424>
  39
  40 vx, vy = CORDIC(vx, vy, coordinate\_mode, beta)
  41
  42      int i = 0;
  43      int iterations = 0; // Number of times to run the algorithm
  44      float arctanTable[iterations]; // in Radians
  45      float K = 0.6073; // K
  46      float v_x,v_y; // Vector v; x and y components
  47
  48      for(i=0; i < iterations; i++) {
  49         arctanTable[i] = atan(pow(2,-i));
  50      }
  51
  52      float vnew_x;   // To store the new value of x;
  53      for(i = 0; i < iterations; i++) {
  54          // If beta is negative, we need to do a counter-clockwise rotation:
  55          if( beta < 0) {
  56             vnew_x = v_x + (v_y*pow(2,-i));
  57             v_y -= (v_x*pow(2,-i));
  58             beta += arctanTable[i];
  59          }
  60          // If beta is positive, we need to do a clockwise rotation:
  61          else {
  62             vnew_x = v_x - (v_y*pow(2,-i));
  63             v_y += (v_x*pow(2,-i));
  64             beta -= arctanTable[i];
  65          }
  66          v_x = vnew_x;
  67      }
  68      v_x *= K;
  69      v_y *= K;
  70
  71 Links:
  72
  73 * <http://www.myhdl.org/docs/examples/sinecomp/>
  74
  75 ## Vector cross product
  76
  77 SUBVL=3, all regs. VL nonzero produces multiple vd results.
  78
  79 * VCROSS vd, vs1, vs1
  80
  81 Result is the cross product of x and y, i.e., the resulting components are, in order:
  82
  83     x[1] * y[2] - y[1] * x[2]
  84     x[2] * y[0] - y[2] * x[0]
  85     x[0] * y[1] - y[0] * x[1]
  86
  87 All the operands must be vectors of 3 components of a floating-point type.
  88
  89 Pseudocode:
  90
  91     vec3 a, b; // elements in order a.x, a.y, a.z
  92     // compute a cross b:
  93     vec3 t1 = a.yzx; // produce vector [a.y, a.z, a.x]
  94     vec3 t2 = b.zxy;
  95     vec3 t3 = a.zxy;
  96     vec3 t4 = b.yzx;
  97     vec3 p = t3 * t4;
  98     vec3 cross = t1 * t2 - p;
  99
 100 Assembler:
 101
 102     fpermute,2130 F4, F1
 103     fpermute,1320 F5, F1
 104     fpermute,2130 F6, F2
 105     fpermute,1320 F7, F2
 106     fmul F8, F5, F6
 107     fmulsub F3, F4, F7, F8
 108
 109 ## Vector dot product
 110
 111 * SUBVL ignored on rd.  SUBVL=2,3,4 vs1,vs2, if all vectors, multiple results generated. If rd scalar, only first (unpredicated) SUBVector is used.
 112 * rd=scalar, SUBVL=1 and vs1, vs2=vec will produce one scalar result. Predication allowed on src vectors.
 113
 114 * VDOT rd, vs1, vs2
 115
 116 Computes the dot product of two vectors. Internal accuracy must be
 117 greater than the input vectors and the result.
 118
 119 Pseudocode in python:
 120
 121     from operator import mul
 122     sum(map(mul, A, B))
 123
 124 Pseudocode in c:
 125
 126     double dot_product(float v[], float u[], int n)
 127     {
 128         double result = 0.0;
 129         for (int i = 0; i < n; i++)
 130             result += v[i] * u[i];
 131         return result;
 132     }
 133
 134 ## Vector length
 135
 136 * rd=scalar, vs1=vec (SUBVL=1)
 137 * rd=scalar, vs1=vec (SUBVL=2,3,4) only 1 (predication rules apply)
 138 * rd=vec, SUBVL ignored; vs1=vec, SUBVL=2,3,4
 139 * rd=vec, SUBVL ignored; vs1=vec, SUBVL=1: reserved encoding.
 140
 141 * VLEN rd, vs1
 142
 143 The scalar length of a vector:
 144
 145     sqrt(x[0]^2 + x[1]^2 + ...).
 146
 147 ## Vector distance
 148
 149 * VDIST rd, vs1, vs2
 150
 151 The scalar distance between two vectors. Subtracts one vector from the
 152 other and returns length:
 153
 154     length(v0 - v1)
 155
 156 ## Vector LERP
 157
 158 * VLERP rd, vs1, rs2 # SUBVL=2: vs1.v0 vs1.v1
 159
 160 Known as **fmix** in GLSL.
 161
 162 <https://en.m.wikipedia.org/wiki/Linear_interpolation>
 163
 164 Pseudocode:
 165
 166     // Imprecise method, which does not guarantee v = v1 when t = 1,
 167     // due to floating-point arithmetic error.
 168     // This form may be used when the hardware has a native fused
 169     // multiply-add instruction.
 170     float lerp(float v0, float v1, float t) {
 171       return v0 + t * (v1 - v0);
 172     }
 173
 174     // Precise method, which guarantees v = v1 when t = 1.
 175     float lerp(float v0, float v1, float t) {
 176       return (1 - t) * v0 + t * v1;
 177     }
 178
 179 ## Vector SLERP
 180
 181 * VSLERP vd, vs1, vs2, rs3
 182
 183 Not recommended as it is not commonly used and has several trigonometric
 184 functions, although CORDIC in vector rotate circular mode is designed for this purpose. Also a costly 4 arg operation.
 185
 186 <https://en.m.wikipedia.org/wiki/Slerp>
 187
 188 Pseudocode:
 189
 190     Quaternion slerp(Quaternion v0, Quaternion v1, double t) {
 191         // Only unit quaternions are valid rotations.
 192         // Normalize to avoid undefined behavior.
 193         v0.normalize();
 194         v1.normalize();
 195
 196         // Compute the cosine of the angle between the two vectors.
 197         double dot = dot_product(v0, v1);
 198
 199         // If the dot product is negative, slerp won't take
 200         // the shorter path. Note that v1 and -v1 are equivalent when
 201         // the negation is applied to all four components. Fix by
 202         // reversing one quaternion.
 203         if (dot < 0.0f) {
 204             v1 = -v1;
 205             dot = -dot;
 206         }
 207
 208         const double DOT_THRESHOLD = 0.9995;
 209         if (dot > DOT_THRESHOLD) {
 210             // If the inputs are too close for comfort, linearly interpolate
 211             // and normalize the result.
 212
 213             Quaternion result = v0 + t*(v1 - v0);
 214             result.normalize();
 215             return result;
 216         }
 217
 218         // Since dot is in range [0, DOT_THRESHOLD], acos is safe
 219         double theta_0 = acos(dot);        // theta_0 = angle between input vectors
 220         double theta = theta_0*t;          // theta = angle between v0 and result
 221         double sin_theta = sin(theta);     // compute this value only once
 222         double sin_theta_0 = sin(theta_0); // compute this value only once
 223
 224         double s0 = cos(theta) - dot * sin_theta / sin_theta_0;  // == sin(theta_0 - theta) / sin(theta_0)
 225         double s1 = sin_theta / sin_theta_0;
 226
 227         return (s0 * v0) + (s1 * v1);
 228     }
 229
 230 However this algorithm does not involve transcendentals except in
 231 the computation of the tables: <https://en.wikipedia.org/wiki/CORDIC#Rotation_mode>
 232
 233     function v = cordic(beta,n)
 234         % This function computes v = [cos(beta), sin(beta)] (beta in radians)
 235         % using n iterations. Increasing n will increase the precision.
 236
 237         if beta < -pi/2 || beta > pi/2
 238             if beta < 0
 239                 v = cordic(beta + pi, n);
 240             else
 241                 v = cordic(beta - pi, n);
 242             end
 243             v = -v; % flip the sign for second or third quadrant
 244             return
 245         end
 246
 247         % Initialization of tables of constants used by CORDIC
 248         % need a table of arctangents of negative powers of two, in radians:
 249         % angles = atan(2.^-(0:27));
 250         angles =  [  ...
 251             0.78539816339745   0.46364760900081
 252             0.24497866312686   0.12435499454676 ...
 253             0.06241880999596   0.03123983343027
 254             0.01562372862048   0.00781234106010 ...
 255             0.00390623013197   0.00195312251648
 256             0.00097656218956   0.00048828121119 ...
 257             0.00024414062015   0.00012207031189
 258             0.00006103515617   0.00003051757812 ...
 259             0.00001525878906   0.00000762939453
 260             0.00000381469727   0.00000190734863 ...
 261             0.00000095367432   0.00000047683716
 262             0.00000023841858   0.00000011920929 ...
 263             0.00000005960464   0.00000002980232
 264             0.00000001490116   0.00000000745058 ];
 265         % and a table of products of reciprocal lengths of vectors [1, 2^-2j]:
 266         % Kvalues = cumprod(1./abs(1 + 1j*2.^(-(0:23))))
 267         Kvalues = [ ...
 268             0.70710678118655   0.63245553203368
 269             0.61357199107790   0.60883391251775 ...
 270             0.60764825625617   0.60735177014130
 271             0.60727764409353   0.60725911229889 ...
 272             0.60725447933256   0.60725332108988
 273             0.60725303152913   0.60725295913894 ...
 274             0.60725294104140   0.60725293651701
 275             0.60725293538591   0.60725293510314 ...
 276             0.60725293503245   0.60725293501477
 277             0.60725293501035   0.60725293500925 ...
 278             0.60725293500897   0.60725293500890
 279             0.60725293500889   0.60725293500888 ];
 280         Kn = Kvalues(min(n, length(Kvalues)));
 281
 282         % Initialize loop variables:
 283         v = [1;0]; % start with 2-vector cosine and sine of zero
 284         poweroftwo = 1;
 285         angle = angles(1);
 286
 287         % Iterations
 288         for j = 0:n-1;
 289             if beta < 0
 290                 sigma = -1;
 291             else
 292                 sigma = 1;
 293             end
 294             factor = sigma * poweroftwo;
 295             % Note the matrix multiplication can be done using scaling by
 296             % powers of two and addition subtraction
 297             R = [1, -factor; factor, 1];
 298             v = R * v; % 2-by-2 matrix multiply
 299             beta = beta - sigma * angle; % update the remaining angle
 300             poweroftwo = poweroftwo / 2;
 301             % update the angle from table, or eventually by just dividing by two
 302             if j+2 > length(angles)
 303                 angle = angle / 2;
 304             else
 305                 angle = angles(j+2);
 306             end
 307         end
 308
 309         % Adjust length of output vector to be [cos(beta), sin(beta)]:
 310         v = v * Kn;
 311         return
 312
 313     endfunction
 314
 315 2x2 matrix multiply can be done with shifts and adds:
 316
 317     x = v[0] - sigma * (v[1] * 2^(-j));
 318     y = sigma * (v[0] * 2^(-j)) + v[1];
 319     v = [x; y];
 320
 321 The technique is outlined in a paper as being applicable to 3D:
 322 <https://www.atlantis-press.com/proceedings/jcis2006/232>
 323
 324 # Expensive 3-operand OP32 operations
 325
 326 3-operand operations are extremely expensive in terms of OP32 encoding space.  A potential idea is to embed 3 RVC register formats across two out of three 5-bit fields rs1/rs2/rd
 327
 328 Another is to overwrite one of the src registers.
 329
 330 # Opcode Table
 331
 332 TODO
 333
 334 # Links
 335
 336 * <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-September/002736.html>
 337 * <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-September/002733.html>
 338 * <http://bugs.libre-riscv.org/show_bug.cgi?id=142>