simple_v_extension/vector_ops.mdwn

   1 [[!tag standards]]
   2
   3 # Vector Operations Extension to SV
   4
   5 This extension is usually dependent on SV SUBVL being implemented. When SUBVL is set to define the length of a subvector the operations in this extension interpret the elements as a single vector.
   6
   7 Normally in SV all operations are scalar and independent, and the operations on them may inherently be independently parallelised, with the result being a vector of length exactly equal to the input vectors.
   8
   9 In this extension, the subvector itself is typically the unit, although some operations will work on scalars or standard vectors as well, or the result is a scalar that is dependent on all elements within the vector arguments.
  10
  11 However given that some of the parameters are vectors (with and without SUBVL set), and some are scalars (where SUBVL will not apply), some clear rules need to be defined as to how the operations work.
  12
  13 Examples which can require SUBVL include cross product and may in future involve complex numbers.
  14
  15 ## CORDIC
  16
  17 * SUBVL=2, vd, vs; SUBVL ignored on beta.
  18 * VL nonzero ok.  beta as scalar ok (applies across all vectors)
  19 * non vector args vd, vs, or SUBVL!=2 reserved.
  20
  21 6 opcode options (fmt3):
  22
  23 * CORDIC.lin.rot vd, vs, beta
  24 * CORDIC.cir.rot vd, vs, beta
  25 * CORDIC.hyp.rot vd, vs, beta
  26 * CORDIC.lin.vec vd, vs, beta
  27 * CORDIC.cir.vec vd, vs, beta
  28 * CORDIC.hyp.vec vd, vs, beta
  29
  30 CORDIC is an extremely general-purpose algorithm useful for a huge number
  31 of diverse purposes.  In its full form it does however require quite a
  32 few parameters, one of which is a vector, making it awkward to include in
  33 a standard "scalar" ISA.  Additionally the coordinates can be set to circular,
  34 linear or hyperbolic, producing three different modes, and the algorithm
  35 may also be run in either "vector" mode or "rotation" mode.  See [[discussion]]
  36
  37 CORDIC can also be used for performing DCT.  See
  38 <https://arxiv.org/abs/1606.02424>
  39
  40 vx, vy = CORDIC(vx, vy, coordinate\_mode, beta)
  41
  42      int i = 0;
  43      int iterations = 0; // Number of times to run the algorithm
  44      float arctanTable[iterations]; // in Radians
  45      float K = 0.6073; // K
  46      float v_x,v_y; // Vector v; x and y components
  47
  48      for(i=0; i < iterations; i++) {
  49         arctanTable[i] = atan(pow(2,-i));
  50      }
  51
  52      float vnew_x;   // To store the new value of x;
  53      for(i = 0; i < iterations; i++) {
  54          // If beta is negative, we need to do a counter-clockwise rotation:
  55          if( beta < 0) {
  56             vnew_x = v_x + (v_y*pow(2,-i));
  57             v_y -= (v_x*pow(2,-i));
  58             beta += arctanTable[i];
  59          }
  60          // If beta is positive, we need to do a clockwise rotation:
  61          else {
  62             vnew_x = v_x - (v_y*pow(2,-i));
  63             v_y += (v_x*pow(2,-i));
  64             beta -= arctanTable[i];
  65          }
  66          v_x = vnew_x;
  67      }
  68      v_x *= K;
  69      v_y *= K;
  70
  71 Links:
  72
  73 * <http://www.myhdl.org/docs/examples/sinecomp/>
  74
  75 ## Vector cross product
  76
  77 SUBVL=2,3,4 all regs. VL nonzero produces multiple vd results.
  78
  79 * VCROSS vd, vs1, vs1
  80
  81 Result is the cross product of x and y, i.e., the resulting components are, in order:
  82
  83     x[1] * y[2] - y[1] * x[2]
  84     x[2] * y[0] - y[2] * x[0]
  85     x[0] * y[1] - y[0] * x[1]
  86
  87 All the operands must be vectors of 3 components of a floating-point type.
  88
  89 Pseudocode:
  90
  91     vec3 a, b; // elements in order a.x, a.y, a.z
  92     // compute a cross b:
  93     vec3 t1 = a.yzx; // produce vector [a.y, a.z, a.x]
  94     vec3 t2 = b.zxy;
  95     vec3 t3 = a.zxy;
  96     vec3 t4 = b.yzx;
  97     vec3 p = t3 * t4;
  98     vec3 cross = t1 * t2 - p;
  99
 100 ## Vector dot product
 101
 102 * SUBVL ignored on rd.  SUBVL=2,3,4 vs1,vs2
 103 * rd=scalar, SUBVL=1, vs1, vs2=vec
 104
 105 * VDOT rd, vs1, vs2
 106
 107 Computes the dot product of two vectors. Internal accuracy must be
 108 greater than the input vectors and the result.
 109
 110 Pseudocode in python:
 111
 112     from operator import mul
 113     sum(map(mul, A, B))
 114
 115 Pseudocode in c:
 116
 117     double dot_product(float v[], float u[], int n)
 118     {
 119         double result = 0.0;
 120         for (int i = 0; i < n; i++)
 121             result += v[i] * u[i];
 122         return result;
 123     }
 124
 125 ## Vector length
 126
 127 * rd=scalar, vs1=vec (SUBVL=1)
 128 * rd=scalar, vs1=vec (SUBVL=2,3,4) only 1 (predication rules apply)
 129 * rd=vec, SUBVL ignored; vs1=vec, SUBVL=2,3,4
 130 * rd=vec, SUBVL ignored; vs1=vec, SUBVL=1: reserved encoding.
 131
 132 * VLEN rd, vs1
 133
 134 The scalar length of a vector:
 135
 136     sqrt(x[0]^2 + x[1]^2 + ...).
 137
 138 ## Vector distance
 139
 140 * VDIST rd, vs1, vs2
 141
 142 The scalar distance between two vectors. Subtracts one vector from the
 143 other and returns length:
 144
 145     length(v0 - v1)
 146
 147 ## Vector LERP
 148
 149 * VLERP rd, vs1, rs2 # SUBVL=2: vs1.v0 vs1.v1
 150
 151 Known as **fmix** in GLSL.
 152
 153 <https://en.m.wikipedia.org/wiki/Linear_interpolation>
 154
 155 Pseudocode:
 156
 157     // Imprecise method, which does not guarantee v = v1 when t = 1,
 158     // due to floating-point arithmetic error.
 159     // This form may be used when the hardware has a native fused
 160     // multiply-add instruction.
 161     float lerp(float v0, float v1, float t) {
 162       return v0 + t * (v1 - v0);
 163     }
 164
 165     // Precise method, which guarantees v = v1 when t = 1.
 166     float lerp(float v0, float v1, float t) {
 167       return (1 - t) * v0 + t * v1;
 168     }
 169
 170 ## Vector SLERP
 171
 172 * VSLERP vd, vs1, vs2, rs3
 173
 174 Not recommended as it is not commonly used and has several trigonometric
 175 functions, although CORDIC in vector rotate circular mode is designed for this purpose. Also a costly 4 arg operation.
 176
 177 <https://en.m.wikipedia.org/wiki/Slerp>
 178
 179 Pseudocode:
 180
 181     Quaternion slerp(Quaternion v0, Quaternion v1, double t) {
 182         // Only unit quaternions are valid rotations.
 183         // Normalize to avoid undefined behavior.
 184         v0.normalize();
 185         v1.normalize();
 186
 187         // Compute the cosine of the angle between the two vectors.
 188         double dot = dot_product(v0, v1);
 189
 190         // If the dot product is negative, slerp won't take
 191         // the shorter path. Note that v1 and -v1 are equivalent when
 192         // the negation is applied to all four components. Fix by
 193         // reversing one quaternion.
 194         if (dot < 0.0f) {
 195             v1 = -v1;
 196             dot = -dot;
 197         }
 198
 199         const double DOT_THRESHOLD = 0.9995;
 200         if (dot > DOT_THRESHOLD) {
 201             // If the inputs are too close for comfort, linearly interpolate
 202             // and normalize the result.
 203
 204             Quaternion result = v0 + t*(v1 - v0);
 205             result.normalize();
 206             return result;
 207         }
 208
 209         // Since dot is in range [0, DOT_THRESHOLD], acos is safe
 210         double theta_0 = acos(dot);        // theta_0 = angle between input vectors
 211         double theta = theta_0*t;          // theta = angle between v0 and result
 212         double sin_theta = sin(theta);     // compute this value only once
 213         double sin_theta_0 = sin(theta_0); // compute this value only once
 214
 215         double s0 = cos(theta) - dot * sin_theta / sin_theta_0;  // == sin(theta_0 - theta) / sin(theta_0)
 216         double s1 = sin_theta / sin_theta_0;
 217
 218         return (s0 * v0) + (s1 * v1);
 219     }
 220
 221 However this algorithm does not involve transcendentals except in
 222 the computation of the tables: <https://en.wikipedia.org/wiki/CORDIC#Rotation_mode>
 223
 224     function v = cordic(beta,n)
 225         % This function computes v = [cos(beta), sin(beta)] (beta in radians)
 226         % using n iterations. Increasing n will increase the precision.
 227
 228         if beta < -pi/2 || beta > pi/2
 229             if beta < 0
 230                 v = cordic(beta + pi, n);
 231             else
 232                 v = cordic(beta - pi, n);
 233             end
 234             v = -v; % flip the sign for second or third quadrant
 235             return
 236         end
 237
 238         % Initialization of tables of constants used by CORDIC
 239         % need a table of arctangents of negative powers of two, in radians:
 240         % angles = atan(2.^-(0:27));
 241         angles =  [  ...
 242             0.78539816339745   0.46364760900081
 243             0.24497866312686   0.12435499454676 ...
 244             0.06241880999596   0.03123983343027
 245             0.01562372862048   0.00781234106010 ...
 246             0.00390623013197   0.00195312251648
 247             0.00097656218956   0.00048828121119 ...
 248             0.00024414062015   0.00012207031189
 249             0.00006103515617   0.00003051757812 ...
 250             0.00001525878906   0.00000762939453
 251             0.00000381469727   0.00000190734863 ...
 252             0.00000095367432   0.00000047683716
 253             0.00000023841858   0.00000011920929 ...
 254             0.00000005960464   0.00000002980232
 255             0.00000001490116   0.00000000745058 ];
 256         % and a table of products of reciprocal lengths of vectors [1, 2^-2j]:
 257         % Kvalues = cumprod(1./abs(1 + 1j*2.^(-(0:23))))
 258         Kvalues = [ ...
 259             0.70710678118655   0.63245553203368
 260             0.61357199107790   0.60883391251775 ...
 261             0.60764825625617   0.60735177014130
 262             0.60727764409353   0.60725911229889 ...
 263             0.60725447933256   0.60725332108988
 264             0.60725303152913   0.60725295913894 ...
 265             0.60725294104140   0.60725293651701
 266             0.60725293538591   0.60725293510314 ...
 267             0.60725293503245   0.60725293501477
 268             0.60725293501035   0.60725293500925 ...
 269             0.60725293500897   0.60725293500890
 270             0.60725293500889   0.60725293500888 ];
 271         Kn = Kvalues(min(n, length(Kvalues)));
 272
 273         % Initialize loop variables:
 274         v = [1;0]; % start with 2-vector cosine and sine of zero
 275         poweroftwo = 1;
 276         angle = angles(1);
 277
 278         % Iterations
 279         for j = 0:n-1;
 280             if beta < 0
 281                 sigma = -1;
 282             else
 283                 sigma = 1;
 284             end
 285             factor = sigma * poweroftwo;
 286             % Note the matrix multiplication can be done using scaling by
 287             % powers of two and addition subtraction
 288             R = [1, -factor; factor, 1];
 289             v = R * v; % 2-by-2 matrix multiply
 290             beta = beta - sigma * angle; % update the remaining angle
 291             poweroftwo = poweroftwo / 2;
 292             % update the angle from table, or eventually by just dividing by two
 293             if j+2 > length(angles)
 294                 angle = angle / 2;
 295             else
 296                 angle = angles(j+2);
 297             end
 298         end
 299
 300         % Adjust length of output vector to be [cos(beta), sin(beta)]:
 301         v = v * Kn;
 302         return
 303
 304     endfunction
 305
 306 2x2 matrix multiply can be done with shifts and adds:
 307
 308     x = v[0] - sigma * (v[1] * 2^(-j));
 309     y = sigma * (v[0] * 2^(-j)) + v[1];
 310     v = [x; y];
 311
 312 The technique is outlined in a paper as being applicable to 3D:
 313 <https://www.atlantis-press.com/proceedings/jcis2006/232>
 314
 315 # Expensive 3-operand OP32 operations
 316
 317 3-operand operations are extremely expensive in terms of OP32 encoding space.  A potential idea is to embed 3 RVC register formats across two out of three 5-bit fields rs1/rs2/rd
 318
 319 Another is to overwrite one of the src registers.
 320
 321 # Opcode Table
 322
 323 TODO
 324
 325 # Links
 326
 327 * <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-September/002736.html>
 328 * <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-September/002733.html>
 329 * <http://bugs.libre-riscv.org/show_bug.cgi?id=142>