openpower/sv/remap/appendix.mdwn

   1 ## REMAP Matrix pseudocode
   2
   3 The algorithm below shows how REMAP works more clearly, and may be
   4 executed as a python program:
   5
   6 ```
   7 [[!inline pages="openpower/sv/remap.py" quick="yes" raw="yes" ]]
   8 ```
   9
  10 An easier-to-read version (using python iterators) is given in
  11 a later section of this Appendix.
  12
  13 Each element index from the for-loop `0..VL-1`
  14 is run through the above algorithm to work out the **actual** element
  15 index, instead.  Given that there are four possible SHAPE entries, up to
  16 four separate registers in any given operation may be simultaneously
  17 remapped:
  18
  19 ```
  20     function op_add(RT, RA, RB) # add not VADD!
  21       for (i=0,id=0,irs1=0,irs2=0; i < VL; i++)
  22         SVSTATE.srcstep = i # save context
  23         if (predval & 1<<i) # predication mask
  24            GPR[RT+remap1(id)] <= GPR[RA+remap2(irs1)] +
  25                                  GPR[RB+remap3(irs2)];
  26            if (!RT.isvector) break;
  27         if (RT.isvector)  { id += 1; }
  28         if (RA.isvector)  { irs1 += 1; }
  29         if (RB.isvector)  { irs2 += 1; }
  30 ```
  31
  32 By changing remappings, 2D matrices may be transposed "in-place" for one
  33 operation, followed by setting a different permutation order without
  34 having to move the values in the registers to or from memory.
  35
  36 Note that:
  37
  38 * Over-running the register file clearly has to be detected and
  39   an illegal instruction exception thrown
  40 * When non-default elwidths are set, the exact same algorithm still
  41   applies (i.e. it offsets *polymorphic* elements *within* registers rather
  42   than entire registers).
  43 * If permute option 000 is utilised, the actual order of the
  44   reindexing does not change.  However, modulo MVL still occurs
  45   which will result in repeated operations (use with caution).
  46 * If two or more dimensions are set to zero, the actual order does not change!
  47 * The above algorithm is pseudo-code **only**.  Actual implementations
  48   will need to take into account the fact that the element for-looping
  49   must be **re-entrant**, due to the possibility of exceptions occurring.
  50   See SVSTATE SPR, which records the current element index.
  51   Continuing after return from an interrupt may introduce latency
  52   due to re-computation of the remapped offsets.
  53 * Twin-predicated operations require **two** separate and distinct
  54   element offsets.  The above pseudo-code algorithm will be applied
  55   separately and independently to each, should each of the two
  56   operands be remapped.  *This even includes unit-strided LD/ST*
  57   and other operations
  58   in that category, where in that case it will be the **address offset**
  59   that is remapped: `EA <- (RA) + immediate * REMAP(elementoffset)`.
  60 * Offset is especially useful, on its own, for accessing elements
  61   within the middle of a register.  Without offsets, it is necessary
  62   to either use a predicated MV, skipping the first elements, or
  63   performing a LOAD/STORE cycle to memory.
  64   With offsets, the data does not have to be moved.
  65 * Setting the total elements (xdim+1) times (ydim+1) times (zdim+1) to
  66   less than MVL is **perfectly legal**, albeit very obscure.  It permits
  67   entries to be regularly presented to operands **more than once**, thus
  68   allowing the same underlying registers to act as an accumulator of
  69   multiple vector or matrix operations, for example.
  70 * Note especially that Program Order **must** still be respected
  71   even when overlaps occur that read or write the same register
  72   elements *including polymorphic ones*
  73
  74 Clearly here some considerable care needs to be taken as the remapping
  75 could hypothetically create arithmetic operations that target the
  76 exact same underlying registers, resulting in data corruption due to
  77 pipeline overlaps.  Out-of-order / Superscalar micro-architectures with
  78 register-renaming will have an easier time dealing with this than
  79 DSP-style SIMD micro-architectures.
  80
  81
  82 ### 4x4 Matrix to vec4 Multiply (4x4 by 1x4)
  83
  84 The following settings will allow a 4x4 matrix (starting at f8), expressed
  85 as a sequence of 16 numbers first by row then by column, to be multiplied
  86 by a vector of length 4 (starting at f0), using a single FMAC instruction.
  87
  88 * SHAPE0: xdim=4, ydim=4, permute=yx, applied to f0
  89 * SHAPE1: xdim=4, ydim=1, permute=xy, applied to f4
  90 * VL=16, f4=vec, f0=vec, f8=vec
  91 * FMAC f4, f0, f8, f4
  92
  93 The permutation on SHAPE0 will use f0 as a vec4 source. On the first
  94 four iterations through the hardware loop, the REMAPed index will not
  95 increment. On the second four, the index will increase by one. Likewise
  96 on each subsequent group of four.
  97
  98 The permutation on SHAPE1 will increment f4 continuously cycling through
  99 f4-f7 every four iterations of the hardware loop.
 100
 101 At the same time, VL will, because there is no SHAPE on f8, increment
 102 straight sequentially through the 16 values f8-f23 in the Matrix. The
 103 equivalent sequence thus is issued:
 104
 105 ```
 106     fmac f4, f0, f8, f4
 107     fmac f5, f0, f9, f5
 108     fmac f6, f0, f10, f6
 109     fmac f7, f0, f11, f7
 110     fmac f4, f1, f12, f4
 111     fmac f5, f1, f13, f5
 112     fmac f6, f1, f14, f6
 113     fmac f7, f1, f15, f7
 114     fmac f4, f2, f16, f4
 115     fmac f5, f2, f17, f5
 116     fmac f6, f2, f18, f6
 117     fmac f7, f2, f19, f7
 118     fmac f4, f3, f20, f4
 119     fmac f5, f3, f21, f5
 120     fmac f6, f3, f22, f6
 121     fmac f7, f3, f23, f7
 122 ```
 123
 124 Hardware should easily pipeline the above FMACs and as long as each FMAC
 125 completes in 4 cycles or less there should be 100% sustained throughput,
 126 from the one single Vector FMAC.
 127
 128 The only other instruction required is to ensure that f4-f7 are
 129 initialised (usually to zero) however obviously if used as part
 130 of some other computation, which is frequently the case, then
 131 clearly the zeroing is not needed.
 132
 133 ## REMAP FFT, DFT, NTT
 134
 135 The algorithm from a later section of this Appendix shows how FFT REMAP works,
 136 and it may be executed as a standalone python3 program.
 137 The executable code is designed to illustrate how a hardware
 138 implementation may generate Indices which are completely
 139 independent of the Execution of element-level operations,
 140 even for something as complex as a Triple-loop Tukey-Cooley
 141 Schedule. A comprehensive demo and test suite may be found
 142 [here](https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_fft.py;hb=HEAD)
 143 including Complex Number FFT which deploys Vertical-First Mode
 144 on top of the REMAP Schedules.
 145
 146 Other uses include more than DFT and NTT: as abstracted RISC-paradigm
 147 the Schedules are not
 148 restricted in any way or tied to any particular instruction.
 149 If the programmer can find any algorithm
 150 which has identical triple nesting then the FFT Schedule may be
 151 used even there.
 152
 153 ## svshape pseudocode
 154
 155 ```
 156     # for convenience, VL to be calculated and stored in SVSTATE
 157     vlen <- [0] * 7
 158     mscale[0:5] <- 0b000001 # for scaling MAXVL
 159     itercount[0:6] <- [0] * 7
 160     SVSTATE[0:31] <- [0] * 32
 161     # only overwrite REMAP if "persistence" is zero
 162     if (SVSTATE[62] = 0b0) then
 163         SVSTATE[32:33] <- 0b00
 164         SVSTATE[34:35] <- 0b00
 165         SVSTATE[36:37] <- 0b00
 166         SVSTATE[38:39] <- 0b00
 167         SVSTATE[40:41] <- 0b00
 168         SVSTATE[42:46] <- 0b00000
 169         SVSTATE[62] <- 0b0
 170         SVSTATE[63] <- 0b0
 171     # clear out all SVSHAPEs
 172     SVSHAPE0[0:31] <- [0] * 32
 173     SVSHAPE1[0:31] <- [0] * 32
 174     SVSHAPE2[0:31] <- [0] * 32
 175     SVSHAPE3[0:31] <- [0] * 32
 176
 177     # set schedule up for multiply
 178     if (SVrm = 0b0000) then
 179         # VL in Matrix Multiply is xd*yd*zd
 180         xd <- (0b00 || SVxd) + 1
 181         yd <- (0b00 || SVyd) + 1
 182         zd <- (0b00 || SVzd) + 1
 183         n <- xd * yd * zd
 184         vlen[0:6] <- n[14:20]
 185         # set up template in SVSHAPE0, then copy to 1-3
 186         SVSHAPE0[0:5] <- (0b0 || SVxd)   # xdim
 187         SVSHAPE0[6:11] <- (0b0 || SVyd)   # ydim
 188         SVSHAPE0[12:17] <- (0b0 || SVzd)   # zdim
 189         SVSHAPE0[28:29] <- 0b11           # skip z
 190         # copy
 191         SVSHAPE1[0:31] <- SVSHAPE0[0:31]
 192         SVSHAPE2[0:31] <- SVSHAPE0[0:31]
 193         SVSHAPE3[0:31] <- SVSHAPE0[0:31]
 194         # set up FRA
 195         SVSHAPE1[18:20] <- 0b001          # permute x,z,y
 196         SVSHAPE1[28:29] <- 0b01           # skip z
 197         # FRC
 198         SVSHAPE2[18:20] <- 0b001          # permute x,z,y
 199         SVSHAPE2[28:29] <- 0b11           # skip y
 200
 201     # set schedule up for FFT butterfly
 202     if (SVrm = 0b0001) then
 203         # calculate O(N log2 N)
 204         n <- [0] * 3
 205         do while n < 5
 206            if SVxd[4-n] = 0 then
 207                leave
 208            n <- n + 1
 209         n <- ((0b0 || SVxd) + 1) * n
 210         vlen[0:6] <- n[1:7]
 211         # set up template in SVSHAPE0, then copy to 1-3
 212         # for FRA and FRT
 213         SVSHAPE0[0:5] <- (0b0 || SVxd)   # xdim
 214         SVSHAPE0[12:17] <- (0b0 || SVzd)   # zdim - "striding" (2D FFT)
 215         mscale <- (0b0 || SVzd) + 1
 216         SVSHAPE0[30:31] <- 0b01          # Butterfly mode
 217         # copy
 218         SVSHAPE1[0:31] <- SVSHAPE0[0:31]
 219         SVSHAPE2[0:31] <- SVSHAPE0[0:31]
 220         # set up FRB and FRS
 221         SVSHAPE1[28:29] <- 0b01           # j+halfstep schedule
 222         # FRC (coefficients)
 223         SVSHAPE2[28:29] <- 0b10           # k schedule
 224
 225     # set schedule up for (i)DCT Inner butterfly
 226     # SVrm Mode 4 (Mode 12 for iDCT) is for on-the-fly (Vertical-First Mode)
 227     if ((SVrm = 0b0100) |
 228         (SVrm = 0b1100)) then
 229         # calculate O(N log2 N)
 230         n <- [0] * 3
 231         do while n < 5
 232            if SVxd[4-n] = 0 then
 233                leave
 234            n <- n + 1
 235         n <- ((0b0 || SVxd) + 1) * n
 236         vlen[0:6] <- n[1:7]
 237         # set up template in SVSHAPE0, then copy to 1-3
 238         # set up FRB and FRS
 239         SVSHAPE0[0:5] <- (0b0 || SVxd)   # xdim
 240         SVSHAPE0[12:17] <- (0b0 || SVzd)   # zdim - "striding" (2D DCT)
 241         mscale <- (0b0 || SVzd) + 1
 242         if (SVrm = 0b1100) then
 243             SVSHAPE0[30:31] <- 0b11          # iDCT mode
 244             SVSHAPE0[18:20] <- 0b011         # iDCT Inner Butterfly sub-mode
 245         else
 246             SVSHAPE0[30:31] <- 0b01          # DCT mode
 247             SVSHAPE0[18:20] <- 0b001         # DCT Inner Butterfly sub-mode
 248             SVSHAPE0[21:23] <- 0b001         # "inverse" on outer loop
 249         SVSHAPE0[6:11] <- 0b000011       # (i)DCT Inner Butterfly mode 4
 250         # copy
 251         SVSHAPE1[0:31] <- SVSHAPE0[0:31]
 252         SVSHAPE2[0:31] <- SVSHAPE0[0:31]
 253         if (SVrm != 0b0100) & (SVrm != 0b1100) then
 254             SVSHAPE3[0:31] <- SVSHAPE0[0:31]
 255         # for FRA and FRT
 256         SVSHAPE0[28:29] <- 0b01           # j+halfstep schedule
 257         # for cos coefficient
 258         SVSHAPE2[28:29] <- 0b10           # ci (k for mode 4) schedule
 259         SVSHAPE2[12:17] <- 0b000000       # reset costable "striding" to 1
 260         if (SVrm != 0b0100) & (SVrm != 0b1100) then
 261             SVSHAPE3[28:29] <- 0b11           # size schedule
 262
 263     # set schedule up for (i)DCT Outer butterfly
 264     if (SVrm = 0b0011) | (SVrm = 0b1011) then
 265         # calculate O(N log2 N) number of outer butterfly overlapping adds
 266         vlen[0:6] <- [0] * 7
 267         n <- 0b000
 268         size <- 0b0000001
 269         itercount[0:6] <- (0b00 || SVxd) + 0b0000001
 270         itercount[0:6] <- (0b0 || itercount[0:5])
 271         do while n < 5
 272            if SVxd[4-n] = 0 then
 273                leave
 274            n <- n + 1
 275            count <- (itercount - 0b0000001) * size
 276            vlen[0:6] <- vlen + count[7:13]
 277            size[0:6] <- (size[1:6] || 0b0)
 278            itercount[0:6] <- (0b0 || itercount[0:5])
 279         # set up template in SVSHAPE0, then copy to 1-3
 280         # set up FRB and FRS
 281         SVSHAPE0[0:5] <- (0b0 || SVxd)   # xdim
 282         SVSHAPE0[12:17] <- (0b0 || SVzd)   # zdim - "striding" (2D DCT)
 283         mscale <- (0b0 || SVzd) + 1
 284         if (SVrm = 0b1011) then
 285             SVSHAPE0[30:31] <- 0b11      # iDCT mode
 286             SVSHAPE0[18:20] <- 0b011     # iDCT Outer Butterfly sub-mode
 287             SVSHAPE0[21:23] <- 0b101     # "inverse" on outer and inner loop
 288         else
 289             SVSHAPE0[30:31] <- 0b01      # DCT mode
 290             SVSHAPE0[18:20] <- 0b100     # DCT Outer Butterfly sub-mode
 291         SVSHAPE0[6:11] <- 0b000010       # DCT Butterfly mode
 292         # copy
 293         SVSHAPE1[0:31] <- SVSHAPE0[0:31] # j+halfstep schedule
 294         SVSHAPE2[0:31] <- SVSHAPE0[0:31] # costable coefficients
 295         # for FRA and FRT
 296         SVSHAPE1[28:29] <- 0b01           # j+halfstep schedule
 297         # reset costable "striding" to 1
 298         SVSHAPE2[12:17] <- 0b000000
 299
 300     # set schedule up for DCT COS table generation
 301     if (SVrm = 0b0101) | (SVrm = 0b1101) then
 302         # calculate O(N log2 N)
 303         vlen[0:6] <- [0] * 7
 304         itercount[0:6] <- (0b00 || SVxd) + 0b0000001
 305         itercount[0:6] <- (0b0 || itercount[0:5])
 306         n <- [0] * 3
 307         do while n < 5
 308            if SVxd[4-n] = 0 then
 309                leave
 310            n <- n + 1
 311            vlen[0:6] <- vlen + itercount
 312            itercount[0:6] <- (0b0 || itercount[0:5])
 313         # set up template in SVSHAPE0, then copy to 1-3
 314         # set up FRB and FRS
 315         SVSHAPE0[0:5] <- (0b0 || SVxd)   # xdim
 316         SVSHAPE0[12:17] <- (0b0 || SVzd)   # zdim - "striding" (2D DCT)
 317         mscale <- (0b0 || SVzd) + 1
 318         SVSHAPE0[30:31] <- 0b01          # DCT/FFT mode
 319         SVSHAPE0[6:11] <- 0b000100       # DCT Inner Butterfly COS-gen mode
 320         if (SVrm = 0b0101) then
 321             SVSHAPE0[21:23] <- 0b001     # "inverse" on outer loop for DCT
 322         # copy
 323         SVSHAPE1[0:31] <- SVSHAPE0[0:31]
 324         SVSHAPE2[0:31] <- SVSHAPE0[0:31]
 325         # for cos coefficient
 326         SVSHAPE1[28:29] <- 0b10           # ci schedule
 327         SVSHAPE2[28:29] <- 0b11           # size schedule
 328
 329     # set schedule up for iDCT / DCT inverse of half-swapped ordering
 330     if (SVrm = 0b0110) | (SVrm = 0b1110) | (SVrm = 0b1111) then
 331         vlen[0:6] <- (0b00 || SVxd) + 0b0000001
 332         # set up template in SVSHAPE0
 333         SVSHAPE0[0:5] <- (0b0 || SVxd)   # xdim
 334         SVSHAPE0[12:17] <- (0b0 || SVzd)   # zdim - "striding" (2D DCT)
 335         mscale <- (0b0 || SVzd) + 1
 336         if (SVrm = 0b1110) then
 337             SVSHAPE0[18:20] <- 0b001     # DCT opposite half-swap
 338         if (SVrm = 0b1111) then
 339             SVSHAPE0[30:31] <- 0b01          # FFT mode
 340         else
 341             SVSHAPE0[30:31] <- 0b11          # DCT mode
 342         SVSHAPE0[6:11] <- 0b000101       # DCT "half-swap" mode
 343
 344     # set schedule up for parallel reduction
 345     if (SVrm = 0b0111) then
 346         # calculate the total number of operations (brute-force)
 347         vlen[0:6] <- [0] * 7
 348         itercount[0:6] <- (0b00 || SVxd) + 0b0000001
 349         step[0:6] <- 0b0000001
 350         i[0:6] <- 0b0000000
 351         do while step <u itercount
 352             newstep <- step[1:6] || 0b0
 353             j[0:6] <- 0b0000000
 354             do while (j+step <u itercount)
 355                 j <- j + newstep
 356                 i <- i + 1
 357             step <- newstep
 358         # VL in Parallel-Reduce is the number of operations
 359         vlen[0:6] <- i
 360         # set up template in SVSHAPE0, then copy to 1. only 2 needed
 361         SVSHAPE0[0:5] <- (0b0 || SVxd)   # xdim
 362         SVSHAPE0[12:17] <- (0b0 || SVzd)   # zdim - "striding" (2D DCT)
 363         mscale <- (0b0 || SVzd) + 1
 364         SVSHAPE0[30:31] <- 0b10          # parallel reduce submode
 365         # copy
 366         SVSHAPE1[0:31] <- SVSHAPE0[0:31]
 367         # set up right operand (left operand 28:29 is zero)
 368         SVSHAPE1[28:29] <- 0b01           # right operand
 369
 370     # set VL, MVL and Vertical-First
 371     m[0:12] <- vlen * mscale
 372     maxvl[0:6] <- m[6:12]
 373     SVSTATE[0:6] <- maxvl  # MAVXL
 374     SVSTATE[7:13] <- vlen  # VL
 375     SVSTATE[63] <- vf
 376 ```
 377
 378 ## svindex pseudocode
 379
 380 ```
 381     # based on nearest MAXVL compute other dimension
 382     MVL <- SVSTATE[0:6]
 383     d <- [0] * 6
 384     dim <- SVd+1
 385     do while d*dim <u ([0]*4 || MVL)
 386        d <- d + 1
 387
 388     # set up template, then copy once location identified
 389     shape <- [0]*32
 390     shape[30:31] <- 0b00            # mode
 391     if SVyx = 0 then
 392         shape[18:20] <- 0b110       # indexed xd/yd
 393         shape[0:5] <- (0b0 || SVd)  # xdim
 394         if sk = 0 then shape[6:11] <- 0 # ydim
 395         else           shape[6:11] <- 0b111111 # ydim max
 396     else
 397         shape[18:20] <- 0b111       # indexed yd/xd
 398         if sk = 1 then shape[6:11] <- 0 # ydim
 399         else           shape[6:11] <- d-1 # ydim max
 400         shape[0:5] <- (0b0 || SVd) # ydim
 401     shape[12:17] <- (0b0 || SVG)        # SVGPR
 402     shape[28:29] <- ew                  # element-width override
 403     shape[21] <- sk                     # skip 1st dimension
 404
 405     # select the mode for updating SVSHAPEs
 406     SVSTATE[62] <- mm # set or clear persistence
 407     if mm = 0 then
 408         # clear out all SVSHAPEs first
 409         SVSHAPE0[0:31] <- [0] * 32
 410         SVSHAPE1[0:31] <- [0] * 32
 411         SVSHAPE2[0:31] <- [0] * 32
 412         SVSHAPE3[0:31] <- [0] * 32
 413         SVSTATE[32:41] <- [0] * 10 # clear REMAP.mi/o
 414         SVSTATE[42:46] <- rmm # rmm exactly REMAP.SVme
 415         idx <- 0
 416         for bit = 0 to 4
 417             if rmm[4-bit] then
 418                 # activate requested shape
 419                 if idx = 0 then SVSHAPE0 <- shape
 420                 if idx = 1 then SVSHAPE1 <- shape
 421                 if idx = 2 then SVSHAPE2 <- shape
 422                 if idx = 3 then SVSHAPE3 <- shape
 423                 SVSTATE[bit*2+32:bit*2+33] <- idx
 424                 # increment shape index, modulo 4
 425                 if idx = 3 then idx <- 0
 426                 else            idx <- idx + 1
 427     else
 428         # refined SVSHAPE/REMAP update mode
 429         bit <- rmm[0:2]
 430         idx <- rmm[3:4]
 431         if idx = 0 then SVSHAPE0 <- shape
 432         if idx = 1 then SVSHAPE1 <- shape
 433         if idx = 2 then SVSHAPE2 <- shape
 434         if idx = 3 then SVSHAPE3 <- shape
 435         SVSTATE[bit*2+32:bit*2+33] <- idx
 436         SVSTATE[46-bit] <- 1
 437 ```
 438
 439 ## svshape2 pseudocode
 440
 441 ```
 442     # based on nearest MAXVL compute other dimension
 443     MVL <- SVSTATE[0:6]
 444     d <- [0] * 6
 445     dim <- SVd+1
 446     do while d*dim <u ([0]*4 || MVL)
 447        d <- d + 1
 448     # set up template, then copy once location identified
 449     shape <- [0]*32
 450     shape[30:31] <- 0b00            # mode
 451     shape[0:5] <- (0b0 || SVd)      # x/ydim
 452     if SVyx = 0 then
 453         shape[18:20] <- 0b000       # ordering xd/yd(/zd)
 454         if sk = 0 then shape[6:11] <- 0 # ydim
 455         else           shape[6:11] <- 0b111111 # ydim max
 456     else
 457         shape[18:20] <- 0b010       # ordering yd/xd(/zd)
 458         if sk = 1 then shape[6:11] <- 0 # ydim
 459         else           shape[6:11] <- d-1 # ydim max
 460     # offset (the prime purpose of this instruction)
 461     shape[24:27] <- SVo         # offset
 462     if sk = 1 then shape[28:29] <- 0b01 # skip 1st dimension
 463     else           shape[28:29] <- 0b00 # no skipping
 464     # select the mode for updating SVSHAPEs
 465     SVSTATE[62] <- mm # set or clear persistence
 466     if mm = 0 then
 467         # clear out all SVSHAPEs first
 468         SVSHAPE0[0:31] <- [0] * 32
 469         SVSHAPE1[0:31] <- [0] * 32
 470         SVSHAPE2[0:31] <- [0] * 32
 471         SVSHAPE3[0:31] <- [0] * 32
 472         SVSTATE[32:41] <- [0] * 10 # clear REMAP.mi/o
 473         SVSTATE[42:46] <- rmm # rmm exactly REMAP.SVme
 474         idx <- 0
 475         for bit = 0 to 4
 476             if rmm[4-bit] then
 477                 # activate requested shape
 478                 if idx = 0 then SVSHAPE0 <- shape
 479                 if idx = 1 then SVSHAPE1 <- shape
 480                 if idx = 2 then SVSHAPE2 <- shape
 481                 if idx = 3 then SVSHAPE3 <- shape
 482                 SVSTATE[bit*2+32:bit*2+33] <- idx
 483                 # increment shape index, modulo 4
 484                 if idx = 3 then idx <- 0
 485                 else            idx <- idx + 1
 486     else
 487         # refined SVSHAPE/REMAP update mode
 488         bit <- rmm[0:2]
 489         idx <- rmm[3:4]
 490         if idx = 0 then SVSHAPE0 <- shape
 491         if idx = 1 then SVSHAPE1 <- shape
 492         if idx = 2 then SVSHAPE2 <- shape
 493         if idx = 3 then SVSHAPE3 <- shape
 494         SVSTATE[bit*2+32:bit*2+33] <- idx
 495         SVSTATE[46-bit] <- 1
 496 ```
 497
 498 [[!tag standards]]
 499
 500 ---------
 501
 502 \newpage{}
 503