1 # RFC ls009 Simple-V REMAP Subsystem
5 * <https://libre-soc.org/openpower/sv/rfc/ls009/>
6 * <https://bugs.libre-soc.org/show_bug.cgi?id=1042>
7 * <https://git.openpower.foundation/isa/PowerISA/issues/124>
19 **Books and Section affected**:
22 Book I, new Zero-Overhead-Loop Chapter.
23 Appendix E Power ISA sorted by opcode
24 Appendix F Power ISA sorted by version
25 Appendix G Power ISA sorted by Compliancy Subset
26 Appendix H Power ISA sorted by mnemonic
32 svremap - Re-Mapping of Register Element Offsets
33 svindex - General-purpose setting of SHAPEs to be re-mapped
34 svshape - Hardware-level setting of SHAPEs for element re-mapping
35 svshape2 - Hardware-level setting of SHAPEs for element re-mapping (v2)
38 **Submitter**: Luke Leighton (Libre-SOC)
40 **Requester**: Libre-SOC
42 **Impact on processor**:
45 Addition of four new "Zero-Overhead-Loop-Control" DSP-style Vector-style
46 Management Instructions which provide advanced features such as Matrix
47 FFT DCT Hardware-Assist Schedules and general-purpose Index reordering.
50 **Impact on software**:
53 Requires support for new instructions in assembler, debuggers,
60 Cray Supercomputing, Vectorisation, Zero-Overhead-Loop-Control (ZOLC),
61 Scalable Vectors, Multi-Issue Out-of-Order, Sequential Programming Model,
62 Digital Signal Processing (DSP)
67 These REMAP Management instructions provide state-of-the-art advanced capabilities
68 to dramatically decrease instruction count and power reduction whilst retaining
69 unprecedented general-purpose capability and a standard Sequential Execution Model.
71 **Notes and Observations**:
77 Add the following entries to:
79 * the Appendices of SV Book
80 * Instructions of SV Book as a new Section
81 * SVI, SVM, SVM2, SVRM Form of Book I Section 1.6.1.6 and 1.6.2
87 [[!inline pages="openpower/sv/remap" raw=yes ]]
91 Add `SVI, SVM, SVM2, SVRM` to `XO (26:31)` Field in Book I, 1.6.2
93 Add the following to Book I, 1.6.1, SVI-Form
96 |0 |6 |11 |16 |21 |23 |24|25|26 31|
97 | PO | SVG|rmm | SVd |ew |SVyx|mm|sk| XO |
100 Add the following to Book I, 1.6.1, SVM-Form
103 |0 |6 |11 |16 |21 |25 |26 |31 |
104 | PO | SVxd | SVyd | SVzd | SVrm |vf | XO |
107 Add the following to Book I, 1.6.1, SVM2-Form
110 |0 |6 |10 |11 |16 |21 |24|25 |26 |31 |
111 | PO | SVo |SVyx| rmm | SVd |XO |mm|sk | XO |
114 Add the following to Book I, 1.6.1, SVRM-Form
117 |0 |6 |11 |13 |15 |17 |19 |21 |22 |26 |31 |
118 | PO | SVme |mi0 | mi1 | mi2 | mo0 | mo1 |pst |/// | XO |
121 Add the following to Book I, 1.6.2
125 Field used in REMAP to select the SVSHAPE for 1st input register
128 Field used in REMAP to select the SVSHAPE for 2nd input register
131 Field used in REMAP to select the SVSHAPE for 3rd input register
134 Field used to specify the meaning of the rmm field for SVI-Form
138 Field used in REMAP to select the SVSHAPE for 1st output register
141 Field used in REMAP to select the SVSHAPE for 2nd output register
144 Field used in REMAP to indicate "persistence" mode (REMAP
145 continues to apply to multiple instructions)
148 REMAP Mode field for SVI-Form and SVM2-Form
151 Field used to specify dimensional skipping in svindex
154 Immediate field used to specify the size of the REMAP dimension
155 in the svindex and svshape2 instructions
158 Immediate field used to specify a 9-bit signed
159 two's complement integer which is concatenated
160 on the right with 0b00 and sign-extended to 64 bits.
163 Field used to specify a GPR to be used as a
167 Simple-V immediate field for setting VL or MVL
170 Simple-V "REMAP" map-enable bits (0-4)
173 Field used by the svshape2 instruction as an offset
176 Simple-V "REMAP" Mode
179 Simple-V "REMAP" x-dimension size
182 Simple-V "REMAP" y-dimension size
185 Simple-V "REMAP" z-dimension size
188 Extended opcode field. Note that bit 21 must be 1, 22 and 23
189 must be zero, and bits 26-31 must be exactly the same as
196 Appendix E Power ISA sorted by opcode
197 Appendix F Power ISA sorted by version
198 Appendix G Power ISA sorted by Compliancy Subset
199 Appendix H Power ISA sorted by mnemonic
201 | Form | Book | Page | Version | mnemonic | Description |
202 |------|------|------|---------|----------|-------------|
203 | SVRM | I | # | 3.0B | svremap | REMAP enabling instruction |
204 | SVM | I | # | 3.0B | svshape | REMAP shape instruction |
205 | SVM2 | I | # | 3.0B | svshape2 | REMAP shape instruction (2) |
206 | SVI | I | # | 3.0B | svindex | REMAP General-purpose Indexing |
208 [[!inline pages="openpower/sv/remap/appendix" raw=yes ]]
212 Written in python3 the following stand-alone executable source code is the Canonical
213 Specification for each REMAP. Vectors of "loopends" are returned when Rc=1
214 in Vectors of CR Fields on `sv.svstep.`, or in Vertical-First Mode
215 a single CR Field (CR0) on `svstep.`. The `SVSTATE.srcstep` or `SVSTATE.dststep` sequential
216 offset is put through each algorithm to determine the actual Element Offset.
217 Alternative implementations producing different ordering
218 is prohibited as software will be critically relying on these Deterministic Schedules.
220 ### REMAP 2D/3D Matrix
222 The following stand-alone executable source code is the Canonical
223 Specification for Matrix (2D/3D) REMAP.
224 Hardware implementations are achievable with simple cascading counter-and-compares.
227 # python "yield" can be iterated. use this to make it clear how
228 # the indices are generated by using natural-looking nested loops
229 def iterate_indices(SVSHAPE):
230 # get indices to iterate over, in the required order
234 # create lists of indices to iterate over in each dimension
235 x_r = list(range(xd))
236 y_r = list(range(yd))
237 z_r = list(range(zd))
238 # invert the indices if needed
239 if SVSHAPE.invxyz[0]: x_r.reverse()
240 if SVSHAPE.invxyz[1]: y_r.reverse()
241 if SVSHAPE.invxyz[2]: z_r.reverse()
242 # start an infinite (wrapping) loop
243 step = 0 # track src/dst step
245 for z in z_r: # loop over 1st order dimension
247 for y in y_r: # loop over 2nd order dimension
249 for x in x_r: # loop over 3rd order dimension
251 # ok work out which order to construct things in.
252 # start by creating a list of tuples of the dimension
254 vals = [(SVSHAPE.lims[0], x, "x"),
255 (SVSHAPE.lims[1], y, "y"),
256 (SVSHAPE.lims[2], z, "z")
258 # now select those by order. this allows us to
259 # create schedules for [z][x], [x][y], or [y][z]
260 # for matrix multiply.
261 vals = [vals[SVSHAPE.order[0]],
262 vals[SVSHAPE.order[1]],
263 vals[SVSHAPE.order[2]]
265 # ok now we can construct the result, using bits of
266 # "order" to say which ones get stacked on
270 lim, idx, dbg = vals[i]
271 # some of the dimensions can be "skipped". the order
272 # was actually selected above on all 3 dimensions,
273 # e.g. [z][x][y] or [y][z][x]. "skip" allows one of
274 # those to be knocked out
275 if SVSHAPE.skip == i+1: continue
276 idx *= mult # shifts up by previous dimension(s)
277 result += idx # adds on this dimension
278 mult *= lim # for the next dimension
281 ((y_end and x_end)<<1) |
282 ((y_end and x_end and z_end)<<2))
284 yield result + SVSHAPE.offset, loopends
288 # set the dimension sizes here
293 # set total (can repeat, e.g. VL=x*y*z*4)
294 VL = xdim * ydim * zdim
300 SVSHAPE0.lims = [xdim, ydim, zdim]
301 SVSHAPE0.order = [1,0,2] # experiment with different permutations, here
304 SVSHAPE0.offset = 0 # experiment with different offset, here
305 SVSHAPE0.invxyz = [0,0,0] # inversion if desired
307 # enumerate over the iterator function, getting new indices
308 for idx, (new_idx, end) in enumerate(iterate_indices(SVSHAPE0)):
311 print ("%d->%d" % (idx, new_idx), "end", bin(end)[2:])
314 if __name__ == '__main__':
318 ### REMAP Parallel Reduction pseudocode
320 The python3 program below is stand-alone executable and is the Canonical Specification
321 for Parallel Reduction REMAP.
322 The Algorithm below is not limited to RADIX2 sizes, and Predicate
323 sources, unlike in Matrix REMAP, apply to the Element Indices **after** REMAP
324 has been applied, not before. MV operations are not required: the algorithm
325 tracks positions of elements that would normally be moved and when applying
326 an Element Reduction Operation sources the operands from their last-known (tracked)
330 # a "yield" version of the Parallel Reduction REMAP algorithm.
331 # the algorithm is in-place. it does not perform "MV" operations.
332 # instead, where a masked-out value *should* be read from is tracked
334 def iterate_indices(SVSHAPE, pred=None):
335 # get indices to iterate over, in the required order
337 # create lists of indices to iterate over in each dimension
339 # invert the indices if needed
340 if SVSHAPE.invxyz[0]: ix.reverse()
341 # start a loop from the lowest step
347 # invert the indices if needed
348 if SVSHAPE.invxyz[1]: steps.reverse()
350 stepend = (step == steps[-1]) # note end of steps
351 idxs = list(range(0, xd, step))
354 other = i + step // 2
356 oi = ix[other] if other < xd else None
357 other_pred = other < xd and (pred is None or pred[oi])
358 if (pred is None or pred[ci]) and other_pred:
359 if SVSHAPE.skip == 0b00: # submode 00
361 elif SVSHAPE.skip == 0b01: # submode 01
363 results.append([result + SVSHAPE.offset, 0])
367 results[-1][1] = (stepend<<1) | 1 # notify end of loops
371 # set the dimension sizes here
378 SVSHAPE0.lims = [xdim, 0, 0]
379 SVSHAPE0.order = [0,1,2]
382 SVSHAPE0.offset = 0 # experiment with different offset, here
383 SVSHAPE0.invxyz = [0,0,0] # inversion if desired
386 SVSHAPE1.lims = [xdim, 0, 0]
387 SVSHAPE1.order = [0,1,2]
390 SVSHAPE1.offset = 0 # experiment with different offset, here
391 SVSHAPE1.invxyz = [0,0,0] # inversion if desired
393 # enumerate over the iterator function, getting new indices
394 shapes = list(iterate_indices(SVSHAPE0)), \
395 list(iterate_indices(SVSHAPE1))
396 for idx in range(len(shapes[0])):
401 print ("%d->%d:%d" % (idx, l_idx, r_idx),
402 "end", bin(lend)[2:], bin(rend)[2:])
405 if __name__ == '__main__':
409 ### REMAP FFT pseudocode
411 The FFT REMAP is RADIX2 only.
414 # a "yield" version of the REMAP algorithm, for FFT Tukey-Cooley schedules
415 # original code for the FFT Tukey-Cooley schedule:
416 # https://www.nayuki.io/res/free-small-fft-in-multiple-languages/fft.py
418 # Radix-2 decimation-in-time FFT (real, not complex)
422 tablestep = n // size
423 for i in range(0, n, size):
425 for j in range(i, i + halfsize):
428 temp1 = vec[jh] * exptable[k]
430 vec[jh] = temp2 - temp1
431 vec[jl] = temp2 + temp1
436 # python "yield" can be iterated. use this to make it clear how
437 # the indices are generated by using natural-looking nested loops
438 def iterate_butterfly_indices(SVSHAPE):
439 # get indices to iterate over, in the required order
441 stride = SVSHAPE.lims[2] # stride-multiplier on reg access
442 # creating lists of indices to iterate over in each dimension
443 # has to be done dynamically, because it depends on the size
444 # first, the size-based loop (which can be done statically)
450 # invert order if requested
451 if SVSHAPE.invxyz[0]: x_r.reverse()
456 # start an infinite (wrapping) loop
459 for size in x_r: # loop over 3rd order dimension (size)
460 x_end = size == x_r[-1]
461 # y_r schedule depends on size
463 tablestep = n // size
465 for i in range(0, n, size):
467 # invert if requested
468 if SVSHAPE.invxyz[1]: y_r.reverse()
469 for i in y_r: # loop over 2nd order dimension
474 for j in range(i, i+halfsize):
478 # invert if requested
479 if SVSHAPE.invxyz[2]: k_r.reverse()
480 if SVSHAPE.invxyz[2]: j_r.reverse()
481 for j, k in zip(j_r, k_r): # loop over 1st order dimension
483 # now depending on MODE return the index
484 if SVSHAPE.skip == 0b00:
485 result = j # for vec[j]
486 elif SVSHAPE.skip == 0b01:
487 result = j + halfsize # for vec[j+halfsize]
488 elif SVSHAPE.skip == 0b10:
489 result = k # for exptable[k]
492 ((y_end and z_end)<<1) |
493 ((y_end and x_end and z_end)<<2))
495 yield (result * stride) + SVSHAPE.offset, loopends
498 # set the dimension sizes here
500 ydim = 0 # not needed
501 zdim = 1 # stride must be set to 1
503 # set total. err don't know how to calculate how many there are...
504 # do it manually for now
510 tablestep = n // size
511 for i in range(0, n, size):
512 for j in range(i, i + halfsize):
521 SVSHAPE0.lims = [xdim, ydim, zdim]
522 SVSHAPE0.order = [0,1,2] # experiment with different permutations, here
525 SVSHAPE0.offset = 0 # experiment with different offset, here
526 SVSHAPE0.invxyz = [0,0,0] # inversion if desired
527 # j+halfstep schedule
529 SVSHAPE1.lims = [xdim, ydim, zdim]
530 SVSHAPE1.order = [0,1,2] # experiment with different permutations, here
533 SVSHAPE1.offset = 0 # experiment with different offset, here
534 SVSHAPE1.invxyz = [0,0,0] # inversion if desired
537 SVSHAPE2.lims = [xdim, ydim, zdim]
538 SVSHAPE2.order = [0,1,2] # experiment with different permutations, here
541 SVSHAPE2.offset = 0 # experiment with different offset, here
542 SVSHAPE2.invxyz = [0,0,0] # inversion if desired
544 # enumerate over the iterator function, getting new indices
546 for idx, (jl, jh, k) in enumerate(zip(iterate_butterfly_indices(SVSHAPE0),
547 iterate_butterfly_indices(SVSHAPE1),
548 iterate_butterfly_indices(SVSHAPE2))):
551 schedule.append((jl, jh, k))
553 # ok now pretty-print the results, with some debug output
558 tablestep = n // size
559 print ("size %d halfsize %d tablestep %d" % \
560 (size, halfsize, tablestep))
561 for i in range(0, n, size):
562 prefix = "i %d\t" % i
564 for j in range(i, i + halfsize):
565 (jl, je), (jh, he), (ks, ke) = schedule[idx]
566 print (" %-3d\t%s j=%-2d jh=%-2d k=%-2d -> "
567 "j[jl=%-2d] j[jh=%-2d] ex[k=%d]" % \
568 (idx, prefix, j, j+halfsize, k,
571 "end", bin(je)[2:], bin(je)[2:], bin(ke)[2:])
577 if __name__ == '__main__':
583 DCT REMAP is RADIX2 only. Convolutions may be applied as usual
584 to create non-RADIX2 DCT. Combined with appropriate Twin-butterfly
585 instructions the algorithm below (written in python3), becomes part
586 of an in-place in-registers Vectorised DCT. The algorithms work
587 by loading data such that as the nested loops progress the result
588 is sorted into correct sequential order.
591 # DCT "REMAP" scheduler to create an in-place iterative DCT.
594 # bits of the integer 'val' of width 'width' are reversed
595 def reverse_bits(val, width):
597 for _ in range(width):
598 result = (result << 1) | (val & 1)
603 # iterative version of [recursively-applied] half-reversing
604 # turns out this is Gray-Encoding.
605 def halfrev2(vec, pre_rev=True):
607 for i in range(len(vec)):
609 res.append(vec[i ^ (i>>1)])
613 for ji in range(1, bl):
619 def iterate_dct_inner_halfswap_loadstore(SVSHAPE):
620 # get indices to iterate over, in the required order
622 mode = SVSHAPE.lims[1]
623 stride = SVSHAPE.lims[2]
625 # reference list for not needing to do data-swaps, just swap what
626 # *indices* are referenced (two levels of indirection at the moment)
627 # pre-reverse the data-swap list so that it *ends up* in the order 0123..
630 levels = n.bit_length() - 1
631 ri = [reverse_bits(i, levels) for i in range(n)]
633 if SVSHAPE.mode == 0b01: # FFT, bitrev only
634 ji = [ji[ri[i]] for i in range(n)]
635 elif SVSHAPE.submode2 == 0b001:
636 ji = [ji[ri[i]] for i in range(n)]
637 ji = halfrev2(ji, True)
639 ji = halfrev2(ji, False)
640 ji = [ji[ri[i]] for i in range(n)]
642 # invert order if requested
643 if SVSHAPE.invxyz[0]:
646 for i, jl in enumerate(ji):
648 yield jl * stride, (0b111 if y_end else 0b000)
650 def iterate_dct_inner_costable_indices(SVSHAPE):
651 # get indices to iterate over, in the required order
653 mode = SVSHAPE.lims[1]
654 stride = SVSHAPE.lims[2]
655 # creating lists of indices to iterate over in each dimension
656 # has to be done dynamically, because it depends on the size
657 # first, the size-based loop (which can be done statically)
663 # invert order if requested
664 if SVSHAPE.invxyz[0]:
670 # start an infinite (wrapping) loop
672 z_end = 1 # doesn't exist in this, only 2 loops
675 for size in x_r: # loop over 3rd order dimension (size)
676 x_end = size == x_r[-1]
677 # y_r schedule depends on size
680 for i in range(0, n, size):
682 # invert if requested
683 if SVSHAPE.invxyz[1]: y_r.reverse()
684 # two lists of half-range indices, e.g. j 0123, jr 7654
685 j = list(range(0, halfsize))
686 # invert if requested
687 if SVSHAPE.invxyz[2]: j_r.reverse()
688 # loop over 1st order dimension
689 for ci, jl in enumerate(j):
691 # now depending on MODE return the index. inner butterfly
692 if SVSHAPE.skip == 0b00: # in [0b00, 0b10]:
693 result = k # offset into COS table
694 elif SVSHAPE.skip == 0b10: #
695 result = ci # coefficient helper
696 elif SVSHAPE.skip == 0b11: #
697 result = size # coefficient helper
699 ((y_end and z_end)<<1) |
700 ((y_end and x_end and z_end)<<2))
702 yield (result * stride) + SVSHAPE.offset, loopends
705 def iterate_dct_inner_butterfly_indices(SVSHAPE):
706 # get indices to iterate over, in the required order
708 mode = SVSHAPE.lims[1]
709 stride = SVSHAPE.lims[2]
710 # creating lists of indices to iterate over in each dimension
711 # has to be done dynamically, because it depends on the size
712 # first, the size-based loop (which can be done statically)
718 # invert order if requested
719 if SVSHAPE.invxyz[0]:
725 # reference (read/write) the in-place data in *reverse-bit-order*
727 if SVSHAPE.submode2 == 0b01:
728 levels = n.bit_length() - 1
729 ri = [ri[reverse_bits(i, levels)] for i in range(n)]
731 # reference list for not needing to do data-swaps, just swap what
732 # *indices* are referenced (two levels of indirection at the moment)
733 # pre-reverse the data-swap list so that it *ends up* in the order 0123..
736 if inplace_mode and SVSHAPE.submode2 == 0b01:
737 ji = halfrev2(ji, True)
738 if inplace_mode and SVSHAPE.submode2 == 0b11:
739 ji = halfrev2(ji, False)
741 # start an infinite (wrapping) loop
745 for size in x_r: # loop over 3rd order dimension (size)
746 x_end = size == x_r[-1]
747 # y_r schedule depends on size
750 for i in range(0, n, size):
752 # invert if requested
753 if SVSHAPE.invxyz[1]: y_r.reverse()
754 for i in y_r: # loop over 2nd order dimension
756 # two lists of half-range indices, e.g. j 0123, jr 7654
757 j = list(range(i, i + halfsize))
758 jr = list(range(i+halfsize, i + size))
760 # invert if requested
761 if SVSHAPE.invxyz[2]:
764 hz2 = halfsize // 2 # zero stops reversing 1-item lists
765 # loop over 1st order dimension
767 for ci, (jl, jh) in enumerate(zip(j, jr)):
769 # now depending on MODE return the index. inner butterfly
770 if SVSHAPE.skip == 0b00: # in [0b00, 0b10]:
771 if SVSHAPE.submode2 == 0b11: # iDCT
772 result = ji[ri[jl]] # lower half
774 result = ri[ji[jl]] # lower half
775 elif SVSHAPE.skip == 0b01: # in [0b01, 0b11]:
776 if SVSHAPE.submode2 == 0b11: # iDCT
777 result = ji[ri[jl+halfsize]] # upper half
779 result = ri[ji[jh]] # upper half
781 # COS table pre-generated mode
782 if SVSHAPE.skip == 0b10: #
783 result = k # cos table offset
785 # COS table generated on-demand ("Vertical-First") mode
786 if SVSHAPE.skip == 0b10: #
787 result = ci # coefficient helper
788 elif SVSHAPE.skip == 0b11: #
789 result = size # coefficient helper
791 ((y_end and z_end)<<1) |
792 ((y_end and x_end and z_end)<<2))
794 yield (result * stride) + SVSHAPE.offset, loopends
799 for ci, (jl, jh) in enumerate(zip(j[:hz2], jr[:hz2])):
801 tmp1, tmp2 = ji[jlh], ji[jh]
802 ji[jlh], ji[jh] = tmp2, tmp1
804 # new k_start point for cos tables( runs inside x_r loop NOT i loop)
808 # python "yield" can be iterated. use this to make it clear how
809 # the indices are generated by using natural-looking nested loops
810 def iterate_dct_outer_butterfly_indices(SVSHAPE):
811 # get indices to iterate over, in the required order
813 mode = SVSHAPE.lims[1]
814 stride = SVSHAPE.lims[2]
815 # creating lists of indices to iterate over in each dimension
816 # has to be done dynamically, because it depends on the size
817 # first, the size-based loop (which can be done statically)
823 # invert order if requested
824 if SVSHAPE.invxyz[0]:
830 # I-DCT, reference (read/write) the in-place data in *reverse-bit-order*
832 if SVSHAPE.submode2 in [0b11, 0b01]:
833 levels = n.bit_length() - 1
834 ri = [ri[reverse_bits(i, levels)] for i in range(n)]
836 # reference list for not needing to do data-swaps, just swap what
837 # *indices* are referenced (two levels of indirection at the moment)
838 # pre-reverse the data-swap list so that it *ends up* in the order 0123..
840 inplace_mode = False # need the space... SVSHAPE.skip in [0b10, 0b11]
841 if SVSHAPE.submode2 == 0b11:
842 ji = halfrev2(ji, False)
844 # start an infinite (wrapping) loop
848 for size in x_r: # loop over 3rd order dimension (size)
850 x_end = size == x_r[-1]
851 y_r = list(range(0, halfsize))
852 # invert if requested
853 if SVSHAPE.invxyz[1]: y_r.reverse()
854 for i in y_r: # loop over 2nd order dimension
856 # one list to create iterative-sum schedule
857 jr = list(range(i+halfsize, i+n-halfsize, size))
858 # invert if requested
859 if SVSHAPE.invxyz[2]: jr.reverse()
860 hz2 = halfsize // 2 # zero stops reversing 1-item lists
862 for ci, jh in enumerate(jr): # loop over 1st order dimension
865 # COS table pre-generated mode
866 if SVSHAPE.skip == 0b00: # in [0b00, 0b10]:
867 if SVSHAPE.submode2 == 0b11: # iDCT
868 result = ji[ri[jh]] # upper half
870 result = ri[ji[jh]] # lower half
871 elif SVSHAPE.skip == 0b01: # in [0b01, 0b11]:
872 if SVSHAPE.submode2 == 0b11: # iDCT
873 result = ji[ri[jh+size]] # upper half
875 result = ri[ji[jh+size]] # upper half
876 elif SVSHAPE.skip == 0b10: #
877 result = k # cos table offset
879 # COS table generated on-demand ("Vertical-First") mode
880 if SVSHAPE.skip == 0b00: # in [0b00, 0b10]:
881 if SVSHAPE.submode2 == 0b11: # iDCT
882 result = ji[ri[jh]] # lower half
884 result = ri[ji[jh]] # lower half
885 elif SVSHAPE.skip == 0b01: # in [0b01, 0b11]:
886 if SVSHAPE.submode2 == 0b11: # iDCT
887 result = ji[ri[jh+size]] # upper half
889 result = ri[ji[jh+size]] # upper half
890 elif SVSHAPE.skip == 0b10: #
891 result = ci # coefficient helper
892 elif SVSHAPE.skip == 0b11: #
893 result = size # coefficient helper
895 ((y_end and z_end)<<1) |
896 ((y_end and x_end and z_end)<<2))
898 yield (result * stride) + SVSHAPE.offset, loopends
901 # new k_start point for cos tables( runs inside x_r loop NOT i loop)
908 Selecting which REMAP Schedule to use is shown by the pseudocode below.
909 Each SVSHAPE 0-3 goes through this selection process.
912 if SVSHAPEn.mode == 0b00: iterate_fn = iterate_indices
913 if SVSHAPEn.mode == 0b10: iterate_fn = iterate_preduce_indices
914 if SVSHAPEn.mode in [0b01, 0b11]:
915 # further sub-selection
916 if SVSHAPEn.ydimsz == 1: iterate_fn = iterate_butterfly_indices
917 if SVSHAPEn.ydimsz == 2: iterate_fn = iterate_dct_inner_butterfly_indices
918 if SVSHAPEn.ydimsz == 3: iterate_fn = iterate_dct_outer_butterfly_indices
919 if SVSHAPEn.ydimsz in [5, 13]: iterate_fn = iterate_dct_inner_costable_indices
920 if SVSHAPEn.ydimsz in [6, 14, 15]: iterate_fn = iterate_dct_inner_halfswap_loadstore