(no commit message)
[libreriscv.git] / openpower / sv / rfc / ls009.mdwn
1 # RFC ls009 Simple-V REMAP Subsystem
2
3 **URLs**:
4
5 * <https://libre-soc.org/openpower/sv/rfc/ls009/>
6 * <https://bugs.libre-soc.org/show_bug.cgi?id=1042>
7 * <https://git.openpower.foundation/isa/PowerISA/issues/124>
8
9 **Severity**: Major
10
11 **Status**: New
12
13 **Date**: 26 Mar 2023
14
15 **Target**: v3.2B
16
17 **Source**: v3.0B
18
19 **Books and Section affected**:
20
21 ```
22 Book I, new Zero-Overhead-Loop Chapter.
23 Appendix E Power ISA sorted by opcode
24 Appendix F Power ISA sorted by version
25 Appendix G Power ISA sorted by Compliancy Subset
26 Appendix H Power ISA sorted by mnemonic
27 ```
28
29 **Summary**
30
31 ```
32 svremap - Re-Mapping of Register Element Offsets
33 svindex - General-purpose setting of SHAPEs to be re-mapped
34 svshape - Hardware-level setting of SHAPEs for element re-mapping
35 svshape2 - Hardware-level setting of SHAPEs for element re-mapping (v2)
36 ```
37
38 **Submitter**: Luke Leighton (Libre-SOC)
39
40 **Requester**: Libre-SOC
41
42 **Impact on processor**:
43
44 ```
45 Addition of four new "Zero-Overhead-Loop-Control" DSP-style Vector-style
46 Management Instructions which provide advanced features such as Matrix
47 FFT DCT Hardware-Assist Schedules and general-purpose Index reordering.
48 ```
49
50 **Impact on software**:
51
52 ```
53 Requires support for new instructions in assembler, debuggers,
54 and related tools.
55 ```
56
57 **Keywords**:
58
59 ```
60 Cray Supercomputing, Vectorisation, Zero-Overhead-Loop-Control (ZOLC),
61 Scalable Vectors, Multi-Issue Out-of-Order, Sequential Programming Model,
62 Digital Signal Processing (DSP)
63 ```
64
65 **Motivation**
66
67 These REMAP Management instructions provide state-of-the-art advanced capabilities
68 to dramatically decrease instruction count and power reduction whilst retaining
69 unprecedented general-purpose capability and a standard Sequential Execution Model.
70
71 **Notes and Observations**:
72
73 1. TODO
74
75 **Changes**
76
77 Add the following entries to:
78
79 * the Appendices of SV Book
80 * Instructions of SV Book as a new Section
81 * SVI, SVM, SVM2, SVRM Form of Book I Section 1.6.1.6 and 1.6.2
82
83 ----------------
84
85 \newpage{}
86
87 [[!inline pages="openpower/sv/remap" raw=yes ]]
88
89 # Forms
90
91 Add `SVI, SVM, SVRM` to `XO (26:31)` Field in Book I, 1.6.2
92
93 Add the following to Book I, 1.6.1, SVI-Form
94
95 ```
96 |0 |6 |11 |16 |21 |23 |24|25|26 31|
97 | PO | SVG|rmm | SVd |ew |SVyx|mm|sk| XO |
98 ```
99
100 Add the following to Book I, 1.6.1, SVM-Form
101
102 ```
103 |0 |6 |11 |16 |21 |25 |26 |31 |
104 | PO | SVxd | SVyd | SVzd | SVrm |vf | XO |
105 ```
106
107 Add the following to Book I, 1.6.1, SVM2-Form
108
109 ```
110 |0 |6 |10 |11 |16 |21 |24|25 |26 |31 |
111 | PO | SVo |SVyx| rmm | SVd |XO |mm|sk | XO |
112 ```
113
114 Add the following to Book I, 1.6.1, SVRM-Form
115
116 ```
117 |0 |6 |11 |13 |15 |17 |19 |21 |22 |26 |31 |
118 | PO | SVme |mi0 | mi1 | mi2 | mo0 | mo1 |pst |/// | XO |
119 ```
120
121 Add the following to Book I, 1.6.2
122
123 ```
124 mi0 (11:12)
125 Field used in REMAP to select the SVSHAPE for 1st input register
126 Formats: SVRM
127 mi1 (13:14)
128 Field used in REMAP to select the SVSHAPE for 2nd input register
129 Formats: SVRM
130 mi2 (15:16)
131 Field used in REMAP to select the SVSHAPE for 3rd input register
132 Formats: SVRM
133 mm (24)
134 Field used to specify the meaning of the rmm field for SVI-Form
135 and SVM2-Form
136 Formats: SVI, SVM2
137 mo0 (17:18)
138 Field used in REMAP to select the SVSHAPE for 1st output register
139 Formats: SVRM
140 mo1 (19:20)
141 Field used in REMAP to select the SVSHAPE for 2nd output register
142 Formats: SVRM
143 pst (21)
144 Field used in REMAP to indicate "persistence" mode (REMAP
145 continues to apply to multiple instructions)
146 Formats: SVRM
147 rmm (11:15)
148 REMAP Mode field for SVI-Form and SVM2-Form
149 Formats: SVI, SVM2
150 sk (25)
151 Field used to specify dimensional skipping in svindex
152 Formats: SVI, SVM2
153 SVd (16:20)
154 Immediate field used to specify the size of the REMAP dimension
155 in the svindex and svshape2 instructions
156 Formats: SVI, SVM2
157 SVDS (16:29)
158 Immediate field used to specify a 9-bit signed
159 two's complement integer which is concatenated
160 on the right with 0b00 and sign-extended to 64 bits.
161 Formats: SVDS
162 SVG (6:10)
163 Field used to specify a GPR to be used as a
164 source for indexing.
165 Formats: SVI
166 SVi (16:22)
167 Simple-V immediate field for setting VL or MVL
168 Formats: SVL
169 SVme (6:10)
170 Simple-V "REMAP" map-enable bits (0-4)
171 Formats: SVRM
172 SVo (6:9)
173 Field used by the svshape2 instruction as an offset
174 Formats: SVM2
175 SVrm (21:24)
176 Simple-V "REMAP" Mode
177 Formats: SVM
178 SVxd (6:10)
179 Simple-V "REMAP" x-dimension size
180 Formats: SVM
181 SVyd (11:15)
182 Simple-V "REMAP" y-dimension size
183 Formats: SVM
184 SVzd (16:20)
185 Simple-V "REMAP" z-dimension size
186 Formats: SVM
187 XO (21:23,26:31)
188 Extended opcode field. Note that bit 21 must be 1, 22 and 23
189 must be zero, and bits 26-31 must be exactly the same as
190 used for svshape.
191 Formats: SVM2
192 ```
193
194 # Appendices
195
196 Appendix E Power ISA sorted by opcode
197 Appendix F Power ISA sorted by version
198 Appendix G Power ISA sorted by Compliancy Subset
199 Appendix H Power ISA sorted by mnemonic
200
201 | Form | Book | Page | Version | mnemonic | Description |
202 |------|------|------|---------|----------|-------------|
203 | SVRM | I | # | 3.0B | svremap | REMAP enabling instruction |
204 | SVM | I | # | 3.0B | svshape | REMAP shape instruction |
205 | SVM2 | I | # | 3.0B | svshape2 | REMAP shape instruction (2) |
206 | SVI | I | # | 3.0B | svindex | REMAP General-purpose Indexing |
207
208 [[!inline pages="openpower/sv/remap/appendix" raw=yes ]]
209
210 ## REMAP pseudocode
211
212 Written in python3 the following stand-alone executable source code is the Canonical
213 Specification for each REMAP. Vectors of "loopends" are returned when Rc=1
214 in Vectors of CR Fields on `sv.svstep.`, or in Vertical-First Mode
215 a single CR Field (CR0) on `svstep.`. The `SVSTATE.srcstep` or `SVSTATE.dststep` sequential
216 offset is put through each algorithm to determine the actual Element Offset.
217 Alternative implementations producing different ordering
218 is prohibited as software will be critically relying on these Deterministic Schedules.
219
220 ### REMAP 2D/3D Matrix
221
222 The following stand-alone executable source code is the Canonical
223 Specification for Matrix (2D/3D) REMAP.
224 Hardware implementations are achievable with simple cascading counter-and-compares.
225
226 ```
227 # python "yield" can be iterated. use this to make it clear how
228 # the indices are generated by using natural-looking nested loops
229 def iterate_indices(SVSHAPE):
230 # get indices to iterate over, in the required order
231 xd = SVSHAPE.lims[0]
232 yd = SVSHAPE.lims[1]
233 zd = SVSHAPE.lims[2]
234 # create lists of indices to iterate over in each dimension
235 x_r = list(range(xd))
236 y_r = list(range(yd))
237 z_r = list(range(zd))
238 # invert the indices if needed
239 if SVSHAPE.invxyz[0]: x_r.reverse()
240 if SVSHAPE.invxyz[1]: y_r.reverse()
241 if SVSHAPE.invxyz[2]: z_r.reverse()
242 # start an infinite (wrapping) loop
243 step = 0 # track src/dst step
244 while True:
245 for z in z_r: # loop over 1st order dimension
246 z_end = z == z_r[-1]
247 for y in y_r: # loop over 2nd order dimension
248 y_end = y == y_r[-1]
249 for x in x_r: # loop over 3rd order dimension
250 x_end = x == x_r[-1]
251 # ok work out which order to construct things in.
252 # start by creating a list of tuples of the dimension
253 # and its limit
254 vals = [(SVSHAPE.lims[0], x, "x"),
255 (SVSHAPE.lims[1], y, "y"),
256 (SVSHAPE.lims[2], z, "z")
257 ]
258 # now select those by order. this allows us to
259 # create schedules for [z][x], [x][y], or [y][z]
260 # for matrix multiply.
261 vals = [vals[SVSHAPE.order[0]],
262 vals[SVSHAPE.order[1]],
263 vals[SVSHAPE.order[2]]
264 ]
265 # ok now we can construct the result, using bits of
266 # "order" to say which ones get stacked on
267 result = 0
268 mult = 1
269 for i in range(3):
270 lim, idx, dbg = vals[i]
271 # some of the dimensions can be "skipped". the order
272 # was actually selected above on all 3 dimensions,
273 # e.g. [z][x][y] or [y][z][x]. "skip" allows one of
274 # those to be knocked out
275 if SVSHAPE.skip == i+1: continue
276 idx *= mult # shifts up by previous dimension(s)
277 result += idx # adds on this dimension
278 mult *= lim # for the next dimension
279
280 loopends = (x_end |
281 ((y_end and x_end)<<1) |
282 ((y_end and x_end and z_end)<<2))
283
284 yield result + SVSHAPE.offset, loopends
285 step += 1
286
287 def demo():
288 # set the dimension sizes here
289 xdim = 3
290 ydim = 2
291 zdim = 4
292
293 # set total (can repeat, e.g. VL=x*y*z*4)
294 VL = xdim * ydim * zdim
295
296 # set up an SVSHAPE
297 class SVSHAPE:
298 pass
299 SVSHAPE0 = SVSHAPE()
300 SVSHAPE0.lims = [xdim, ydim, zdim]
301 SVSHAPE0.order = [1,0,2] # experiment with different permutations, here
302 SVSHAPE0.mode = 0b00
303 SVSHAPE0.skip = 0b00
304 SVSHAPE0.offset = 0 # experiment with different offset, here
305 SVSHAPE0.invxyz = [0,0,0] # inversion if desired
306
307 # enumerate over the iterator function, getting new indices
308 for idx, (new_idx, end) in enumerate(iterate_indices(SVSHAPE0)):
309 if idx >= VL:
310 break
311 print ("%d->%d" % (idx, new_idx), "end", bin(end)[2:])
312
313 # run the demo
314 if __name__ == '__main__':
315 demo()
316 ```
317
318 ### REMAP Parallel Reduction pseudocode
319
320 The python3 program below is stand-alone executable and is the Canonical Specification
321 for Parallel Reduction REMAP.
322 The Algorithm below is not limited to RADIX2 sizes, and Predicate
323 sources, unlike in Matrix REMAP, apply to the Element Indices **after** REMAP
324 has been applied, not before. MV operations are not required: the algorithm
325 tracks positions of elements that would normally be moved and when applying
326 an Element Reduction Operation sources the operands from their last-known (tracked)
327 position.
328
329 ```
330 # a "yield" version of the Parallel Reduction REMAP algorithm.
331 # the algorithm is in-place. it does not perform "MV" operations.
332 # instead, where a masked-out value *should* be read from is tracked
333
334 def iterate_indices(SVSHAPE, pred=None):
335 # get indices to iterate over, in the required order
336 xd = SVSHAPE.lims[0]
337 # create lists of indices to iterate over in each dimension
338 ix = list(range(xd))
339 # invert the indices if needed
340 if SVSHAPE.invxyz[0]: ix.reverse()
341 # start a loop from the lowest step
342 step = 1
343 steps = []
344 while step < xd:
345 step *= 2
346 steps.append(step)
347 # invert the indices if needed
348 if SVSHAPE.invxyz[1]: steps.reverse()
349 for step in steps:
350 stepend = (step == steps[-1]) # note end of steps
351 idxs = list(range(0, xd, step))
352 results = []
353 for i in idxs:
354 other = i + step // 2
355 ci = ix[i]
356 oi = ix[other] if other < xd else None
357 other_pred = other < xd and (pred is None or pred[oi])
358 if (pred is None or pred[ci]) and other_pred:
359 if SVSHAPE.skip == 0b00: # submode 00
360 result = ci
361 elif SVSHAPE.skip == 0b01: # submode 01
362 result = oi
363 results.append([result + SVSHAPE.offset, 0])
364 elif other_pred:
365 ix[i] = oi
366 if results:
367 results[-1][1] = (stepend<<1) | 1 # notify end of loops
368 yield from results
369
370 def demo():
371 # set the dimension sizes here
372 xdim = 9
373
374 # set up an SVSHAPE
375 class SVSHAPE:
376 pass
377 SVSHAPE0 = SVSHAPE()
378 SVSHAPE0.lims = [xdim, 0, 0]
379 SVSHAPE0.order = [0,1,2]
380 SVSHAPE0.mode = 0b10
381 SVSHAPE0.skip = 0b00
382 SVSHAPE0.offset = 0 # experiment with different offset, here
383 SVSHAPE0.invxyz = [0,0,0] # inversion if desired
384
385 SVSHAPE1 = SVSHAPE()
386 SVSHAPE1.lims = [xdim, 0, 0]
387 SVSHAPE1.order = [0,1,2]
388 SVSHAPE1.mode = 0b10
389 SVSHAPE1.skip = 0b01
390 SVSHAPE1.offset = 0 # experiment with different offset, here
391 SVSHAPE1.invxyz = [0,0,0] # inversion if desired
392
393 # enumerate over the iterator function, getting new indices
394 shapes = list(iterate_indices(SVSHAPE0)), \
395 list(iterate_indices(SVSHAPE1))
396 for idx in range(len(shapes[0])):
397 l = shapes[0][idx]
398 r = shapes[1][idx]
399 (l_idx, lend) = l
400 (r_idx, rend) = r
401 print ("%d->%d:%d" % (idx, l_idx, r_idx),
402 "end", bin(lend)[2:], bin(rend)[2:])
403
404 # run the demo
405 if __name__ == '__main__':
406 demo()
407 ```
408
409 ### REMAP FFT pseudocode
410
411 The FFT REMAP is RADIX2 only.
412
413 ```
414 # a "yield" version of the REMAP algorithm, for FFT Tukey-Cooley schedules
415 # original code for the FFT Tukey-Cooley schedule:
416 # https://www.nayuki.io/res/free-small-fft-in-multiple-languages/fft.py
417 """
418 # Radix-2 decimation-in-time FFT (real, not complex)
419 size = 2
420 while size <= n:
421 halfsize = size // 2
422 tablestep = n // size
423 for i in range(0, n, size):
424 k = 0
425 for j in range(i, i + halfsize):
426 jh = j+halfsize
427 jl = j
428 temp1 = vec[jh] * exptable[k]
429 temp2 = vec[jl]
430 vec[jh] = temp2 - temp1
431 vec[jl] = temp2 + temp1
432 k += tablestep
433 size *= 2
434 """
435
436 # python "yield" can be iterated. use this to make it clear how
437 # the indices are generated by using natural-looking nested loops
438 def iterate_butterfly_indices(SVSHAPE):
439 # get indices to iterate over, in the required order
440 n = SVSHAPE.lims[0]
441 stride = SVSHAPE.lims[2] # stride-multiplier on reg access
442 # creating lists of indices to iterate over in each dimension
443 # has to be done dynamically, because it depends on the size
444 # first, the size-based loop (which can be done statically)
445 x_r = []
446 size = 2
447 while size <= n:
448 x_r.append(size)
449 size *= 2
450 # invert order if requested
451 if SVSHAPE.invxyz[0]: x_r.reverse()
452
453 if len(x_r) == 0:
454 return
455
456 # start an infinite (wrapping) loop
457 skip = 0
458 while True:
459 for size in x_r: # loop over 3rd order dimension (size)
460 x_end = size == x_r[-1]
461 # y_r schedule depends on size
462 halfsize = size // 2
463 tablestep = n // size
464 y_r = []
465 for i in range(0, n, size):
466 y_r.append(i)
467 # invert if requested
468 if SVSHAPE.invxyz[1]: y_r.reverse()
469 for i in y_r: # loop over 2nd order dimension
470 y_end = i == y_r[-1]
471 k_r = []
472 j_r = []
473 k = 0
474 for j in range(i, i+halfsize):
475 k_r.append(k)
476 j_r.append(j)
477 k += tablestep
478 # invert if requested
479 if SVSHAPE.invxyz[2]: k_r.reverse()
480 if SVSHAPE.invxyz[2]: j_r.reverse()
481 for j, k in zip(j_r, k_r): # loop over 1st order dimension
482 z_end = j == j_r[-1]
483 # now depending on MODE return the index
484 if SVSHAPE.skip == 0b00:
485 result = j # for vec[j]
486 elif SVSHAPE.skip == 0b01:
487 result = j + halfsize # for vec[j+halfsize]
488 elif SVSHAPE.skip == 0b10:
489 result = k # for exptable[k]
490
491 loopends = (z_end |
492 ((y_end and z_end)<<1) |
493 ((y_end and x_end and z_end)<<2))
494
495 yield (result * stride) + SVSHAPE.offset, loopends
496
497 def demo():
498 # set the dimension sizes here
499 xdim = 8
500 ydim = 0 # not needed
501 zdim = 1 # stride must be set to 1
502
503 # set total. err don't know how to calculate how many there are...
504 # do it manually for now
505 VL = 0
506 size = 2
507 n = xdim
508 while size <= n:
509 halfsize = size // 2
510 tablestep = n // size
511 for i in range(0, n, size):
512 for j in range(i, i + halfsize):
513 VL += 1
514 size *= 2
515
516 # set up an SVSHAPE
517 class SVSHAPE:
518 pass
519 # j schedule
520 SVSHAPE0 = SVSHAPE()
521 SVSHAPE0.lims = [xdim, ydim, zdim]
522 SVSHAPE0.order = [0,1,2] # experiment with different permutations, here
523 SVSHAPE0.mode = 0b01
524 SVSHAPE0.skip = 0b00
525 SVSHAPE0.offset = 0 # experiment with different offset, here
526 SVSHAPE0.invxyz = [0,0,0] # inversion if desired
527 # j+halfstep schedule
528 SVSHAPE1 = SVSHAPE()
529 SVSHAPE1.lims = [xdim, ydim, zdim]
530 SVSHAPE1.order = [0,1,2] # experiment with different permutations, here
531 SVSHAPE0.mode = 0b01
532 SVSHAPE1.skip = 0b01
533 SVSHAPE1.offset = 0 # experiment with different offset, here
534 SVSHAPE1.invxyz = [0,0,0] # inversion if desired
535 # k schedule
536 SVSHAPE2 = SVSHAPE()
537 SVSHAPE2.lims = [xdim, ydim, zdim]
538 SVSHAPE2.order = [0,1,2] # experiment with different permutations, here
539 SVSHAPE0.mode = 0b01
540 SVSHAPE2.skip = 0b10
541 SVSHAPE2.offset = 0 # experiment with different offset, here
542 SVSHAPE2.invxyz = [0,0,0] # inversion if desired
543
544 # enumerate over the iterator function, getting new indices
545 schedule = []
546 for idx, (jl, jh, k) in enumerate(zip(iterate_butterfly_indices(SVSHAPE0),
547 iterate_butterfly_indices(SVSHAPE1),
548 iterate_butterfly_indices(SVSHAPE2))):
549 if idx >= VL:
550 break
551 schedule.append((jl, jh, k))
552
553 # ok now pretty-print the results, with some debug output
554 size = 2
555 idx = 0
556 while size <= n:
557 halfsize = size // 2
558 tablestep = n // size
559 print ("size %d halfsize %d tablestep %d" % \
560 (size, halfsize, tablestep))
561 for i in range(0, n, size):
562 prefix = "i %d\t" % i
563 k = 0
564 for j in range(i, i + halfsize):
565 (jl, je), (jh, he), (ks, ke) = schedule[idx]
566 print (" %-3d\t%s j=%-2d jh=%-2d k=%-2d -> "
567 "j[jl=%-2d] j[jh=%-2d] ex[k=%d]" % \
568 (idx, prefix, j, j+halfsize, k,
569 jl, jh, ks,
570 ),
571 "end", bin(je)[2:], bin(je)[2:], bin(ke)[2:])
572 k += tablestep
573 idx += 1
574 size *= 2
575
576 # run the demo
577 if __name__ == '__main__':
578 demo()
579 ```
580
581 ### DCT REMAP
582
583 DCT REMAP is RADIX2 only. Convolutions may be applied as usual
584 to create non-RADIX2 DCT. Combined with appropriate Twin-butterfly
585 instructions the algorithm below (written in python3), becomes part
586 of an in-place in-registers Vectorised DCT. The algorithms work
587 by loading data such that as the nested loops progress the result
588 is sorted into correct sequential order.
589
590 ```
591 # DCT "REMAP" scheduler to create an in-place iterative DCT.
592 #
593
594 # bits of the integer 'val' of width 'width' are reversed
595 def reverse_bits(val, width):
596 result = 0
597 for _ in range(width):
598 result = (result << 1) | (val & 1)
599 val >>= 1
600 return result
601
602
603 # iterative version of [recursively-applied] half-reversing
604 # turns out this is Gray-Encoding.
605 def halfrev2(vec, pre_rev=True):
606 res = []
607 for i in range(len(vec)):
608 if pre_rev:
609 res.append(vec[i ^ (i>>1)])
610 else:
611 ri = i
612 bl = i.bit_length()
613 for ji in range(1, bl):
614 ri ^= (i >> ji)
615 res.append(vec[ri])
616 return res
617
618
619 def iterate_dct_inner_halfswap_loadstore(SVSHAPE):
620 # get indices to iterate over, in the required order
621 n = SVSHAPE.lims[0]
622 mode = SVSHAPE.lims[1]
623 stride = SVSHAPE.lims[2]
624
625 # reference list for not needing to do data-swaps, just swap what
626 # *indices* are referenced (two levels of indirection at the moment)
627 # pre-reverse the data-swap list so that it *ends up* in the order 0123..
628 ji = list(range(n))
629
630 levels = n.bit_length() - 1
631 ri = [reverse_bits(i, levels) for i in range(n)]
632
633 if SVSHAPE.mode == 0b01: # FFT, bitrev only
634 ji = [ji[ri[i]] for i in range(n)]
635 elif SVSHAPE.submode2 == 0b001:
636 ji = [ji[ri[i]] for i in range(n)]
637 ji = halfrev2(ji, True)
638 else:
639 ji = halfrev2(ji, False)
640 ji = [ji[ri[i]] for i in range(n)]
641
642 # invert order if requested
643 if SVSHAPE.invxyz[0]:
644 ji.reverse()
645
646 for i, jl in enumerate(ji):
647 y_end = jl == ji[-1]
648 yield jl * stride, (0b111 if y_end else 0b000)
649
650 def iterate_dct_inner_costable_indices(SVSHAPE):
651 # get indices to iterate over, in the required order
652 n = SVSHAPE.lims[0]
653 mode = SVSHAPE.lims[1]
654 stride = SVSHAPE.lims[2]
655 # creating lists of indices to iterate over in each dimension
656 # has to be done dynamically, because it depends on the size
657 # first, the size-based loop (which can be done statically)
658 x_r = []
659 size = 2
660 while size <= n:
661 x_r.append(size)
662 size *= 2
663 # invert order if requested
664 if SVSHAPE.invxyz[0]:
665 x_r.reverse()
666
667 if len(x_r) == 0:
668 return
669
670 # start an infinite (wrapping) loop
671 skip = 0
672 z_end = 1 # doesn't exist in this, only 2 loops
673 k = 0
674 while True:
675 for size in x_r: # loop over 3rd order dimension (size)
676 x_end = size == x_r[-1]
677 # y_r schedule depends on size
678 halfsize = size // 2
679 y_r = []
680 for i in range(0, n, size):
681 y_r.append(i)
682 # invert if requested
683 if SVSHAPE.invxyz[1]: y_r.reverse()
684 # two lists of half-range indices, e.g. j 0123, jr 7654
685 j = list(range(0, halfsize))
686 # invert if requested
687 if SVSHAPE.invxyz[2]: j_r.reverse()
688 # loop over 1st order dimension
689 for ci, jl in enumerate(j):
690 y_end = jl == j[-1]
691 # now depending on MODE return the index. inner butterfly
692 if SVSHAPE.skip == 0b00: # in [0b00, 0b10]:
693 result = k # offset into COS table
694 elif SVSHAPE.skip == 0b10: #
695 result = ci # coefficient helper
696 elif SVSHAPE.skip == 0b11: #
697 result = size # coefficient helper
698 loopends = (z_end |
699 ((y_end and z_end)<<1) |
700 ((y_end and x_end and z_end)<<2))
701
702 yield (result * stride) + SVSHAPE.offset, loopends
703 k += 1
704
705 def iterate_dct_inner_butterfly_indices(SVSHAPE):
706 # get indices to iterate over, in the required order
707 n = SVSHAPE.lims[0]
708 mode = SVSHAPE.lims[1]
709 stride = SVSHAPE.lims[2]
710 # creating lists of indices to iterate over in each dimension
711 # has to be done dynamically, because it depends on the size
712 # first, the size-based loop (which can be done statically)
713 x_r = []
714 size = 2
715 while size <= n:
716 x_r.append(size)
717 size *= 2
718 # invert order if requested
719 if SVSHAPE.invxyz[0]:
720 x_r.reverse()
721
722 if len(x_r) == 0:
723 return
724
725 # reference (read/write) the in-place data in *reverse-bit-order*
726 ri = list(range(n))
727 if SVSHAPE.submode2 == 0b01:
728 levels = n.bit_length() - 1
729 ri = [ri[reverse_bits(i, levels)] for i in range(n)]
730
731 # reference list for not needing to do data-swaps, just swap what
732 # *indices* are referenced (two levels of indirection at the moment)
733 # pre-reverse the data-swap list so that it *ends up* in the order 0123..
734 ji = list(range(n))
735 inplace_mode = True
736 if inplace_mode and SVSHAPE.submode2 == 0b01:
737 ji = halfrev2(ji, True)
738 if inplace_mode and SVSHAPE.submode2 == 0b11:
739 ji = halfrev2(ji, False)
740
741 # start an infinite (wrapping) loop
742 while True:
743 k = 0
744 k_start = 0
745 for size in x_r: # loop over 3rd order dimension (size)
746 x_end = size == x_r[-1]
747 # y_r schedule depends on size
748 halfsize = size // 2
749 y_r = []
750 for i in range(0, n, size):
751 y_r.append(i)
752 # invert if requested
753 if SVSHAPE.invxyz[1]: y_r.reverse()
754 for i in y_r: # loop over 2nd order dimension
755 y_end = i == y_r[-1]
756 # two lists of half-range indices, e.g. j 0123, jr 7654
757 j = list(range(i, i + halfsize))
758 jr = list(range(i+halfsize, i + size))
759 jr.reverse()
760 # invert if requested
761 if SVSHAPE.invxyz[2]:
762 j.reverse()
763 jr.reverse()
764 hz2 = halfsize // 2 # zero stops reversing 1-item lists
765 # loop over 1st order dimension
766 k = k_start
767 for ci, (jl, jh) in enumerate(zip(j, jr)):
768 z_end = jl == j[-1]
769 # now depending on MODE return the index. inner butterfly
770 if SVSHAPE.skip == 0b00: # in [0b00, 0b10]:
771 if SVSHAPE.submode2 == 0b11: # iDCT
772 result = ji[ri[jl]] # lower half
773 else:
774 result = ri[ji[jl]] # lower half
775 elif SVSHAPE.skip == 0b01: # in [0b01, 0b11]:
776 if SVSHAPE.submode2 == 0b11: # iDCT
777 result = ji[ri[jl+halfsize]] # upper half
778 else:
779 result = ri[ji[jh]] # upper half
780 elif mode == 4:
781 # COS table pre-generated mode
782 if SVSHAPE.skip == 0b10: #
783 result = k # cos table offset
784 else: # mode 2
785 # COS table generated on-demand ("Vertical-First") mode
786 if SVSHAPE.skip == 0b10: #
787 result = ci # coefficient helper
788 elif SVSHAPE.skip == 0b11: #
789 result = size # coefficient helper
790 loopends = (z_end |
791 ((y_end and z_end)<<1) |
792 ((y_end and x_end and z_end)<<2))
793
794 yield (result * stride) + SVSHAPE.offset, loopends
795 k += 1
796
797 # now in-place swap
798 if inplace_mode:
799 for ci, (jl, jh) in enumerate(zip(j[:hz2], jr[:hz2])):
800 jlh = jl+halfsize
801 tmp1, tmp2 = ji[jlh], ji[jh]
802 ji[jlh], ji[jh] = tmp2, tmp1
803
804 # new k_start point for cos tables( runs inside x_r loop NOT i loop)
805 k_start += halfsize
806
807
808 # python "yield" can be iterated. use this to make it clear how
809 # the indices are generated by using natural-looking nested loops
810 def iterate_dct_outer_butterfly_indices(SVSHAPE):
811 # get indices to iterate over, in the required order
812 n = SVSHAPE.lims[0]
813 mode = SVSHAPE.lims[1]
814 stride = SVSHAPE.lims[2]
815 # creating lists of indices to iterate over in each dimension
816 # has to be done dynamically, because it depends on the size
817 # first, the size-based loop (which can be done statically)
818 x_r = []
819 size = n // 2
820 while size >= 2:
821 x_r.append(size)
822 size //= 2
823 # invert order if requested
824 if SVSHAPE.invxyz[0]:
825 x_r.reverse()
826
827 if len(x_r) == 0:
828 return
829
830 # I-DCT, reference (read/write) the in-place data in *reverse-bit-order*
831 ri = list(range(n))
832 if SVSHAPE.submode2 in [0b11, 0b01]:
833 levels = n.bit_length() - 1
834 ri = [ri[reverse_bits(i, levels)] for i in range(n)]
835
836 # reference list for not needing to do data-swaps, just swap what
837 # *indices* are referenced (two levels of indirection at the moment)
838 # pre-reverse the data-swap list so that it *ends up* in the order 0123..
839 ji = list(range(n))
840 inplace_mode = False # need the space... SVSHAPE.skip in [0b10, 0b11]
841 if SVSHAPE.submode2 == 0b11:
842 ji = halfrev2(ji, False)
843
844 # start an infinite (wrapping) loop
845 while True:
846 k = 0
847 k_start = 0
848 for size in x_r: # loop over 3rd order dimension (size)
849 halfsize = size//2
850 x_end = size == x_r[-1]
851 y_r = list(range(0, halfsize))
852 # invert if requested
853 if SVSHAPE.invxyz[1]: y_r.reverse()
854 for i in y_r: # loop over 2nd order dimension
855 y_end = i == y_r[-1]
856 # one list to create iterative-sum schedule
857 jr = list(range(i+halfsize, i+n-halfsize, size))
858 # invert if requested
859 if SVSHAPE.invxyz[2]: jr.reverse()
860 hz2 = halfsize // 2 # zero stops reversing 1-item lists
861 k = k_start
862 for ci, jh in enumerate(jr): # loop over 1st order dimension
863 z_end = jh == jr[-1]
864 if mode == 4:
865 # COS table pre-generated mode
866 if SVSHAPE.skip == 0b00: # in [0b00, 0b10]:
867 if SVSHAPE.submode2 == 0b11: # iDCT
868 result = ji[ri[jh]] # upper half
869 else:
870 result = ri[ji[jh]] # lower half
871 elif SVSHAPE.skip == 0b01: # in [0b01, 0b11]:
872 if SVSHAPE.submode2 == 0b11: # iDCT
873 result = ji[ri[jh+size]] # upper half
874 else:
875 result = ri[ji[jh+size]] # upper half
876 elif SVSHAPE.skip == 0b10: #
877 result = k # cos table offset
878 else:
879 # COS table generated on-demand ("Vertical-First") mode
880 if SVSHAPE.skip == 0b00: # in [0b00, 0b10]:
881 if SVSHAPE.submode2 == 0b11: # iDCT
882 result = ji[ri[jh]] # lower half
883 else:
884 result = ri[ji[jh]] # lower half
885 elif SVSHAPE.skip == 0b01: # in [0b01, 0b11]:
886 if SVSHAPE.submode2 == 0b11: # iDCT
887 result = ji[ri[jh+size]] # upper half
888 else:
889 result = ri[ji[jh+size]] # upper half
890 elif SVSHAPE.skip == 0b10: #
891 result = ci # coefficient helper
892 elif SVSHAPE.skip == 0b11: #
893 result = size # coefficient helper
894 loopends = (z_end |
895 ((y_end and z_end)<<1) |
896 ((y_end and x_end and z_end)<<2))
897
898 yield (result * stride) + SVSHAPE.offset, loopends
899 k += 1
900
901 # new k_start point for cos tables( runs inside x_r loop NOT i loop)
902 k_start += halfsize
903
904 ```
905
906 ## REMAP selector
907
908 Selecting which REMAP Schedule to use is shown by the pseudocode below.
909 Each SVSHAPE 0-3 goes through this selection process.
910
911 ```
912 if SVSHAPEn.mode == 0b00: iterate_fn = iterate_indices
913 if SVSHAPEn.mode == 0b10: iterate_fn = iterate_preduce_indices
914 if SVSHAPEn.mode in [0b01, 0b11]:
915 # further sub-selection
916 if SVSHAPEn.ydimsz == 1: iterate_fn = iterate_butterfly_indices
917 if SVSHAPEn.ydimsz == 2: iterate_fn = iterate_dct_inner_butterfly_indices
918 if SVSHAPEn.ydimsz == 3: iterate_fn = iterate_dct_outer_butterfly_indices
919 if SVSHAPEn.ydimsz in [5, 13]: iterate_fn = iterate_dct_inner_costable_indices
920 if SVSHAPEn.ydimsz in [6, 14, 15]: iterate_fn = iterate_dct_inner_halfswap_loadstore
921 ```
922
923
924 [[!tag opf_rfc]]