(no commit message)
[libreriscv.git] / 3d_gpu / architecture / dynamic_simd.mdwn
1 # Dynamic Partitioned SIMD
2
3 The Dynamic Partitioned SIMD Signal is effectively a parallelisation
4 of nmigen's Signal. It is expected to work transparently as if it was
5 a nmigen Signal, in every way, as a full peer of a nmigen Signal, with
6 no requirement on the part of the developer to even know that it is
7 performing parallel dynamically-partitioned operations.
8
9 nmigen 32-bit Signal:
10
11 a : .... .... .... .... (32 bits, illustrated as 4x8)
12
13 Dynamically-partitioned 32-bit Signal subdivided into four 8-bit
14 sections, by 3 partition bits:
15
16 partition: P P P (3 bits)
17 a : .... .... .... .... (32 bits, illustrated as 4x8)
18 exp-a : ....P....P....P.... (32+3 bits, P=0 if no partition)
19
20 Each partitioned section shall act as an independent Signal where the **partitioning is dynamic at runtime** and may subdivide the above example
21 into all 8 possible combinations of the 3 Partition bits:
22
23 exp-a : ....0....0....0.... 1x 32-bit
24 exp-a : ....0....0....1.... 1x 24-bit plus 1x 8-bit
25 exp-a : ....0....1....0.... 2x 16-bit
26 ...
27 ...
28 exp-a : ....1....1....0.... 2x 8-bit, 1x 16-bit
29 exp-a : ....1....1....1.... 4x 8-bit
30
31
32 Links:
33
34 * <https://bugs.libre-soc.org/show_bug.cgi?id=458> m.If/Switch
35 * <https://bugs.libre-soc.org/show_bug.cgi?id=115> top level SIMD
36 * <https://bugs.libre-soc.org/show_bug.cgi?id=707> Limited Cat
37 * <https://bugs.libre-soc.org/show_bug.cgi?id=594> RFC for nmigen integration
38 * <https://bugs.libre-soc.org/show_bug.cgi?id=565> Formal proof of PartitionedSignal
39 * <https://bugs.libre-soc.org/show_bug.cgi?id=596> Formal proof of PartitionedSignal nmigen interaction
40
41
42 To save hugely on gate count the normal practice of having separate scalar ALUs and separate SIMD ALUs is not followed.
43
44 Instead a suite of "partition points" identical in fashion to the Aspex Microelectronics ASP (Array-String-Architecture) architecture is deployed.
45
46 Basic principle: when all partition gates are open the ALU is subdivided into isolated and independent 8 bit SIMD ALUs. Whenever any one gate is opened, the relevant 8 bit "part-results" are chained together in a downstream cascade to create 16 bit, 32 bit, 64 bit and 128 bit compound results.
47
48 Pages below describe the basic features of each and track the relevant bugreports.
49
50 * [[dynamic_simd/eq]] aka `__eq__` not to be confused with nmigen eq
51 * [[dynamic_simd/assign]] nmigen eq (assignment)
52 * [[dynamic_simd/gt]]
53 * [[dynamic_simd/add]]
54 * [[dynamic_simd/cat]] - limited capability
55 * [[dynamic_simd/mul]]
56 * [[dynamic_simd/shift]]
57 * [[dynamic_simd/logicops]] some all xor
58
59 # Integration with nmigen
60
61 Dynamic partitioning of signals is not enough on its own. Normal nmigen programs involve conditional decisions, that means if statements and switch statements.
62
63 With the PartitionedSignal class, basic operations such as `x + y` are functional, producing results 1x64 bit, or 2x32 or 4x16 or 8x8 or anywhere in between, but what about control and decisions? Here is the "normal" way in which SIMD decisions are performed:
64
65 if partitions == 1x64
66 with m.If(x > y):
67 do something
68 elif partitions == 2x32:
69 with m.If(x[0:31] > y[0:31]):
70 do something on 1st half
71 elif ...
72 elif ...
73 # many more lines of repeated laborious hand written
74 # SIMD nonsense all exactly the same except for the
75 # for loop and sizes.
76
77 Clearly this is a total unmaintainable nightmare of worthless crud which, if continued throughout a large project with 40,000 lines of code when written without SIMD, would completely destroy all chances of that project being successful by turning 40,000 lines into 400,000 lines of unreadable spaghetti.
78
79 A much more intelligent approach is needed. What we actually want is:
80
81 with m.If(x > y): # do a partitioned compare here
82 do something dynamic here
83
84 where behind the scenes the above laborious for-loops (conceptually) are created, hidden, looking to all intents and purposes that this is exactly like any other nmigen Signal.
85
86 This means that nmigen needs to "understand" the partitioning, in m.If, m.Else and m.Switch, at the bare minimum.
87
88 Analysis of the internals of nmigen shows that m.If, m.Else and m.Switch are all redirected to `Value.cases`. Within that function Mux and other "global" functions (similar to python operator functions). The hypothesis is therefore proposed that if `Value.mux` is added in an identical way to how `operator.add` calls `__add__` this may turn out to be all that (or most of what) is needed.