157053e3b82fffa8f1715ea01394fe1965b36e88
[libreriscv.git] / 3d_gpu / architecture / dynamic_simd.mdwn
1 # Dynamic Partitioned SIMD
2
3 The Dynamic Partitioned SIMD Signal is effectively a parallelisation
4 of nmigen's Signal. It is expected to work transparently as if it was
5 a nmigen Signal, in every way, as a full peer of a nmigen Signal, with
6 no requirement on the part of the developer to even know that it is
7 performing parallel dynamically-partitioned operations.
8
9 nmigen 32-bit Signal:
10
11 a : .... .... .... .... (32 bits, illustrated as 4x8)
12
13 Dynamically-partitioned 32-bit Signal subdivided into four 8-bit
14 sections, by 3 partition bits:
15
16 partition: P P P (3 bits)
17 a : .... .... .... .... (32 bits, illustrated as 4x8)
18 exp-a : ....P....P....P.... (32+3 bits, P=0 if no partition)
19
20 Each partitioned section shall act as an independent Signal where the **partitioning is dynamic at runtime** and may subdivide the above example
21 into all 8 possible combinations of the 3 Partition bits:
22
23 exp-a : ....0....0....0.... 1x 32-bit
24 exp-a : ....0....0....1.... 1x 24-bit plus 1x 8-bit
25 exp-a : ....0....1....0.... 2x 16-bit
26 ...
27 ...
28 exp-a : ....1....1....0.... 2x 8-bit, 1x 16-bit
29 exp-a : ....1....1....1.... 4x 8-bit
30
31
32 Links:
33
34 * <https://bugs.libre-soc.org/show_bug.cgi?id=458> m.If/Switch
35 * <https://bugs.libre-soc.org/show_bug.cgi?id=115> top level SIMD
36 * <https://bugs.libre-soc.org/show_bug.cgi?id=594> RFC for nmigen integration
37 * <https://bugs.libre-soc.org/show_bug.cgi?id=565> Formal proof of PartitionedSignal
38 * <https://bugs.libre-soc.org/show_bug.cgi?id=596> Formal proof of PartitionedSignal nmigen interaction
39
40
41 To save hugely on gate count the normal practice of having separate scalar ALUs and separate SIMD ALUs is not followed.
42
43 Instead a suite of "partition points" identical in fashion to the Aspex Microelectronics ASP (Array-String-Architecture) architecture is deployed.
44
45 Basic principle: when all partition gates are open the ALU is subdivided into isolated and independent 8 bit SIMD ALUs. Whenever any one gate is opened, the relevant 8 bit "part-results" are chained together in a downstream cascade to create 16 bit, 32 bit, 64 bit and 128 bit compound results.
46
47 Pages below describe the basic features of each and track the relevant bugreports.
48
49 * [[dynamic_simd/eq]]
50 * [[dynamic_simd/gt]]
51 * [[dynamic_simd/add]]
52 * [[dynamic_simd/mul]]
53 * [[dynamic_simd/shift]]
54 * [[dynamic_simd/logicops]] some all xor
55
56 # Integration with nmigen
57
58 Dynamic partitioning of signals is not enough on its own. Normal nmigen programs involve conditional decisions, that means if statements and switch statements.
59
60 With the PartitionedSignal class, basic operations such as `x + y` are functional, producing results 1x64 bit, or 2x32 or 4x16 or 8x8 or anywhere in between, but what about control and decisions? Here is the "normal" way in which SIMD decisions are performed:
61
62 if partitions == 1x64
63 with m.If(x > y):
64 do something
65 elif partitions == 2x32:
66 with m.If(x[0:31] > y[0:31]):
67 do something on 1st half
68 elif ...
69 elif ...
70 # many more lines of repeated laborious hand written
71 # SIMD nonsense all exactly the same except for the
72 # for loop and sizes.
73
74 Clearly this is a total unmaintainable nightmare of worthless crud which, if continued throughout a large project with 40,000 lines of code when written without SIMD, would completely destroy all chances of that project being successful by turning 40,000 lines into 400,000 lines of unreadable spaghetti.
75
76 A much more intelligent approach is needed. What we actually want is:
77
78 with m.If(x > y): # do a partitioned compare here
79 do something dynamic here
80
81 where behind the scenes the above laborious for-loops (conceptually) are created, hidden, looking to all intents and purposes that this is exactly like any other nmigen Signal.
82
83 This means that nmigen needs to "understand" the partitioning, in m.If, m.Else and m.Switch, at the bare minimum.
84
85 Analysis of the internals of nmigen shows that m.If, m.Else and m.Switch are all redirected to `Value.cases`. Within that function Mux and other "global" functions (similar to python operator functions). The hypothesis is therefore proposed that if `Value.mux` is added in an identical way to how `operator.add` calls `__add__` this may turn out to be all that (or most of what) is needed.