3d_gpu/architecture/dynamic_simd.mdwn

   1 # Dynamic Partitioned SIMD
   2
   3 The Dynamic Partitioned SIMD Signal is effectively a parallelisation
   4 of nmigen's Signal.  It is expected to work transparently as if it was
   5 a nmigen Signal, in every way, as a full peer of a nmigen Signal, with
   6 no requirement on the part of the developer to even know that it is
   7 performing parallel dynamically-partitioned operations.
   8
   9 nmigen 32-bit Signal:
  10
  11     a        : .... .... .... .... (32 bits, illustrated as 4x8)
  12
  13 Dynamically-partitioned 32-bit Signal subdivided into four 8-bit
  14 sections, by 3 partition bits:
  15
  16     partition:     P    P    P     (3 bits)
  17     a        : .... .... .... .... (32 bits, illustrated as 4x8)
  18     exp-a    : ....P....P....P.... (32+3 bits, P=0 if no partition)
  19
  20 Each partitioned section shall act as an independent Signal where the **partitioning is dynamic at runtime** and may subdivide the above example
  21 into all 8 possible combinations of the 3 Partition bits:
  22
  23     exp-a    : ....0....0....0.... 1x 32-bit
  24     exp-a    : ....0....0....1.... 1x 24-bit plus 1x 8-bit
  25     exp-a    : ....0....1....0.... 2x 16-bit
  26     ...
  27     ...
  28     exp-a    : ....1....1....0.... 2x 8-bit, 1x 16-bit
  29     exp-a    : ....1....1....1.... 4x 8-bit
  30
  31
  32 Links:
  33
  34 * <https://bugs.libre-soc.org/show_bug.cgi?id=458> m.If/Switch
  35 * <https://bugs.libre-soc.org/show_bug.cgi?id=115> top level SIMD
  36 * <https://bugs.libre-soc.org/show_bug.cgi?id=707> Limited Cat
  37 * <https://bugs.libre-soc.org/show_bug.cgi?id=594> RFC for nmigen integration
  38 * <https://bugs.libre-soc.org/show_bug.cgi?id=565> Formal proof of PartitionedSignal
  39 * <https://bugs.libre-soc.org/show_bug.cgi?id=596> Formal proof of PartitionedSignal nmigen interaction
  40
  41
  42 To save hugely on gate count the normal practice of having separate scalar ALUs and separate SIMD ALUs is not followed.
  43
  44 Instead a suite of "partition points" identical in fashion to the Aspex Microelectronics ASP (Array-String-Architecture) architecture is deployed.
  45
  46 Basic principle: when all partition gates are open the ALU is subdivided into isolated and independent 8 bit SIMD ALUs.  Whenever any one gate is opened, the relevant 8 bit "part-results" are chained together in a downstream cascade to create 16 bit, 32 bit, 64 bit and 128 bit compound results.
  47
  48 Pages below describe the basic features of each and track the relevant bugreports.
  49
  50 * [[dynamic_simd/eq]] aka `__eq__` not to be confused with nmigen eq
  51 * [[dynamic_simd/assign]] nmigen eq (assignment)
  52 * [[dynamic_simd/gt]]
  53 * [[dynamic_simd/add]]
  54 * [[dynamic_simd/cat]] - limited capability
  55 * [[dynamic_simd/mul]]
  56 * [[dynamic_simd/shift]]
  57 * [[dynamic_simd/logicops]] some all xor
  58
  59 # Integration with nmigen
  60
  61 Dynamic partitioning of signals is not enough on its own. Normal nmigen programs involve conditional decisions, that means if statements and switch statements.
  62
  63 With the PartitionedSignal class, basic operations such as `x + y` are functional, producing results 1x64 bit, or 2x32 or 4x16 or 8x8 or anywhere in between, but what about control and decisions? Here is the "normal" way in which SIMD decisions are performed:
  64
  65     if partitions == 1x64
  66          with m.If(x > y):
  67               do something
  68     elif partitions == 2x32:
  69          with m.If(x[0:31] > y[0:31]):
  70               do something on 1st half
  71          elif ...
  72     elif ...
  73     # many more lines of repeated laborious hand written
  74     # SIMD nonsense all exactly the same except for the
  75     # for loop and sizes.
  76
  77 Clearly this is a total unmaintainable nightmare of worthless crud which, if continued throughout a large project with 40,000 lines of code when written without SIMD, would completely destroy all chances of that project being successful by turning 40,000 lines into 400,000 lines of unreadable spaghetti.
  78
  79 A much more intelligent approach is needed. What we actually want is:
  80
  81     with m.If(x > y): # do a partitioned compare here
  82          do something dynamic here
  83
  84 where behind the scenes the above laborious for-loops (conceptually) are created, hidden, looking to all intents and purposes that this is exactly like any other nmigen Signal.
  85
  86 This means that nmigen needs to "understand" the partitioning, in m.If, m.Else and m.Switch, at the bare minimum.
  87
  88 Analysis of the internals of nmigen shows that m.If, m.Else and m.Switch are all redirected to `Value.cases`.  Within that function Mux and other "global" functions (similar to python operator functions).  The hypothesis is therefore proposed that if `Value.mux` is added in an identical way to how `operator.add` calls `__add__` this may turn out to be all that (or most of what) is needed.