From 0170fa3e4a03e2e92d3651e563a3ce4b9ba5671b Mon Sep 17 00:00:00 2001
From: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date: Mon, 17 Dec 2018 15:56:41 +0000
Subject: [PATCH] add update 005

---
 updates/005_2018dec14_simd_without_simd.mdwn | 62 ++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/updates/005_2018dec14_simd_without_simd.mdwn b/updates/005_2018dec14_simd_without_simd.mdwn
index ce952f7..69f19af 100644
--- a/updates/005_2018dec14_simd_without_simd.mdwn
+++ b/updates/005_2018dec14_simd_without_simd.mdwn
@@ -61,4 +61,66 @@ the Register and Function Unit Matrices?  It looks like this:
 
 {{reorder_alias_bytemask_scheme.png}}
 
+So if you recall from the previous updates about Scoreboards, it's not
+the "scoreboard" that's the key, it's these Register to Function Unit
+and FU to FU Dependency Matrices that are the misunderstood key.
+So let's explain this diagram.  Firstly, in purple in the bottom left
+is a massive matrix of FU to FU, just as with the standard CDC 6600,
+except now there are separate 32-bit FUs, 16-bit FUs, and 8-bit FUs.
+In this way, we can have 32-bit ADD depending on and waiting for
+an 8-bit computation, or 16-bit MUL on a 32-bit SQRT and so on.  Nothing
+immediately obviously different there.
 
+Likewise, in the bottom right, in red, we see matrices that have
+FU along rows, and Registers along the columns, exactly again as with
+the CDC 6600 standard scoreboard: however, again, we note that
+because there are separate 32-bit FUs and separate 16-bit and 8-bit
+FUs, there are *three* separate sets of FU-to-Register Matrices.
+Also, note that these are separate, where they would be expected
+to be grouped together.  Except, they're *not* independent, and that's
+where the diagram at the top (middle) comes in.
+
+The diagram at the top says, in words, "if you need a 32-bit register
+for an operation (using a 32-bit Function Unit), the 16-bit and 8-bit
+Function Units *also* connected to that exact same register **must**
+be prevented from occuring.  Also, if you need 8 bits of a register,
+whilst it does not prevent the other bytes of the register from being
+used, it *does* prevent the overlapping 16-bit portion **and the 32-bit
+and the 64-bit** portions of that same named register from being used".
+
+This is absolutely essential to understand, this "cascading" relationship.
+Need Register R1 (all of it), you **cannot** go and allocate any of that
+register for use in any 32-bit, 16-bit or 8-bit operations.  This is
+common sense!  However, if you use the lowest byte (byte 1), you can still
+use the top three 16-bit portions of R1, and you can also still use byte 2.
+This is also common sense!
+
+So in fact, it's actually quite simple, and this "cascade" is simply and
+easily propagated down to the Function Unit Dependency Matrices, stopping
+32-bit operations from overwriting 8-bit and vice-versa.
+
+The fourth part is the grid in green, in the top left corner.  This is
+a "virtual" to "real" one-bit table.  It's here because the size of
+these matrices is so enormous that there is deep concern about the line
+driver strength, as well as the actual size.  128 registers means
+that one single gate, when it goes high or low, has to "drive" the
+input of 128 other gates.  That takes longer and longer to do, the higher
+the number of gates, so it becomes a critical factor in determining the
+maximum speed of the entire processor.  We will have to keep an eye
+on this.
+
+So, to keep the FU to Register matrix size down, this "virtual" register
+concept was introduced.  Only one bit in each row of the green table
+may be active: it says, for example, "IR1 actually represents that there
+is an instruction being executed using R3".  This does mean however that
+if this table is not high enough (not enough IRs), the processor has to
+stall until an instruction is completed, so that one register becomes
+free.  Again, another thing to keep an eye on, in simulations.
+
+The second major concern is the purple matrix: the FU-to-FU one.  Basically
+where previously we would have FU1 cover all ADDs, FU2 would cover all MUL
+operations, FU3 covers BRANCH and so on, now we have to multiply those
+numbers by **four** (64-bit ops, 32-bit ops, 16-bit and 8), which in turn
+means that the size of the FU-to-FU Matrix has gone up by a staggering
+**sixteen** times.  This is not really acceptable, so we have to do something
+different.
-- 
2.30.2