# Modernising 1960s Computer Technology: what can be learned from the CDC 6600

Firstly, many thanks to 
[Heise.de](https://www.heise.de/newsticker/meldung/Mobilprozessor-mit-freier-GPU-Libre-RISC-V-M-Class-geplant-4242802.html)
for publishing a story on this project.  I replied to some of the
[Heise Forum](https://www.heise.de/forum/heise-online/News-Kommentare/Mobilprozessor-mit-freier-GPU-Libre-RISC-V-M-Class-geplant/forum-414986/comment/)
comments, here, endeavouring to use translation software to respect that
the forum is in German.

In this update, following on from the analysis of the Tomasulo Algorithm,
by a process of osmosis I finally was able to make out a light at the
end of the "Scoreboard" tunnel, and it is not an oncoming train.
Conversations with
[Mitch Alsup](https://groups.google.com/d/msg/comp.arch/w5fUBkrcw-s/-9JNF0cUCAAJ)
are becoming clear.

In the previous update, I really did not like the
[Scoreboard](https://en.wikipedia.org/wiki/Scoreboarding) technique
for doing out-of-order superscalar execution, because, *as described*,
it is hopelessly inadequate.  There's no roll-back method for
exceptions, no method for coping with register "hazards" (Read after Write
and so on), so register "renaming" has to be done as a precursor step,
no way to do branch prediction, and only a single LOAD/STORE can be
done at any one time.

The only *well-known* documentation on the CDC 6600 Scoreboarding technique
is the 1967 patent.  Here's the kicker: the patent *does not* describe
the key strategic part of Scoreboarding that makes it so powerful and
much more power-efficient than the Tomasulo Algorithm when combined
with Reorder Buffers: the Dependency Matrices.

Before getting to that stage, I thought it would be a good idea to
make people aware of a book that Mitch told me about, called
"Design of a Computer: the Control Data 6600" by James Thornton.
James worked with Seymour Cray on the 6600.  It was literally
constructed from PCB modules using hand-soldered transistors.
Memory was magnetic rings (which is where we get the term "core memory"
from), and the bootloader was a bank of toggle-switches.

In 2002, someone named Tom Uban sought permission from James and his
wife, to make the book available online, as, historically, the
CDC 6600 is quite literally the precursor to modern supercomputing:

[[design_of_a_computer_6600_permission.jpg]]

So I particularly wanted to show the Dependency Matrix, which is the
key strategic part of the Scoreboard:

[[design_of_a_computer_6600.jpg]]

Basically, the patent shows a table with src1 and src2, and "ready"
signals: what it does *not* show is the "Go Read" and "Go Write"
signals, and it does not show the way in which one Function Unit
blocks others, via the Dependency Matrix.

It is well-known that the Tomasulo Reorder Buffer requires a CAM
on the Destination Register, (which is power-hungry and expensive).
This is described in academic literature as data coming "to".  The
Scoreboard technique is described as data coming "from" source
registers, however because the Dependency Matrix is left out of
these discussions, what they fail to mention is that there are
*multiple single-line* source wires, thus achieving the exact
same purpose as the Reorder Buffer's CAM, with *far less power
and die area*.

Not only that: it is quite easy to add incremental register-renaming
tags on top of the Scoreboard + Dependency Matrix, again, no need
for a CAM.  Not only that: Mitch describes in an unpublished book
chapter several techniques that each bring in all of the techniques
that are usually exclusively associated with Reorder Buffers,
such as Branch Prediction, speculative execution, precise exceptions
and multi-issue LOAD / STORE hazard avoidance.  This diagram below
is reproduced with Mitch's permission:

[[mitch_ld_st_augmentation.jpg]]

This high-level diagram includes some subtle modifications that
augment a standard CDC 6600 design to allow speculative execution.
A "Schroedinger" wire is added ("neither alive nor dead"), which,
very simply put, prohibits Function Unit "Write" of results.  In
this way, because the "Read" signals were independent of "Write"
(something that is again completely missing from the academic
literature in discussions of 6600 Scoreboards), the instruction
may *begin* execution, but is prevented from *completing*
execution.

All that is required is to add one extra line to the Dependency
Matrix per "branch" that is to be speculatively executed, just like
any other Functional Unit, in effect.

Mitch also has a high-level diagram of an additional LOAD/STORE Matrix that
has, again, extremely simple rules: LOADs block STOREs, and
STOREs block LOADs, and the signals "Read / Write" are then passed
down to the Function Unit Dependency Matrix as well.  The rules for
the blocking need only be based on "there is no possibility of a conflict"
rather than "on which exact and precise address does a conflict occur".
This in turn means that the number of address bits needed to detect a
conflict may be significantly reduced.  Interestingly, RISC-V "Fence"
instruction rules are based on the same idea.

So this is just amazing.  Let's recap.  It's 2018, there's absolutely zero
Libre SoCs in existence anywhere on our planet of 8 billion people, and
we're looking for inspiration at literally a 55-year-old computer design
that occupied an entire room and was hand-built with transistors, 
on how to make a modern, power-efficient 3D-capable processor.

Not only that: the project has accidentally unearthed incredibly valuable
historic processor design information that has eluded the Intels and
ARMs - billion-dollar companies - as well as the Academic community -
for several decades.

I'd like to take a minute to especially thank Mitch Alsup for his
time in ongoing discussions, without which there would be absolutely
no chance that I could possibly have learned about, let alone understood,
any of the above.  As I mentioned in the very first update: new processor
designs get one shot at success.  Basing the core of the design on
a 55-year-old well-documented and extremely compact and efficient design
is a reasonable strategy: it's just that, without Mitch's help, there
would have been no way to understand the 6600's true value.

Bottom line: we do not need to follow Intel's power-inefficient lead, here.