# Modernising 1960s Computer Technology: what can be learned from the CDC 6600 Firstly, many thanks to [Heise.de](https://www.heise.de/newsticker/meldung/Mobilprozessor-mit-freier-GPU-Libre-RISC-V-M-Class-geplant-4242802.html) for publishing a story on this project. I replied to some of the [Heise Forum](https://www.heise.de/forum/heise-online/News-Kommentare/Mobilprozessor-mit-freier-GPU-Libre-RISC-V-M-Class-geplant/forum-414986/comment/) comments, here, endeavouring to use translation software to respect that the forum is in German. In this update, following on from the analysis of the Tomasulo Algorithm, by a process of osmosis I finally was able to make out a light at the end of the "Scoreboard" tunnel, and it is not an oncoming train. Conversations with [Mitch Alsup](https://groups.google.com/d/msg/comp.arch/w5fUBkrcw-s/-9JNF0cUCAAJ) are becoming clear. In the previous update, I really did not like the [Scoreboard](https://en.wikipedia.org/wiki/Scoreboarding) technique for doing out-of-order superscalar execution, because, *as described*, it is hopelessly inadequate. There's no roll-back method for exceptions, no method for coping with register "hazards" (Read after Write and so on), so register "renaming" has to be done as a precursor step, no way to do branch prediction, and only a single LOAD/STORE can be done at any one time. The only *well-known* documentation on the CDC 6600 Scoreboarding technique is the 1967 patent. Here's the kicker: the patent *does not* describe the key strategic part of Scoreboarding that makes it so powerful and much more power-efficient than the Tomasulo Algorithm when combined with Reorder Buffers: the Dependency Matrices. Before getting to that stage, I thought it would be a good idea to make people aware of a book that Mitch told me about, called "Design of a Computer: the Control Data 6600" by James Thornton. James worked with Seymour Cray on the 6600. It was literally constructed from PCB modules using hand-soldered transistors. Memory was magnetic rings (which is where we get the term "core memory" from), and the bootloader was a bank of toggle-switches. In 2002, someone named Tom Uban sought permission from James and his wife, to make the book available online, as, historically, the CDC 6600 is quite literally the precursor to modern supercomputing: [[design_of_a_computer_6600_permission.jpg]] So I particularly wanted to show the Dependency Matrix, which is the key strategic part of the Scoreboard: [[design_of_a_computer_6600.jpg]] Basically, the patent shows a table with src1 and src2, and "ready" signals: what it does *not* show is the "Go Read" and "Go Write" signals, and it does not show the way in which one Function Unit blocks others, via the Dependency Matrix. It is well-known that the Tomasulo Reorder Buffer requires a CAM on the Destination Register, (which is power-hungry and expensive). This is described in academic literature as data coming "to". The Scoreboard technique is described as data coming "from" source registers, however because the Dependency Matrix is left out of these discussions, what they fail to mention is that there are *multiple single-line* source wires, thus achieving the exact same purpose as the Reorder Buffer's CAM, with *far less power and die area*. Not only that: it is quite easy to add incremental register-renaming tags on top of the Scoreboard + Dependency Matrix, again, no need for a CAM. Not only that: Mitch describes in an unpublished book chapter several techniques that each bring in all of the techniques that are usually exclusively associated with Reorder Buffers, such as Branch Prediction, speculative execution, precise exceptions and multi-issue LOAD / STORE hazard avoidance. This diagram below is reproduced with Mitch's permission: [[mitch_ld_st_augmentation.jpg]] This high-level diagram includes some subtle modifications that augment a standard CDC 6600 design to allow speculative execution. A "Schroedinger" wire is added ("neither alive nor dead"), which, very simply put, prohibits Function Unit "Write" of results. In this way, because the "Read" signals were independent of "Write" (something that is again completely missing from the academic literature in discussions of 6600 Scoreboards), the instruction may *begin* execution, but is prevented from *completing* execution. All that is required is to add one extra line to the Dependency Matrix per "branch" that is to be speculatively executed, just like any other Functional Unit, in effect. Mitch also has a high-level diagram of an additional LOAD/STORE Matrix that has, again, extremely simple rules: LOADs block STOREs, and STOREs block LOADs, and the signals "Read / Write" are then passed down to the Function Unit Dependency Matrix as well. The rules for the blocking need only be based on "there is no possibility of a conflict" rather than "on which exact and precise address does a conflict occur". This in turn means that the number of address bits needed to detect a conflict may be significantly reduced. Interestingly, RISC-V "Fence" instruction rules are based on the same idea. So this is just amazing. Let's recap. It's 2018, there's absolutely zero Libre SoCs in existence anywhere on our planet of 8 billion people, and we're looking for inspiration at literally a 55-year-old computer design that occupied an entire room and was hand-built with transistors, on how to make a modern, power-efficient 3D-capable processor. Not only that: the project has accidentally unearthed incredibly valuable historic processor design information that has eluded the Intels and ARMs - billion-dollar companies - as well as the Academic community - for several decades. I'd like to take a minute to especially thank Mitch Alsup for his time in ongoing discussions, without which there would be absolutely no chance that I could possibly have learned about, let alone understood, any of the above. As I mentioned in the very first update: new processor designs get one shot at success. Basing the core of the design on a 55-year-old well-documented and extremely compact and efficient design is a reasonable strategy: it's just that, without Mitch's help, there would have been no way to understand the 6600's true value. Bottom line: we do not need to follow Intel's power-inefficient lead, here.