-add one extra line to the Dependency Matrix per "branch" that is to be
-speculatively executed. The "Branch Speculation" Unit is just like any
-other Functional Unit, in effect. In this way, we gain *exactly* the
-same capability as a Reorder Buffer, including all of the benefits.
-The same trick will work just as well for Exceptions.
-
-Mitch also has a high-level diagram of an additional LOAD/STORE Matrix that
-has, again, extremely simple rules: LOADs block STOREs, and
-STOREs block LOADs, and the signals "Read / Write" are then passed
-down to the Function Unit Dependency Matrix as well. The rules for
-the blocking need only be based on "there is no possibility of a conflict"
-rather than "on which exact and precise address does a conflict occur".
-This in turn means that the number of address bits needed to detect a
-conflict may be significantly reduced, i.e. only the top bits are
-needed.
-
-Interestingly, RISC-V "Fence" instruction rules are based on the same idea,
-and it may turn out to be possible to leverage the L1 Cache Line numbers
-instead of the (full) address.
-
-Also, thanks to Mitch's help, his unpublished book chapters help
-to identify and make clear that the CDC 6600's register file is designed with
-"write-through" capability, i.e. that a register that's written will
-go through *on the same clock cycle* to a "read" request. This makes
-the 6600's register file pretty much synonymous with the Tomasulo
-Algorithm "Common Data Bus".
-
-So this is just amazing. Let's recap. It's 2018, there's absolutely zero
-Libre SoCs in existence anywhere on our planet of 8 billion people, and
-we're looking for inspiration at literally a 55-year-old computer design
-that occupied an entire room and was hand-built with transistors,
-on how to make a modern, power-efficient 3D-capable processor.
-
-Not only that: the project has accidentally unearthed incredibly valuable
-historic processor design information that has eluded the Intels and
-ARMs - billion-dollar companies - as well as the Academic community -
-for several decades.
-
-I'd like to take a minute to especially thank Mitch Alsup for his
-time in ongoing discussions, without which there would be absolutely
-no chance that I could possibly have learned about, let alone understood,
-any of the above. As I mentioned in the very first update: new processor
-designs get one shot at success. Basing the core of the design on
-a 55-year-old well-documented and extremely compact and efficient design
-is a reasonable strategy: it's just that, without Mitch's help, there
-would have been no way to understand the 6600's true value.
-
-Bottom line is, we have a way forward that will result in significantly
-less hardware, a simpler design, using a lot less power than modern
-designs today, yet providing all of the features normally the exclusive
-domain of top-end processors. Thanks to a refresh of a 55-year-old
-processor and the willingness of Mitch Alsup and James Thornton to share
-their expertise with the world.
-
+add to the dependency matrix one extra line per "branch" that is to be
+speculatively executed. The "branch speculation" unit is just like
+any other functional unit, in effect. In this way, we gain *exactly*
+the same capability as a reorder buffer, including all of the
+benefits. The same trick will work just as well for exceptions.
+
+Mitch also has a high-level diagram of an additional LOAD/STORE matrix
+that has, again, extremely simple rules: LOADs block STOREs, and
+STOREs block LOADs, and the signals "read / write" are then passed
+down to the function unit dependency matrix as well. The rules for the
+blocking need only be based on "there is no possibility of a conflict"
+rather than "on which exact and precise address does a conflict
+occur". This in turn means that the number of address bits needed to
+detect a conflict may be significantly reduced, i.e., only the top
+bits are needed.
+
+Interestingly, RISC-V "fence" instruction rules are based on the same
+idea, and it may turn out to be possible to leverage the L1 cache line
+numbers instead of the (full) address.
+
+Also, Mitch's unpublished book chapters help to
+identify and make clear that the CDC 6600's register file is designed
+with "write-through" capability, i.e., that a register that's written
+will go through *on the same clock cycle* to a "read" request. This
+makes the 6600's register file pretty much synonymous with the
+Tomasulo algorithm's "common data bus." This same-cycle feature *also
+provides operand forwarding for free*!
+
+This is just amazing. Let's recap. It's 2018, there's absolutely no
+Libre SoCs in existence anywhere on our planet of 8 billion people,
+and we're looking for inspiration on how to make a modern,
+power-efficient 3D-capable processor, only to find it in a literally
+55-year-old design for a computer that occupied an entire room and was
+hand-built with transistors!
+
+Not only that, but the project has accidentally unearthed incredibly
+valuable historic processor design information that has eluded the
+Intels and ARMs (billion-dollar companies), as well as the academic
+community, for several decades.
+
+I'd like to take a minute to especially thank Mitch Alsup for his time
+in ongoing discussions, without which there would be absolutely no
+chance I could possibly have learned about, let alone understood, any
+of the above. As I mentioned in the [very first project
+update](https://www.crowdsupply.com/libre-risc-v/m-class/updates/why-make-a-quad-core-64-bit-soc-surely-there-are-enough-already):
+new processor designs get one shot at success. Basing the core of the
+design on a 55-year-old, well-documented, and extremely compact and
+efficient design is a reasonable strategy: it's just that, without
+Mitch's help, there would have been no way to understand the 6600's
+true value.
+
+The bottom line is, we have a way forward that will result in
+significantly less hardware and a simpler design, using a lot less
+power than modern designs, yet providing all of the features normally
+the exclusive domain of top-end processors, all thanks to a refresh of
+a 55-year-old processor and the willingness of Mitch Alsup and James
+Thornton to share their expertise with the world.