From: Luke Kenneth Casson Leighton Date: Mon, 27 May 2019 09:45:38 +0000 (+0100) Subject: add new update for nlnet grant X-Git-Url: https://git.libre-soc.org/?p=crowdsupply.git;a=commitdiff_plain;h=7d6313a5fe6e4ef0448d04dc3498ddf7d51b40b7 add new update for nlnet grant --- diff --git a/updates/013_2019jan21_virtual_memory.mdwn b/updates/013_2019jan21_virtual_memory.mdwn new file mode 100644 index 0000000..6d891c8 --- /dev/null +++ b/updates/013_2019jan21_virtual_memory.mdwn @@ -0,0 +1,3 @@ +# TLB Design + +https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-gras.pdf diff --git a/updates/018_2019may27_nlnet_grant_approved.mdwn b/updates/018_2019may27_nlnet_grant_approved.mdwn new file mode 100644 index 0000000..a877257 --- /dev/null +++ b/updates/018_2019may27_nlnet_grant_approved.mdwn @@ -0,0 +1,125 @@ +# First NLNet Grant approved to fund development of the Libre RISC-V SoC + +The application for funding from NLnet, from back in November of last year, +has been approved. It means that we have EUR $50,000 to pay for full-time +engineering work to be carried out over the next year, and to pay for +"bounty" style tasks. + +However this is not all: by splitting the tasks up into separate groups, +and using a second European-based individual as the applicant, we can +apply for a second grant (also of up to EUR $50,000). In the next couple +of days we will put in an application for "Formal Mathematical Proofs" +of the processor design. + +There are several reasons for doing so. The primary one is down to the +fact that we anticipate this (commercial, libre) product to be closely +and independently examined by third parties, to verify for themselves +that it does not contain spying backdoor co-processors, as well as the +usual security and correctness guarantees. If there exists *formal +mathematical proofs* that the processor and its sub-components operate +correctly, that independent 3rd party verification task is a lot easier. + +In addition, it turns out that when writing unit tests, using formal +mathematical proofs makes for *complete* code coverage - far better +than any other "comprehensive" multiple unit test technique could ever +hope to achieve - with less code and not just better accuracy but *100% +provable* accuracy. Additional, much simpler unit tests can then be written +which are more along the lines of "HOWTOs" - examples on how to use the +unit. + +This is one of those "epiphany" moments that, as a Software Engineer of +25 years experience, has me stunned and wondering why on earth this is +not more generally and widely deployed in software. The answer I believe +is down to the nature of what a processor actually is. + +A processor is developed much more along the lines of how Functional Programming +works. Functional Programming can have formal mathematical proofs applied +to it because for any given inputs, the output is *guaranteed* to be the same. +This of course breaks down when the function has "side-effects" (such as +reading from a file or accesses other external state outside of the "control" +of the function). And in the design of a processor, by the very nature +of hardware, you simply cannot create a verilog module that has access to +"files" or to "global variables". + +In addition, hardware is based purely on boolean logic, and on if/else +constructs. In essence, then the *entire hardware design* has to be made +according to far simpler rules than "normal" software is expected to conform +to. Even memory accesses in hardware have to be implemented according to +these strict rules (we're *implementing* LOAD and STORE instructions, here, +not *using* those LOAD and STORE instructions). + +Consequently, adding in formal proofs is a little bit easier, brings +huge benefits as well in terms of code readability, reliability, and +time-cost savings, and has the crucial advantage of being aligned with +the overall privacy goal *and* with NLnet's funding remit. + +# Summary from the past couple of months + +The past few months have seen a *lot* of activity. The IEEE754 ADD unit +has been completed, both as a Finite State Machine (FSM) and as a fully +pipelined design, both of which have parameters that allow them to do +FP16, FP32 or FP64. The DIV unit has been implemented as an FSM, and +will stay that way for now. MUL has been completed, however needs to +be turned into an FMAC (3 operands: Multiply and Accumulate). + +A provisional pipeline API has been developed, which the IEEE754 FPU +is using. It includes data "funneling" (multiplexing) blocks that allow +for the creation of what Mitch Alsup calls a "Concurrent Computation Unit". +It's basically an array of matched operand latches and result latches +(as input and output, respectively), in front of a single pipeline. +This arrangement allows a batch of operations to be presented to the +CDC6600 style "Dependency Matrices". + +Jacob has been working on a fascinating design: a dynamically partitionable +Adder and Multiplier Unit. Given that we are doing a Vector Processing +front-end onto SIMD back-end operations, it makes sense to save gates by +allowing the ADD and MUL units to be able to optionally handle a batch +of 8-bit operations, or half the number of 16-bit operations, or a quarter +of the number of 32-bit operations or just one 64-bit operation. +The unit tests demonstrate that the code that Jacob has written provide +RISC-V mul, mulh, mulhu and mulhsu functionality. The pipelined version +should be particularly interesting, for doing 64-bit multiply, although +64 performance is not a high priority in this design, so could be done +as an FSM. + +The augmented 6600 Scoreboard took literally six weeks to correctly implement +Read-after-Write and Write-after-Read hazards. It required extraordinary +and excruciating patience to get right. Adding in Write-after-Write +however only took two days, as the infrastructure to do so had already been +developed. + +Currently being implemented is "branch shadowing" - this is not the +same as branch *prediction*: that is a different algorithm which, when +*combined* with "branch shadowing", provides the feature known as branch +*speculation*. This is the source of a lot of confusion about Out-of-Order +designs in general. It seems to be assumed that an OoO design *has* to +have branch speculation: it doesn't. It's just that, given all the +pieces, adding in branch speculation is actually quite straightforward, +and provides such a high performance increase that it is hard to justify +leaving it out. + +One huge surprise came out of a recent discussion with Mitch Alsup. It +has been assumed all along that turning this design from a single-issue +to multi-issue would be difficult, or require significant gates and latency +to do so. The "simple" approach to do multi-issue, using the Dependency +Matrices, would be to analyse a batch of instructions, and if there are +no overlaps (no registers in common), allow that batch to proceed in +parallel. This is a naive approach. + +Mitch pointed out that in his work on the AMD Opteron (the processor +family that AMD had to publish "Intel equivalent" speed numbers for, +because it was so much more efficient and effective than Intel's designs) +each instruction "accumulated" the dependencies of all prior instructions +being issued in the same batch. This works because read and write +dependencies are *transitive* (google it...) + +What that means, in practical terms, is that we have a way to create a +design that could, if ramped up, "take on the big boys". To make that +clear: there's no technical barrier that would prevent us from creating +a quad issue (or higher) design. + +There is still a heck of a lot to get done, here. However, it has to +be said that actually adding an instruction decoder onto the 6600 style +Dependency Matrices is relatively straightforward, this being RISC, after +all. It is possible, then, that we may have a subset of functionality +operational far sooner than anticipated.