From b1f66bc27d1401e6ead52dfd4643f061d8c66a2e Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Tue, 16 Jul 2019 11:22:13 +0100 Subject: [PATCH] add purism draft update --- updates/019_2019jul16_purism_donation.mdwn | 196 +++++++++++++++++++++ 1 file changed, 196 insertions(+) create mode 100644 updates/019_2019jul16_purism_donation.mdwn diff --git a/updates/019_2019jul16_purism_donation.mdwn b/updates/019_2019jul16_purism_donation.mdwn new file mode 100644 index 0000000..45bb7d9 --- /dev/null +++ b/updates/019_2019jul16_purism_donation.mdwn @@ -0,0 +1,196 @@ +**DRAFT STATUS. last edit 16jul2019** + +We are delighted to be able to announce additional sponsorship by +[Purism](http://puri.sm), through [NLNet](http://nlnet.nl). + +# Purism Sponsorship + +As a Benefit Corporation, Purism is empowered to balance ethics, social +enterprise and profitable business. I am delighted that they chose to +fund the Libre RISC-V hybrid CPU/GPU through the NLNet Foundation. Their +donation provides us some extra flexibility in how we reach the goal of +bringing to market a hybrid CPU, VPU and GPU that is libre to the bedrock. + +Purism started with a crowdsupply campaign to deliver a modern laptop +with full software support and a coreboot BIOS. I know that, after +this initial success, they worked hard to try to solve the "NSA backdoor +coprocessor" issue, known as the "Management Engine". Ironically, inspired +by Purism, Intel's internal efforts became moot, as a 3rd party reverse +engineered an Intel BIOS and discovered the "nsa\_me\_off\_switch" parameter, +designed to be used by the NSA when Intel equipment is deployed within +NSA premises. + +Purism then moved quickly to provide a BIOS update to disable this +"feature", eliminating the last and most important barrier to being able +to declare a full privacy software stack. + +It is these kinds of brave strategic decisions to kick the trend towards +privacy invading hardware "by default" for which Purism deserves our +respect and gratitude. + +However, just as NLNet recognise, Purism also appreciate that we cannot +stop at just the software. Profit maximising Corporations just do not +take the brave decisions that can compromise profits, particularly when +faced with competition: it's too much. + +So we are extremely grateful for their donation, managed through NLnet, +the Charitable Foundation. + +# Progress + +So much has happened already, since the last update, it is hard to know +where to begin. + +* The IEEE754 FPU has a simulation-proven FADD pipeline, and FMUL, + FDIV, FSQRT and FCVT are on the way. +* A RISC-V Reciprocal Square Root FP Opcode has been proposed, which is + needed for 3D operations, particularly normalisation of vectors. With + other RISC-V implementors needing this opcode it makes sense for it + to be a Standard Extension. +* The SimpleV extension has had a major overhaul, with the addition of a + single-instruction prefix (P32C, P48 and P64), and a "VBLOCK" format that + adds Vectorisation Context to a batch of instructions. +* Implementation of the precise-augmented 6600 style scoreboard system has + begun, with ALU register hazards and shadowing already completed, and + memory hazards underway. + +# Multi Issue + +Multi Issue is absolutely critical for this CPU/VPU/GPU because the +[SimpleV](https://libre-riscv.org/simple_v_extension/specification) +engine critically relies on being able to turn one "vector" +operation into multiple "scalar element" instructions, in every cycle. The +simplest way to do this is to throw equivalent scalar opcodes into a +multi issue execution engine, and let the engine sort it out. + +So, regarding the Dependency Matrices: thanks to Mitch Alsup's absolutely +invaluable input we now know how to do multi-issue. On top of a precise +6600 style Dependency Matrix it is almost comically trivial. + +The key insight that Mitch gave us was that instruction dependencies are +transitive. In other words: if there are 4 instructions to be issued, +the second instruction may have the dependencies of the first added to it; +the 3rd may accumulate the dependencies of the first and second and so on. + +Where this trick does not work well (or takes significant hardware to +implement) is when, for example with the Tomasulo Algorithm (or the +original 6600 Q-Table), the Register Dependency Hazards are expressed +in *binary* (r5 = 0b00101, r3=0b00011). If instead the registers are +expressed in *unary* (r5 = 0b00010000, r3= 0b00000100) then it should +be pretty obvious that in a multi issue design, all that is needed in +each clock cycle is to OR the cumulative register dependencies in a +cascading fashion. Aside from now also needing to increase the number of +register ports and other resources to cope with the increased workload, +amazingly that's all it takes! + +To achieve the same trick with a Tomasulo Reorder Buffer (ROB) requires +the addition of an entire extra CAM per every extra issue to be added to +the architecture: four way multi issue would require four ROB CAMs! The +power consumption and gate count would be prohibitively expensive, +and resolving the commits of multiple parallel operations is also fraught. + +# SimpleV + +What began ironically as "simple" still bears some vestige of its +original name, in that the ISA needs no new opcodes: any scalar RISC-V +implementation may be turned parallel through the addition of SV at the +instruction issue phase. + +However, one of the major drawbacks of the initial draft spec was that +the use of CSRs took a huge number of instructions just to set up and +then tear down the vectorisation context. + +This had to be solved. + +The idea which came to mind was to embed RISC-V opcodes within +a longer, variable-length encoding, which we've called the +[VBLOCK Format](https://libre-riscv.org/simple_v_extension/vblock_format/). +At the beginning of this new format, the vectorisation and predication +context could be embedded, which "changes" the standard *scalar* opcodes +to become "parallel" (multi-issue) operations. + +The advantage of this approach is that, firstly, the context is much +smaller: the actual CSR opcodes are gone, leaving only the "data", +which is now batched together. Secondly, there is no need to "reset" +(tear down) the vectorisation context, because that automatically goes +when the long-format ends. + +The other issue that needed to be fixed is that we really need a +[SETVL](https://libre-riscv.org/simple_v_extension/specification/sv.setvl/) +instruction. This is really unfortunate as it breaks the "no new opcodes" +paradigm. However, what we are going to do is simply to reuse the RVV +SETVL opcode, now that RVV has reached its last anticipated draft before +ratification. Secondly: it's not an *actual* instruction related to +elements (it doesn't perform a parallel add, for example). It's more an +"infrastructure support" instruction. + +The reason for needing SETVL is complex. It is down to the fact that, +unlike in RVV, the Maximum Vector Length is **not** an architectural hard +design parameter, it is a runtime dynamic one. Thus, it is absolutely +crucial that not only VL is set on every loop (or SV Prefix instruction), +but that MVL is also set. + +This means that SV has two additional instructions for any algorithm, +when compared to RVV, and this kind of penalty is just not acceptable. The +solution therefore was to create a special SV.SETVL opcode that always +takes the MVL as an *additional* extra parameter over and above those +provided to the RV equivalent opcode. That basically puts SV on par with +RV as far as instruction count is concerned. + +# Fail on First + +The other really nice addition, which came with a small reorganisation +of the Vector and Predicate Contexts, is data dependent +["fail on first"](https://libre-riscv.org/simple_v_extension/appendix/#ffirst). + +ARM's SVE, RVV, and the Mill Architecture all have an incredibly neat +feature where if data is being loaded from memory in parallel, and the +LD operations run off the end of a page boundary, this may be detected +and the *legal* parallel operations may complete, all without needing +to drop into "scalar" mode. + +In the case of the Mill Architecture, this is achieved through the +extremely innovative feature of simply marking the result of the +operation as "invalid", and that "tag" cascades through all subsequent +operations. Thus, any attempts to ADD or STORE the data will result in +the invalid data being simply ignored. + +RV instead detects the point at which the LD became invalid, "fails" +at the "first" such illegal memory access, and truncates all subsequent +vector operations to within that limit, by *changing VL*. This is an +extremely effective and very simple idea, it was worth adding to SV. + +However, when doing so, the idea sprang to mind: why not extend the +"fail on first" concept to not just cover LD/ST operations, but to cover +actual ALU operations as well? Why not, if any of the the results from +a sequence of parallel operations is zero ("fail"), similarly truncate VL? + +This idea was tested out on strncpy (the typical canonical function +used to test out data-dependent ISA concepts), and it worked! So, that +is going into SV as well. It does mean that after every ALU operation, +a comparator against zero will be optionally activated: given that it +is optional and under the control of the ffirst bit, it is not a power +penalty on every single instruction. + +# Summary + +There is so much to do, and so much that has already been achieved, +it is almost overwhelming. We still cannot lose sight of the fact that +there is an enormous amount that we do not yet know, yet at the same +time, never let that stop us from moving forward. A journey starts with +a first step, and continues with each step. + +With help from NLNet and companies like Purism we can look forward +to actually paying people to contribute to solving what was formerly +considered an impossible task. + +It is worthwhile emphasising: any individual or Corporation wishing to +see this project succeed (so that you can use it as the basis for one +of your products, for example), donations through NLNet, as a Registered +Charitable Foundation, are tax deductible. + +Likewise, for anyone who would like to help with the project's Milestones, +payments from NLnet are *donations*, and, depending on jurisdiction, +may also be tax deductible. If you are interested to learn more, do +get in touch. + -- 2.30.2