X-Git-Url: https://git.libre-soc.org/?a=blobdiff_plain;f=updates%2F023_2020mar26_decoder_emulator_started.mdwn;h=1fd0606f9a2be89dd2393cf569b87211092724fb;hb=3e5f9725d044001aceaf95482fcc101bdfa854ad;hp=14f800b64f404d54ecd9d482b60a100c6f8e0754;hpb=035d23c2261fdcf7a87f35763300e5261fa98bca;p=crowdsupply.git diff --git a/updates/023_2020mar26_decoder_emulator_started.mdwn b/updates/023_2020mar26_decoder_emulator_started.mdwn index 14f800b..1fd0606 100644 --- a/updates/023_2020mar26_decoder_emulator_started.mdwn +++ b/updates/023_2020mar26_decoder_emulator_started.mdwn @@ -1,6 +1,7 @@ So many things happened since the last update they actually need to go -in the main update, even in summary form. One big thing: Raptor Engineering -sponsored us with remote access to a TALOS II Workstation! +in the main update, even in summary form. One big thing: +[Raptor CS](https://www.raptorcs.com/) +sponsored us with remote access to a Monster spec'd TALOS II Workstation! # Introduction @@ -13,20 +14,26 @@ Here's the summary (if it can be called a summary): * New team member, Jock (hello Jock!) starts on the coriolis2 layout, after Jean-Paul from LIP6.fr helped to dramatically improve how coriolis2 can be used. This resulted in a - [tutoria](https://libre-riscv.org/HDL_workflow/coriolis2/) and a - [huge bugreport discussion](http://bugs.libre-riscv.org/show_bug.cgi?id=178) + [tutorial](https://libre-riscv.org/HDL_workflow/coriolis2/) and a + [huge bug report discussion](http://bugs.libre-riscv.org/show_bug.cgi?id=178) * Work has started on the [POWER ISA decoder](http://bugs.libre-riscv.org/show_bug.cgi?id=186), verified through [calling GNU AS](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/decoder/test/test_decoder_gas.py;h=9238d3878d964907c5569a3468d6895effb7dc02;hb=56d145e42ac75626423915af22d1493f1e7bb143) (yes, really!) and on a mini-simulator - [calling qemu](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/qemu.py;h=9eb103bae227e00a2a1d2ec4f43d7e39e4f44960;hb=56d145e42ac75626423915af22d1493f1e7bb143) + [calling QEMU](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/qemu.py;h=9eb103bae227e00a2a1d2ec4f43d7e39e4f44960;hb=56d145e42ac75626423915af22d1493f1e7bb143) for verification. -* Jacob's algorithmic library grows +* Jacob's simple-soft-float library growing [Power FP compatibility](http://bugs.libre-riscv.org/show_bug.cgi?id=258) - and python bindings. + and python bindings. +* Kazan, the Vulkan driver Jacob is writing, is getting + a [new shader compiler IR](http://bugs.libre-riscv.org/show_bug.cgi?id=161). * A Conference call with OpenPOWER Foundation Director, Hugh, and Timothy Pearson from RaptorCS has been established every two weeks. +* The OpenPOWER Foundation is also running some open + ["Virtual Coffee"](https://openpowerfoundation.org/openpower-virtual-coffee-calls/) + weekly round-table calls for anyone interested, generally, in OpenPOWER + development. * Tim sponsors our team with access to a Monster Talos II system with a whopping 128 GB RAM. htop lists a staggering 72 cores (18 real with 4-way hyperthreading). @@ -38,9 +45,9 @@ Here's the summary (if it can be called a summary): [Milestone tasks list(s)](http://bugs.libre-riscv.org/buglist.cgi?component=Milestones&list_id=567&resolution=---) and a [boat-load](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/thread.html) - of bugreports to the list. + of bug reports to the list. * Immanuel Yehowshua is participating in the Georgia Tech - [https://create-x.gatech.edu/](Create-X) Programme, and is establishing + [Create-X](https://create-x.gatech.edu/) Programme, and is establishing a Public Benefit Corporation in Atlanta, as an ethical vehicle for VC Funding. * A [Load/Store Buffer](http://bugs.libre-riscv.org/show_bug.cgi?id=216) @@ -52,53 +59,769 @@ Here's the summary (if it can be called a summary): on the [6600 scoreboard](https://libre-riscv.org/3d_gpu/architecture/6600scoreboard/) page. +* [Public-Inbox](http://bugs.libre-riscv.org/show_bug.cgi?id=181) was + installed successfully on the server, which is in the process of + moving to a [new domain name](http://bugs.libre-riscv.org/show_bug.cgi?id=182) + [Libre-SOC](http://libre-soc.org) +* Build Servers have been set up with + [automated testing](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005364.html) + being established Well dang, as you can see, suddenly it just went ballistic. There's almost certainly things left off the list. For such a small team there's a heck of a lot going on. We have an awful lot to do, in a short amount of time: the 180nm tape-out is in October 2020 - only 7 months away. +With this update we're doing something slightly different: a request +has gone out [to the other team members](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005428.html) +to say a little bit about what each of them is doing. This also helps me +because these updates do take quite a bit of time to write. + # NLNet Funding announcement -TODO +An announcement went out +[last year](https://lists.gnu.org/archive/html/libreplanet-discuss/2019-09/msg00170.html) +that we'd applied for funding, and we got some great responses and +feedback (such as "don't use patented AXI4"). The second time, we +sent out a "we got it!" message and got some really nice private and +public replies, as well as requests from people to join the team. +More on that when it happens. # Coriolis2 experimentation started -TODO +Jock, a really enthusiastic and clearly skilled and experienced python +developer, has this to say about coriolis2: + + As a humble Python developer, I understand the unique status and + significance of the Coriolis project, nevertheless I cannot help + but notice that it has a huge room for improvement. I genuinely hope + that my participation in libre-riscv will also help improve Coriolis. + +This was the short version, with a much more +[detailed insight](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005478.html) +listed here which would do well as a bugreport. However the time it would +take is quite significant. We do have funding available from NLNet, +so if there is anyone that would like to take this on, under the supervision +of Jean-Paul at LIP6.fr, we can look at facilitating that. + +One of the key insights that Jock came up with was that the coding style, +whilst consistent, is something that specifically has to be learned, and, +as such, being contrary to PEP8 in so many ways, creates an artificially +high barrier and learning curve. + +Even particularly experienced cross-language developers such as +myself tend to be able to *read* such code, but editing it, when +commas separating list items are on the beginning of lines, results in +syntax errors automatically introduced *without thinking* because we +automatically add them *at the end* because it looks like one is missing. + +This is why we insisted on PEP8 in the +[HDL workflow](http://libre-riscv.org/HDL_workflow) document. + +Other than that: coriolis2 is actually extremely exciting to work with. +Anyone who has done manual PCB layout will know quite how much of a relief +it is to have auto-routing: this is what coriolis2 has by the bucket-load, +*as well* as auto-placement. We are looking at half a *million* objects +(Cells) to place. Without an auto-router / auto-placer this is just a +flat-out impossible task. + +The first step was to +[learn and adapt coriolis2](http://bugs.libre-riscv.org/show_bug.cgi?id=178) +which was needed to find out how much work would be involved, as much as +anything else, in order to be able to accurately assign the fixed budgets +to the NLNet milestones. Following on from that, when Jock joined, +we needed to work out a compact way to express the +[layout of blocks](http://bugs.libre-riscv.org/show_bug.cgi?id=217#c44) +and he's well on the way to achieving that. + +Some of the pictures from coriolis2 are +[stunning](bugs.libre-riscv.org/attachment.cgi?id=29). This was an +experimental routing of the IEEE754 FP 64-bit multiplier. It took +5 minutes to run, and is around 50,000 gates: as big as most silicon +ASICs that have formerly been done with Coriolis2, and 50% of the +practical size that can be handed in one go to the auto-place/auto-router. + +Other designs using coriolis2 have been of the form where the major "blocks" +(such as FPMUL, or Register File) are laid-out automatically in a single-level +hierarchy, followed by full and total manual layout from that point onwawrds, +in what is termed in the industry as a "Floorplan". +With around 500,000 gates to do and many blocks being repeated, this approach +is not viable for us. We therefore need a *two* level or potentially three +level hierarchy. + +[Explaining this](http://bugs.libre-riscv.org/show_bug.cgi?id=178#c146) +to Jean-Paul was amusing and challenging. Much bashing of heads against +walls and keyboards was involved. The basic plan: rather than have +coriolis2 perform an *entire* layout, in a flat and all-or-nothing fashion, +we need a much more subtle fine-grained approach, where *sub-blocks* are +laid-out, then *included* at a given level of hierarchy as "pre-done blocks". + +Save and repeat. + +This apparently had never been done before, and explaining it in words was +extremely challenging. Through a massive hack (actively editing the underlying +HDL files temporarily in between tasks) was the only way to illustrate it. +However once the lightbulb went on, Jean-Paul was able to get coriolis2's +c++ code into shape extremely rapidly, and this alone has opened up an +*entire new avenue* of potential for coriolis2 to be used in industry +for doing much larger ASICs. Which is precisely the kind of thing that +our NLNet sponsors (and the EU, from the Horizon 2020 Grant) love. hooray. +Now if only we could actually go to a conference and talk about it. # POWER ISA decoder and Simulator -TODO +*(kindly written by Michael)* -# Arbitrary IEEE754 Algorithmic Library and POWER FP emulation +The decoder we have is based on that of IBM's +[microwatt reference design](https://github.com/antonblanchard/microwatt). +As microwatt's decoder is quite regular, consisting of a bunch of large +switch statements returning fields of a struct, we elected not to +pursue a direct conversion of the VHDL to nmigen. Instead, we +extracted the information in the switch statements into several +[CSV tables](https://libre-riscv.org/openpower/isatables/), +and leveraged nmigen to construct the decoder from these +tables. We applied the same technique to extract the subfields +(register numbers, branch offset, immediates, etc.) from the +instruction, where Luke converted the information in the POWER ISA +specification to text, and wrote a module in python to extract those +fields from an instruction. -TODO +To test the decoder, we initially verified it against the tables we +extracted, and manually against the [POWER ISA +specification](https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0). Later +however, we came up with the idea of [verifying the +decoder](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/decoder/test/test_decoder_gas.py;h=9238d3878d964907c5569a3468d6895effb7dc02;hb=433ab59cf9b7ab1ae10754798fc1c110e705db76) +against the output of the GNU assembler. This is done by selecting an +instruction type (integer reg/reg, integer immediate, load store, +etc), and randomly selecting the opcode, registers, immediates, and +other operands. We then feed this instruction to GNU AS to assemble, +and then the assembled instruction is sent to our decoder. From this, +we can then verify that the output of the decoder matches what was +generated earlier. + +We also explored using a similar idea to test the functionality of the +entire SOC. By using the [QEMU](https://www.qemu.org/) PowerPC +emulator, we can compare the execution of our SOC against that of the +emulator to verify that our decoder and backend are working correctly. +We would write snippets of test code (or potentially randomly generate +instructions) and send the resulting binary to both the SOC and +QEMU. We would then simulate our SOC until it was finished executing +instructions, and use Qemu's gdb interface to do the same. We would +then use Qemu's gdb interface to compare the register file and memory +with that of our SOC to verify that it is working correctly. I did +some experimentation using this technique to verify a [rudimentary +simulator](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/test_sim.py;h=aadaf667eff7317b1aa514993cd82b9abedf1047;hb=433ab59cf9b7ab1ae10754798fc1c110e705db76) +of the SOC backend, and it seemed to work quite well. + +*(Note from Luke: this automated approach, taking either other people's +regularly-written code or actual PDF specifications, not only saves us a +vast amount of time, it also ensures that our implementation is +correct and does not contain transcription errors).* + +# simple-soft-float Library and POWER FP emulation + +The [simple-soft-float](https://salsa.debian.org/Kazan-team/simple-soft-float) +library is a floating-point library Jacob wrote with the intention +of being a reference implementation of IEEE 754 for hardware testing +purposes. It's specifically designed to be written to be easier to +understand instead of having the code obscured in pursuit of speed: + +* Being easier to understand helps prevent bugs where the code does not + match the IEEE spec. +* It uses the [algebraics](https://salsa.debian.org/Kazan-team/algebraics) + library that Jacob wrote since that allows using numbers that behave + like exact real numbers, making reasoning about the code simpler. +* It is written in Rust rather than highly-macro-ified C, since that helps with + readability since operations aren't obscured, as well as safety, since Rust + proves at compile time that the code won't seg-fault unless you specifically + opt-out of those guarantees by using `unsafe`. + +It currently supports 16, 32, 64, 128-bit FP for RISC-V, along with +having a `DynamicFloat` type which allows dynamically specifying all +aspects of how a particular floating-point type behaves -- if one wanted, +they could configure it as a 2048-bit floating-point type. + +It also has Python bindings, thanks to the awesome +[PyO3](https://pyo3.rs/) library for writing Python bindings in Rust. + +We decided to write simple-soft-float instead +of extending the industry-standard [Berkeley +softfloat](http://www.jhauser.us/arithmetic/SoftFloat.html) library +because of a range of issues, including not supporting Power FP, requiring +recompilation to switch which ISA is being emulated, not supporting +all the required operations, architectural issues such as depending on +global variables, etc. We are still testing simple-soft-float against +Berkeley softfloat where we can, however, since Berkeley softfloat is +widely used and highly likely to be correct. + +simple-soft-float is [gaining support for Power +FP](http://bugs.libre-riscv.org/show_bug.cgi?id=258), which requires +rewriting a lot of the status-flag handling code since Power supports a +much larger set of floating-point status flags and exceptions than most +other ISAs. + +Thanks to Raptor CS for giving us remote access to a Power9 system, +since that makes it much easier verifying that the test cases are correct +(more on this below). + +API Docs for stable releases of both +[simple-soft-float](https://docs.rs/simple-soft-float) and +[algebraics](https://docs.rs/algebraics) are available on docs.rs. + +The algebraics library was chosen as the +[Crate of the Week for October 8, 2019 on This Week in +Rust](https://this-week-in-rust.org/blog/2019/10/08/this-week-in-rust-307/#crate-of-the-week). + +One of the really important things about these libraries: they're not +specifically coded exclusively for Libre-SOC: like Berkeley softfloat itself +(and also like the [IEEE754 FPU](https://git.libre-riscv.org/?p=ieee754fpu.git)) +they're intended for *general-purpose* use by other projects. These are +exactly the kinds of side-benefits for the wider Libre community that +sponsorship, from individuals, Foundations (such as NLNet) and Companies +(such as Purism and Raptor CS) brings. + +# Kazan Getting a New Shader Compiler IR + +After spending several weeks only to discover that translating directly from +SPIR-V to LLVM IR, Vectorizing, and all the other front-end stuff all in a +single step is not really feasible, Jacob has switched to [creating a new +shader compiler IR](http://bugs.libre-riscv.org/show_bug.cgi?id=161) to allow +decomposing the translation process into several smaller steps. + +The IR and +SPIR-V to IR translator are being written simultaneously, since that allows +more easily finding the things that need to be represented in the shader +compiler IR. Because writing both of the IR and SPIR-V translator together is +such a big task, we decided to pick an arbitrary point ([translating a totally +trivial shader into the IR](http://bugs.libre-riscv.org/show_bug.cgi?id=177)) +and split it into tasks at that point so Jacob would be able to get paid +after several months of work. + +The IR uses structured control-flow inspired by WebAssembly's control-flow +constructs as well as +[SSA](https://en.wikipedia.org/wiki/Static_single_assignment_form) but, instead +of using traditional phi instructions, it uses block and loop parameters and +return values (inspired by [Cranelift's EBB +parameters](https://github.com/bytecodealliance/wasmtime/blob/master/cranelift/docs/ir.md#static-single-assignment-form) +as well as both of the [Rust](https://www.rust-lang.org/) and [Lua](https://www.lua.org/) programming languages). + +The IR has a single pointer type for all data pointers (`data_ptr`), unlike LLVM IR where pointer types have a type they point to (like `* i32`, where `i32` is the type the pointer points to). + +Because having a serialized form of the IR is important for any good IR, like +LLVM IR, it has a user-friendly textual form that can be both read and +written without losing any information (assuming the IR is valid, comments are +ignored). A binary form may be added later. + +Some example code (the IR is likely to change somewhat): + +``` +# this is a comment, comments go from the `#` character +# to the end of the line. + +fn function1[] -> ! { + # declares a function named function1 that takes + # zero parameters and doesn't return + # (the return type is !, taken from Rust). + # If the function could return, there would instead be + # a list of return types: + # fn my_fn[] -> [i32, i64] {...} + # my_fn returns an i32 and an i64. The multiple + # returned values is inspired by Lua's multiple return values. + + # the hints for this function + hints { + # there are no inlining hints for this function + inlining_hint: none, + # this function doesn't have a side-effect hint + side_effects: normal, + } + + # function local variables + { + # the local variable is an i32 with an + # alignment of 4 bytes + i32, align: 0x4 -> local_var1: data_ptr; + # the pointer to the local variable is + # assigned to local_var1 which has the type data_ptr + } + + # the function body is a single block -- block1. + # block1's return types are instead attached to the + # function signature above + # (the `-> !` in the `fn function1[] -> !`). + block1 { + # the first instruction is a loop named loop1. + # the initial value of loop_var is the_const, + # which is a named constant. + # the value of the_const is the address of the + # function `function1`. + loop loop1[the_const: fn function1] -> ! { + # loop1 takes 1 parameter, which is assigned + # to loop_var. the type of loop_var is a pointer to a + # function which takes no parameters and doesn't + # return. + -> [loop_var: fn[] -> !]; + + # the loop body is a single block -- block2. + # block2's return value definitions are instead + # attached to the loop instruction above + # (the `-> !` in the `loop loop1[...] -> !`). + block2 { + + # block3 is a block instruction, it returns + # two values, which are assigned to a and b. + # Both of a and b have type i32. + block block3 -> [a: i32, b: i32] { + # the only way a block can return is by + # being broken out of using the break + # instruction. It is invalid for execution + # to reach the end of a block. + + # this break instruction breaks out of + # block3, making block3 return the + # constants 1 and 2, both of type i32. + break block3[1i32, 2i32]; + }; + + # an add instruction. The instruction adds + # the value `a` (returned by block3 above) to + # the constant `increment` (which is an i32 + # with the value 0x1), and stores the + # result in the value `"a"1`. The source-code + # location for the add instruction is specified + # as being line 12, column 34, in the file + # `source_file.vertex`. + add [a, increment: 0x1i32] + -> ["a"1: i32] @ "source_file.vertex":12:34; + + # The `"a"1` name is stored as just `a` in + # the IR, where the 1 is a numerical name + # suffix to differentiate between the two + # values with name `a`. This allows robustly + # handling duplicate names, by using the + # numerical name suffix to disambiguate. + # + # If a name is specified without the numerical + # name suffix, the suffix is assumed to be the + # number 0. This also allows handling names that + # have unusual characters or are just the empty + # string by using the form with the numerical + # suffix: + # `""0` (empty string) + # `"\n"0` (a newline) + # `"\u{12345}"0` (the unicode scalar value 0x12345) + + + # this continue instruction jumps back to + # the beginning of loop1, supplying the new + # values of the loop parameters. In this case, + # we just supply loop_var as the value for + # the parameter, which just gets assigned to + # loop_var in the next iteration. + continue loop1[loop_var]; + } + }; + } +} +``` # OpenPOWER Conference calls -TODO +We've now established a routine two-week conference call with Hugh Blemings, +OpenPOWER Foundation Director, and Timothy Pearson, CEO of Raptor CS. This +allows us to keep up-to-date (each way) on both our new venture and also +the newly-announced OpenPOWER Foundation effort as it progresses. + +One of the most important things that we, Libre-SOC, need, and are +discussing with Hugh and Tim is: a way to switch on/off functionality +in the (limited) 32-bit opcode space, so that we have one mode for +"POWER 3.0B compliance" and another for "things that are absolutely +essential to make a decent GPU". With these two being strongly +mutually exclusively incompatible, this is just absolutely critical. + +Khronos Vulkan Floating-point Compliance is, for example, critical not +just from a Khronos Trademark Compliance perspective, it's essential +from a power-saving and thus commercial success perspective. If we +have absolute strict compliance with IEEE754 for POWER 3.0B, this will +result in far more silicon than any commercially-competitive GPU on +the market, and we will not be able to sell product. Thus it is +*commercially* essential to be able to swap between POWER Compliance +and Khronos Compliance *at the silicon level*. + +POWER 3.0B does not have c++ style LR/SC atomic operations for example, +and if we have half a **million** 3D GPU data structures **per second** +that need SMP-level inter-core mutexes, and the current POWER 3.0B +multi-instruction atomic operations are used - conforming strictly to +the standard - we're highly likely to use 10 to 15 **percent** processing +power consumed on spin-locking. Finding out from Tim on one of these +calls that this is something that c++ atomics is something that end-users +have been asking about is therefore a good sign. + +Adding new and essential features that could well end up in a future version +of the POWER ISA *need* to be firewalled in a clean way, and we've been +asked to [draft a letter](https://libre-riscv.org/openpower/isans_letter/) +to some of the (very busy) engineers with a huge amount of knowledge +and experience inside IBM, for them to consider. Some help in reviewing +it would be greatly appreciated. + +These and many other things are why the calls with Tim and Hugh are a +good idea. The amazing thing is that they're taking us seriously, and +we can discuss things like those above with them. + +Other nice things we learned (more on this below) is that Epic Games +and RaptorCS are collaborating to get POWER9 supported in Unreal Engine. +And that the idea has been very tentatively considered to use our design +for the "boot management" processor, running +[OpenBMC](https://github.com/openbmc/openbmc). These are early days, +it's just ideas, ok! Aside from anything, we actually have to get a chip +done, first. + +# OpenPower Virtual Coffee Meetings + +The "Virtual Coffee Meetings", announced +[here](https://openpowerfoundation.org/openpower-virtual-coffee-calls/) +are literally open to anyone interested in OpenPOWER (if you're strictly +Libre there's a dial-in method). These calls are not recorded, it's +just an informal conversation. + +What's a really nice surprise is finding +out that Paul Mackerras, whom I used to work with 20 years ago, is *also* +working on OpenPOWER, specifically +[microwatt](https://github.com/antonblanchard/microwatt), being managed +by Anton Blanchard. + +A brief discussion led to learning that Paul is looking at adding TLB +(Virtual Memory) support to microwatt, specifically the RADIX TLB. +I therefore pointed him at the same resource +[(power-gem5)](https://github.com/power-gem5/gem5/tree/gem5-experimental) +that Hugh had kindly pointed me at, the week before, and did a +[late night write-up](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005445.html) + +My feeling is that these weekly round-table meetings are going to be +really important for everyone involved in OpenPOWER. It's a community: +we help each other. # Sponsorship by RaptorCS with a TALOS II Workstation -TODO +With many thanks to Timothy from +[RaptorCS](https://raptorcs.com), we've a new shiny +online server that needs +[setting up](http://bugs.libre-riscv.org/show_bug.cgi?id=265). +This machine is not just a "nice-to-have", it's actually essential for +us to be able to verify against. As you can see in the bugreport, the idea +is to bootstrap our way from running IEEE754 FP on a *POWER* system +(using typically gnu libm), verifying Jacob's algorithmic FP library +particularly and specifically for its rounding modes and exception modes. + +Once that is done, then apart from having a general-purpose library that +is compliant with POWER IEEE754 which *anyone else can use*, we can use +that to run unit tests against our[ +hardware IEEE754 FP library](https://git.libre-riscv.org/?p=ieee754fpu.git;a=summary) - +again, a resource that anyone may use in any arbitrary project - verifying +that it is also correct. This stepping-stone "bootstrap" method we are +deploying all over the place, however to do so we need access to resources +that have correctly-compliant implementations in the first place. Thus, +the critical importance of access to a TALOS II POWER9 workstation. # Epic Megagrants -TODO +Several months back I got word of the existence of Epic Games' "Megagrants". +In December 2019 they announced that so far they've given +[USD $13 million](https://www.unrealengine.com/en-US/blog/epic-megagrants-reaches-13-million-milestone-in-2019) +to 200 recipients, so far: one of them, the Blender Foundation, was +[USD $1.2 million](https://www.blender.org/press/epic-games-supports-blender-foundation-with-1-2-million-epic-megagrant/)! +This is an amazing and humbling show of support for the 3D Community, +world-wide. + +It's not just "games", or products specifically using the Unreal Engine: +they're happy to look at anything that "enhances Libre / Open source" +capabilities for the 3D Graphics Community. + +A full hybrid 3D-capable CPU-GPU-VPU which is fully-documented not just in +its capabilities, that [documentation](http://libre-riscv.org) and +[full source code](http://git.libre-riscv.org) kinda extends +right the way through the *entire development process* down to the bedrock +of the actual silicon - not just the firmware, bootloader and BIOS, +*everything* - in my mind it kinda qualifies in way that can, in some +delightful way, be characterised delicately as "complete overkill". + +Interestingly, guys, if you're reading this: Tim, the CEO of RaptorCS +informs us that you're working closely with his team to get the Unreal +Engine up and running on the POWER architecture? Wouldn't that be highly +amusing, for us to be able to run the Unreal Engine on the Libre-SOC, +given that it's going to be POWER compatible hardware, as a test, +first initially in FPGA and then in 18-24 months, on actual silicon, eh? + +So, as I mentioned +[on the list](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005262.html) +(reiterating what I put in the original application), we're happy with +USD $25,000, we're happy with USD $10 million. It's really up to you guys, +at Epic Games, as to what level you'd like to see us get to, and how fast. + +USD $600,000 for example we can instead of paying USD $1million to a proprietary +company to license a DDR3 PHY for a limited one-time use and only a 32-bit +wide interface, we can contract SymbioticEDA to *design* a DDR3 PHY for us, +which both we *and the rest of the worldwide Silicon Community can use +without limitation* because we will ask SymbioticEDA to make the design +(and layout) libre-licensed, for anyone to use. + +USD 250,000 pays for the mask charges that will allow us to do the 40nm +quad-core ASIC that we have on the roadmap for the second chip. USD +$1m pays for 28nm masks (and so on, in an exponential ramp-up). No, we +don't want to do that straight away: yes we do want to go through a first +proving test ASIC in 180nm, which, thanks to NLNet, is already funded. +This is just good sane sensible use of funds. + +Even USD $25,000 helps us to cover things such as administration of the +website (which is taking up a *lot* of time) and little things that we +didn't quite foresee when putting in the NLNet Grant Applications. + +Lastly, one of the conditions as I understood it from the Megagrants +process is that the funds are paid in "stages". This is exactly +what NLNet does for (and with) us, right now. If you wanted to save +administrative costs, there may be some benefit to having a conversation +with the [30-year-old](https://nlnet.nl/foundation/history/) +NLNet Charitable Foundation. Something to think about? # NLNet Milestone tasks -TODO +Part of applying for NLNet's Grants is a requirement to create a list +of tasks, each of which is assigned a budget. On 100% completion of the task, +donations can be sent out. With *six* new proposals accepted, each of which +required between five (minimum) and *ninteen* separate and distinct tasks, +a call with Michiel and Joost turned into an unexpected three hour online +marathon, scrambling to write almost fifty bugreports as part of the Schedule +to be attached to each Memorandum of Understanding. The mailing list +got a [leeetle bit busy](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005003.html) +right around here. + +Which emphasised for us the important need to subdivide the mailing list into +separate lists (below). # Georgia Tech CREATE-X -TODO +(*This section kindly written by Yehowshua*) + +Yehowshua is a student at Georgia Tech currently pursuing a Masters in +Computer Engineering - to graduate this summer. He had started working +on LibreSOC in December and wanted to to get LibreSOC more funding so +I could work on it full time. + +He originally asked if the ECE Chair at Georgia Tech would be willing +to fund an in-department effort to deliver an SOC in collaboration +with LibreSOC(an idea to which he was quite receptive). Through Luke, +Yehowshua got in contact with Chistopher Klaus who suggested Yehowshua +should look into Klaus's startup accelerator program Create-X and perhaps +consider taking LibreSOC down the startup route. Robert Rhinehart, who +had funded LibreSOC a little in the past (*note from Luke: he donated +the ZC706 and also funded modernisation of Richard Herveille's excellent +[vga_lcd](https://github.com/RoaLogic/vga_lcd) Library*) +also suggested that Yehowshua +incorporate LibreSOC with help from Create-X and said he would be willing +to be a seed investor. All this happened by February. + +As of March, Yehowshua has been talking with Robert about what type of +customers would be interested in LibreSOC. Robert is largely interested in +biological applications. Yehowshua also had a couple meetings with Rahul +from Create-X. Yehowshua has started the incorporation of LibreSOC. The +parent company will probably be called Systèmes-Libres with LibreSOC +simply being one of the products we will offer. Yehowshua also attended +HPCA in late February and had mentioned LIbreSOC during his talk. People +seemed to find the idea quite interesting + +He will later be speaking with some well know startup lawyers that have +an HQ in Atlanta to discuss business related things such as S Corps, +C corps, taxes, wages, equity etc… + +Yehowshua plans for Systèmes-Libres to hire full time employees. Part +time work on Libre-SOC will still be possible through donations and +support from NL Net and companies like purism. + +Currently, Yehowshua plans to take the Create-X summer launch program +and fund Systèmes-Libres by August. Full time wages would probably be +set around 100k USD. # LOAD/STORE Buffer and 6600 design documentation +A critical part of this project is not just to create a chip, it's to +*document* the chip design, the decisions along the way, for both +educational, research, and ongoing maintenance purposes. With an +augmented CDC 6600 design being chosen as the fundamental basis, +[documenting that](https://libre-riscv.org/3d_gpu/architecture/6600scoreboard/) +as well as the key differences is particularly important. At the very least, +the extremely simple and highly effective hardware but timing-critical +design aspects of the circular loops in the 6600 were recognised by James +Thornton (the co-designer of the 6600) as being paradoxically challenging +to understand why so few gates could be so effective (being as they were, +literally the world's first ever out-of-order superscalar architecture). +Consequently, documenting it just to be able to *develop* it is extremely +important. + +We're getting to the point where we need to connect the LOAD/STORE Computation +Units up to an actual memory architecture. We've chosen +[minerva](https://github.com/lambdaconcept/minerva/blob/master/minerva/units/loadstore.py) +as the basis because it is written in nmigen, works, and, crucially, uses +wishbone (which we decided to use as the main Bus Backbone a few months ago). + +However, unlike minerva, which is a single-issue 32-bit embedded chip, +where it's perfectly ok to have one single LD/ST operation per clock, +and not only that but to have that operation take a few clock cycles, +to get anything like the level of performance needed of a GPU, we need +at least four 64-bit LOADs or STOREs *every clock cycle*. + +For a first ASIC from a team that's never done a chip before, this is, +officially, "Bonkers Territory". Where minerva is doing 32-bit-wide +Buses (and does not support 64-bit LD/ST at all), we need internal +data buses of a minimum whopping **2000** wires wide. + +Let that sink in for a moment. + +The reason why the internal buses need to be 2000 wires wide comes down +to the fact that we need, realistically, 6 to eight LOAD/STORE Computation +Units. 4 of them will be operational, 2 to 4 of them will be waiting +with pending instructions from the multi-issue Vectorisation Engine. + +We chose to use a system which expands the first 4 bits of the address, +plus the operation width (1,2,4,8 bytes) into a "bitmap" - a byte-mask - +that corresponds directly with the 16 byte "cache line" byte enable +columns, in the L1 Cache. These bitmaps can then be "merged" such +that requests that go to the same cache line can be served *in the +same clock cycle* to multiple LOAD/STORE Computation Units. This +being absolutely critical for effective Vector Processing. + +Additionally, in order to deal with misaligned memory requests, each of those +needs to put out *two* such 16-byte-wide requests (see where this is going?) +out to the L1 Cache. +So, we now have eight times two times 128 bits which is a staggering +2048 wires *just for the data*. There do exist ways to get that down +(potentially to half), and there do exist ways to get that cut in half +again, however doing so would miss opportunities for merging of requests +into cache lines. + +At that point, thanks to Mitch Alsup's input (Mitch is the designer of +the Motorola 68000, Motorola 88120, key architecture on AMD's Opteron +Series, the AMD K9, AMDGPU and Samsung's latest GPU), we learned that +L1 cache design critically depends on what type of SRAM you have. We +initially, naively, wanted dual-ported L1 SRAM and that's when Staf +and Mitch taught us that this results in half-duty rate. Only +1-Read **or** 1-Write SRAM Cells give you fast enough (single-cycle) +data rates to be useable for L1 Caches. + +Part of the conversation has wandered into +[why we chose dynamic pipelines](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005459.html) +as well as receiving that +[important advice](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005354.html) +from both Mitch Alsup and Staf Verhaegen. + +(Staf is also [sponsored by NLNet](https://nlnet.nl/project/Chips4Makers/) +to create Libre-licensed Cell Libraries, busting through one of the - +many - layers of NDAs and reducing NREs and unnecessary and artificial +barriers for ASIC development: I helped him put in the submission, and +he was really happy to do the Cell Libraries that we will be using for +LibreSOC's 180nm test tape-out in October 2020.) + +# Public-Inbox and Domain Migration + +As mentioned before, one of the important aspects of this project is +the documentation and archiving. It also turns out that when working +over an extremely unreliable or ultra-expensive mobile broadband link, +having *local* (offline) access to every available development resource +is critically important. + +Hence why we are going to the trouble of installing public-inbox, due +to its ability to not only have a mailing list entirely stored in a +git repository, the "web service" which provides access to that git-backed +archive can be not only mirrored elsewhere, it can be *run locally on +your own local machine* even when offline. This in combination +with the right mailer setup can store-and-forward any replies to the +(offline-copied) messages, such that they can be sent when internet +connectivity is restored, yet remain a productive collaborative developer. + +Now you know why we absolutely do not accept "slack", or other proprietary +"online oh-so-convenient" service. Not only is it highly inappropriate for +Libre Projects, not only do we become critically dependent on the Corporation +running the service (yes, github has been entirely offline, several times), +if we have remote developers (such as myself, working from Scotland last +month with sporadic access to a single Cell Tower) or developers in emerging +markets where their only internet access is via a Library or Internet Cafe, +we absolutely do not want to exclude or penalise such people, just because +they have less resources. + +Fascinatingly, Linus Torvals is *specifically* +[on record](https://www.linuxjournal.com/content/line-length-limits) +about making sure that "Linux development does not favour wealthy people". + +We are also, as mentioned before, moving to a new domain name. We'll take +the opportunity to fix some of the issues with HTTPS (wrong certificate), +and also do some +[better mailing list names](http://bugs.libre-riscv.org/show_bug.cgi?id=184) +at the same time. + +TODO (Veera?) bit about what was actually done, how it links into mailman2. + +# OpenPOWER HDL Mailing List opens up + +It is early days, however it is fantastic to see responses from IBM with +regards to requests for access to the POWER ISA Specification +documents in +[machine-readable form](http://lists.mailinglist.openpowerfoundation.org/pipermail/openpower-hdl-cores/2020-March/000007.html) +I took Jeff at his word and explained, in some detail, +[exactly why](http://lists.mailinglist.openpowerfoundation.org/pipermail/openpower-hdl-cores/2020-March/000008.html) +machine readable versions of specifications are critically important. + +The takeaway is: *we haven't got time to do manual transliteration of the spec* +into "code". We're expending considerable effort making sure that we +"bounce" or "bootstrap" off of pre-existing resources, using computer +programs to do so. + +This "trick" is something that I learned over 20 years ago, when developing +an SMB Client and Server in something like two weeks flat. I wrote a +parser which read the packet formats *from the IETF Draft Specification*, +and outputted c-code. + +This leaves me wondering, as I mention on the HDL list, if we can do the same +thing with large sections of the POWER Spec. + +# Build Servers + TODO # Conclusion -TODO +I'm not going to mention anything about the current world climate: you've +seen enough news reports. I will say (more about this through the +[EOMA68](https://www.crowdsupply.com/eoma68/micro-desktop) updates) that +I anticipated something like what is happening right now, over ten years +ago. I wasn't precisely expecting what *has* happened, just the consequences: +world-wide travel shut-down, and for people - the world over - to return to +local community roots. + +However what I definitely wasn't expecting was a United States President +to be voted in who was eager and, frankly, stupid enough, to start *and +escalate* a Trade war with China. The impact on the U.S economy alone, and the +reputation of the whole country, has been detrimental in the extreme. + +This combination leaves us - world-wide - with the strong possibility that +seemed so "preposterous" that I could in no way discuss it widely, let alone +mention it on something like a Crowdsupply update, that thanks to the +business model on which their entire product lifecycle is predicated, +in combination with the extremely high NREs and development costs for +ASICs (custom silicon costs USD $100 million, these days), several +large Corporations producing proprietary binary-only drivers for +hardware on which we critically rely for our internet-connected way +of life **may soon go out of business**. + +Right at a critical time where video conferencing is taking off massively, +your proprietary hardware - your smartphone, your tablet, your laptop, +everything you rely on for connectivity to the rest of the world, all of +a sudden **you may not be able to get software updates** or, worse, +your products could even be +[remotely shut down](https://www.theguardian.com/technology/2016/apr/05/revolv-devices-bricked-google-nest-smart-home) +**without warning**. + +I do not want to hammer the point home too strongly but you should be +getting, in no uncertain terms, exactly how strategically critical, in +the current world climate, this project just became. We need to get it +accelerated, completed, and into production, in an expedited and responsible +fashion.