updates/023_2020mar26_decoder_emulator_started.mdwn

   1 So many things happened since the last update they actually need to go
   2 in the main update, even in summary form.  One big thing:
   3 [Raptor CS](https://www.raptorcs.com/)
   4 sponsored us with remote access to a Monster spec'd TALOS II Workstation!
   5
   6 # Introduction
   7
   8 Here's the summary (if it can be called a summary):
   9
  10 * [An announcement](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/004995.html)
  11   that we got the funding (which is open to anyone - hint, hint) resulted in
  12   at least three people reaching out to join the team.  "We don't need
  13   permission to own our own hardware" got a *really* positive reaction.
  14 * New team member, Jock (hello Jock!) starts on the coriolis2 layout,
  15   after Jean-Paul from LIP6.fr helped to dramatically improve how coriolis2
  16   can be used.  This resulted in a
  17   [tutorial](https://libre-riscv.org/HDL_workflow/coriolis2/) and a
  18   [huge bug report discussion](http://bugs.libre-riscv.org/show_bug.cgi?id=178)
  19 * Work has started on the
  20   [POWER ISA decoder](http://bugs.libre-riscv.org/show_bug.cgi?id=186),
  21   verified through
  22   [calling GNU AS](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/decoder/test/test_decoder_gas.py;h=9238d3878d964907c5569a3468d6895effb7dc02;hb=56d145e42ac75626423915af22d1493f1e7bb143) (yes, really!)
  23   and on a mini-simulator
  24   [calling QEMU](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/qemu.py;h=9eb103bae227e00a2a1d2ec4f43d7e39e4f44960;hb=56d145e42ac75626423915af22d1493f1e7bb143)
  25   for verification.
  26 * Jacob's simple-soft-float library growing
  27   [Power FP compatibility](http://bugs.libre-riscv.org/show_bug.cgi?id=258)
  28   and python bindings.
  29 * A Conference call with OpenPOWER Foundation Director, Hugh, and Timothy
  30   Pearson from RaptorCS has been established every two weeks.
  31 * The OpenPOWER Foundation is also running some open
  32   ["Virtual Coffee"](https://openpowerfoundation.org/openpower-virtual-coffee-calls/)
  33   weekly round-table calls for anyone interested, generally, in OpenPOWER
  34   development.
  35 * Tim sponsors our team with access to a Monster Talos II system with a
  36   whopping 128 GB RAM.  htop lists a staggering 72 cores (18 real
  37   with 4-way hyperthreading).
  38 * [Epic MegaGrants](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005262.html)
  39   reached out (hello!) to say they're still considering our
  40   request.
  41 * A marathon 3-hour session with [NLNet](http://nlnet.nl) resulted
  42   in the completion of the
  43   [Milestone tasks list(s)](http://bugs.libre-riscv.org/buglist.cgi?component=Milestones&list_id=567&resolution=---)
  44   and a
  45   [boat-load](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/thread.html)
  46   of bug reports to the list.
  47 * Immanuel Yehowshua is participating in the Georgia Tech
  48   [Create-X](https://create-x.gatech.edu/) Programme, and is establishing
  49   a Public Benefit Corporation in Atlanta, as an ethical vehicle for VC
  50   Funding.
  51 * A [Load/Store Buffer](http://bugs.libre-riscv.org/show_bug.cgi?id=216)
  52   design and
  53   [further discussion](http://bugs.libre-riscv.org/show_bug.cgi?id=257)
  54   including on
  55   [comp.arch](https://groups.google.com/forum/#!topic/comp.arch/cbGAlcCjiZE)
  56   inspired additional writeup
  57   on the
  58   [6600 scoreboard](https://libre-riscv.org/3d_gpu/architecture/6600scoreboard/)
  59   page.
  60 * [Public-Inbox](http://bugs.libre-riscv.org/show_bug.cgi?id=181) was
  61   installed successfully on the server, which is in the process of
  62   moving to a [new domain name](http://bugs.libre-riscv.org/show_bug.cgi?id=182)
  63   [Libre-SOC](http://libre-soc.org)
  64 * Build Servers have been set up with
  65   [automated testing](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005364.html)
  66   being established
  67
  68 Well dang, as you can see, suddenly it just went ballistic.  There's
  69 almost certainly things left off the list.  For such a small team there's
  70 a heck of a lot going on.  We have an awful lot to do, in a short amount
  71 of time: the 180nm tape-out is in October 2020 - only 7 months away.
  72
  73 With this update we're doing something slightly different: a request
  74 has gone out [to the other team members](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005428.html)
  75 to say a little bit about what each of them is doing.  This also helps me
  76 because these updates do take quite a bit of time to write.
  77
  78 # NLNet Funding announcement
  79
  80 An announcement went out
  81 [last year](https://lists.gnu.org/archive/html/libreplanet-discuss/2019-09/msg00170.html)
  82 that we'd applied for funding, and we got some great responses and
  83 feedback (such as "don't use patented AXI4").  The second time, we
  84 sent out a "we got it!" message and got some really nice private and
  85 public replies, as well as requests from people to join the team.
  86 More on that when it happens.
  87
  88 # Coriolis2 experimentation started
  89
  90 Jock, a really enthusiastic and clearly skilled and experienced python
  91 developer, has this to say about coriolis2:
  92
  93     As a humble Python developer, I understand the unique status and
  94     significance of the Coriolis project, nevertheless I cannot help
  95     but notice that it has a huge room for improvement. I genuinely hope
  96     that my participation in libre-riscv will also help improve Coriolis.
  97
  98 This was the short version, with a much more
  99 [detailed insight](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005478.html)
 100 listed here which would do well as a bugreport.  However the time it would
 101 take is quite significant.  We do have funding available from NLNet,
 102 so if there is anyone that would like to take this on, under the supervision
 103 of Jean-Paul at LIP6.fr, we can look at facilitating that.
 104
 105 One of the key insights that Jock came up with was that the coding style,
 106 whilst consistent, is something that specifically has to be learned, and,
 107 as such, being contrary to PEP8 in so many ways, creates an artificially
 108 high barrier and learning curve.
 109
 110 Even particularly experienced cross-language developers such as
 111 myself tend to be able to *read* such code, but editing it, when
 112 commas separating list items are on the beginning of lines, results in
 113 syntax errors automatically introduced *without thinking* because we
 114 automatically add them *at the end* because it looks like one is missing.
 115
 116 This is why we insisted on PEP8 in the
 117 [HDL workflow](http://libre-riscv.org/HDL_workflow) document.
 118
 119 Other than that: coriolis2 is actually extremely exciting to work with.
 120 Anyone who has done manual PCB layout will know quite how much of a relief
 121 it is to have auto-routing: this is what coriolis2 has by the bucket-load,
 122 *as well* as auto-placement.  We are looking at half a *million* objects
 123 (Cells) to place.  Without an auto-router / auto-placer this is just a
 124 flat-out impossible task.
 125
 126 The first step was to
 127 [learn and adapt coriolis2](http://bugs.libre-riscv.org/show_bug.cgi?id=178)
 128 which was needed to find out how much work would be involved, as much as
 129 anything else, in order to be able to accurately assign the fixed budgets
 130 to the NLNet milestones.  Following on from that, when Jock joined,
 131 we needed to work out a compact way to express the
 132 [layout of blocks](http://bugs.libre-riscv.org/show_bug.cgi?id=217#c44)
 133 and he's well on the way to achieving that.
 134
 135 Some of the pictures from coriolis2 are
 136 [stunning](bugs.libre-riscv.org/attachment.cgi?id=29).  This was an
 137 experimental routing of the IEEE754 FP 64-bit multiplier.  It took
 138 5 minutes to run, and is around 50,000 gates: as big as most silicon
 139 ASICs that have formerly been done with Coriolis2, and 50% of the
 140 practical size that can be handed in one go to the auto-place/auto-router.
 141
 142 Other designs using coriolis2 have been of the form where the major "blocks"
 143 (such as FPMUL, or Register File) are laid-out automatically in a single-level
 144 hierarchy, followed by full and total manual layout from that point onwawrds,
 145 in what is termed in the industry as a "Floorplan".
 146 With around 500,000 gates to do and many blocks being repeated, this approach
 147 is not viable for us.  We therefore need a *two* level or potentially three
 148 level hierarchy.
 149
 150 [Explaining this](http://bugs.libre-riscv.org/show_bug.cgi?id=178#c146)
 151 to Jean-Paul was amusing and challenging.  Much bashing of heads against
 152 walls and keyboards was involved.  The basic plan: rather than have
 153 coriolis2 perform an *entire* layout, in a flat and all-or-nothing fashion,
 154 we need a much more subtle fine-grained approach, where *sub-blocks* are
 155 laid-out, then *included* at a given level of hierarchy as "pre-done blocks".
 156
 157 Save and repeat.
 158
 159 This apparently had never been done before, and explaining it in words was
 160 extremely challenging.  Through a massive hack (actively editing the underlying
 161 HDL files temporarily in between tasks) was the only way to illustrate it.
 162 However once the lightbulb went on, Jean-Paul was able to get coriolis2's
 163 c++ code into shape extremely rapidly, and this alone has opened up an
 164 *entire new avenue* of potential for coriolis2 to be used in industry
 165 for doing much larger ASICs.  Which is precisely the kind of thing that
 166 our NLNet sponsors (and the EU, from the Horizon 2020 Grant) love.  hooray.
 167 Now if only we could actually go to a conference and talk about it.
 168
 169 # POWER ISA decoder and Simulator
 170
 171 *(kindly written by Michael)*
 172
 173 The decoder we have is based on that of IBM's
 174 [microwatt reference design](https://github.com/antonblanchard/microwatt).
 175 As microwatt's decoder is quite regular, consisting of a bunch of large
 176 switch statements returning fields of a struct, we elected not to
 177 pursue a direct conversion of the VHDL to nmigen. Instead, we
 178 extracted the information in the switch statements into several
 179 [CSV tables](https://libre-riscv.org/openpower/isatables/),
 180 and leveraged nmigen to construct the decoder from these
 181 tables. We applied the same technique to extract the subfields
 182 (register numbers, branch offset, immediates, etc.) from the
 183 instruction, where Luke converted the information in the POWER ISA
 184 specification to text, and wrote a module in python to extract those
 185 fields from an instruction.
 186
 187 To test the decoder, we initially verified it against the tables we
 188 extracted, and manually against the power ISA specification. Later
 189 however, we came up with the idea of verifying the decoder against the
 190 output of the GNU assembler. This is done by selecting an instruction
 191 type (integer reg/reg, integer immediate, load store, etc), and
 192 randomly selecting the opcode, registers, immediates, and other
 193 operands. We then feed this instruction to GNU AS to assemble, and
 194 then the assembled instruction is sent to our decoder. From this, we
 195 can then verify that the output of the decoder matches what was generated
 196 earlier.
 197
 198 We also explored using a similar idea to test the functionality of the
 199 entire SOC. By using the [QEMU](https://www.qemu.org/) powerpc
 200 emulator, we can compare the execution of our SOC against that of the
 201 emulator to verify that our decoder and backend are working correctly.
 202 We would write snippets of test code (or potentially randomly generate
 203 instructions) and send the resulting binary to both the SOC and
 204 QEMU. We would then simulate our SOC until it was finished executing
 205 instructions, and use Qemu's gdb interface to do the same. We would
 206 then use Qemu's gdb interface to compare the register file and memory
 207 with that of our SOC to verify that it is working correctly. I did
 208 some experimentation using this technique to verify a rudimentary
 209 simulator of the SOC backend, and it seemed to work quite well.
 210
 211 *(Note from Luke: this automated approach, taking either other people's
 212 regularly-written code or actual PDF specifications, not only saves us a
 213 vast amount of time, it also ensures that our implementation is
 214 correct and does not contain transcription errors).*
 215
 216 # simple-soft-float Library and POWER FP emulation
 217
 218 The
 219 [simple-soft-float](https://salsa.debian.org/Kazan-team/simple-soft-float)
 220 library is a floating-point library Jacob wrote with the intention
 221 of being a reference implementation of IEEE 754 for hardware testing
 222 purposes. It's specifically designed to be written to be easier to
 223 understand instead of having the code obscured in pursuit of speed:
 224
 225 * Being easier to understand helps prevent bugs where the code does not
 226   match the IEEE spec.
 227 * It uses the [algebraics](https://salsa.debian.org/Kazan-team/algebraics)
 228   library that Jacob wrote since that allows using numbers that behave
 229   like exact real numbers, making reasoning about the code simpler.
 230 * It is written in Rust rather than highly-macro-ified C, since that helps with
 231   readability since operations aren't obscured, as well as safety, since Rust
 232   proves at compile time that the code won't seg-fault unless you specifically
 233   opt-out of those guarantees by using `unsafe`.
 234
 235 It currently supports 16, 32, 64, 128-bit FP for RISC-V, along with
 236 having a `DynamicFloat` type which allows dynamically specifying all
 237 aspects of how a particular floating-point type behaves -- if one wanted,
 238 they could configure it as a 2048-bit floating-point type.
 239
 240 It also has Python bindings, thanks to the awesome
 241 [PyO3](https://pyo3.rs/) library for writing Python bindings in Rust.
 242
 243 We decided to write simple-soft-float instead
 244 of extending the industry-standard [Berkeley
 245 softfloat](http://www.jhauser.us/arithmetic/SoftFloat.html) library
 246 because of a range of issues, including not supporting Power FP, requiring
 247 recompilation to switch which ISA is being emulated, not supporting
 248 all the required operations, architectural issues such as depending on
 249 global variables, etc. We are still testing simple-soft-float against
 250 Berkeley softfloat where we can, however, since Berkeley softfloat is
 251 widely used and highly likely to be correct.
 252
 253 simple-soft-float is [gaining support for Power
 254 FP](http://bugs.libre-riscv.org/show_bug.cgi?id=258), which requires
 255 rewriting a lot of the status-flag handling code since Power supports a
 256 much larger set of floating-point status flags and exceptions than most
 257 other ISAs.
 258
 259 Thanks to RaptorCS for giving us remote access to a Power9 system,
 260 since that makes it much easier verifying that the test cases are correct
 261 (more on this below).
 262
 263 API Docs for stable releases of both
 264 [simple-soft-float](https://docs.rs/simple-soft-float) and
 265 [algebraics](https://docs.rs/algebraics) are available on docs.rs.
 266
 267 One of the really important things about these libraries: they're not
 268 specifically coded exclusively for Libre-SOC: like softfloat-3 itself
 269 (and also like the [IEEE754 FPU](https://git.libre-riscv.org/?p=ieee754fpu.git))
 270 they're intended for *general-purpose* use by other projects.  These are
 271 exactly the kinds of side-benefits for the wider Libre community that
 272 sponsorship, from individuals, Foundations (such as NLNet) and Companies
 273 (such as Purism and Raptor CS) brings.
 274
 275 # OpenPOWER Conference calls
 276
 277 We've now established a routine two-week conference call with Hugh Blemings,
 278 OpenPOWER Foundation Director, and Timothy Pearson, CEO of Raptor CS.  This
 279 allows us to keep up-to-date (each way) on both our new venture and also
 280 the newly-announced OpenPOWER Foundation effort as it progresses.
 281
 282 One of the most important things that we, Libre-SOC, need, and are
 283 discussing with Hugh and Tim is: a way to switch on/off functionality
 284 in the (limited) 32-bit opcode space, so that we have one mode for
 285 "POWER 3.0B compliance" and another for "things that are absolutely
 286 essential to make a decent GPU".  With these two being strongly
 287 mutually exclusively incompatible, this is just absolutely critical.
 288
 289 Khronos Vulkan Floating-point Compliance is, for example, critical not
 290 just from a Khronos Trademark Compliance perspective, it's essential
 291 from a power-saving and thus commercial success perspective.  If we
 292 have absolute strict compliance with IEEE754 for POWER 3.0B, this will
 293 result in far more silicon than any commercially-competitive GPU on
 294 the market, and we will not be able to sell product.  Thus it is
 295 *commercially* essential to be able to swap between POWER Compliance
 296 and Khronos Compliance *at the silicon level*.
 297
 298 POWER 3.0B does not have c++ style LR/SC atomic operations for example,
 299 and if we have half a **million** 3D GPU data structures **per second**
 300 that need SMP-level inter-core mutexes, and the current POWER 3.0B
 301 multi-instruction atomic operations are used - conforming strictly to
 302 the standard - we're highly likely to use 10 to 15 **percent** processing
 303 power consumed on spin-locking.  Finding out from Tim on one of these
 304 calls that this is something that c++ atomics is something that end-users
 305 have been asking about is therefore a good sign.
 306
 307 Adding new and essential features that could well end up in a future version
 308 of the POWER ISA *need* to be firewalled in a clean way, and we've been
 309 asked to [draft a letter](https://libre-riscv.org/openpower/isans_letter/)
 310 to some of the (very busy) engineers with a huge amount of knowledge
 311 and experience inside IBM, for them to consider.  Some help in reviewing
 312 it would be greatly appreciated.
 313
 314 These and many other things are why the calls with Tim and Hugh are a
 315 good idea.  The amazing thing is that they're taking us seriously, and
 316 we can discuss things like those above with them.
 317
 318 Other nice things we learned (more on this below) is that Epic Games
 319 and RaptorCS are collaborating to get POWER9 supported in Unreal Engine.
 320 And that the idea has been very tentatively considered to use our design
 321 for the "boot management" processor, running
 322 [OpenBMC](https://github.com/openbmc/openbmc).  These are early days,
 323 it's just ideas, ok!  Aside from anything, we actually have to get a chip
 324 done, first.
 325
 326 # OpenPower Virtual Coffee Meetings
 327
 328 The "Virtual Coffee Meetings", announced
 329 [here](https://openpowerfoundation.org/openpower-virtual-coffee-calls/)
 330 are literally open to anyone interested in OpenPOWER (if you're strictly
 331 Libre there's a dial-in method).  These calls are not recorded, it's
 332 just an informal conversation.
 333
 334 What's a really nice surprise is finding
 335 out that Paul Mackerras, whom I used to work with 20 years ago, is *also*
 336 working on OpenPOWER, specifically
 337 [microwatt](https://github.com/antonblanchard/microwatt), being managed
 338 by Anton Blanchard.
 339
 340 A brief discussion led to learning that Paul is looking at adding TLB
 341 (Virtual Memory) support to microwatt, specifically the RADIX TLB.
 342 I therefore pointed him at the same resource
 343 [(power-gem5)](https://github.com/power-gem5/gem5/tree/gem5-experimental)
 344 that Hugh had kindly pointed me at, the week before, and did a
 345 [late night write-up](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005445.html)
 346
 347 My feeling is that these weekly round-table meetings are going to be
 348 really important for everyone involved in OpenPOWER.  It's a community:
 349 we help each other.
 350
 351 # Sponsorship by RaptorCS with a TALOS II Workstation
 352
 353 With many thanks to Timothy from
 354 [RaptorCS](https://raptorcs.com), we've a new shiny
 355 online server that needs
 356 [setting up](http://bugs.libre-riscv.org/show_bug.cgi?id=265).
 357 This machine is not just a "nice-to-have", it's actually essential for
 358 us to be able to verify against.  As you can see in the bugreport, the idea
 359 is to bootstrap our way from running IEEE754 FP on a *POWER* system
 360 (using typically gnu libm), verifying Jacob's algorithmic FP library
 361 particularly and specifically for its rounding modes and exception modes.
 362
 363 Once that is done, then apart from having a general-purpose library that
 364 is compliant with POWER IEEE754 which *anyone else can use*, we can use
 365 that to run unit tests against our[
 366 hardware IEEE754 FP library](https://git.libre-riscv.org/?p=ieee754fpu.git;a=summary) -
 367 again, a resource that anyone may use in any arbitrary project - verifying
 368 that it is also correct.  This stepping-stone "bootstrap" method we are
 369 deploying all over the place, however to do so we need access to resources
 370 that have correctly-compliant implementations in the first place.  Thus,
 371 the critical importance of access to a TALOS II POWER9 workstation.
 372
 373 # Epic Megagrants
 374
 375 Several months back I got word of the existence of Epic Games' "Megagrants".
 376 In December 2019 they announced that so far they've given
 377 [USD $13 million](https://www.unrealengine.com/en-US/blog/epic-megagrants-reaches-13-million-milestone-in-2019)
 378 to 200 recipients, so far: one of them, the Blender Foundation, was
 379 [USD $1.2 million](https://www.blender.org/press/epic-games-supports-blender-foundation-with-1-2-million-epic-megagrant/)!
 380 This is an amazing and humbling show of support for the 3D Community,
 381 world-wide.
 382
 383 It's not just "games", or products specifically using the Unreal Engine:
 384 they're happy to look at anything that "enhances Libre / Open source"
 385 capabilities for the 3D Graphics Community.
 386
 387 A full hybrid 3D-capable CPU-GPU-VPU which is fully-documented not just in
 388 its capabilities, that [documentation](http://libre-riscv.org) and
 389 [full source code](http://git.libre-riscv.org) kinda extends
 390 right the way through the *entire development process* down to the bedrock
 391 of the actual silicon - not just the firmware, bootloader and BIOS,
 392 *everything* - in my mind it kinda qualifies in way that can, in some
 393 delightful way, be characterised delicately as "complete overkill".
 394
 395 Interestingly, guys, if you're reading this: Tim, the CEO of RaptorCS
 396 informs us that you're working closely with his team to get the Unreal
 397 Engine up and running on the POWER architecture?  Wouldn't that be highly
 398 amusing, for us to be able to run the Unreal Engine on the Libre-SOC,
 399 given that it's going to be POWER compatible hardware, as a test,
 400 first initially in FPGA and then in 18-24 months, on actual silicon, eh?
 401
 402 So, as I mentioned
 403 [on the list](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005262.html)
 404 (reiterating what I put in the original application), we're happy with
 405 USD $25,000, we're happy with USD $10 million.  It's really up to you guys,
 406 at Epic Games, as to what level you'd like to see us get to, and how fast.
 407
 408 USD $600,000 for example we can instead of paying USD $1million to a proprietary
 409 company to license a DDR3 PHY for a limited one-time use and only a 32-bit
 410 wide interface, we can contract SymbioticEDA to *design* a DDR3 PHY for us,
 411 which both we *and the rest of the worldwide Silicon Community can use
 412 without limitation* because we will ask SymbioticEDA to make the design
 413 (and layout) libre-licensed, for anyone to use.
 414
 415 USD 250,000 pays for the mask charges that will allow us to do the 40nm
 416 quad-core ASIC that we have on the roadmap for the second chip. USD
 417 $1m pays for 28nm masks (and so on, in an exponential ramp-up).  No, we
 418 don't want to do that straight away: yes we do want to go through a first
 419 proving test ASIC in 180nm, which, thanks to NLNet, is already funded.
 420 This is just good sane sensible use of funds.
 421
 422 Even USD $25,000 helps us to cover things such as administration of the
 423 website (which is taking up a *lot* of time) and little things that we
 424 didn't quite foresee when putting in the NLNet Grant Applications.
 425
 426 Lastly, one of the conditions as I understood it from the Megagrants
 427 process is that the funds are paid in "stages".  This is exactly
 428 what NLNet does for (and with) us, right now.  If you wanted to save
 429 administrative costs, there may be some benefit to having a conversation
 430 with the [30-year-old](https://nlnet.nl/foundation/history/)
 431 NLNet Charitable Foundation.  Something to think about?
 432
 433 # NLNet Milestone tasks
 434
 435 Part of applying for NLNet's Grants is a requirement to create a list
 436 of tasks, each of which is assigned a budget.  On 100% completion of the task,
 437 donations can be sent out.  With *six* new proposals accepted, each of which
 438 required between five (minimum) and *ninteen* separate and distinct tasks,
 439 a call with Michiel and Joost turned into an unexpected three hour online
 440 marathon, scrambling to write almost fifty bugreports as part of the Schedule
 441 to be attached to each Memorandum of Understanding.  The mailing list
 442 got a [leeetle bit busy](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005003.html)
 443 right around here.
 444
 445 Which emphasised for us the important need to subdivide the mailing list into
 446 separate lists (below).
 447
 448 # Georgia Tech CREATE-X
 449
 450 TODO
 451
 452 # LOAD/STORE Buffer and 6600 design documentation
 453
 454 A critical part of this project is not just to create a chip, it's to
 455 *document* the chip design, the decisions along the way, for both
 456 educational, research, and ongoing maintenance purposes.  With an
 457 augmented CDC 6600 design being chosen as the fundamental basis,
 458 [documenting that](https://libre-riscv.org/3d_gpu/architecture/6600scoreboard/)
 459 as well as the key differences is particularly important.  At the very least,
 460 the extremely simple and highly effective hardware but timing-critical
 461 design aspects of the circular loops in the 6600 were recognised by James
 462 Thornton (the co-designer of the 6600) as being paradoxically challenging
 463 to understand why so few gates could be so effective (being as they were,
 464 literally the world's first ever out-of-order superscalar architecture).
 465 Consequently, documenting it just to be able to *develop* it is extremely
 466 important.
 467
 468 We're getting to the point where we need to connect the LOAD/STORE Computation
 469 Units up to an actual memory architecture.  We've chosen
 470 [minerva](https://github.com/lambdaconcept/minerva/blob/master/minerva/units/loadstore.py)
 471 as the basis because it is written in nmigen, works, and, crucially, uses
 472 wishbone (which we decided to use as the main Bus Backbone a few months ago).
 473
 474 However, unlike minerva, which is a single-issue 32-bit embedded chip,
 475 where it's perfectly ok to have one single LD/ST operation per clock,
 476 and not only that but to have that operation take a few clock cycles,
 477 to get anything like the level of performance needed of a GPU, we need
 478 at least four 64-bit LOADs or STOREs *every clock cycle*.
 479
 480 For a first ASIC from a team that's never done a chip before, this is,
 481 officially, "Bonkers Territory".  Where minerva is doing 32-bit-wide
 482 Buses (and does not support 64-bit LD/ST at all), we need internal
 483 data buses of a minimum whopping **2000** wires wide.
 484
 485 Let that sink in for a moment.
 486
 487 The reason why the internal buses need to be 2000 wires wide comes down
 488 to the fact that we need, realistically, 6 to eight LOAD/STORE Computation
 489 Units.  4 of them will be operational, 2 to 4 of them will be waiting
 490 with pending instructions from the multi-issue Vectorisation Engine.
 491
 492 We chose to use a system which expands the first 4 bits of the address,
 493 plus the operation width (1,2,4,8 bytes) into a "bitmap" - a byte-mask -
 494 that corresponds directly with the 16 byte "cache line" byte enable
 495 columns, in the L1 Cache.  These bitmaps can then be "merged" such
 496 that requests that go to the same cache line can be served *in the
 497 same clock cycle* to multiple LOAD/STORE Computation Units.  This
 498 being absolutely critical for effective Vector Processing.
 499
 500 Additionally, in order to deal with misaligned memory requests, each of those
 501 needs to put out *two* such 16-byte-wide requests (see where this is going?)
 502 out to the L1 Cache.
 503 So, we now have eight times two times 128 bits which is a staggering
 504 2048 wires *just for the data*.  There do exist ways to get that down
 505 (potentially to half), and there do exist ways to get that cut in half
 506 again, however doing so would miss opportunities for merging of requests
 507 into cache lines.
 508
 509 At that point, thanks to Mitch Alsup's input (Mitch is the designer of
 510 the Motorola 68000, Motorola 88120, key architecture on AMD's Opteron
 511 Series, the AMD K9, AMDGPU and Samsung's latest GPU), we learned that
 512 L1 cache design critically depends on what type of SRAM you have.  We
 513 initially, naively, wanted dual-ported L1 SRAM and that's when Staf
 514 and Mitch taught us that this results in half-duty rate.  Only
 515 1-Read **or** 1-Write SRAM Cells give you fast enough (single-cycle)
 516 data rates to be useable for L1 Caches.
 517
 518 Part of the conversation has wandered into
 519 [why we chose dynamic pipelines](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005459.html)
 520 as well as receiving that
 521 [important advice](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005354.html)
 522 from both Mitch Alsup and Staf Verhaegen.
 523
 524 (Staf is also [sponsored by NLNet](https://nlnet.nl/project/Chips4Makers/)
 525 to create Libre-licensed Cell Libraries, busting through one of the -
 526 many - layers of NDAs and reducing NREs and unnecessary and artificial
 527 barriers for ASIC development: I helped him put in the submission, and
 528 he was really happy to do the Cell Libraries that we will be using for
 529 LibreSOC's 180nm test tape-out in October 2020.)
 530
 531 # Public-Inbox and Domain Migration
 532
 533 As mentioned before, one of the important aspects of this project is
 534 the documentation and archiving.  It also turns out that when working
 535 over an extremely unreliable or ultra-expensive mobile broadband link,
 536 having *local* (offline) access to every available development resource
 537 is critically important.
 538
 539 Hence why we are going to the trouble of installing public-inbox, due
 540 to its ability to not only have a mailing list entirely stored in a
 541 git repository, the "web service" which provides access to that git-backed
 542 archive can be not only mirrored elsewhere, it can be *run locally on
 543 your own local machine* even when offline.  This in combination
 544 with the right mailer setup can store-and-forward any replies to the
 545 (offline-copied) messages, such that they can be sent when internet
 546 connectivity is restored, yet remain a productive collaborative developer.
 547
 548 Now you know why we absolutely do not accept "slack", or other proprietary
 549 "online oh-so-convenient" service.  Not only is it highly inappropriate for
 550 Libre Projects, not only do we become critically dependent on the Corporation
 551 running the service (yes, github has been entirely offline, several times),
 552 if we have remote developers (such as myself, working from Scotland last
 553 month with sporadic access to a single Cell Tower) or developers in emerging
 554 markets where their only internet access is via a Library or Internet Cafe,
 555 we absolutely do not want to exclude or penalise such people, just because
 556 they have less resources.
 557
 558 Fascinatingly, Linus Torvals is *specifically*
 559 [on record](https://www.linuxjournal.com/content/line-length-limits)
 560 about making sure that "Linux development does not favour wealthy people".
 561
 562 We are also, as mentioned before, moving to a new domain name.  We'll take
 563 the opportunity to fix some of the issues with HTTPS (wrong certificate),
 564 and also do some
 565 [better mailing list names](http://bugs.libre-riscv.org/show_bug.cgi?id=184)
 566 at the same time.
 567
 568 TODO (Veera?) bit about what was actually done, how it links into mailman2.
 569
 570 # OpenPOWER HDL Mailing List opens up
 571
 572 It is early days, however it is fantastic to see responses from IBM with
 573 regards to requests for access to the POWER ISA Specification
 574 documents in
 575 [machine-readable form](http://lists.mailinglist.openpowerfoundation.org/pipermail/openpower-hdl-cores/2020-March/000007.html)
 576 I took Jeff at his word and explained, in some detail,
 577 [exactly why](http://lists.mailinglist.openpowerfoundation.org/pipermail/openpower-hdl-cores/2020-March/000008.html)
 578 machine readable versions of specifications are critically important.
 579
 580 The takeaway is: *we haven't got time to do manual transliteration of the spec*
 581 into "code".  We're expending considerable effort making sure that we
 582 "bounce" or "bootstrap" off of pre-existing resources, using computer
 583 programs to do so.
 584
 585 This "trick" is something that I learned over 20 years ago, when developing
 586 an SMB Client and Server in something like two weeks flat.  I wrote a
 587 parser which read the packet formats *from the IETF Draft Specification*,
 588 and outputted c-code.
 589
 590 This leaves me wondering, as I mention on the HDL list, if we can do the same
 591 thing with large sections of the POWER Spec.
 592
 593 # Build Servers
 594
 595 TODO
 596
 597 # Conclusion
 598
 599 I'm not going to mention anything about the current world climate: you've
 600 seen enough news reports.  I will say (more about this through the
 601 [EOMA68](https://www.crowdsupply.com/eoma68/micro-desktop) updates) that
 602 I anticipated something like what is happening right now, over ten years
 603 ago.  I wasn't precisely expecting what *has* happened, just the consequences:
 604 world-wide travel shut-down, and for people - the world over - to return to
 605 local community roots.
 606
 607 However what I definitely wasn't expecting was a United States President
 608 to be voted in who was eager and, frankly, stupid enough, to start *and
 609 escalate* a Trade war with China.  The impact on the U.S economy alone, and the
 610 reputation of the whole country, has been detrimental in the extreme.
 611
 612 This combination leaves us - world-wide - with the strong possibility that
 613 seemed so "preposterous" that I could in no way discuss it widely, let alone
 614 mention it on something like a Crowdsupply update, that thanks to the
 615 business model on which their entire product lifecycle is predicated,
 616 in combination with the extremely high NREs and development costs for
 617 ASICs (custom silicon costs USD $100 million, these days), several
 618 large Corporations producing proprietary binary-only drivers for
 619 hardware on which we critically rely for our internet-connected way
 620 of life **may soon go out of business**.
 621
 622 Right at a critical time where video conferencing is taking off massively,
 623 your proprietary hardware - your smartphone, your tablet, your laptop,
 624 everything you rely on for connectivity to the rest of the world, all of
 625 a sudden **you may not be able to get software updates** or, worse,
 626 your products could even be
 627 [remotely shut down](https://www.theguardian.com/technology/2016/apr/05/revolv-devices-bricked-google-nest-smart-home)
 628 **without warning**.
 629
 630 I do not want to hammer the point home too strongly but you should be
 631 getting, in no uncertain terms, exactly how strategically critical, in
 632 the current world climate, this project just became.  We need to get it
 633 accelerated, completed, and into production, in an expedited and responsible
 634 fashion.
 635