updates/023_2020mar26_decoder_emulator_started.mdwn

   1 So many things happened since the last update they actually need to go
   2 in the main update, even in summary form.  One big thing:
   3 [Raptor CS](https://www.raptorcs.com/)
   4 sponsored us with remote access to a Monster spec'd TALOS II Workstation!
   5
   6 # Introduction
   7
   8 Here's the summary (if it can be called a summary):
   9
  10 * [An announcement](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/004995.html)
  11   that we got the funding (which is open to anyone - hint, hint) resulted in
  12   at least three people reaching out to join the team.  "We don't need
  13   permission to own our own hardware" got a *really* positive reaction.
  14 * New team member, Jock (hello Jock!) starts on the coriolis2 layout,
  15   after Jean-Paul from LIP6.fr helped to dramatically improve how coriolis2
  16   can be used.  This resulted in a
  17   [tutorial](https://libre-riscv.org/HDL_workflow/coriolis2/) and a
  18   [huge bug report discussion](http://bugs.libre-riscv.org/show_bug.cgi?id=178)
  19 * Work has started on the
  20   [POWER ISA decoder](http://bugs.libre-riscv.org/show_bug.cgi?id=186),
  21   verified through
  22   [calling GNU AS](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/decoder/test/test_decoder_gas.py;h=9238d3878d964907c5569a3468d6895effb7dc02;hb=56d145e42ac75626423915af22d1493f1e7bb143) (yes, really!)
  23   and on a mini-simulator
  24   [calling QEMU](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/qemu.py;h=9eb103bae227e00a2a1d2ec4f43d7e39e4f44960;hb=56d145e42ac75626423915af22d1493f1e7bb143)
  25   for verification.
  26 * Jacob's simple-soft-float library growing
  27   [Power FP compatibility](http://bugs.libre-riscv.org/show_bug.cgi?id=258)
  28   and python bindings.
  29 * A Conference call with OpenPOWER Foundation Director, Hugh, and Timothy
  30   Pearson from RaptorCS has been established every two weeks.
  31 * The OpenPOWER Foundation is also running some open
  32   ["Virtual Coffee"](https://openpowerfoundation.org/openpower-virtual-coffee-calls/)
  33   weekly round-table calls for anyone interested, generally, in OpenPOWER
  34   development.
  35 * Tim sponsors our team with access to a Monster Talos II system with a
  36   whopping 128 GB RAM.  htop lists a staggering 72 cores (18 real
  37   with 4-way hyperthreading).
  38 * [Epic MegaGrants](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005262.html)
  39   reached out (hello!) to say they're still considering our
  40   request.
  41 * A marathon 3-hour session with [NLNet](http://nlnet.nl) resulted
  42   in the completion of the
  43   [Milestone tasks list(s)](http://bugs.libre-riscv.org/buglist.cgi?component=Milestones&list_id=567&resolution=---)
  44   and a
  45   [boat-load](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/thread.html)
  46   of bug reports to the list.
  47 * Immanuel Yehowshua is participating in the Georgia Tech
  48   [Create-X](https://create-x.gatech.edu/) Programme, and is establishing
  49   a Public Benefit Corporation in Atlanta, as an ethical vehicle for VC
  50   Funding.
  51 * A [Load/Store Buffer](http://bugs.libre-riscv.org/show_bug.cgi?id=216)
  52   design and
  53   [further discussion](http://bugs.libre-riscv.org/show_bug.cgi?id=257)
  54   including on
  55   [comp.arch](https://groups.google.com/forum/#!topic/comp.arch/cbGAlcCjiZE)
  56   inspired additional writeup
  57   on the
  58   [6600 scoreboard](https://libre-riscv.org/3d_gpu/architecture/6600scoreboard/)
  59   page.
  60 * [Public-Inbox](http://bugs.libre-riscv.org/show_bug.cgi?id=181) was
  61   installed successfully on the server, which is in the process of
  62   moving to a [new domain name](http://bugs.libre-riscv.org/show_bug.cgi?id=182)
  63   [Libre-SOC](http://libre-soc.org)
  64 * Build Servers have been set up with
  65   [automated testing](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005364.html)
  66   being established
  67
  68 Well dang, as you can see, suddenly it just went ballistic.  There's
  69 almost certainly things left off the list.  For such a small team there's
  70 a heck of a lot going on.  We have an awful lot to do, in a short amount
  71 of time: the 180nm tape-out is in October 2020 - only 7 months away.
  72
  73 With this update we're doing something slightly different: a request
  74 has gone out [to the other team members](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005428.html)
  75 to say a little bit about what each of them is doing.  This also helps me
  76 because these updates do take quite a bit of time to write.
  77
  78 # NLNet Funding announcement
  79
  80 An announcement went out
  81 [last year](https://lists.gnu.org/archive/html/libreplanet-discuss/2019-09/msg00170.html)
  82 that we'd applied for funding, and we got some great responses and
  83 feedback (such as "don't use patented AXI4").  The second time, we
  84 sent out a "we got it!" message and got some really nice private and
  85 public replies, as well as requests from people to join the team.
  86 More on that when it happens.
  87
  88 # Coriolis2 experimentation started
  89
  90 TODO by Jock http://bugs.libre-riscv.org/show_bug.cgi?id=217#c44
  91
  92 # POWER ISA decoder and Simulator
  93
  94 TODO
  95
  96 # simple-soft-float Library and POWER FP emulation
  97
  98 The
  99 [simple-soft-float](https://salsa.debian.org/Kazan-team/simple-soft-float)
 100 library is a floating-point library Jacob wrote with the intention
 101 of being a reference implementation of IEEE 754 for hardware testing
 102 purposes. It's specifically designed to be written to be easier to
 103 understand instead of having the code obscured in pursuit of speed:
 104
 105 * Being easier to understand helps prevent bugs where the code does not
 106   match the IEEE spec.
 107 * It uses the [algebraics](https://salsa.debian.org/Kazan-team/algebraics)
 108   library that Jacob wrote since that allows using numbers that behave
 109   like exact real numbers, making reasoning about the code simpler.
 110 * It is written in Rust rather than highly-macro-ified C, since that helps with
 111   readability since operations aren't obscured, as well as safety, since Rust
 112   proves at compile time that the code won't seg-fault unless you specifically
 113   opt-out of those guarantees by using `unsafe`.
 114
 115 It currently supports 16, 32, 64, 128-bit FP for RISC-V, along with
 116 having a `DynamicFloat` type which allows dynamically specifying all
 117 aspects of how a particular floating-point type behaves -- if one wanted,
 118 they could configure it as a 2048-bit floating-point type.
 119
 120 It also has Python bindings, thanks to the awesome
 121 [PyO3](https://pyo3.rs/) library for writing Python bindings in Rust.
 122
 123 We decided to write simple-soft-float instead
 124 of extending the industry-standard [Berkeley
 125 softfloat](http://www.jhauser.us/arithmetic/SoftFloat.html) library
 126 because of a range of issues, including not supporting Power FP, requiring
 127 recompilation to switch which ISA is being emulated, not supporting
 128 all the required operations, architectural issues such as depending on
 129 global variables, etc. We are still testing simple-soft-float against
 130 Berkeley softfloat where we can, however, since Berkeley softfloat is
 131 widely used and highly likely to be correct.
 132
 133 simple-soft-float is [gaining support for Power
 134 FP](http://bugs.libre-riscv.org/show_bug.cgi?id=258), which requires
 135 rewriting a lot of the status-flag handling code since Power supports a
 136 much larger set of floating-point status flags and exceptions than most
 137 other ISAs.
 138
 139 Thanks to RaptorCS for giving us remote access to a Power9 system,
 140 since that makes it much easier verifying that the test cases are correct
 141 (more on this below).
 142
 143 API Docs for stable releases of both
 144 [simple-soft-float](https://docs.rs/simple-soft-float) and
 145 [algebraics](https://docs.rs/algebraics) are available on docs.rs.
 146
 147 One of the really important things about these libraries: they're not
 148 specifically coded exclusively for Libre-SOC: like softfloat-3 itself
 149 (and also like the [IEEE754 FPU](https://git.libre-riscv.org/?p=ieee754fpu.git))
 150 they're intended for *general-purpose* use by other projects.  These are
 151 exactly the kinds of side-benefits for the wider Libre community that
 152 sponsorship, from individuals, Foundations (such as NLNet) and Companies
 153 (such as Purism and Raptor CS) brings.
 154
 155 # OpenPOWER Conference calls
 156
 157 TODO
 158
 159 # OpenPower Virtual Coffee Meetings
 160
 161 The "Virtual Coffee Meetings", announced
 162 [here](https://openpowerfoundation.org/openpower-virtual-coffee-calls/)
 163 are literally open to anyone interested in OpenPOWER (if you're strictly
 164 Libre there's a dial-in method).  These calls are not recorded, it's
 165 just an informal conversation.
 166
 167 What's a really nice surprise is finding
 168 out that Paul Mackerras, whom I used to work with 20 years ago, is *also*
 169 working on OpenPOWER, specifically
 170 [microwatt](https://github.com/antonblanchard/microwatt), being managed
 171 by Anton Blanchard.
 172
 173 A brief discussion led to learning that Paul is looking at adding TLB
 174 (Virtual Memory) support to microwatt, specifically the RADIX TLB.
 175 I therefore pointed him at the same resource
 176 [(power-gem5)](https://github.com/power-gem5/gem5/tree/gem5-experimental)
 177 that Hugh had kindly pointed me at, the week before, and did a
 178 [late night write-up](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005445.html)
 179
 180 My feeling is that these weekly round-table meetings are going to be
 181 really important for everyone involved in OpenPOWER.  It's a community:
 182 we help each other.
 183
 184 # Sponsorship by RaptorCS with a TALOS II Workstation
 185
 186 TODO http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005291.html
 187
 188 # Epic Megagrants
 189
 190 TODO
 191
 192 # NLNet Milestone tasks
 193
 194 Part of applying for NLNet's Grants is a requirement to create a list
 195 of tasks, each of which is assigned a budget.  On 100% completion of the task,
 196 donations can be sent out.  With *six* new proposals accepted, each of which
 197 required between five (minimum) and *ninteen* separate and distinct tasks,
 198 a call with Michiel and Joost turned into an unexpected three hour online
 199 marathon, scrambling to write almost fifty bugreports as part of the Schedule
 200 to be attached to each Memorandum of Understanding.  The mailing list
 201 got a [leeetle bit busy](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005003.html)
 202 right around here.
 203
 204 Which emphasised for us the important need to subdivide the mailing list into
 205 separate lists (below).
 206
 207 # Georgia Tech CREATE-X
 208
 209 TODO
 210
 211 # LOAD/STORE Buffer and 6600 design documentation
 212
 213 A critical part of this project is not just to create a chip, it's to
 214 *document* the chip design, the decisions along the way, for both
 215 educational, research, and ongoing maintenance purposes.  With an
 216 augmented CDC 6600 design being chosen as the fundamental basis,
 217 [documenting that](https://libre-riscv.org/3d_gpu/architecture/6600scoreboard/)
 218 as well as the key differences is particularly important.  At the very least,
 219 the extremely simple and highly effective hardware but timing-critical
 220 design aspects of the circular loops in the 6600 were recognised by James
 221 Thornton (the co-designer of the 6600) as being paradoxically challenging
 222 to understand why so few gates could be so effective.  Consequently,
 223 documenting it just to be able to *develop* it is extremely important.
 224
 225 We're getting to the point where we need to connect the LOAD/STORE Computation
 226 Units up to an actual memory architecture.  We've chosen
 227 [minerva](https://github.com/lambdaconcept/minerva/blob/master/minerva/units/loadstore.py)
 228 as the basis because it is written in nmigen, works, and, crucially, uses
 229 wishbone (which we decided to use as the main Bus Backbone a few months ago).
 230
 231 However, unlike minerva, which is a single-issue 32-bit embedded chip,
 232 where it's perfectly ok to have one single LD/ST operation per clock,
 233 and not only that but to have that operation take a few clock cycles,
 234 to get anything like the level of performance needed of a GPU, we need
 235 at least four 64-bit LOADs or STOREs *every clock cycle*.
 236
 237 For a first ASIC from a team that's never done a chip before, this is,
 238 officially, "Bonkers Territory".  Where minerva is doing 32-bit-wide
 239 Buses (and does not support 64-bit LD/ST at all), we need internal
 240 data buses of a minimum whopping **2000** wires wide.
 241
 242 Let that sink in for a moment.
 243
 244 The reason why the internal buses need to be 2000 wires wide comes down
 245 to the fact that we need, realistically, 6 to eight LOAD/STORE Computation
 246 Units.  4 of them will be operational, 2 to 4 of them will be waiting
 247 with pending instructions from the multi-issue Vectorisation Engine.
 248
 249 We chose to use a system which expands the first 4 bits of the address,
 250 plus the operation width (1,2,4,8 bytes) into a "bitmap" - a byte-mask -
 251 that corresponds directly with the 16 byte "cache line" byte enable
 252 columns, in the L1 Cache.  These bitmaps can then be "merged" such
 253 that requests that go to the same cache line can be served *in the
 254 same clock cycle* to multiple LOAD/STORE Computation Units.  This
 255 being absolutely critical for effective Vector Processing.
 256
 257 Additionally, in order to deal with misaligned memory requests, each of those
 258 needs to put out *two* such 16-byte-wide requests (see where this is going?)
 259 out to the L1 Cache.
 260 So, we now have eight times two times 128 bits which is a staggering
 261 2048 wires *just for the data*.  There do exist ways to get that down
 262 (potentially to half), and there do exist ways to get that cut in half
 263 again, however doing so would miss opportunities for merging of requests
 264 into cache lines.
 265
 266 At that point, thanks to Mitch Alsup's input (Mitch is the designer of
 267 the Motorola 68000, Motorola 88120, key architecture on AMD's Opteron
 268 Series, the AMD K9, AMDGPU and Samsung's latest GPU), we learned that
 269 L1 cache design critically depends on what type of SRAM you have.  We
 270 initially, naively, wanted dual-ported L1 SRAM and that's when Staf
 271 and Mitch taught us that this results in half-duty rate.  Only
 272 1-Read **or** 1-Write SRAM Cells give you fast enough (single-cycle)
 273 data rates to be useable for L1 Caches.
 274
 275 Part of the conversation has wandered into
 276 [why we chose dynamic pipelines](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005459.html)
 277 as well as receiving that
 278 [important advice](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005354.html)
 279 from both Mitch Alsup and Staf Verhaegen.
 280
 281 (Staf is also [sponsored by NLNet](https://nlnet.nl/project/Chips4Makers/)
 282 to create Libre-licensed Cell Libraries, busting through one of the -
 283 many - layers of NDAs and reducing NREs for ASIC development: I helped him
 284 put in the submission, and he was really happy to do the Cell Libraries
 285 that we will be using for LibreSOC's 180nm test tape-out in October 2020.)
 286
 287 # Public-Inbox and Domain Migration
 288
 289 As mentioned before, one of the important aspects of this project is
 290 the documentation and archiving.  It also turns out that when working
 291 over an extremely unreliable or ultra-expensive mobile broadband link,
 292 having *local* (offline) access to every available development resource
 293 is critically important.
 294
 295 Hence why we are going to the trouble of installing public-inbox, due
 296 to its ability to not only have a mailing list entirely stored in a
 297 git repository, the "web service" which provides access to that git-backed
 298 archive can be not only mirrored elsewhere, it can be *run locally on
 299 your own offline machine*.  This in combination with the right mailer
 300 setup can store-and-forward any replies to the (offline-copied) messages,
 301
 302 Now you know why we absolutely do not accept "slack", or other proprietary
 303 "online oh-so-convenient" service.  Not only is it highly inappropriate for
 304 Libre Projects, not only do we become critically dependent on the Corporation
 305 running the service (yes, github has been entirely offline, several times),
 306 if we have remote developers (such as myself, working from Scotland last
 307 month with sporadic access to a single Cell Tower) or developers in emerging
 308 markets where their only internet access is via a Library or Internet Cafe,
 309 we absolutely do not want to exclude or penalise such people, just because
 310 they have less resources.
 311
 312 Fascinatingly, Linus Torvals is *specifically*
 313 [on record](https://www.linuxjournal.com/content/line-length-limits)
 314 about making sure that "Linux development does not favour wealthy people".
 315
 316
 317 TODO (Veera?) bit about what was actually done, how it links into mailman2.
 318
 319 # OpenPOWER HDL Mailing List opens up
 320
 321 It is early days, however it is fantastic to see responses from IBM with
 322 regards to requests for access to the POWER ISA Specification
 323 documents in
 324 [machine-readable form](http://lists.mailinglist.openpowerfoundation.org/pipermail/openpower-hdl-cores/2020-March/000007.html)
 325 I took Jeff at his word and explained, in some detail,
 326 [exactly why](http://lists.mailinglist.openpowerfoundation.org/pipermail/openpower-hdl-cores/2020-March/000008.html)
 327 machine readable versions of specifications are critically important.
 328
 329 The takeaway is: *we haven't got time to do manual transliteration of the spec*
 330 into "code".  We're expending considerable effort making sure that we
 331 "bounce" or "bootstrap" off of pre-existing resources, using computer
 332 programs to do so.
 333
 334 This "trick" is something that I learned over 20 years ago, when developing
 335 an SMB Client and Server in something like two weeks flat.  I wrote a
 336 parser which read the packet formats *from the IETF Draft Specification*,
 337 and outputted c-code.
 338
 339 This leaves me wondering, as I mention on the HDL list, if we can do the same
 340 thing with large sections of the POWER Spec.
 341
 342 # Build Servers
 343
 344 TODO
 345
 346 # Conclusion
 347
 348 TODO
 349
 350