So many things happened since the last update they actually need to go
in the main update, even in summary form.  One big thing:
[Raptor CS](https://www.raptorcs.com/)
sponsored us with remote access to a Monster spec'd TALOS II Workstation!

# Introduction

Here's the summary (if it can be called a summary):

* [An announcement](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/004995.html)
  that we got the funding (which is open to anyone - hint, hint) resulted in
  at least three people reaching out to join the team.  "We don't need
  permission to own our own hardware" got a *really* positive reaction.
* New team member, Jock (hello Jock!) starts on the coriolis2 layout,
  after Jean-Paul from LIP6.fr helped to dramatically improve how coriolis2
  can be used.  This resulted in a
  [tutorial](https://libre-riscv.org/HDL_workflow/coriolis2/) and a
  [huge bug report discussion](http://bugs.libre-riscv.org/show_bug.cgi?id=178)
* Work has started on the
  [POWER ISA decoder](http://bugs.libre-riscv.org/show_bug.cgi?id=186),
  verified through
  [calling GNU AS](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/decoder/test/test_decoder_gas.py;h=9238d3878d964907c5569a3468d6895effb7dc02;hb=56d145e42ac75626423915af22d1493f1e7bb143) (yes, really!)
  and on a mini-simulator
  [calling QEMU](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/qemu.py;h=9eb103bae227e00a2a1d2ec4f43d7e39e4f44960;hb=56d145e42ac75626423915af22d1493f1e7bb143)
  for verification.
* Jacob's simple-soft-float library growing
  [Power FP compatibility](http://bugs.libre-riscv.org/show_bug.cgi?id=258)
  and python bindings. 
* A Conference call with OpenPOWER Foundation Director, Hugh, and Timothy
  Pearson from RaptorCS has been established every two weeks.
* The OpenPOWER Foundation is also running some open
  ["Virtual Coffee"](https://openpowerfoundation.org/openpower-virtual-coffee-calls/)
  weekly round-table calls for anyone interested, generally, in OpenPOWER
  development.
* Tim sponsors our team with access to a Monster Talos II system with a
  whopping 128 GB RAM.  htop lists a staggering 72 cores (18 real
  with 4-way hyperthreading).
* [Epic MegaGrants](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005262.html)
  reached out (hello!) to say they're still considering our
  request.
* A marathon 3-hour session with [NLNet](http://nlnet.nl) resulted
  in the completion of the
  [Milestone tasks list(s)](http://bugs.libre-riscv.org/buglist.cgi?component=Milestones&list_id=567&resolution=---)
  and a
  [boat-load](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/thread.html)
  of bug reports to the list.
* Immanuel Yehowshua is participating in the Georgia Tech
  [Create-X](https://create-x.gatech.edu/) Programme, and is establishing
  a Public Benefit Corporation in Atlanta, as an ethical vehicle for VC
  Funding.
* A [Load/Store Buffer](http://bugs.libre-riscv.org/show_bug.cgi?id=216)
  design and
  [further discussion](http://bugs.libre-riscv.org/show_bug.cgi?id=257)
  including on
  [comp.arch](https://groups.google.com/forum/#!topic/comp.arch/cbGAlcCjiZE)
  inspired additional writeup
  on the
  [6600 scoreboard](https://libre-riscv.org/3d_gpu/architecture/6600scoreboard/)
  page.
* [Public-Inbox](http://bugs.libre-riscv.org/show_bug.cgi?id=181) was
  installed successfully on the server, which is in the process of
  moving to a [new domain name](http://bugs.libre-riscv.org/show_bug.cgi?id=182)
  [Libre-SOC](http://libre-soc.org)
* Build Servers have been set up with
  [automated testing](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005364.html)
  being established

Well dang, as you can see, suddenly it just went ballistic.  There's
almost certainly things left off the list.  For such a small team there's
a heck of a lot going on.  We have an awful lot to do, in a short amount
of time: the 180nm tape-out is in October 2020 - only 7 months away.

With this update we're doing something slightly different: a request
has gone out [to the other team members](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005428.html)
to say a little bit about what each of them is doing.  This also helps me
because these updates do take quite a bit of time to write.

# NLNet Funding announcement

An announcement went out
[last year](https://lists.gnu.org/archive/html/libreplanet-discuss/2019-09/msg00170.html)
that we'd applied for funding, and we got some great responses and
feedback (such as "don't use patented AXI4").  The second time, we
sent out a "we got it!" message and got some really nice private and
public replies, as well as requests from people to join the team.
More on that when it happens.

# Coriolis2 experimentation started

TODO by Jock http://bugs.libre-riscv.org/show_bug.cgi?id=217#c44

# POWER ISA decoder and Simulator

TODO

# simple-soft-float Library and POWER FP emulation

The
[simple-soft-float](https://salsa.debian.org/Kazan-team/simple-soft-float)
library is a floating-point library Jacob wrote with the intention
of being a reference implementation of IEEE 754 for hardware testing
purposes. It's specifically designed to be written to be easier to
understand instead of having the code obscured in pursuit of speed:

* Being easier to understand helps prevent bugs where the code does not
  match the IEEE spec.
* It uses the [algebraics](https://salsa.debian.org/Kazan-team/algebraics)
  library that Jacob wrote since that allows using numbers that behave
  like exact real numbers, making reasoning about the code simpler.
* It is written in Rust rather than highly-macro-ified C, since that helps with
  readability since operations aren't obscured, as well as safety, since Rust
  proves at compile time that the code won't seg-fault unless you specifically
  opt-out of those guarantees by using `unsafe`.

It currently supports 16, 32, 64, 128-bit FP for RISC-V, along with
having a `DynamicFloat` type which allows dynamically specifying all
aspects of how a particular floating-point type behaves -- if one wanted,
they could configure it as a 2048-bit floating-point type.

It also has Python bindings, thanks to the awesome
[PyO3](https://pyo3.rs/) library for writing Python bindings in Rust.

We decided to write simple-soft-float instead
of extending the industry-standard [Berkeley
softfloat](http://www.jhauser.us/arithmetic/SoftFloat.html) library
because of a range of issues, including not supporting Power FP, requiring
recompilation to switch which ISA is being emulated, not supporting
all the required operations, architectural issues such as depending on
global variables, etc. We are still testing simple-soft-float against
Berkeley softfloat where we can, however, since Berkeley softfloat is
widely used and highly likely to be correct.

simple-soft-float is [gaining support for Power
FP](http://bugs.libre-riscv.org/show_bug.cgi?id=258), which requires
rewriting a lot of the status-flag handling code since Power supports a
much larger set of floating-point status flags and exceptions than most
other ISAs.

Thanks to RaptorCS for giving us remote access to a Power9 system,
since that makes it much easier verifying that the test cases are correct
(more on this below).

API Docs for stable releases of both
[simple-soft-float](https://docs.rs/simple-soft-float) and
[algebraics](https://docs.rs/algebraics) are available on docs.rs.

One of the really important things about these libraries: they're not
specifically coded exclusively for Libre-SOC: like softfloat-3 itself
(and also like the [IEEE754 FPU](https://git.libre-riscv.org/?p=ieee754fpu.git))
they're intended for *general-purpose* use by other projects.  These are
exactly the kinds of side-benefits for the wider Libre community that
sponsorship, from individuals, Foundations (such as NLNet) and Companies
(such as Purism and Raptor CS) brings.

# OpenPOWER Conference calls

TODO

# OpenPower Virtual Coffee Meetings

The "Virtual Coffee Meetings", announced
[here](https://openpowerfoundation.org/openpower-virtual-coffee-calls/)
are literally open to anyone interested in OpenPOWER (if you're strictly
Libre there's a dial-in method).  These calls are not recorded, it's
just an informal conversation.

What's a really nice surprise is finding
out that Paul Mackerras, whom I used to work with 20 years ago, is *also*
working on OpenPOWER, specifically
[microwatt](https://github.com/antonblanchard/microwatt), being managed
by Anton Blanchard.

A brief discussion led to learning that Paul is looking at adding TLB
(Virtual Memory) support to microwatt, specifically the RADIX TLB.
I therefore pointed him at the same resource
[(power-gem5)](https://github.com/power-gem5/gem5/tree/gem5-experimental)
that Hugh had kindly pointed me at, the week before, and did a
[late night write-up](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005445.html)

My feeling is that these weekly round-table meetings are going to be
really important for everyone involved in OpenPOWER.  It's a community:
we help each other.

# Sponsorship by RaptorCS with a TALOS II Workstation

TODO http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005291.html

# Epic Megagrants

TODO

# NLNet Milestone tasks

TODO

# Georgia Tech CREATE-X

TODO

# LOAD/STORE Buffer and 6600 design documentation

A critical part of this project is not just to create a chip, it's to
*document* the chip design, the decisions along the way, for both
educational, research, and ongoing maintenance purposes.  With an
augmented CDC 6600 design being chosen as the fundamental basis,
[documenting that](https://libre-riscv.org/3d_gpu/architecture/6600scoreboard/)
as well as the key differences is particularly important.  At the very least,
the extremely simple and highly effective hardware but timing-critical
design aspects of the circular loops in the 6600 were recognised by James
Thornton (the co-designer of the 6600) as being paradoxically challenging
to understand why so few gates could be so effective.  Consequently,
documenting it just to be able to *develop* it is extremely important.

We're getting to the point where we need to connect the LOAD/STORE Computation
Units up to an actual memory architecture.  We've chosen
[minerva](https://github.com/lambdaconcept/minerva/blob/master/minerva/units/loadstore.py)
as the basis because it is written in nmigen, works, and, crucially, uses
wishbone (which we decided to use as the main Bus Backbone a few months ago).

However, unlike minerva, which is a single-issue 32-bit embedded chip,
where it's perfectly ok to have one single LD/ST operation per clock,
and not only that but to have that operation take a few clock cycles,
to get anything like the level of performance needed of a GPU, we need
at least four 64-bit LOADs or STOREs *every clock cycle*.

For a first ASIC from a team that's never done a chip before, this is,
officially, "Bonkers Territory".  Where minerva is doing 32-bit-wide
Buses (and does not support 64-bit LD/ST at all), we need internal
data buses of a minimum whopping **2000** wires wide.

Let that sink in for a moment.

The reason why the internal buses need to be 2000 wires wide comes down
to the fact that we need, realistically, 6 to eight LOAD/STORE Computation
Units.  4 of them will be operational, 2 to 4 of them will be waiting
with pending instructions from the multi-issue Vectorisation Engine.

We chose to use a system which expands the first 4 bits of the address,
plus the operation width (1,2,4,8 bytes) into a "bitmap" - a byte-mask -
that corresponds directly with the 16 byte "cache line" byte enable
columns, in the L1 Cache.  These bitmaps can then be "merged" such
that requests that go to the same cache line can be served *in the
same clock cycle* to multiple LOAD/STORE Computation Units.  This
being absolutely critical for effective Vector Processing.

Additionally, in order to deal with misaligned memory requests, each of those
needs to put out *two* such 16-byte-wide requests (see where this is going?)
out to the L1 Cache.
So, we now have eight times two times 128 bits which is a staggering
2048 wires *just for the data*.  There do exist ways to get that down
(potentially to half), and there do exist ways to get that cut in half
again, however doing so would miss opportunities for merging of requests
into cache lines.

At that point, thanks to Mitch Alsup's input (Mitch is the designer of
the Motorola 68000, Motorola 88120, key architecture on AMD's Opteron
Series, the AMD K9, AMDGPU and Samsung's latest GPU), we learned that
L1 cache design critically depends on what type of SRAM you have.  We
initially, naively, wanted dual-ported L1 SRAM and that's when Staf
and Mitch taught us that this results in half-duty rate.  Only
1-Read **or** 1-Write SRAM Cells give you fast enough (single-cycle)
data rates to be useable for L1 Caches.

Part of the conversation has wandered into
[why we chose dynamic pipelines](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005459.html)
as well as receiving that
[important advice](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005354.html)
from both Mitch Alsup and Staf Verhaegen.

(Staf is also [sponsored by NLNet](https://nlnet.nl/project/Chips4Makers/)
to create Libre-licensed Cell Libraries, busting through one of the -
many - layers of NDAs and reducing NREs for ASIC development: I helped him
put in the submission, and he was really happy to do the Cell Libraries
that we will be using for LibreSOC's 180nm test tape-out in October 2020.)

# Public-Inbox and Domain Migration

TODO

# Build Servers

TODO

# Conclusion

TODO