add LD/ST buffer section
[crowdsupply.git] / updates / 023_2020mar26_decoder_emulator_started.mdwn
1 So many things happened since the last update they actually need to go
2 in the main update, even in summary form. One big thing: Raptor Engineering
3 sponsored us with remote access to a TALOS II Workstation!
4
5 # Introduction
6
7 Here's the summary (if it can be called a summary):
8
9 * [An announcement](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/004995.html)
10 that we got the funding (which is open to anyone - hint, hint) resulted in
11 at least three people reaching out to join the team. "We don't need
12 permission to own our own hardware" got a *really* positive reaction.
13 * New team member, Jock (hello Jock!) starts on the coriolis2 layout,
14 after Jean-Paul from LIP6.fr helped to dramatically improve how coriolis2
15 can be used. This resulted in a
16 [tutorial](https://libre-riscv.org/HDL_workflow/coriolis2/) and a
17 [huge bug report discussion](http://bugs.libre-riscv.org/show_bug.cgi?id=178)
18 * Work has started on the
19 [POWER ISA decoder](http://bugs.libre-riscv.org/show_bug.cgi?id=186),
20 verified through
21 [calling GNU AS](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/decoder/test/test_decoder_gas.py;h=9238d3878d964907c5569a3468d6895effb7dc02;hb=56d145e42ac75626423915af22d1493f1e7bb143) (yes, really!)
22 and on a mini-simulator
23 [calling QEMU](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/qemu.py;h=9eb103bae227e00a2a1d2ec4f43d7e39e4f44960;hb=56d145e42ac75626423915af22d1493f1e7bb143)
24 for verification.
25 * Jacob's simple-soft-float library growing
26 [Power FP compatibility](http://bugs.libre-riscv.org/show_bug.cgi?id=258)
27 and python bindings.
28 * A Conference call with OpenPOWER Foundation Director, Hugh, and Timothy
29 Pearson from RaptorCS has been established every two weeks.
30 * The OpenPOWER Foundation is also running some open
31 ["Virtual Coffee"](https://openpowerfoundation.org/openpower-virtual-coffee-calls/)
32 weekly round-table calls for anyone interested, generally, in OpenPOWER
33 development.
34 * Tim sponsors our team with access to a Monster Talos II system with a
35 whopping 128 GB RAM. htop lists a staggering 72 cores (18 real
36 with 4-way hyperthreading).
37 * [Epic MegaGrants](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005262.html)
38 reached out (hello!) to say they're still considering our
39 request.
40 * A marathon 3-hour session with [NLNet](http://nlnet.nl) resulted
41 in the completion of the
42 [Milestone tasks list(s)](http://bugs.libre-riscv.org/buglist.cgi?component=Milestones&list_id=567&resolution=---)
43 and a
44 [boat-load](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/thread.html)
45 of bug reports to the list.
46 * Immanuel Yehowshua is participating in the Georgia Tech
47 [Create-X](https://create-x.gatech.edu/) Programme, and is establishing
48 a Public Benefit Corporation in Atlanta, as an ethical vehicle for VC
49 Funding.
50 * A [Load/Store Buffer](http://bugs.libre-riscv.org/show_bug.cgi?id=216)
51 design and
52 [further discussion](http://bugs.libre-riscv.org/show_bug.cgi?id=257)
53 including on
54 [comp.arch](https://groups.google.com/forum/#!topic/comp.arch/cbGAlcCjiZE)
55 inspired additional writeup
56 on the
57 [6600 scoreboard](https://libre-riscv.org/3d_gpu/architecture/6600scoreboard/)
58 page.
59 * [Public-Inbox](http://bugs.libre-riscv.org/show_bug.cgi?id=181) was
60 installed successfully on the server, which is in the process of
61 moving to a [new domain name](http://bugs.libre-riscv.org/show_bug.cgi?id=182)
62 [Libre-SOC](http://libre-soc.org)
63 * Build Servers have been set up with
64 [automated testing](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005364.html)
65 being established
66
67 Well dang, as you can see, suddenly it just went ballistic. There's
68 almost certainly things left off the list. For such a small team there's
69 a heck of a lot going on. We have an awful lot to do, in a short amount
70 of time: the 180nm tape-out is in October 2020 - only 7 months away.
71
72 With this update we're doing something slightly different: a request
73 has gone out [to the other team members](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005428.html)
74 to say a little bit about what each of them is doing. This also helps me
75 because these updates do take quite a bit of time to write.
76
77 # NLNet Funding announcement
78
79 An announcement went out
80 [last year](https://lists.gnu.org/archive/html/libreplanet-discuss/2019-09/msg00170.html)
81 that we'd applied for funding, and we got some great responses and
82 feedback (such as "don't use patented AXI4"). The second time, we
83 sent out a "we got it!" message and got some really nice private and
84 public replies, as well as requests from people to join the team.
85 More on that when it happens.
86
87 # Coriolis2 experimentation started
88
89 TODO by Jock http://bugs.libre-riscv.org/show_bug.cgi?id=217#c44
90
91 # POWER ISA decoder and Simulator
92
93 TODO
94
95 # simple-soft-float Library and POWER FP emulation
96
97 The
98 [simple-soft-float](https://salsa.debian.org/Kazan-team/simple-soft-float)
99 library is a floating-point library Jacob wrote with the intention
100 of being a reference implementation of IEEE 754 for hardware testing
101 purposes. It's specifically designed to be written to be easier to
102 understand instead of having the code obscured in pursuit of speed:
103
104 * Being easier to understand helps prevent bugs where the code does not
105 match the IEEE spec.
106 * It uses the [algebraics](https://salsa.debian.org/Kazan-team/algebraics)
107 library that Jacob wrote since that allows using numbers that behave
108 like exact real numbers, making reasoning about the code simpler.
109 * It is written in Rust rather than highly-macro-ified C, since that helps with
110 readability since operations aren't obscured, as well as safety, since Rust
111 proves at compile time that the code won't seg-fault unless you specifically
112 opt-out of those guarantees by using `unsafe`.
113
114 It currently supports 16, 32, 64, 128-bit FP for RISC-V, along with
115 having a `DynamicFloat` type which allows dynamically specifying all
116 aspects of how a particular floating-point type behaves -- if one wanted,
117 they could configure it as a 2048-bit floating-point type.
118
119 It also has Python bindings, thanks to the awesome
120 [PyO3](https://pyo3.rs/) library for writing Python bindings in Rust.
121
122 We decided to write simple-soft-float instead
123 of extending the industry-standard [Berkeley
124 softfloat](http://www.jhauser.us/arithmetic/SoftFloat.html) library
125 because of a range of issues, including not supporting Power FP, requiring
126 recompilation to switch which ISA is being emulated, not supporting
127 all the required operations, architectural issues such as depending on
128 global variables, etc. We are still testing simple-soft-float against
129 Berkeley softfloat where we can, however, since Berkeley softfloat is
130 widely used and highly likely to be correct.
131
132 simple-soft-float is [gaining support for Power
133 FP](http://bugs.libre-riscv.org/show_bug.cgi?id=258), which requires
134 rewriting a lot of the status-flag handling code since Power supports a
135 much larger set of floating-point status flags and exceptions than most
136 other ISAs.
137
138 Thanks to RaptorCS for giving us remote access to a Power9 system,
139 since that makes it much easier verifying that the test cases are correct.
140
141 API Docs for stable releases of both
142 [simple-soft-float](https://docs.rs/simple-soft-float) and
143 [algebraics](https://docs.rs/algebraics) are available on docs.rs.
144
145 One of the really important things about these libraries: they're not
146 specifically coded exclusively for Libre-SOC: like softfloat-3 itself
147 (and also like the [IEEE754 FPU](https://git.libre-riscv.org/?p=ieee754fpu.git))
148 they're intended for *general-purpose* use by other projects. These are
149 exactly the kinds of side-benefits for the wider Libre community that
150 sponsorship, from individuals, Foundations (such as NLNet) and Companies
151 (such as Purism and Raptor Engineering) brings.
152
153 # OpenPOWER Conference calls
154
155 TODO
156
157 # OpenPower Virtual Coffee Meetings
158
159 The "Virtual Coffee Meetings", announced
160 [here](https://openpowerfoundation.org/openpower-virtual-coffee-calls/)
161 are literally open to anyone interested in OpenPOWER (if you're strictly
162 Libre there's a dial-in method). These calls are not recorded, it's
163 just an informal conversation.
164
165 What's a really nice surprise is finding
166 out that Paul Mackerras, whom I used to work with 20 years ago, is *also*
167 working on OpenPOWER, specifically
168 [microwatt](https://github.com/antonblanchard/microwatt), being managed
169 by Anton Blanchard.
170
171 A brief discussion led to learning that Paul is looking at adding TLB
172 (Virtual Memory) support to microwatt, specifically the RADIX TLB.
173 I therefore pointed him at the same resource
174 [(power-gem5)](https://github.com/power-gem5/gem5/tree/gem5-experimental)
175 that Hugh had kindly pointed me at, the week before, and did a
176 [late night write-up](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005445.html)
177
178 My feeling is that these weekly round-table meetings are going to be
179 really important for everyone involved in OpenPOWER. It's a community:
180 we help each other.
181
182 # Sponsorship by RaptorCS with a TALOS II Workstation
183
184 TODO http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005291.html
185
186 # Epic Megagrants
187
188 TODO
189
190 # NLNet Milestone tasks
191
192 TODO
193
194 # Georgia Tech CREATE-X
195
196 TODO
197
198 # LOAD/STORE Buffer and 6600 design documentation
199
200 A critical part of this project is not just to create a chip, it's to
201 *document* the chip design, the decisions along the way, for both
202 educational, research, and ongoing maintenance purposes. With an
203 augmented CDC 6600 design being chosen as the fundamental basis,
204 [documenting that](https://libre-riscv.org/3d_gpu/architecture/6600scoreboard/)
205 as well as the key differences is particularly important. At the very least,
206 the extremely simple and highly effective hardware but timing-critical
207 design aspects of the circular loops in the 6600 were recognised by James
208 Thornton (the co-designer of the 6600) as being paradoxically challenging
209 to understand why so few gates could be so effective. Consequently,
210 documenting it just to be able to *develop* it is extremely important.
211
212 We're getting to the point where we need to connect the LOAD/STORE Computation
213 Units up to an actual memory architecture. We've chosen
214 [minerva](https://github.com/lambdaconcept/minerva/blob/master/minerva/units/loadstore.py)
215 as the basis because it is written in nmigen, works, and, crucially, uses
216 wishbone (which we decided to use as the main Bus Backbone a few months ago).
217
218 However, unlike minerva, which is a single-issue 32-bit embedded chip,
219 where it's perfectly ok to have one single LD/ST operation per clock,
220 and not only that but to have that operation take a few clock cycles,
221 to get anything like the level of performance needed of a GPU, we need
222 at least four 64-bit LOADs or STOREs *every clock cycle*.
223
224 For a first ASIC from a team that's never done a chip before, this is,
225 officially, "Bonkers Territory". Where minerva is doing 32-bit-wide
226 Buses (and does not support 64-bit LD/ST at all), we need internal
227 data buses of a minimum whopping **2000** wires wide.
228
229 Let that sink in for a moment.
230
231 The reason why the internal buses need to be 2000 wires wide comes down
232 to the fact that we need, realistically, 6 to eight LOAD/STORE Computation
233 Units. 4 of them will be operational, 2 to 4 of them will be waiting
234 with pending instructions from the multi-issue Vectorisation Engine.
235
236 We chose to use a system which expands the first 4 bits of the address,
237 plus the operation width (1,2,4,8 bytes) into a "bitmap" - a byte-mask -
238 that corresponds directly with the 16 byte "cache line" byte enable
239 columns, in the L1 Cache. These bitmaps can then be "merged" such
240 that requests that go to the same cache line can be served *in the
241 same clock cycle* to multiple LOAD/STORE Computation Units. This
242 being absolutely critical for effective Vector Processing.
243
244 Additionally, in order to deal with misaligned memory requests, each of those
245 needs to put out *two* such 16-byte-wide requests (see where this is going?)
246 out to the L1 Cache.
247 So, we now have eight times two times 128 bits which is a staggering
248 2048 wires *just for the data*. There do exist ways to get that down
249 (potentially to half), and there do exist ways to get that cut in half
250 again, however doing so would miss opportunities for merging of requests
251 into cache lines.
252
253 At that point, thanks to Mitch Alsup's input (Mitch is the designer of
254 the Motorola 68000, Motorola 88120, key architecture on AMD's Opteron
255 Series, the AMD K9, AMDGPU and Samsung's latest GPU), we learned that
256 L1 cache design critically depends on what type of SRAM you have. We
257 initially, naively, wanted dual-ported L1 SRAM and that's when Staf
258 and Mitch taught us that this results in half-duty rate. Only
259 1-Read **or** 1-Write SRAM Cells give you fast enough (single-cycle)
260 data rates to be useable for L1 Caches.
261
262 Part of the conversation has wandered into
263 [why we chose dynamic pipelines](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005459.html)
264 as well as receiving that
265 [important advice](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005354.html)
266 from both Mitch Alsup and Staf Verhaegen.
267
268 (Staf is also [sponsored by NLNet](https://nlnet.nl/project/Chips4Makers/)
269 to create Libre-licensed Cell Libraries, busting through one of the -
270 many - layers of NDAs and reducing NREs for ASIC development: I helped him
271 put in the submission, and he was really happy to do the Cell Libraries
272 that we will be using for LibreSOC's 180nm test tape-out in October 2020.)
273
274 # Public-Inbox and Domain Migration
275
276 TODO
277
278 # Build Servers
279
280 TODO
281
282 # Conclusion
283
284 TODO
285
286