f6d533283f59a5002382372bebb07b8edd982606
[crowdsupply.git] / updates / 023_2020mar26_decoder_emulator_started.mdwn
1 So many things happened since the last update they actually need to go
2 in the main update, even in summary form. One big thing:
3 [Raptor CS](https://www.raptorcs.com/)
4 sponsored us with remote access to a Monster spec'd TALOS II Workstation!
5
6 # Introduction
7
8 Here's the summary (if it can be called a summary):
9
10 * [An announcement](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/004995.html)
11 that we got the funding (which is open to anyone - hint, hint) resulted in
12 at least three people reaching out to join the team. "We don't need
13 permission to own our own hardware" got a *really* positive reaction.
14 * New team member, Jock (hello Jock!) starts on the coriolis2 layout,
15 after Jean-Paul from LIP6.fr helped to dramatically improve how coriolis2
16 can be used. This resulted in a
17 [tutorial](https://libre-riscv.org/HDL_workflow/coriolis2/) and a
18 [huge bug report discussion](http://bugs.libre-riscv.org/show_bug.cgi?id=178)
19 * Work has started on the
20 [POWER ISA decoder](http://bugs.libre-riscv.org/show_bug.cgi?id=186),
21 verified through
22 [calling GNU AS](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/decoder/test/test_decoder_gas.py;h=9238d3878d964907c5569a3468d6895effb7dc02;hb=56d145e42ac75626423915af22d1493f1e7bb143) (yes, really!)
23 and on a mini-simulator
24 [calling QEMU](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/qemu.py;h=9eb103bae227e00a2a1d2ec4f43d7e39e4f44960;hb=56d145e42ac75626423915af22d1493f1e7bb143)
25 for verification.
26 * Jacob's simple-soft-float library growing
27 [Power FP compatibility](http://bugs.libre-riscv.org/show_bug.cgi?id=258)
28 and python bindings.
29 * A Conference call with OpenPOWER Foundation Director, Hugh, and Timothy
30 Pearson from RaptorCS has been established every two weeks.
31 * The OpenPOWER Foundation is also running some open
32 ["Virtual Coffee"](https://openpowerfoundation.org/openpower-virtual-coffee-calls/)
33 weekly round-table calls for anyone interested, generally, in OpenPOWER
34 development.
35 * Tim sponsors our team with access to a Monster Talos II system with a
36 whopping 128 GB RAM. htop lists a staggering 72 cores (18 real
37 with 4-way hyperthreading).
38 * [Epic MegaGrants](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005262.html)
39 reached out (hello!) to say they're still considering our
40 request.
41 * A marathon 3-hour session with [NLNet](http://nlnet.nl) resulted
42 in the completion of the
43 [Milestone tasks list(s)](http://bugs.libre-riscv.org/buglist.cgi?component=Milestones&list_id=567&resolution=---)
44 and a
45 [boat-load](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/thread.html)
46 of bug reports to the list.
47 * Immanuel Yehowshua is participating in the Georgia Tech
48 [Create-X](https://create-x.gatech.edu/) Programme, and is establishing
49 a Public Benefit Corporation in Atlanta, as an ethical vehicle for VC
50 Funding.
51 * A [Load/Store Buffer](http://bugs.libre-riscv.org/show_bug.cgi?id=216)
52 design and
53 [further discussion](http://bugs.libre-riscv.org/show_bug.cgi?id=257)
54 including on
55 [comp.arch](https://groups.google.com/forum/#!topic/comp.arch/cbGAlcCjiZE)
56 inspired additional writeup
57 on the
58 [6600 scoreboard](https://libre-riscv.org/3d_gpu/architecture/6600scoreboard/)
59 page.
60 * [Public-Inbox](http://bugs.libre-riscv.org/show_bug.cgi?id=181) was
61 installed successfully on the server, which is in the process of
62 moving to a [new domain name](http://bugs.libre-riscv.org/show_bug.cgi?id=182)
63 [Libre-SOC](http://libre-soc.org)
64 * Build Servers have been set up with
65 [automated testing](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005364.html)
66 being established
67
68 Well dang, as you can see, suddenly it just went ballistic. There's
69 almost certainly things left off the list. For such a small team there's
70 a heck of a lot going on. We have an awful lot to do, in a short amount
71 of time: the 180nm tape-out is in October 2020 - only 7 months away.
72
73 With this update we're doing something slightly different: a request
74 has gone out [to the other team members](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005428.html)
75 to say a little bit about what each of them is doing. This also helps me
76 because these updates do take quite a bit of time to write.
77
78 # NLNet Funding announcement
79
80 An announcement went out
81 [last year](https://lists.gnu.org/archive/html/libreplanet-discuss/2019-09/msg00170.html)
82 that we'd applied for funding, and we got some great responses and
83 feedback (such as "don't use patented AXI4"). The second time, we
84 sent out a "we got it!" message and got some really nice private and
85 public replies, as well as requests from people to join the team.
86 More on that when it happens.
87
88 # Coriolis2 experimentation started
89
90 TODO by Jock http://bugs.libre-riscv.org/show_bug.cgi?id=217#c44
91
92 # POWER ISA decoder and Simulator
93
94 TODO
95
96 # simple-soft-float Library and POWER FP emulation
97
98 The
99 [simple-soft-float](https://salsa.debian.org/Kazan-team/simple-soft-float)
100 library is a floating-point library Jacob wrote with the intention
101 of being a reference implementation of IEEE 754 for hardware testing
102 purposes. It's specifically designed to be written to be easier to
103 understand instead of having the code obscured in pursuit of speed:
104
105 * Being easier to understand helps prevent bugs where the code does not
106 match the IEEE spec.
107 * It uses the [algebraics](https://salsa.debian.org/Kazan-team/algebraics)
108 library that Jacob wrote since that allows using numbers that behave
109 like exact real numbers, making reasoning about the code simpler.
110 * It is written in Rust rather than highly-macro-ified C, since that helps with
111 readability since operations aren't obscured, as well as safety, since Rust
112 proves at compile time that the code won't seg-fault unless you specifically
113 opt-out of those guarantees by using `unsafe`.
114
115 It currently supports 16, 32, 64, 128-bit FP for RISC-V, along with
116 having a `DynamicFloat` type which allows dynamically specifying all
117 aspects of how a particular floating-point type behaves -- if one wanted,
118 they could configure it as a 2048-bit floating-point type.
119
120 It also has Python bindings, thanks to the awesome
121 [PyO3](https://pyo3.rs/) library for writing Python bindings in Rust.
122
123 We decided to write simple-soft-float instead
124 of extending the industry-standard [Berkeley
125 softfloat](http://www.jhauser.us/arithmetic/SoftFloat.html) library
126 because of a range of issues, including not supporting Power FP, requiring
127 recompilation to switch which ISA is being emulated, not supporting
128 all the required operations, architectural issues such as depending on
129 global variables, etc. We are still testing simple-soft-float against
130 Berkeley softfloat where we can, however, since Berkeley softfloat is
131 widely used and highly likely to be correct.
132
133 simple-soft-float is [gaining support for Power
134 FP](http://bugs.libre-riscv.org/show_bug.cgi?id=258), which requires
135 rewriting a lot of the status-flag handling code since Power supports a
136 much larger set of floating-point status flags and exceptions than most
137 other ISAs.
138
139 Thanks to RaptorCS for giving us remote access to a Power9 system,
140 since that makes it much easier verifying that the test cases are correct
141 (more on this below).
142
143 API Docs for stable releases of both
144 [simple-soft-float](https://docs.rs/simple-soft-float) and
145 [algebraics](https://docs.rs/algebraics) are available on docs.rs.
146
147 One of the really important things about these libraries: they're not
148 specifically coded exclusively for Libre-SOC: like softfloat-3 itself
149 (and also like the [IEEE754 FPU](https://git.libre-riscv.org/?p=ieee754fpu.git))
150 they're intended for *general-purpose* use by other projects. These are
151 exactly the kinds of side-benefits for the wider Libre community that
152 sponsorship, from individuals, Foundations (such as NLNet) and Companies
153 (such as Purism and Raptor CS) brings.
154
155 # OpenPOWER Conference calls
156
157 TODO
158
159 # OpenPower Virtual Coffee Meetings
160
161 The "Virtual Coffee Meetings", announced
162 [here](https://openpowerfoundation.org/openpower-virtual-coffee-calls/)
163 are literally open to anyone interested in OpenPOWER (if you're strictly
164 Libre there's a dial-in method). These calls are not recorded, it's
165 just an informal conversation.
166
167 What's a really nice surprise is finding
168 out that Paul Mackerras, whom I used to work with 20 years ago, is *also*
169 working on OpenPOWER, specifically
170 [microwatt](https://github.com/antonblanchard/microwatt), being managed
171 by Anton Blanchard.
172
173 A brief discussion led to learning that Paul is looking at adding TLB
174 (Virtual Memory) support to microwatt, specifically the RADIX TLB.
175 I therefore pointed him at the same resource
176 [(power-gem5)](https://github.com/power-gem5/gem5/tree/gem5-experimental)
177 that Hugh had kindly pointed me at, the week before, and did a
178 [late night write-up](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005445.html)
179
180 My feeling is that these weekly round-table meetings are going to be
181 really important for everyone involved in OpenPOWER. It's a community:
182 we help each other.
183
184 # Sponsorship by RaptorCS with a TALOS II Workstation
185
186 TODO http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005291.html
187
188 # Epic Megagrants
189
190 Several months back I got word of the existence of Epic Games' "Megagrants".
191 In December 2019 they announced that so far they've given
192 [USD $13 million](https://www.unrealengine.com/en-US/blog/epic-megagrants-reaches-13-million-milestone-in-2019)
193 to 200 recipients, so far: one of them, the Blender Foundation, was
194 [USD $1.2 million](https://www.blender.org/press/epic-games-supports-blender-foundation-with-1-2-million-epic-megagrant/)!
195 This is an amazing and humbling show of support for the 3D Community,
196 world-wide.
197
198 It's not just "games", or products specifically using the Unreal Engine:
199 they're happy to look at anything that "enhances Libre / Open source"
200 capabilities for the 3D Graphics Community.
201
202 A full hybrid 3D-capable CPU-GPU-VPU which is fully-documented not just in
203 its capabilities, that [documentation](http://libre-riscv.org) and
204 [full source code](http://git.libre-riscv.org) kinda extends
205 right the way through the *entire development process* down to the bedrock
206 of the actual silicon - not just the firmware, bootloader and BIOS,
207 *everything* - in my mind it kinda qualifies in way that can, in some
208 delightful way, be characterised delicately as "complete overkill".
209
210 Interestingly, guys, if you're reading this: Tim, the CEO of RaptorCS
211 informs us that you're working closely with his team to get the Unreal
212 Engine up and running on the POWER architecture? Wouldn't that be highly
213 amusing, for us to be able to run the Unreal Engine on the Libre-SOC,
214 given that it's going to be POWER compatible hardware, as a test,
215 first initially in FPGA and then in 18-24 months, on actual silicon, eh?
216
217 So, as I mentioned
218 [on the list](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005262.html)
219 (reiterating what I put in the original application), we're happy with
220 USD $25,000, we're happy with USD $10 million. It's really up to you guys,
221 at Epic Games, as to what level you'd like to see us get to, and how fast.
222
223 USD $600,000 for example we can instead of paying USD $1million to a proprietary
224 company to license a DDR3 PHY for a limited one-time use and only a 32-bit
225 wide interface, we can contract SymbioticEDA to *design* a DDR3 PHY for us,
226 which both we *and the rest of the worldwide Silicon Community can use
227 without limitation* because we will ask SymbioticEDA to make the design
228 libre-licensed, for anyone to use.
229
230 USD 250,000 pays for the mask charges that will allow us to do the 40nm
231 quad-core ASIC that we have on the roadmap for the second chip. USD
232 $1m pays for 28nm masks (and so on, in an exponential ramp-up). No, we
233 don't want to do that straight away: yes we do want to go through a first
234 proving test ASIC in 180nm, which, thanks to NLNet, is already funded.
235 This is just good sane sensible use of funds.
236
237 Even USD $25,000 helps us to cover things such as administration of the
238 website (which is taking up a *lot* of time) and little things that we
239 didn't quite foresee when putting in the NLNet Grant Applications.
240
241 Lastly, one of the conditions as I understood it from the Megagrants
242 process is that the funds are paid in "stages". This is exactly
243 what NLNet does for (and with) us, right now. If you wanted to save
244 administrative costs, there may be some benefit to having a conversation
245 with the [30-year-old](https://nlnet.nl/foundation/history/)
246 NLNet Charitable Foundation. Something to think about?
247
248 # NLNet Milestone tasks
249
250 Part of applying for NLNet's Grants is a requirement to create a list
251 of tasks, each of which is assigned a budget. On 100% completion of the task,
252 donations can be sent out. With *six* new proposals accepted, each of which
253 required between five (minimum) and *ninteen* separate and distinct tasks,
254 a call with Michiel and Joost turned into an unexpected three hour online
255 marathon, scrambling to write almost fifty bugreports as part of the Schedule
256 to be attached to each Memorandum of Understanding. The mailing list
257 got a [leeetle bit busy](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005003.html)
258 right around here.
259
260 Which emphasised for us the important need to subdivide the mailing list into
261 separate lists (below).
262
263 # Georgia Tech CREATE-X
264
265 TODO
266
267 # LOAD/STORE Buffer and 6600 design documentation
268
269 A critical part of this project is not just to create a chip, it's to
270 *document* the chip design, the decisions along the way, for both
271 educational, research, and ongoing maintenance purposes. With an
272 augmented CDC 6600 design being chosen as the fundamental basis,
273 [documenting that](https://libre-riscv.org/3d_gpu/architecture/6600scoreboard/)
274 as well as the key differences is particularly important. At the very least,
275 the extremely simple and highly effective hardware but timing-critical
276 design aspects of the circular loops in the 6600 were recognised by James
277 Thornton (the co-designer of the 6600) as being paradoxically challenging
278 to understand why so few gates could be so effective. Consequently,
279 documenting it just to be able to *develop* it is extremely important.
280
281 We're getting to the point where we need to connect the LOAD/STORE Computation
282 Units up to an actual memory architecture. We've chosen
283 [minerva](https://github.com/lambdaconcept/minerva/blob/master/minerva/units/loadstore.py)
284 as the basis because it is written in nmigen, works, and, crucially, uses
285 wishbone (which we decided to use as the main Bus Backbone a few months ago).
286
287 However, unlike minerva, which is a single-issue 32-bit embedded chip,
288 where it's perfectly ok to have one single LD/ST operation per clock,
289 and not only that but to have that operation take a few clock cycles,
290 to get anything like the level of performance needed of a GPU, we need
291 at least four 64-bit LOADs or STOREs *every clock cycle*.
292
293 For a first ASIC from a team that's never done a chip before, this is,
294 officially, "Bonkers Territory". Where minerva is doing 32-bit-wide
295 Buses (and does not support 64-bit LD/ST at all), we need internal
296 data buses of a minimum whopping **2000** wires wide.
297
298 Let that sink in for a moment.
299
300 The reason why the internal buses need to be 2000 wires wide comes down
301 to the fact that we need, realistically, 6 to eight LOAD/STORE Computation
302 Units. 4 of them will be operational, 2 to 4 of them will be waiting
303 with pending instructions from the multi-issue Vectorisation Engine.
304
305 We chose to use a system which expands the first 4 bits of the address,
306 plus the operation width (1,2,4,8 bytes) into a "bitmap" - a byte-mask -
307 that corresponds directly with the 16 byte "cache line" byte enable
308 columns, in the L1 Cache. These bitmaps can then be "merged" such
309 that requests that go to the same cache line can be served *in the
310 same clock cycle* to multiple LOAD/STORE Computation Units. This
311 being absolutely critical for effective Vector Processing.
312
313 Additionally, in order to deal with misaligned memory requests, each of those
314 needs to put out *two* such 16-byte-wide requests (see where this is going?)
315 out to the L1 Cache.
316 So, we now have eight times two times 128 bits which is a staggering
317 2048 wires *just for the data*. There do exist ways to get that down
318 (potentially to half), and there do exist ways to get that cut in half
319 again, however doing so would miss opportunities for merging of requests
320 into cache lines.
321
322 At that point, thanks to Mitch Alsup's input (Mitch is the designer of
323 the Motorola 68000, Motorola 88120, key architecture on AMD's Opteron
324 Series, the AMD K9, AMDGPU and Samsung's latest GPU), we learned that
325 L1 cache design critically depends on what type of SRAM you have. We
326 initially, naively, wanted dual-ported L1 SRAM and that's when Staf
327 and Mitch taught us that this results in half-duty rate. Only
328 1-Read **or** 1-Write SRAM Cells give you fast enough (single-cycle)
329 data rates to be useable for L1 Caches.
330
331 Part of the conversation has wandered into
332 [why we chose dynamic pipelines](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005459.html)
333 as well as receiving that
334 [important advice](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005354.html)
335 from both Mitch Alsup and Staf Verhaegen.
336
337 (Staf is also [sponsored by NLNet](https://nlnet.nl/project/Chips4Makers/)
338 to create Libre-licensed Cell Libraries, busting through one of the -
339 many - layers of NDAs and reducing NREs for ASIC development: I helped him
340 put in the submission, and he was really happy to do the Cell Libraries
341 that we will be using for LibreSOC's 180nm test tape-out in October 2020.)
342
343 # Public-Inbox and Domain Migration
344
345 As mentioned before, one of the important aspects of this project is
346 the documentation and archiving. It also turns out that when working
347 over an extremely unreliable or ultra-expensive mobile broadband link,
348 having *local* (offline) access to every available development resource
349 is critically important.
350
351 Hence why we are going to the trouble of installing public-inbox, due
352 to its ability to not only have a mailing list entirely stored in a
353 git repository, the "web service" which provides access to that git-backed
354 archive can be not only mirrored elsewhere, it can be *run locally on
355 your own offline machine*. This in combination with the right mailer
356 setup can store-and-forward any replies to the (offline-copied) messages,
357
358 Now you know why we absolutely do not accept "slack", or other proprietary
359 "online oh-so-convenient" service. Not only is it highly inappropriate for
360 Libre Projects, not only do we become critically dependent on the Corporation
361 running the service (yes, github has been entirely offline, several times),
362 if we have remote developers (such as myself, working from Scotland last
363 month with sporadic access to a single Cell Tower) or developers in emerging
364 markets where their only internet access is via a Library or Internet Cafe,
365 we absolutely do not want to exclude or penalise such people, just because
366 they have less resources.
367
368 Fascinatingly, Linus Torvals is *specifically*
369 [on record](https://www.linuxjournal.com/content/line-length-limits)
370 about making sure that "Linux development does not favour wealthy people".
371
372 We are also, as mentioned before, moving to a new domain name. We'll take
373 the opportunity to fix some of the issues with HTTPS (wrong certificate),
374 and also do some
375 [better mailing list names](http://bugs.libre-riscv.org/show_bug.cgi?id=184)
376 at the same time.
377
378 TODO (Veera?) bit about what was actually done, how it links into mailman2.
379
380 # OpenPOWER HDL Mailing List opens up
381
382 It is early days, however it is fantastic to see responses from IBM with
383 regards to requests for access to the POWER ISA Specification
384 documents in
385 [machine-readable form](http://lists.mailinglist.openpowerfoundation.org/pipermail/openpower-hdl-cores/2020-March/000007.html)
386 I took Jeff at his word and explained, in some detail,
387 [exactly why](http://lists.mailinglist.openpowerfoundation.org/pipermail/openpower-hdl-cores/2020-March/000008.html)
388 machine readable versions of specifications are critically important.
389
390 The takeaway is: *we haven't got time to do manual transliteration of the spec*
391 into "code". We're expending considerable effort making sure that we
392 "bounce" or "bootstrap" off of pre-existing resources, using computer
393 programs to do so.
394
395 This "trick" is something that I learned over 20 years ago, when developing
396 an SMB Client and Server in something like two weeks flat. I wrote a
397 parser which read the packet formats *from the IETF Draft Specification*,
398 and outputted c-code.
399
400 This leaves me wondering, as I mention on the HDL list, if we can do the same
401 thing with large sections of the POWER Spec.
402
403 # Build Servers
404
405 TODO
406
407 # Conclusion
408
409 I'm not going to mention anything about the current world climate: you've
410 seen enough news reports. I will say (more about this through the
411 [EOMA68](https://www.crowdsupply.com/eoma68/micro-desktop) updates) that
412 I anticipated something like what is happening right now, over ten years
413 ago. I wasn't precisely expecting what *has* happened, just the consequences:
414 world-wide travel shut-down, and for people - the world over - to return to
415 local community roots.
416
417 However what I definitely wasn't expecting was a United States President
418 to be voted in who was eager and, frankly, stupid enough, to start *and
419 escalate* a Trade war with China. The impact on the U.S economy alone, and the
420 reputation of the whole country, has been detrimental in the extreme.
421
422 This combination leaves us - world-wide - with the strong possibility that
423 seemed so "preposterous" that I could in no way discuss it widely, let alone
424 mention it on something like a Crowdsupply update, that thanks to the
425 business model on which their entire product lifecycle is predicated,
426 in combination with the extremely high NREs and development costs for
427 ASICs (custom silicon costs USD $100 million, these days), several
428 large Corporations producing proprietary binary-only drivers for
429 hardware on which we critically rely for our internet-connected way
430 of life **may soon go out of business**.
431
432 Right at a critical time where video conferencing is taking off massively,
433 your proprietary hardware - your smartphone, your tablet, your laptop,
434 everything you rely on for connectivity to the rest of the world, all of
435 a sudden **you may not be able to get software updates** or, worse,
436 your products could even be
437 [remotely shut down](https://www.theguardian.com/technology/2016/apr/05/revolv-devices-bricked-google-nest-smart-home)
438 **without warning**.
439
440 I do not want to hammer the point home too strongly but you should be
441 getting, in no uncertain terms, exactly how strategically critical, in
442 the current world climate, this project just became. We need to get it
443 accelerated, completed, and into production, in an expedited and responsible
444 fashion.
445