add section on Kazan's new shader compiler ir
[crowdsupply.git] / updates / 023_2020mar26_decoder_emulator_started.mdwn
1 So many things happened since the last update they actually need to go
2 in the main update, even in summary form. One big thing:
3 [Raptor CS](https://www.raptorcs.com/)
4 sponsored us with remote access to a Monster spec'd TALOS II Workstation!
5
6 # Introduction
7
8 Here's the summary (if it can be called a summary):
9
10 * [An announcement](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/004995.html)
11 that we got the funding (which is open to anyone - hint, hint) resulted in
12 at least three people reaching out to join the team. "We don't need
13 permission to own our own hardware" got a *really* positive reaction.
14 * New team member, Jock (hello Jock!) starts on the coriolis2 layout,
15 after Jean-Paul from LIP6.fr helped to dramatically improve how coriolis2
16 can be used. This resulted in a
17 [tutorial](https://libre-riscv.org/HDL_workflow/coriolis2/) and a
18 [huge bug report discussion](http://bugs.libre-riscv.org/show_bug.cgi?id=178)
19 * Work has started on the
20 [POWER ISA decoder](http://bugs.libre-riscv.org/show_bug.cgi?id=186),
21 verified through
22 [calling GNU AS](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/decoder/test/test_decoder_gas.py;h=9238d3878d964907c5569a3468d6895effb7dc02;hb=56d145e42ac75626423915af22d1493f1e7bb143) (yes, really!)
23 and on a mini-simulator
24 [calling QEMU](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/qemu.py;h=9eb103bae227e00a2a1d2ec4f43d7e39e4f44960;hb=56d145e42ac75626423915af22d1493f1e7bb143)
25 for verification.
26 * Jacob's simple-soft-float library growing
27 [Power FP compatibility](http://bugs.libre-riscv.org/show_bug.cgi?id=258)
28 and python bindings.
29 * Kazan, the Vulkan driver Jacob is writing, is getting
30 a [new shader compiler IR](http://bugs.libre-riscv.org/show_bug.cgi?id=161).
31 * A Conference call with OpenPOWER Foundation Director, Hugh, and Timothy
32 Pearson from RaptorCS has been established every two weeks.
33 * The OpenPOWER Foundation is also running some open
34 ["Virtual Coffee"](https://openpowerfoundation.org/openpower-virtual-coffee-calls/)
35 weekly round-table calls for anyone interested, generally, in OpenPOWER
36 development.
37 * Tim sponsors our team with access to a Monster Talos II system with a
38 whopping 128 GB RAM. htop lists a staggering 72 cores (18 real
39 with 4-way hyperthreading).
40 * [Epic MegaGrants](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005262.html)
41 reached out (hello!) to say they're still considering our
42 request.
43 * A marathon 3-hour session with [NLNet](http://nlnet.nl) resulted
44 in the completion of the
45 [Milestone tasks list(s)](http://bugs.libre-riscv.org/buglist.cgi?component=Milestones&list_id=567&resolution=---)
46 and a
47 [boat-load](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/thread.html)
48 of bug reports to the list.
49 * Immanuel Yehowshua is participating in the Georgia Tech
50 [Create-X](https://create-x.gatech.edu/) Programme, and is establishing
51 a Public Benefit Corporation in Atlanta, as an ethical vehicle for VC
52 Funding.
53 * A [Load/Store Buffer](http://bugs.libre-riscv.org/show_bug.cgi?id=216)
54 design and
55 [further discussion](http://bugs.libre-riscv.org/show_bug.cgi?id=257)
56 including on
57 [comp.arch](https://groups.google.com/forum/#!topic/comp.arch/cbGAlcCjiZE)
58 inspired additional writeup
59 on the
60 [6600 scoreboard](https://libre-riscv.org/3d_gpu/architecture/6600scoreboard/)
61 page.
62 * [Public-Inbox](http://bugs.libre-riscv.org/show_bug.cgi?id=181) was
63 installed successfully on the server, which is in the process of
64 moving to a [new domain name](http://bugs.libre-riscv.org/show_bug.cgi?id=182)
65 [Libre-SOC](http://libre-soc.org)
66 * Build Servers have been set up with
67 [automated testing](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005364.html)
68 being established
69
70 Well dang, as you can see, suddenly it just went ballistic. There's
71 almost certainly things left off the list. For such a small team there's
72 a heck of a lot going on. We have an awful lot to do, in a short amount
73 of time: the 180nm tape-out is in October 2020 - only 7 months away.
74
75 With this update we're doing something slightly different: a request
76 has gone out [to the other team members](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005428.html)
77 to say a little bit about what each of them is doing. This also helps me
78 because these updates do take quite a bit of time to write.
79
80 # NLNet Funding announcement
81
82 An announcement went out
83 [last year](https://lists.gnu.org/archive/html/libreplanet-discuss/2019-09/msg00170.html)
84 that we'd applied for funding, and we got some great responses and
85 feedback (such as "don't use patented AXI4"). The second time, we
86 sent out a "we got it!" message and got some really nice private and
87 public replies, as well as requests from people to join the team.
88 More on that when it happens.
89
90 # Coriolis2 experimentation started
91
92 Jock, a really enthusiastic and clearly skilled and experienced python
93 developer, has this to say about coriolis2:
94
95 As a humble Python developer, I understand the unique status and
96 significance of the Coriolis project, nevertheless I cannot help
97 but notice that it has a huge room for improvement. I genuinely hope
98 that my participation in libre-riscv will also help improve Coriolis.
99
100 This was the short version, with a much more
101 [detailed insight](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005478.html)
102 listed here which would do well as a bugreport. However the time it would
103 take is quite significant. We do have funding available from NLNet,
104 so if there is anyone that would like to take this on, under the supervision
105 of Jean-Paul at LIP6.fr, we can look at facilitating that.
106
107 One of the key insights that Jock came up with was that the coding style,
108 whilst consistent, is something that specifically has to be learned, and,
109 as such, being contrary to PEP8 in so many ways, creates an artificially
110 high barrier and learning curve.
111
112 Even particularly experienced cross-language developers such as
113 myself tend to be able to *read* such code, but editing it, when
114 commas separating list items are on the beginning of lines, results in
115 syntax errors automatically introduced *without thinking* because we
116 automatically add them *at the end* because it looks like one is missing.
117
118 This is why we insisted on PEP8 in the
119 [HDL workflow](http://libre-riscv.org/HDL_workflow) document.
120
121 Other than that: coriolis2 is actually extremely exciting to work with.
122 Anyone who has done manual PCB layout will know quite how much of a relief
123 it is to have auto-routing: this is what coriolis2 has by the bucket-load,
124 *as well* as auto-placement. We are looking at half a *million* objects
125 (Cells) to place. Without an auto-router / auto-placer this is just a
126 flat-out impossible task.
127
128 The first step was to
129 [learn and adapt coriolis2](http://bugs.libre-riscv.org/show_bug.cgi?id=178)
130 which was needed to find out how much work would be involved, as much as
131 anything else, in order to be able to accurately assign the fixed budgets
132 to the NLNet milestones. Following on from that, when Jock joined,
133 we needed to work out a compact way to express the
134 [layout of blocks](http://bugs.libre-riscv.org/show_bug.cgi?id=217#c44)
135 and he's well on the way to achieving that.
136
137 Some of the pictures from coriolis2 are
138 [stunning](bugs.libre-riscv.org/attachment.cgi?id=29). This was an
139 experimental routing of the IEEE754 FP 64-bit multiplier. It took
140 5 minutes to run, and is around 50,000 gates: as big as most silicon
141 ASICs that have formerly been done with Coriolis2, and 50% of the
142 practical size that can be handed in one go to the auto-place/auto-router.
143
144 Other designs using coriolis2 have been of the form where the major "blocks"
145 (such as FPMUL, or Register File) are laid-out automatically in a single-level
146 hierarchy, followed by full and total manual layout from that point onwawrds,
147 in what is termed in the industry as a "Floorplan".
148 With around 500,000 gates to do and many blocks being repeated, this approach
149 is not viable for us. We therefore need a *two* level or potentially three
150 level hierarchy.
151
152 [Explaining this](http://bugs.libre-riscv.org/show_bug.cgi?id=178#c146)
153 to Jean-Paul was amusing and challenging. Much bashing of heads against
154 walls and keyboards was involved. The basic plan: rather than have
155 coriolis2 perform an *entire* layout, in a flat and all-or-nothing fashion,
156 we need a much more subtle fine-grained approach, where *sub-blocks* are
157 laid-out, then *included* at a given level of hierarchy as "pre-done blocks".
158
159 Save and repeat.
160
161 This apparently had never been done before, and explaining it in words was
162 extremely challenging. Through a massive hack (actively editing the underlying
163 HDL files temporarily in between tasks) was the only way to illustrate it.
164 However once the lightbulb went on, Jean-Paul was able to get coriolis2's
165 c++ code into shape extremely rapidly, and this alone has opened up an
166 *entire new avenue* of potential for coriolis2 to be used in industry
167 for doing much larger ASICs. Which is precisely the kind of thing that
168 our NLNet sponsors (and the EU, from the Horizon 2020 Grant) love. hooray.
169 Now if only we could actually go to a conference and talk about it.
170
171 # POWER ISA decoder and Simulator
172
173 *(kindly written by Michael)*
174
175 The decoder we have is based on that of IBM's
176 [microwatt reference design](https://github.com/antonblanchard/microwatt).
177 As microwatt's decoder is quite regular, consisting of a bunch of large
178 switch statements returning fields of a struct, we elected not to
179 pursue a direct conversion of the VHDL to nmigen. Instead, we
180 extracted the information in the switch statements into several
181 [CSV tables](https://libre-riscv.org/openpower/isatables/),
182 and leveraged nmigen to construct the decoder from these
183 tables. We applied the same technique to extract the subfields
184 (register numbers, branch offset, immediates, etc.) from the
185 instruction, where Luke converted the information in the POWER ISA
186 specification to text, and wrote a module in python to extract those
187 fields from an instruction.
188
189 To test the decoder, we initially verified it against the tables we
190 extracted, and manually against the [POWER ISA
191 specification](https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0). Later
192 however, we came up with the idea of [verifying the
193 decoder](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/decoder/test/test_decoder_gas.py;h=9238d3878d964907c5569a3468d6895effb7dc02;hb=433ab59cf9b7ab1ae10754798fc1c110e705db76)
194 against the output of the GNU assembler. This is done by selecting an
195 instruction type (integer reg/reg, integer immediate, load store,
196 etc), and randomly selecting the opcode, registers, immediates, and
197 other operands. We then feed this instruction to GNU AS to assemble,
198 and then the assembled instruction is sent to our decoder. From this,
199 we can then verify that the output of the decoder matches what was
200 generated earlier.
201
202 We also explored using a similar idea to test the functionality of the
203 entire SOC. By using the [QEMU](https://www.qemu.org/) PowerPC
204 emulator, we can compare the execution of our SOC against that of the
205 emulator to verify that our decoder and backend are working correctly.
206 We would write snippets of test code (or potentially randomly generate
207 instructions) and send the resulting binary to both the SOC and
208 QEMU. We would then simulate our SOC until it was finished executing
209 instructions, and use Qemu's gdb interface to do the same. We would
210 then use Qemu's gdb interface to compare the register file and memory
211 with that of our SOC to verify that it is working correctly. I did
212 some experimentation using this technique to verify a [rudimentary
213 simulator](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/test_sim.py;h=aadaf667eff7317b1aa514993cd82b9abedf1047;hb=433ab59cf9b7ab1ae10754798fc1c110e705db76)
214 of the SOC backend, and it seemed to work quite well.
215
216 *(Note from Luke: this automated approach, taking either other people's
217 regularly-written code or actual PDF specifications, not only saves us a
218 vast amount of time, it also ensures that our implementation is
219 correct and does not contain transcription errors).*
220
221 # simple-soft-float Library and POWER FP emulation
222
223 The
224 [simple-soft-float](https://salsa.debian.org/Kazan-team/simple-soft-float)
225 library is a floating-point library Jacob wrote with the intention
226 of being a reference implementation of IEEE 754 for hardware testing
227 purposes. It's specifically designed to be written to be easier to
228 understand instead of having the code obscured in pursuit of speed:
229
230 * Being easier to understand helps prevent bugs where the code does not
231 match the IEEE spec.
232 * It uses the [algebraics](https://salsa.debian.org/Kazan-team/algebraics)
233 library that Jacob wrote since that allows using numbers that behave
234 like exact real numbers, making reasoning about the code simpler.
235 * It is written in Rust rather than highly-macro-ified C, since that helps with
236 readability since operations aren't obscured, as well as safety, since Rust
237 proves at compile time that the code won't seg-fault unless you specifically
238 opt-out of those guarantees by using `unsafe`.
239
240 It currently supports 16, 32, 64, 128-bit FP for RISC-V, along with
241 having a `DynamicFloat` type which allows dynamically specifying all
242 aspects of how a particular floating-point type behaves -- if one wanted,
243 they could configure it as a 2048-bit floating-point type.
244
245 It also has Python bindings, thanks to the awesome
246 [PyO3](https://pyo3.rs/) library for writing Python bindings in Rust.
247
248 We decided to write simple-soft-float instead
249 of extending the industry-standard [Berkeley
250 softfloat](http://www.jhauser.us/arithmetic/SoftFloat.html) library
251 because of a range of issues, including not supporting Power FP, requiring
252 recompilation to switch which ISA is being emulated, not supporting
253 all the required operations, architectural issues such as depending on
254 global variables, etc. We are still testing simple-soft-float against
255 Berkeley softfloat where we can, however, since Berkeley softfloat is
256 widely used and highly likely to be correct.
257
258 simple-soft-float is [gaining support for Power
259 FP](http://bugs.libre-riscv.org/show_bug.cgi?id=258), which requires
260 rewriting a lot of the status-flag handling code since Power supports a
261 much larger set of floating-point status flags and exceptions than most
262 other ISAs.
263
264 Thanks to RaptorCS for giving us remote access to a Power9 system,
265 since that makes it much easier verifying that the test cases are correct
266 (more on this below).
267
268 API Docs for stable releases of both
269 [simple-soft-float](https://docs.rs/simple-soft-float) and
270 [algebraics](https://docs.rs/algebraics) are available on docs.rs.
271
272 The algebraics library was chosen as the
273 [Crate of the Week for October 8, 2019 on This Week in
274 Rust](https://this-week-in-rust.org/blog/2019/10/08/this-week-in-rust-307/#crate-of-the-week).
275
276 One of the really important things about these libraries: they're not
277 specifically coded exclusively for Libre-SOC: like Berkeley softfloat itself
278 (and also like the [IEEE754 FPU](https://git.libre-riscv.org/?p=ieee754fpu.git))
279 they're intended for *general-purpose* use by other projects. These are
280 exactly the kinds of side-benefits for the wider Libre community that
281 sponsorship, from individuals, Foundations (such as NLNet) and Companies
282 (such as Purism and Raptor CS) brings.
283
284 # Kazan Getting a New Shader Compiler IR
285
286 After spending several weeks only to discover that translating directly from
287 SPIR-V to LLVM IR, Vectorizing, and all the other front-end stuff all in a
288 single step is not really feasible, Jacob has switched to [creating a new
289 shader compiler IR](http://bugs.libre-riscv.org/show_bug.cgi?id=161) to allow
290 decomposing the translation process into several smaller steps.
291
292 The IR and
293 SPIR-V to IR translator are being written simultaneously, since that allows
294 more easily finding the things that need to be represented in the shader
295 compiler IR. Because writing both of the IR and SPIR-V translator together is
296 such a big task, we decided to pick an arbitrary point ([translating a totally
297 trivial shader into the IR](http://bugs.libre-riscv.org/show_bug.cgi?id=177))
298 and split it into tasks at that point so Jacob would be able to get paid
299 after several months of work.
300
301 The IR uses structured control-flow inspired by WebAssembly's control-flow
302 constructs as well as
303 [SSA](https://en.wikipedia.org/wiki/Static_single_assignment_form) but, instead
304 of using traditional phi instructions, it uses block and loop parameters and
305 return values (inspired by [Cranelift's EBB
306 parameters](https://github.com/bytecodealliance/wasmtime/blob/master/cranelift/docs/ir.md#static-single-assignment-form)
307 as well as both of the [Rust](https://www.rust-lang.org/) and [Lua](https://www.lua.org/) programming languages).
308
309 The IR has a single pointer type for all data pointers (`data_ptr`), unlike LLVM IR where pointer types have a type they point to (like `* i32`, where `i32` is the type the pointer points to).
310
311 Because having a serialized form of the IR is important for any good IR, like
312 LLVM IR, it has a user-friendly textual form that can be both read and
313 written without losing any information (assuming the IR is valid, comments are
314 ignored). A binary form may be added later.
315
316 Some example code (the IR is likely to change somewhat):
317
318 ```
319 # this is a comment, comments go from the `#` character
320 # to the end of the line.
321
322 fn function1[] -> ! {
323 # declares a function named function1 that takes
324 # zero parameters and doesn't return
325 # (the return type is !, taken from Rust).
326 # If the function could return, there would instead be
327 # a list of return types:
328 # fn my_fn[] -> [i32, i64] {...}
329 # my_fn returns an i32 and an i64. The multiple
330 # returned values is inspired by Lua's multiple return values.
331
332 # the hints for this function
333 hints {
334 # there are no inlining hints for this function
335 inlining_hint: none,
336 # this function doesn't have a side-effect hint
337 side_effects: normal,
338 }
339
340 # function local variables
341 {
342 # the local variable is an i32 with an
343 # alignment of 4 bytes
344 i32, align: 0x4 -> local_var1: data_ptr;
345 # the pointer to the local variable is
346 # assigned to local_var1 which has the type data_ptr
347 }
348
349 # the function body is a single block -- block1.
350 # block1's return types are instead attached to the
351 # function signature above
352 # (the `-> !` in the `fn function1[] -> !`).
353 block1 {
354 # the first instruction is a loop named loop1.
355 # the initial value of loop_var is the_const,
356 # which is a named constant.
357 # the value of the_const is the address of the
358 # function `function1`.
359 loop loop1[the_const: fn function1] -> ! {
360 # loop1 takes 1 parameter, which is assigned
361 # to loop_var. the type of loop_var is a pointer to a
362 # function which takes no parameters and doesn't
363 # return.
364 -> [loop_var: fn[] -> !];
365
366 # the loop body is a single block -- block2.
367 # block2's return value definitions are instead
368 # attached to the loop instruction above
369 # (the `-> !` in the `loop loop1[...] -> !`).
370 block2 {
371
372 # block3 is a block instruction, it returns
373 # two values, which are assigned to a and b.
374 # Both of a and b have type i32.
375 block block3 -> [a: i32, b: i32] {
376 # the only way a block can return is by
377 # being broken out of using the break
378 # instruction. It is invalid for execution
379 # to reach the end of a block.
380
381 # this break instruction breaks out of
382 # block3, making block3 return the
383 # constants 1 and 2, both of type i32.
384 break block3[1i32, 2i32];
385 };
386
387 # an add instruction. The instruction adds
388 # the value `a` (returned by block3 above) to
389 # the constant `increment` (which is an i32
390 # with the value 0x1), and stores the
391 # result in the value `"a"1`. The source-code
392 # location for the add instruction is specified
393 # as being line 12, column 34, in the file
394 # `source_file.vertex`.
395 add [a, increment: 0x1i32]
396 -> ["a"1: i32] @ "source_file.vertex":12:34;
397
398 # The `"a"1` name is stored as just `a` in
399 # the IR, where the 1 is a numerical name
400 # suffix to differentiate between the two
401 # values with name `a`. This allows robustly
402 # handling duplicate names, by using the
403 # numerical name suffix to disambiguate.
404 #
405 # If a name is specified without the numerical
406 # name suffix, the suffix is assumed to be the
407 # number 0. This also allows handling names that
408 # have unusual characters or are just the empty
409 # string by using the form with the numerical
410 # suffix:
411 # `""0` (empty string)
412 # `"\n"0` (a newline)
413 # `"\u{12345}"0` (the unicode scalar value 0x12345)
414
415
416 # this continue instruction jumps back to
417 # the beginning of loop1, supplying the new
418 # values of the loop parameters. In this case,
419 # we just supply loop_var as the value for
420 # the parameter, which just gets assigned to
421 # loop_var in the next iteration.
422 continue loop1[loop_var];
423 }
424 };
425 }
426 }
427 ```
428
429 # OpenPOWER Conference calls
430
431 We've now established a routine two-week conference call with Hugh Blemings,
432 OpenPOWER Foundation Director, and Timothy Pearson, CEO of Raptor CS. This
433 allows us to keep up-to-date (each way) on both our new venture and also
434 the newly-announced OpenPOWER Foundation effort as it progresses.
435
436 One of the most important things that we, Libre-SOC, need, and are
437 discussing with Hugh and Tim is: a way to switch on/off functionality
438 in the (limited) 32-bit opcode space, so that we have one mode for
439 "POWER 3.0B compliance" and another for "things that are absolutely
440 essential to make a decent GPU". With these two being strongly
441 mutually exclusively incompatible, this is just absolutely critical.
442
443 Khronos Vulkan Floating-point Compliance is, for example, critical not
444 just from a Khronos Trademark Compliance perspective, it's essential
445 from a power-saving and thus commercial success perspective. If we
446 have absolute strict compliance with IEEE754 for POWER 3.0B, this will
447 result in far more silicon than any commercially-competitive GPU on
448 the market, and we will not be able to sell product. Thus it is
449 *commercially* essential to be able to swap between POWER Compliance
450 and Khronos Compliance *at the silicon level*.
451
452 POWER 3.0B does not have c++ style LR/SC atomic operations for example,
453 and if we have half a **million** 3D GPU data structures **per second**
454 that need SMP-level inter-core mutexes, and the current POWER 3.0B
455 multi-instruction atomic operations are used - conforming strictly to
456 the standard - we're highly likely to use 10 to 15 **percent** processing
457 power consumed on spin-locking. Finding out from Tim on one of these
458 calls that this is something that c++ atomics is something that end-users
459 have been asking about is therefore a good sign.
460
461 Adding new and essential features that could well end up in a future version
462 of the POWER ISA *need* to be firewalled in a clean way, and we've been
463 asked to [draft a letter](https://libre-riscv.org/openpower/isans_letter/)
464 to some of the (very busy) engineers with a huge amount of knowledge
465 and experience inside IBM, for them to consider. Some help in reviewing
466 it would be greatly appreciated.
467
468 These and many other things are why the calls with Tim and Hugh are a
469 good idea. The amazing thing is that they're taking us seriously, and
470 we can discuss things like those above with them.
471
472 Other nice things we learned (more on this below) is that Epic Games
473 and RaptorCS are collaborating to get POWER9 supported in Unreal Engine.
474 And that the idea has been very tentatively considered to use our design
475 for the "boot management" processor, running
476 [OpenBMC](https://github.com/openbmc/openbmc). These are early days,
477 it's just ideas, ok! Aside from anything, we actually have to get a chip
478 done, first.
479
480 # OpenPower Virtual Coffee Meetings
481
482 The "Virtual Coffee Meetings", announced
483 [here](https://openpowerfoundation.org/openpower-virtual-coffee-calls/)
484 are literally open to anyone interested in OpenPOWER (if you're strictly
485 Libre there's a dial-in method). These calls are not recorded, it's
486 just an informal conversation.
487
488 What's a really nice surprise is finding
489 out that Paul Mackerras, whom I used to work with 20 years ago, is *also*
490 working on OpenPOWER, specifically
491 [microwatt](https://github.com/antonblanchard/microwatt), being managed
492 by Anton Blanchard.
493
494 A brief discussion led to learning that Paul is looking at adding TLB
495 (Virtual Memory) support to microwatt, specifically the RADIX TLB.
496 I therefore pointed him at the same resource
497 [(power-gem5)](https://github.com/power-gem5/gem5/tree/gem5-experimental)
498 that Hugh had kindly pointed me at, the week before, and did a
499 [late night write-up](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005445.html)
500
501 My feeling is that these weekly round-table meetings are going to be
502 really important for everyone involved in OpenPOWER. It's a community:
503 we help each other.
504
505 # Sponsorship by RaptorCS with a TALOS II Workstation
506
507 With many thanks to Timothy from
508 [RaptorCS](https://raptorcs.com), we've a new shiny
509 online server that needs
510 [setting up](http://bugs.libre-riscv.org/show_bug.cgi?id=265).
511 This machine is not just a "nice-to-have", it's actually essential for
512 us to be able to verify against. As you can see in the bugreport, the idea
513 is to bootstrap our way from running IEEE754 FP on a *POWER* system
514 (using typically gnu libm), verifying Jacob's algorithmic FP library
515 particularly and specifically for its rounding modes and exception modes.
516
517 Once that is done, then apart from having a general-purpose library that
518 is compliant with POWER IEEE754 which *anyone else can use*, we can use
519 that to run unit tests against our[
520 hardware IEEE754 FP library](https://git.libre-riscv.org/?p=ieee754fpu.git;a=summary) -
521 again, a resource that anyone may use in any arbitrary project - verifying
522 that it is also correct. This stepping-stone "bootstrap" method we are
523 deploying all over the place, however to do so we need access to resources
524 that have correctly-compliant implementations in the first place. Thus,
525 the critical importance of access to a TALOS II POWER9 workstation.
526
527 # Epic Megagrants
528
529 Several months back I got word of the existence of Epic Games' "Megagrants".
530 In December 2019 they announced that so far they've given
531 [USD $13 million](https://www.unrealengine.com/en-US/blog/epic-megagrants-reaches-13-million-milestone-in-2019)
532 to 200 recipients, so far: one of them, the Blender Foundation, was
533 [USD $1.2 million](https://www.blender.org/press/epic-games-supports-blender-foundation-with-1-2-million-epic-megagrant/)!
534 This is an amazing and humbling show of support for the 3D Community,
535 world-wide.
536
537 It's not just "games", or products specifically using the Unreal Engine:
538 they're happy to look at anything that "enhances Libre / Open source"
539 capabilities for the 3D Graphics Community.
540
541 A full hybrid 3D-capable CPU-GPU-VPU which is fully-documented not just in
542 its capabilities, that [documentation](http://libre-riscv.org) and
543 [full source code](http://git.libre-riscv.org) kinda extends
544 right the way through the *entire development process* down to the bedrock
545 of the actual silicon - not just the firmware, bootloader and BIOS,
546 *everything* - in my mind it kinda qualifies in way that can, in some
547 delightful way, be characterised delicately as "complete overkill".
548
549 Interestingly, guys, if you're reading this: Tim, the CEO of RaptorCS
550 informs us that you're working closely with his team to get the Unreal
551 Engine up and running on the POWER architecture? Wouldn't that be highly
552 amusing, for us to be able to run the Unreal Engine on the Libre-SOC,
553 given that it's going to be POWER compatible hardware, as a test,
554 first initially in FPGA and then in 18-24 months, on actual silicon, eh?
555
556 So, as I mentioned
557 [on the list](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005262.html)
558 (reiterating what I put in the original application), we're happy with
559 USD $25,000, we're happy with USD $10 million. It's really up to you guys,
560 at Epic Games, as to what level you'd like to see us get to, and how fast.
561
562 USD $600,000 for example we can instead of paying USD $1million to a proprietary
563 company to license a DDR3 PHY for a limited one-time use and only a 32-bit
564 wide interface, we can contract SymbioticEDA to *design* a DDR3 PHY for us,
565 which both we *and the rest of the worldwide Silicon Community can use
566 without limitation* because we will ask SymbioticEDA to make the design
567 (and layout) libre-licensed, for anyone to use.
568
569 USD 250,000 pays for the mask charges that will allow us to do the 40nm
570 quad-core ASIC that we have on the roadmap for the second chip. USD
571 $1m pays for 28nm masks (and so on, in an exponential ramp-up). No, we
572 don't want to do that straight away: yes we do want to go through a first
573 proving test ASIC in 180nm, which, thanks to NLNet, is already funded.
574 This is just good sane sensible use of funds.
575
576 Even USD $25,000 helps us to cover things such as administration of the
577 website (which is taking up a *lot* of time) and little things that we
578 didn't quite foresee when putting in the NLNet Grant Applications.
579
580 Lastly, one of the conditions as I understood it from the Megagrants
581 process is that the funds are paid in "stages". This is exactly
582 what NLNet does for (and with) us, right now. If you wanted to save
583 administrative costs, there may be some benefit to having a conversation
584 with the [30-year-old](https://nlnet.nl/foundation/history/)
585 NLNet Charitable Foundation. Something to think about?
586
587 # NLNet Milestone tasks
588
589 Part of applying for NLNet's Grants is a requirement to create a list
590 of tasks, each of which is assigned a budget. On 100% completion of the task,
591 donations can be sent out. With *six* new proposals accepted, each of which
592 required between five (minimum) and *ninteen* separate and distinct tasks,
593 a call with Michiel and Joost turned into an unexpected three hour online
594 marathon, scrambling to write almost fifty bugreports as part of the Schedule
595 to be attached to each Memorandum of Understanding. The mailing list
596 got a [leeetle bit busy](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005003.html)
597 right around here.
598
599 Which emphasised for us the important need to subdivide the mailing list into
600 separate lists (below).
601
602 # Georgia Tech CREATE-X
603
604 TODO
605
606 # LOAD/STORE Buffer and 6600 design documentation
607
608 A critical part of this project is not just to create a chip, it's to
609 *document* the chip design, the decisions along the way, for both
610 educational, research, and ongoing maintenance purposes. With an
611 augmented CDC 6600 design being chosen as the fundamental basis,
612 [documenting that](https://libre-riscv.org/3d_gpu/architecture/6600scoreboard/)
613 as well as the key differences is particularly important. At the very least,
614 the extremely simple and highly effective hardware but timing-critical
615 design aspects of the circular loops in the 6600 were recognised by James
616 Thornton (the co-designer of the 6600) as being paradoxically challenging
617 to understand why so few gates could be so effective (being as they were,
618 literally the world's first ever out-of-order superscalar architecture).
619 Consequently, documenting it just to be able to *develop* it is extremely
620 important.
621
622 We're getting to the point where we need to connect the LOAD/STORE Computation
623 Units up to an actual memory architecture. We've chosen
624 [minerva](https://github.com/lambdaconcept/minerva/blob/master/minerva/units/loadstore.py)
625 as the basis because it is written in nmigen, works, and, crucially, uses
626 wishbone (which we decided to use as the main Bus Backbone a few months ago).
627
628 However, unlike minerva, which is a single-issue 32-bit embedded chip,
629 where it's perfectly ok to have one single LD/ST operation per clock,
630 and not only that but to have that operation take a few clock cycles,
631 to get anything like the level of performance needed of a GPU, we need
632 at least four 64-bit LOADs or STOREs *every clock cycle*.
633
634 For a first ASIC from a team that's never done a chip before, this is,
635 officially, "Bonkers Territory". Where minerva is doing 32-bit-wide
636 Buses (and does not support 64-bit LD/ST at all), we need internal
637 data buses of a minimum whopping **2000** wires wide.
638
639 Let that sink in for a moment.
640
641 The reason why the internal buses need to be 2000 wires wide comes down
642 to the fact that we need, realistically, 6 to eight LOAD/STORE Computation
643 Units. 4 of them will be operational, 2 to 4 of them will be waiting
644 with pending instructions from the multi-issue Vectorisation Engine.
645
646 We chose to use a system which expands the first 4 bits of the address,
647 plus the operation width (1,2,4,8 bytes) into a "bitmap" - a byte-mask -
648 that corresponds directly with the 16 byte "cache line" byte enable
649 columns, in the L1 Cache. These bitmaps can then be "merged" such
650 that requests that go to the same cache line can be served *in the
651 same clock cycle* to multiple LOAD/STORE Computation Units. This
652 being absolutely critical for effective Vector Processing.
653
654 Additionally, in order to deal with misaligned memory requests, each of those
655 needs to put out *two* such 16-byte-wide requests (see where this is going?)
656 out to the L1 Cache.
657 So, we now have eight times two times 128 bits which is a staggering
658 2048 wires *just for the data*. There do exist ways to get that down
659 (potentially to half), and there do exist ways to get that cut in half
660 again, however doing so would miss opportunities for merging of requests
661 into cache lines.
662
663 At that point, thanks to Mitch Alsup's input (Mitch is the designer of
664 the Motorola 68000, Motorola 88120, key architecture on AMD's Opteron
665 Series, the AMD K9, AMDGPU and Samsung's latest GPU), we learned that
666 L1 cache design critically depends on what type of SRAM you have. We
667 initially, naively, wanted dual-ported L1 SRAM and that's when Staf
668 and Mitch taught us that this results in half-duty rate. Only
669 1-Read **or** 1-Write SRAM Cells give you fast enough (single-cycle)
670 data rates to be useable for L1 Caches.
671
672 Part of the conversation has wandered into
673 [why we chose dynamic pipelines](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005459.html)
674 as well as receiving that
675 [important advice](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/005354.html)
676 from both Mitch Alsup and Staf Verhaegen.
677
678 (Staf is also [sponsored by NLNet](https://nlnet.nl/project/Chips4Makers/)
679 to create Libre-licensed Cell Libraries, busting through one of the -
680 many - layers of NDAs and reducing NREs and unnecessary and artificial
681 barriers for ASIC development: I helped him put in the submission, and
682 he was really happy to do the Cell Libraries that we will be using for
683 LibreSOC's 180nm test tape-out in October 2020.)
684
685 # Public-Inbox and Domain Migration
686
687 As mentioned before, one of the important aspects of this project is
688 the documentation and archiving. It also turns out that when working
689 over an extremely unreliable or ultra-expensive mobile broadband link,
690 having *local* (offline) access to every available development resource
691 is critically important.
692
693 Hence why we are going to the trouble of installing public-inbox, due
694 to its ability to not only have a mailing list entirely stored in a
695 git repository, the "web service" which provides access to that git-backed
696 archive can be not only mirrored elsewhere, it can be *run locally on
697 your own local machine* even when offline. This in combination
698 with the right mailer setup can store-and-forward any replies to the
699 (offline-copied) messages, such that they can be sent when internet
700 connectivity is restored, yet remain a productive collaborative developer.
701
702 Now you know why we absolutely do not accept "slack", or other proprietary
703 "online oh-so-convenient" service. Not only is it highly inappropriate for
704 Libre Projects, not only do we become critically dependent on the Corporation
705 running the service (yes, github has been entirely offline, several times),
706 if we have remote developers (such as myself, working from Scotland last
707 month with sporadic access to a single Cell Tower) or developers in emerging
708 markets where their only internet access is via a Library or Internet Cafe,
709 we absolutely do not want to exclude or penalise such people, just because
710 they have less resources.
711
712 Fascinatingly, Linus Torvals is *specifically*
713 [on record](https://www.linuxjournal.com/content/line-length-limits)
714 about making sure that "Linux development does not favour wealthy people".
715
716 We are also, as mentioned before, moving to a new domain name. We'll take
717 the opportunity to fix some of the issues with HTTPS (wrong certificate),
718 and also do some
719 [better mailing list names](http://bugs.libre-riscv.org/show_bug.cgi?id=184)
720 at the same time.
721
722 TODO (Veera?) bit about what was actually done, how it links into mailman2.
723
724 # OpenPOWER HDL Mailing List opens up
725
726 It is early days, however it is fantastic to see responses from IBM with
727 regards to requests for access to the POWER ISA Specification
728 documents in
729 [machine-readable form](http://lists.mailinglist.openpowerfoundation.org/pipermail/openpower-hdl-cores/2020-March/000007.html)
730 I took Jeff at his word and explained, in some detail,
731 [exactly why](http://lists.mailinglist.openpowerfoundation.org/pipermail/openpower-hdl-cores/2020-March/000008.html)
732 machine readable versions of specifications are critically important.
733
734 The takeaway is: *we haven't got time to do manual transliteration of the spec*
735 into "code". We're expending considerable effort making sure that we
736 "bounce" or "bootstrap" off of pre-existing resources, using computer
737 programs to do so.
738
739 This "trick" is something that I learned over 20 years ago, when developing
740 an SMB Client and Server in something like two weeks flat. I wrote a
741 parser which read the packet formats *from the IETF Draft Specification*,
742 and outputted c-code.
743
744 This leaves me wondering, as I mention on the HDL list, if we can do the same
745 thing with large sections of the POWER Spec.
746
747 # Build Servers
748
749 TODO
750
751 # Conclusion
752
753 I'm not going to mention anything about the current world climate: you've
754 seen enough news reports. I will say (more about this through the
755 [EOMA68](https://www.crowdsupply.com/eoma68/micro-desktop) updates) that
756 I anticipated something like what is happening right now, over ten years
757 ago. I wasn't precisely expecting what *has* happened, just the consequences:
758 world-wide travel shut-down, and for people - the world over - to return to
759 local community roots.
760
761 However what I definitely wasn't expecting was a United States President
762 to be voted in who was eager and, frankly, stupid enough, to start *and
763 escalate* a Trade war with China. The impact on the U.S economy alone, and the
764 reputation of the whole country, has been detrimental in the extreme.
765
766 This combination leaves us - world-wide - with the strong possibility that
767 seemed so "preposterous" that I could in no way discuss it widely, let alone
768 mention it on something like a Crowdsupply update, that thanks to the
769 business model on which their entire product lifecycle is predicated,
770 in combination with the extremely high NREs and development costs for
771 ASICs (custom silicon costs USD $100 million, these days), several
772 large Corporations producing proprietary binary-only drivers for
773 hardware on which we critically rely for our internet-connected way
774 of life **may soon go out of business**.
775
776 Right at a critical time where video conferencing is taking off massively,
777 your proprietary hardware - your smartphone, your tablet, your laptop,
778 everything you rely on for connectivity to the rest of the world, all of
779 a sudden **you may not be able to get software updates** or, worse,
780 your products could even be
781 [remotely shut down](https://www.theguardian.com/technology/2016/apr/05/revolv-devices-bricked-google-nest-smart-home)
782 **without warning**.
783
784 I do not want to hammer the point home too strongly but you should be
785 getting, in no uncertain terms, exactly how strategically critical, in
786 the current world climate, this project just became. We need to get it
787 accelerated, completed, and into production, in an expedited and responsible
788 fashion.
789