Copy edit second update
[crowdsupply.git] / updates / 002_201830nov_phoronix_articles.mdwn
1 Many thanks to Michael Larabel, who has been writing early articles on
2 this project even before we had a chance to set up this pre-launch
3 page. What follows are some of my observations and responses about the
4 articles.
5
6 #### List of Phoronix Articles
7
8 For the last few months, Michael has been covering various aspects of
9 this project. The first article covered a lot of the technical
10 details, the second article covered an announcement of Kazan, which
11 implements the Vulkan 3D API, and the most recent article picks up on
12 [the first project
13 update](why-make-a-quad-core-64-bit-soc-surely-there-are-enough-already):
14
15 - [There's A New Libre GPU Effort Building On RISC-V, Rust, LLVM & Vulkan](https://www.phoronix.com/scan.php?page=news_item&px=Libre-GPU-RISC-V-Vulkan) (posted 28 September 2018)
16 - [The Kazan Vulkan CPU/Software-Based Implementation Being Rewritten In Rust](https://www.phoronix.com/scan.php?page=news_item&px=Kazan-Vulkan-Rust) (posted 04 October 2018)
17 - [The EOMA68 Libre Computer Developer Wants To Tackle A Quad-Core RISC-V Libre SoC Design](http://www.phoronix.com/scan.php?page=news_item&px=Quad-Core-Libre-SoC-Proposal) (posted 29 November 2018)
18
19 There has been quite a lot going on, including an enormous amount of
20 planning for nearly six to eight months, so there are quite
21 a few catch-up updates to write. It's worthwhile doing one that incorporates
22 responses to Michael and to some of the people who also kindly asked
23 questions and made comments on the
24 [Phoronix Forum](https://www.phoronix.com/forums/node/1064199).
25
26 #### Comments and Responses
27
28 I have no illusions about the cost of development of this project:
29 it's going to be somewhere north of USD $6 million, with contingency
30 of up to USD $10 million. This is just how it is. Interestingly,
31 that means there's provision for both attracting investment and
32 really, really good talent, and to properly pay for that talent. The
33 project's origins reflect what can be achieved with the current
34 resources. I've been kindly sponsored with a ZC706 FPGA board (worth
35 over USD $2,500), which will allow one major hurdle to be cleared that
36 will meet the criteria of many investors: making sure the design is
37 FPGA proven.
38
39 Secondly, I note Michael is a bit incredulous of the goal of achieving
40 mobile-class 3D performance. It's actually extremely modest: 100
41 million pixels per second, 30 million triangles per second, and around
42 5 to 6 GFLOPs. These statistics were taken from the benchmarks for
43 Vivante's GC800. Achieving these kind of numbers is dead easy.
44 Achieving them within a power envelope of under 2.5 watts? Not so
45 easy!
46
47 To that end, I spent a considerable amount of time speaking to Jeff
48 Bush, who developed
49 [Nyuzi](https://www.phoronix.com/scan.php?page=news_item&px=LGPL-GPGPU-NyuziProcessor).
50 Jeff's work is fascinating and extremely valuable because, despite it
51 being such low 3D peformance, the technical documentation and academic
52 analysis of *why* that performance is so low is absolutely, absolutely
53 critical. The
54 [paper](http://www.cs.binghamton.edu/~millerti/nyuziraster.pdf) that
55 Jeff co-authored makes a comparison of software and hardware
56 rasterisation, and he actually developed a fixed-function hardware
57 renderer called [ChiselGPU](https://github.com/jbush001/ChiselGPU) in
58 order to do the comparisons.
59
60 One fascinating insight that came out of Jeff's work was that just getting
61 data through the L1/L2 cache has a massive impact on power consumption.
62 A way to deal with that is to increase the number of registers in the
63 design until such time as the data being processed (for example, a tile,
64 or an inter-dependent four-wide bank of 4x3 floating-point numbers) can
65 all fit into the register file, so as not to need to be pushed back down
66 to the L1 cache and back. Some GPUs have a "scratch RAM" area to deal
67 with this. Staggeringly, even for (or, especially for) a mobile-class
68 GPU, we had to increase the register file size to a whopping 128 64-bit
69 entries, that can be broken down into **256** 32-bit single-precision
70 floating-point entries! What are we *doing*?! This is supposed to be
71 a modest design!
72
73 A little digging around the Internet reveals that even mobile-class
74 GPUs genuinely have this number of registers. More than that, though,
75 it turns out that we may have a hidden advantage through implementing
76 Kazan as a Vulkan Driver. In [this mailing list
77 discussion](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2018-October/000065.html)
78 which follows on from a proposal on the LLVM mailing list about making
79 matrices a first-class type, Jacob informed me that the Vulkan API
80 also passes in large batches of data that contain matrices, and also
81 arrays of data structures that need to be processed together. The
82 problem here is that you want the elements of the arrays (or the
83 matrix) to be processed as if they were linear, preferably without
84 having to move them around. Matrix multiplication, for example,
85 typically requires the second matrix to be transposed (X swapped with
86 Y) in order to access the elements in a linear fashion. What we
87 decided to do instead was to add [1D/2D/3D data
88 shaping](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2018-October/000087.html).
89 The elements *stay in-place* in the original registers, however a
90 "remapper" engine makes them appear, as far as the parallel
91 (SIMD/Vector) engine is concerned, as if the registers are contiguous.
92 In the case of transposing, register numbers 0 1 2 3 / 4 5 6 7 / 8 9
93 10 11 get "remapped" to 0 4 8 / 1 5 9 / 2 6 10 / 3 7 11 without the
94 need for "MV" instructions. Thanks to Mitch Alsup, the designer of
95 the 66000 ISA, we learned that this re-invented wheel has [also been
96 implemented](https://groups.google.com/d/msg/comp.arch/bGBeaNjAKvc/_vbqyxTUAQAJ)
97 in production GPUs and vectorisation systems.
98
99 The point is, that by picking Vulkan and implementing both the hardware
100 design and the software at the same time, we are both constrained
101 *and guided* towards a successful design. In addition, as we know from
102 previous successful (truly) open projects, the very fact you are
103 at liberty to talk about what you're doing (as compared to a secretive
104 proprietary company) means that people with specialist expertise are
105 more than happy to come forward and comment, and help guide you away
106 from areas that have caught out billion-dollar companies in the past.
107
108 The project thus becomes a *synthesis of the expertise and efforts
109 of much more than just the people who are implementing it*.
110
111 Just having the opportunity to do that is extremely humbling. Mitch
112 Alsup, the designer of the famous Motorola 68000 series of processors,
113 is giving us some feedback and input! Like... wow! For example, [he
114 made an extremely valuable
115 recommendation](https://libre-riscv.org/3d_gpu/microarchitecture/) on
116 how to save on register file space, only needing 1R1W (1 x read-port,
117 1 x write-port) SRAM, by stretching out the pipeline phases to load
118 operands sequentially rather than in parallel. I cannot express how
119 grateful I am for his input, and for all the other people who have
120 helped.
121
122 So, given this community, I believe we're in good hands. Ultimately,
123 what is being presented here is more of an opportunity for anyone who
124 has wanted something like this to succeed (or even exist), empowering
125 them to go from "I wish someone would do this" to "I can help make it
126 happen." This is one of the reasons why, if I am honest, I get
127 slightly aggravated by people who write, "oh this project could not
128 possibly succeed" or "this person could not possibly achieve this
129 goal," as such comments entirely miss the point.
130
131 As this update is quite long, I'll answer more on the [Phoronix
132 comments](https://www.phoronix.com/forums/forum/phoronix/latest-phoronix-articles/1064199-the-eoma68-libre-computer-developer-wants-to-tackle-a-quad-core-risc-v-libre-soc-design),
133 however, I particularly wanted to address one comment, here:
134 *"This religious war target seems to be way off based on skill set."*.
135 As you can see above, I believe that one aspect of that comment has
136 been addressed, above: as it's a libre project, unlike a proprietary
137 company we're at liberty to get out on the Internet and ask peoples'
138 advice before even committing to a design.
139
140 The second aspect is the very silly "religious war" implication. It's
141 absolutely nothing of the kind. What many people do not know about me
142 is that I very, very deliberately pick projects that nobody else is doing.
143 I pick projects that make an ethical difference. That, if successful,
144 many peoples' lives would be much better, less painful. It has absolutely
145 sod-all to do with "religious frothing fervour" (foam, foam).
146
147 Many companies choose to make ethical compromises in order to make a profit.
148 People are finally beginning to wake up to the consequences of this kind
149 of concentration of financial and informational power. In India,
150 people have been *murdered* based on Whatsapp viral hear-say. In the USA,
151 democratic elections have been interfered with (e.g., Cambridge Analytica).
152 I could very easily go to any of these massively-unethical corporations,
153 and make an absolute fortune in the process of empowering them to do a
154 hundred more Cambridge Analyticas. *I choose not to do so.*
155
156 There are plenty of companies that make decisions without
157 a moral compass, because, financially, it is easier to do so. And, more
158 poignantly, it is legally permissible and *actively encouraged* by
159 legal frameworks, tax incentives, and Government-sanction monopolies
160 known by the name "patents."
161
162 What I am doing here is to demonstrate that none of that is necessary.
163 That it is possible to design - and get funding for - a desirable
164 product that happens also to be ethical. This is why the goal is
165 as it is: a mobile-class processor, because that's the kind of product
166 that could sell in large volumes at around the USD $4 mark.
167 You won't see any corporation taking on such a goal, as they're required
168 to prioritise profits over ethics. So, it's down to you,
169 if you want this project to succeed, to help make it happen.