reword
[crowdsupply.git] / updates / 002_201830nov_phoronix_articles.mdwn
1 Many thanks to Michael Larabel as he has been writing early articles
2 on this project before we had a chance to get this pre-launch stage
3 up and running, he picked up on the first update
4 [here](http://www.phoronix.com/scan.php?page=news_item&px=Quad-Core-Libre-SoC-Proposal). The first article covered a lot more of the
5 [technical details](https://www.phoronix.com/scan.php?page=news_item&px=Libre-GPU-RISC-V-Vulkan), and the second covered an announcement of
6 [Kazan](https://www.phoronix.com/scan.php?page=news_item&px=Kazan-Vulkan-Rust),
7 which implements the Vulkan 3D API.
8
9 There has been quite a lot going on, including an enormous amount of
10 planning for nearly six to eight months going on, so there are quite
11 a few catch-up updates to write. It's worthwhile doing one that incorporates
12 responses to Michael and to some of the people who also kindly asked
13 questions and made comments on the
14 [Phoronix Forum](https://www.phoronix.com/forums/node/1064199).
15
16 I have no illusions about the cost of development of this project: it's
17 going to be somewhere north of USD $6 million, with contingency of up
18 to USD $10 million. This is just how it is. What that means is,
19 interestingly, it means that there's provision for both investment and
20 also to attract really, really good talent, and to properly pay for it.
21 Where the project has started from is what can be achieved with the
22 current resources. I've been kindly sponsored with a ZC706 FPGA board
23 (worth over USD $2,500), which will allow one major hurdle to be cleared
24 that will meet the criteria of many investors: making sure that the
25 design is FPGA proven.
26
27 Secondly, Michael, I note some incredulity at the goal of meeting the
28 target of mobile-class 3D performance. It's actually extremely modest:
29 100 million pixels per second, 30 million triangles per second, and around
30 5 to 6 GFLOPs. These statistics were taken from the benchmarks for
31 Vivante's GC800. Achieving these kinds of numbers is dead easy. Achieving
32 them within a power envelope of under 2.5 watts? Not so easy!
33
34 So here, what I did was, spend a considerable amount of time speaking to
35 Jeff Bush, who developed
36 [Nyuzi](https://www.phoronix.com/scan.php?page=news_item&px=LGPL-GPGPU-NyuziProcessor).
37 Jeff's work is fascinating and extremely valuable because despite it being
38 such low 3D peformance, the technical documentation and academic analysis
39 of *why* that performance is so low is absolutely, absolutely critical.
40 The [paper](http://www.cs.binghamton.edu/~millerti/nyuziraster.pdf) that
41 Jeff co-authored makes a comparison of software and hardware rasterisation,
42 and he actually developed a fixed-function hardware renderer called
43 [ChiselGPU](https://github.com/jbush001/ChiselGPU) in order to do
44 comparisons.
45
46 One fascinating insight that came out of Jeff's work was that just getting
47 data through the L1/L2 cache has a massive impact on power consumption.
48 A way to deal with that is to increase the number of registers in the
49 design until such time as the data being processed (a tile for example,
50 or an inter-dependent 4-wide bank of 4x3 Floating-point numbers) can
51 all fit into the register file, so as not to need to be pushed back down
52 to the L1 cache and back. Some GPUs have a "scratch RAM" area to deal
53 with this. Staggeringly, even for (or, especially for) a mobile-class
54 GPU, we had to increase the register file size to a whopping 128 64-bit
55 entries, that can be broken down into **256** 32-bit single-precision
56 floating-point entries! What are we *doing*! This is supposed to be
57 a modest design!
58
59 A little digging around the Internet reveals that even mobile-class GPUs
60 genuinely have this number of registers. More than that, though, it
61 turns out that we may have a hidden advantage through implementing
62 Kazan as a Vulkan Driver. In this
63 [discussion](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2018-October/000065.html)
64 which follows on from a proposal on the LLVM mailing list about making
65 Matrices a first-class type, Jacob informed me that the Vulkan API also
66 passes in large batches of data that contains Matrices, and also arrays of
67 data structures that need to be processed together. The problem here
68 is that you want the elements of the arrays (or the Matrix) to be
69 processed as if they were linear, preferably without having to move
70 them around. Matrix multiplication for example typically requires the
71 2nd matrix to be transposed (X swapped with Y) in order to access the
72 elements in a linear fashion. What we decided to do instead was to
73 add [1D/2D/3D data shaping](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2018-October/000087.html). The elements *stay in-place* in the
74 original registers, however a "remapper" engine makes them appear,
75 as far as the parallel (SIMD/Vector) engine is concerned, as if the
76 registers are contiguous. Register numbers 0 1 2 3 / 4 5 6 7 / 8 9 10 11
77 get "remapped" to 0 4 8 / 1 5 9 / 2 6 10 / 3 7 11 without the need
78 for "MV" instructions. Thanks to Mitch Alsup, the designer of the
79 66000 ISA, we learned that this re-invented wheel has
80 [also been implemented](https://groups.google.com/d/msg/comp.arch/bGBeaNjAKvc/_vbqyxTUAQAJ)
81 in production GPUs and Vectorisation Systems.
82
83 The point is that by picking Vulkan, and implementing both the hardware
84 design and the software at the same time, we are both constrained
85 *and guided* towards a successful design. In addition, as we know from
86 previous successful (truly) open projects, the very fact that you are
87 at liberty to talk about what you're doing (as compared to a secretive
88 proprietary company) means that people with specialist expertise are
89 more than happy to come forward and comment, and help guide you away
90 from areas that have caught out billion-dollar companies in the past.
91
92 The project thus becomes a *synthesis of the expertise and efforts
93 of much more than just the people who are implementing it*.
94
95 Just having the opportunity to do that is extremely humbling. Mitch Alsup,
96 the designer of the famous Motorola 68000 series of processors,
97 is giving us some feedback and input! Like... wow! For example,
98 he made an extremely valuable recommendation
99 [here](https://libre-riscv.org/3d_gpu/microarchitecture/) on how
100 to save on register file space, only needing 1R1W (1x read-port,
101 1x write-port) SRAM, by stretching out the pipeline phases to
102 load operands sequentially rather than in parallel. I cannot express
103 how grateful I am for his input, and for all the other people who
104 have helped.
105
106 So I believe we're in good hands, here. Ultimately, what is being
107 presented here is more of an opportunity for anyone who has wanted
108 something like this to succeed (or even exist), empowering them to
109 go from "I wish someone would do this" to "I can help make it happen".
110 This is one of the reasons why, if i am honest, I get slightly aggravated
111 by people who write, "oh this project could not possibly succeed"
112 or "this person could not possibly achieve this goal" as such comments
113 entirely miss the point.
114
115 As this update is quite long, I'll answer more on the Phoronix
116 [comments](https://www.phoronix.com/forums/forum/phoronix/latest-phoronix-articles/1064199-the-eoma68-libre-computer-developer-wants-to-tackle-a-quad-core-risc-v-libre-soc-design)
117 however I particularly wanted to address one comment, here:
118 *"This religious war target seems to be way off based on skill-set."*.
119 As you can see above, I believe that one aspect of that comment has
120 been addressed, above: as it's a libre project, unlike a proprietary
121 company we're at liberty to get out on the Internet and ask peoples'
122 advice before committing even to a design.
123
124 The second aspect is the very silly "Religious War" implication. It's
125 absolutely nothing of the kind. What many people do not know about me
126 is that I pick projects that nobody else is doing, very very deliberately.
127 I pick projects that make an ethical difference. That, if successful,
128 many peoples' lives would be much better, less painful. It has absolutely
129 sod-all to do with "Religious frothing fervour" (foam, foam).
130
131 Many companies choose to make ethical compromises in order to make a profit.
132 People are finally beginning to wake up to the consequences of this kind
133 of concentration of financial and informational power. In India,
134 people have been *murdered* based on Whatsapp viral hear-say. In the USA,
135 democratic elections have been interfered with (Cambridge Analytica).
136 I could very easily go to any of these massively-unethical Corporations,
137 and make an absolute fortune in the process of empowering them to do a
138 hundred more Cambridge Analyticas. *I choose not to do so*.
139
140 So there are plenty of companies that make decisions without
141 a moral compass, because, financially, it is easier to do so. And, more
142 poignantly, it is legally permissible and *actively encouraged* by
143 legal frameworks, tax incentives and Government-sanction monopolies
144 known by the name "patents".
145
146 What I am doing here is to demonstrate that none of that is necessary.
147 That it is possible to design - and get funding for - a desirable
148 product that happens also to be ethical. This is why the goal is
149 as it is: a mobile-class processor, because that's the kind of product
150 that could sell in large volumes at around the USD $4 mark.
151 You won't see any Corporation taking on such a goal, as they're required
152 to prioritise profits over ethics. So it's down to you,
153 if you want this project to succeed, to help make it happen.
154