add to intriguing ideas
[crowdsupply.git] / updates / 020_2019aug28_intriguing_ideas.mdwn
1 Intriguing Ideas
2
3 Pixilica starts a 3D Open Graphics Alliance initiative;
4 We decide to go with a "reconfigurable" pipeline;
5
6 # The possibility of a 3D Open Graphics Alliance
7
8 {https://youtu.be/HeVz-z4D8os}
9
10 At SIGGRAPH 2019 this year there was a very interesting BoF, where the
11 [idea was put forward]
12 (https://www.pixilica.com/forum/event/risc-v-graphical-isa-at-siggraph-2019/p-1/dl-5d62b6282dc27100170a4a05)
13 by Atif, of Pixilica, to use RISC-V as the core
14 basis of a 3D Embedded flexible GPGPU (hybrid / general purpose GPU). 
15 Whilst the idea of a GPGPU has been floated before (in particular by
16 ICubeCorp), the reasons *why* were what particularly caught peoples'
17 attention at the BoF.
18
19 The current 3D GPU designs -  NVIDIA, AMD, Intel, are hugely optimised
20 for mass volume appeal. Niche markets, by virtue of the profit
21 opportunities being lower or even negative given the design choices of
22 the incumbents, are inherently penalised. Not only that but the source
23 code of the 3D engines is proprietary, meaning that anything outside of
24 what is dictated by the incumbents is out of the question.
25
26 At the BoF, one attendee described how they are implementing *transparent*
27 shader algorithms. Most shader hardware provides triangle algorithms that
28 asume a solid surface. Using such hardware for transparent shaders is a
29 2 pass process which clearly comes with an inherent *100%* performance
30 penalty. If on the other hand they had some input into a new 3D core,
31 one that was designed to be flexible...
32
33 The level of interest was sufficiently high that Atif is reaching out to
34 people (including our team) to set up an Open 3D Graphics Alliance. The
35 basic idea being to have people work together to create an appropriate
36 efficient "Hybrid CPU/GPU" Instruction Set (ISA) suitable for a diverse
37 range of architectures and requirements: all the way from small embedded
38 softcores, to embedded GPUs for use in mobile processors, to HPC servers
39 to high end Machine Learning and Robotics applications.
40
41 One interesting thing that has to be made clear - the lesson from
42 Nyuzi and Larrabee - is that a good Vector Processor does **not**
43 automatically make a good 3D GPU. Jeff Bush designed Nyuzi very
44 specifically to replicate the Larrabee team's work: in particular, their
45 use of a recursive software-based tiling algorithm. By deliberately
46 not including custom 3D Hardware Accelerated Opcodes, Nyuzi has only
47 25% the performance of a modern GPU consuming the same amount of power.
48 Put another way: if you want to use a pure Vector Engine to get the same
49 performance as a commercially-competitive GPU, you need *four times*
50 the power consumption and four times the silicon area.
51
52 Thus we simply cannot use an off-the-shelf Vector extension such as the
53 upcoming RISC-V Vector Extension, or even SimpleV, and expect to
54 automatically have a commercially competitive 3D GPU. It takes texture
55 opcodes, Z-Buffers, pixel conversion, Linear Interpolation, Trascendentals
56 (sin, cos, exp, log), and much more, all of which has to be designed,
57 thought through, implemented *and then used behind a suitable API*.
58
59 In addition, given that the Alliance is to meet the needs of "unusual"
60 markets, it is no good creating an ISA that has such a high barrier to
61 entry and such a power-performance penalty that it inherently excludes
62 the very implementors it is targetted at, particularly in Embedded markets.
63
64 Thus we need a Hybrid Architecture, not just to reduce complexity, not
65 just to meet Libre criteria, but to meet the long tail of innovation in
66 3D and kick start some real innovation.
67 These were the challenges discussed at the upcoming first
68 [meetup](https://www.meetup.com/Bay-Area-RISC-V-Meetup/events/264231095/)
69 at Western Digital's Milpitas HQ. Experts in 3D at the Meetup were really
70 enthusiastic and praised this approach.
71
72 # Reconfigureable Pipelines
73
74 Jacob came up with a fascinating idea: a reconfigureable pipeline. The
75 basic idea behind pipelines is that combinatorial blocks are separated
76 by latches.  The reason is because when gates are chained together,
77 there is a ripple effect which has to have time to stabilise. If the
78 clock is run too fast, computations no longer have time to become valid.
79
80 So the solution is to split the combinatorial blocks into shorter chains,
81 and have "latches" in between them which capture the intermediary
82 results. This is termed a "pipeline".  Actually it's more like an
83 escalator.
84
85 The problem comes when you want to vary the clock speed. This is desirable
86 because if the pipeline is long and the clock rate is slow, the latency
87 (completion time of an instruction) is also long.
88
89 Conversely, if the pipeline is short (large numbers of gates connected
90 together) then as mentioned above, this can inherently limit the maximum
91 frequency that the processor could run at.
92
93 What if there was a solution which allowed *both* options? What if you
94 could actually reconfigure tge pipeline to be shorter or longer?
95
96 It turns out that by using what is termed "transparent latches" that it
97 is possible to do precisely that.  The advantages are enormous and were
98 described in detail on comp.arch
99
100 Earlier in
101 [this thread](https://groups.google.com/d/msg/comp.arch/fcq-GLQqvas/SY2F9Hd8AQAJ),
102 someone kindly pointed out that IBM published
103 papers on the technique.  Basically, the latches normally present in the
104 pipeline have a combinatorial "bypass" in the form of a Mux. The output
105 is dynamically selected from either the input *or* the input after it
106 has been put through a flip-flop. The flip-flop basically stores (and
107 delays) its input for one clock cycle.
108
109 By putting these transparent latches on every other combinatorial stage
110 in the processing chain, the length of the pipeline may be halved, such
111 that when the clock rate is also halved the *instruction completion time
112 remains the same*.
113
114 Normally if the processor speed were lowered it would have an adverse
115 impact on instruction latency.
116
117 It's a fantastic idea that will allow us to reconfigure the processor
118 to reach a 1.5ghz clock rate for high performance bursts.
119
120 # NLNet Funding proposals.
121
122 The next step is to put in half a dozen NLNet Funding proposals. No,
123 literally:
124 [seven new proposals](https://libre-riscv.org/nlnet_proposals/),
125 each for EUR 50,000. One for gcc, one for a port of MESA RADV to the
126 new processor, another for writing experimental assembly code to go into
127 libswscale, libx264 etc. ultimately for use in VLC and ffmpeg and so on.
128
129 Best of all, two for actually doing a test ASIC: one working with
130 chips4makers, the other with lip6.fr. It turns out that 180nm ASIC shuttle
131 services cost only USD 600 per square mm, and we can get away with around
132 20 sq.mm which is about USD 12,000 and estimated 800,000 gates.
133
134 At that low cost, we can iterate before going to lower geometries plus
135 actually have something which, even at 350mhz, if it was dual issue,
136 would be a reasonable saleable product in its own right. The only thing
137 we have to watch out for, there, is that it will be a bit of a monster
138 so power consumption is going to be high at 350mhz. Still, for a first
139 ASIC ever, it's just exciting to think that it's possible at all.
140
141 Regarding the NLNet proposals: we need people! In particular, we need two
142 EU Citizens to come forward, to satisfy NLNet's backers' requirements
143 (Thanks to [NGU.eu](https://ngi.eu), NLNet has received its money under
144 the EU Horizon 2020 Programme), so at least one EU Citizen has to be
145 part of the proposal. One for gcc, another for the MESA/RADV port.
146 Please do contact me for details. There's no contract or obligation,
147 because this is charitable donations.
148
149 In addition, if anyone wants to receive tax deductible charitable
150 donations direct from NLNet for working on aspects of this project,
151 do get in touch, there is plenty to do. Application reviews start in 2
152 weeks, we will hear from NLnet by December as to what has been approved,
153 and will be able to expand the project scope around January 2020.
154
155 Also remember, if you work for a Corporation that could financially
156 benefit from this project being a reality, sponsorship, via NLNet,
157 is tax deductible because it is a charitable donation.
158