add slides
[libreriscv.git] / shakti / m_class / libre_riscv_chennai_2018.tex
1 \documentclass[slidestop]{beamer}
2 \usepackage{beamerthemesplit}
3 \usepackage{graphics}
4 \usepackage{pstricks}
5
6 \title{Commercial Libre-RISCV SoC}
7 \author{Luke Kenneth Casson Leighton}
8
9
10 \begin{document}
11
12 \frame{
13 \begin{center}
14 \huge{Designing a Commercial Libre RISC-V SoC}\\
15 \vspace{32pt}
16 \Large{Ethical Strategic Leveraging of the benefits}\\
17 \Large{of Libre and Open SW/HW}\\
18 \Large{for pure unadulterated Commercial gain}\\
19 \vspace{24pt}
20 \Large{Chennai 9th RISC-V Workshop}\\
21 \vspace{16pt}
22 \large{\today}
23 \end{center}
24 }
25
26
27 \frame{\frametitle{Credits and Acknowledgements}
28
29 \begin{itemize}
30 \item The Designers of RISC-V
31 \item The RISC-V Foundation
32 \item The Shakti Group, and IIT Madras RISE Group
33 \item Prof. G S Madhusudan
34 \item Neel Gala
35 \item Rishabh Jain
36 \item Members of the RISC-V Open Groups (SW/HW/ISA)
37 \item Libre and Open Software and Hardware Communities
38 \item Richard Herveille (RoaLogic), Edmund Humenberger, Clifford Wolf
39 (Symbiotica EDA), Rudi (Asics.ws), Enjoy-Digital.fr,
40 Alex Forenchich, LowRISC Team
41 \item Anonymous Sponsor
42 \end{itemize}
43 }
44
45
46 \frame{\frametitle{Why, How, What?}
47
48 \begin{itemize}
49 \item Why? Because these days it's just not necessary to
50 make [un]ethical compromises in order to make a profitable,
51 desirable mass-volume product\\
52 {\it (There's enough companies doing that: where it's got us??)}
53 \item How? By leveraging the long-establised strategic cost and
54 maintenance benefits of libre-licensed software (and
55 HDL) and
56 {\it making sure that the people who provide it are
57 financially rewarded}. Also by empowering diverse team
58 collaboration
59 \item What? A 2.5ghz RISC-V 64-bit SoC that has
60 a 3D Embedded GPU, 1080p Video decode, and interfaces
61 to make it attractive for use in tablets, netbooks, industrial
62 embedded and more. 22nm or less, under 400 pins, under USD \$4.\\
63 {\it All sounds obvious... but is it practical and achievable?}
64 \end{itemize}
65 }
66
67
68 \frame{\frametitle{Definitions}
69
70 \begin{itemize}
71 \item {\bf Business}: the provision of a service and being
72 commensurately financially rewarded for doing so
73 \item {\bf Spongeing}: the provision of a service and being
74 taken advantage of for doing so {\it (cf: Professor Yunus)}
75 \item {\bf An ethical act}: an act that increases truth,
76 love, awareness or creativity for one or more people
77 (including yourself), {\it without} reducing those
78 same four qualities {\it for anyone}
79 \item {\bf The Four Freedoms}: the rights and guarantees
80 associated with and embedded within GNU Licenses {\it (cf: FSF)}
81 \end{itemize}
82 {\it Is it possible to ethically do business and respect the
83 Four Freedoms? That's where it gets interesting, as there are
84 even cases where the Four Freedoms are unethical. Note: google's
85 former motto "don't be evil" is clearly (unintentionally) unethical}
86 }
87
88
89 \frame{\frametitle{Does what we want already exist? Surely this is nonsense!}
90 \begin{center}
91 \includegraphics[height=2.4in]{nolibresocs.jpg}\\
92 {\bf Analysis of SoCs over the past 7+ years (answer: no)}
93 \end{center}
94 }
95
96
97 \frame{\frametitle{Breakdown of non-existence of fully-Libre SoCs}
98
99 \begin{itemize}
100 \item {\bf iMX6}: Libre bootable, Vivante 3D GPU (libre etnaviv)
101 but proprietary VPU (and a power-hungry Cortex A9)
102 \item {\bf Allwinner SoCs}: mostly Libre bootable,
103 VPU reverse engineered; GPU: MALI or PowerVR (i.e. proprietary)
104 \item {\bf Rockchip SoCs}: good but using MALI or PowerVR.
105 \item {\bf TI OMAP}: good but using PowerVR. and expensive.
106 \item {\bf Samsung}: good but using MALI.
107 \item {\bf Ingenic jz4775}: GREAT! performance
108 sucks (1ghz MIPS32).
109 \item {\bf Broadcom SoCs}: Cartelled. and boots from the GPU
110 \end{itemize}
111 {\it Basically there does not exist one single commercial SoC that
112 provides full source code for all functions (CPU, GPU, VPU)
113 with modern performance. Which is kinda bizarre if you think about it}
114 }
115
116
117 \frame{\frametitle{What would a good (Libre) boring, mundane SoC have?}
118
119 \begin{itemize}
120 \item Cover a lot of different scenarios (embedded, tablets, industrial,
121 netbooks, crypto-currency mining).
122 \item Decent performance with high efficiency. RISC-V: 40\%
123 more efficient than ARM / Intel. Shakti a good
124 candidate: 2.5ghz and 120mW per core @ 22nm.
125 \item 1080p video: y'all gotta watch cute kittens on youtube, right?
126 \item 3D GPU: y'all gotta play Angri Burds, right? (or Minecraft)
127 \item No spying back-door co-processors (to steal crypto-wallets)
128 \item No Spectres, no Meltdowns.
129 \end{itemize}
130 {\it Basically quite boring and mundane. No Monster Performance,
131 no AI stuff, no special sauce. Just a plain-old SoC,
132 40\% more power efficient than ARM/Intel,
133 and not spying on end-users, that's all}
134 }
135
136
137 \frame{\frametitle{How on earth does an ethical Libre SoC make money???}
138
139 \begin{itemize}
140 \item Simple answer: Mask Rights.
141 \item Without Mask Rights: by having a desirable
142 product, and packaging it for a customer (i.e. by being a middle-man
143 a service is still being provided for which payment etc. etc.)
144 \item Without a desirable product or customer(s): err... you don't.\\
145 (cf: definition of Business)
146 \item By not having high NREs (leveraging back-to-back deals,
147 and helping others fulfil their needs and goals)
148 \end{itemize}
149 {\it Detachment from the goal also helps. If someone else makes this
150 product then GREAT! I can go do something else}\\
151 \vspace{4pt}
152 {\bf Main point: please do not automatically assume Ethical and Libre is
153 non-commercial. It's not nice, and it's not helping }
154
155 }
156
157 \frame{\frametitle{Things wot are "off-limits"}
158
159 \begin{itemize}
160 \item Customer entrapment (through proprietary software).\\
161 Strong business case for not entrapping customers:\\
162 \url{https://tinyurl.com/most-productive-meeting-ever}
163 \item Funding, endorsing, supporting or empowering unethical
164 Companies, Organisations, Cartels and Individuals.\\
165 (cf: definition of an ethical act).
166 \item Being totally inflexible / unrealistic. Goals have
167 to be met: it's no good being an idiot about that. e.g. if
168 a Libre 3D GPU really can't be made, use Vivante GC800
169 (with etnaviv).
170 \item Spying back-door co-processors a no-no. Sovereignty
171 is critical. Russia has Baikal. China has Loongson.
172
173 \end{itemize}
174 {\it Still no real show-stoppers to making money (or product):
175 it's just slightly harder, that's all. Ultimately it's about
176 confidence. }
177 }
178
179
180 \frame{\frametitle{Interfaces, Block Diagram, of the Libre-RISCV SoC}
181 \begin{center}
182 \includegraphics[height=2.1in]{../shakti_libre_riscv.jpg}\\
183 {\bf Separate Power Domains for GPIO banks, Variable voltages
184 required, low-power sleep states etc. Quite involved}
185 \end{center}
186 }
187
188
189 \frame{\frametitle{Hardware / Development Complexity Comparison}
190
191 \begin{itemize}
192 \item {\bf Server}: relatively easy. PCIe, RapidIO, XAUI, SATA, GbE, 10GE,
193 DDR3/4 (or HMC) etc. etc. No multiplexing: all interfaces dedicated
194 and high-speed differential pairs.
195 \item {\bf Desktop}: really just a variant of Server.
196 Graphics is a PCIe Card (except if integrated). Peripherals
197 often done in dedicated external ICs ("Southbridge" concept)
198 \item {\bf Embedded}: also pretty easy. Really needs a pinmux. Low clock
199 rate, low power mode. e.g. SiFive Freedom U310.
200 \item {\bf Mobile}: HARD. Performance/Watt matters $=>$ variable core
201 voltage domains {\it per core}. Number of pins matters (affects
202 yield and package cost). Cost
203 matters. Pinmux critical.
204 \end{itemize}
205 {\it Bottom line: Mobile-class processors are challenging!}
206 }
207
208
209 \frame{\frametitle{Proprietary vs Libre-licensed Interface HDL}
210
211 \begin{itemize}
212 \item DDR3/4: challenging! \$1m for single-use, single instance.\\
213 Symbiotic EDA: \$600k for PHY; CERN developed a Controller\\
214 \url{http://libre-riscv.org/shakti/m_class/DDR/}
215 \item HyperRAM (JEDEC xSPI): lower risk than DDR3/4\\
216 \url{http://libre-riscv.org/shakti/m_class/HyperRAM/}
217 \item RGMII: several available (saves \$50k)\\
218 \url{http://libre-riscv.org/shakti/m_class/RGMII/}
219 \item UART, SPI, I2C, PWM, SD/MMC: all libre (except eMMC).
220 \item Shakti Group has FlexBus, QuadSPI, SRAM, many more.
221 \item RGB/TTL: R. Herveille (SSD2828, SN75LVDS83b, TFP410a)
222 \end{itemize}
223 {\it Basically there's no compelling reason to spend vast sums
224 on proprietary HDL. Sorry Cadence / Mentor / Synopsis / whoever}
225 }
226
227
228 \frame{\frametitle{Challenging Stuff [1] - Memory Interfaces}
229
230 \begin{itemize}
231 \item DDR3/4 PHYs are analog and very high speed.
232 Impedance training. Extreme timing tolerances on parallel buses.\\
233 No surprise proprietary cost is USD \$1m and above.
234 \item Symbiotic EDA will do (Libre) PHY layout for USD \$300k,
235 time to completion for chosen geometry: 8-12 months.
236 \end{itemize}
237 {\it Silicon-proven but still risky. What are the alternatives?}
238 \vspace{4pt}
239 \begin{itemize}
240 \item FlexBus/SDRAM (low clock, lots of pins, single-data-rate).
241 \item HyperRAM (aka JEDEC xSPI) 8-bit SPI 166mhz or DDR-300.\\
242 300mbyte/sec for only 13 wires, not bad! (We'll take several)\\
243 \url{http://libre-riscv.org/shakti/m_class/HyperRAM/}
244 \item HMC: insanely fast, very low power. OpenHMC (LGPL)
245 \url{https://opencores.org/project/openhmc}
246 \end{itemize}
247 }
248
249
250 \frame{\frametitle{Challenging Stuff [2] - Video Decode Engine}
251
252 \begin{itemize}
253 \item Richard Herveille's Video Core Blocks\\
254 https://opencores.org/project/video\_systems
255 \item Symbiotic EDA MP4 decoder in FPGA
256 \item H.264 seems to have been done...\\
257 https://github.com/adsc-hls/synthesizable\_h264
258 \item Really needs SIMD (or better, not-SIMD)\\
259 \url{http://libre-riscv.org/simple_v_extension/}
260 \item Definitely needs xBitManip (parallelised by Simple-V)\\
261 \url{https://github.com/cliffordwolf/xbitmanip}
262 \end{itemize}
263 {\it SIMD is insane. $O(N^6)$ opcode proliferation. See\\
264 https://www.sigarch.org/simd-instructions-considered-harmful/ \\
265 (1): P-Ext designed for Audio. (2): Investigate RI5CY's SIMD
266 }
267 }
268
269
270 \frame{\frametitle{Challenging Stuff [3] - Power Management}
271
272 \begin{itemize}
273 \item Been done before (many times), but not as a Libre Design.
274 \item Sanjay Charagulla: GlobalFoundries 22nm mobile process
275 can reach as low as 0.4v
276 \item GPIO Banks need per-bank VREF (1.8v? to 3.3v)\\
277 IO pads need built-in
278 level-shifting to convert to CPU VCORE
279 \item Each core needs independent variable-voltage capability
280 and independent shut-down (PMIC supplies external voltage)
281 \item DDR RAM still needs refreshing (even in sleep mode)
282 \item Extra RV32 (PicoRV32?) always-on core for wake-up / RTC
283 \item PLLs are Analog. fun fun fun in the sun sun sun...
284 \end{itemize}
285 {\it Really need help. PLLs, Analog stuff: specific
286 domain expertise. Fall-back example:}
287 \url{https://www.dolphin-integration.com}?
288
289 }
290
291
292 \frame{\frametitle{Challenging Stuff [4] - Libre 3D GPU. Sigh.}
293
294 \begin{itemize}
295 \item Actual requirements quite modest: 30MP/s 100MT/s 5GFLOPS
296 but power/area is crucial ($2mm^2$ @ 40nm, 1W)
297 \item Nyuzi, MIAOW, GPLGPU (Number Nine), OGP.
298 \item Nyuzi based on Larrabee. Jeff Bush really helpful.
299 \item MIAOW is an OpenCL engine. GPLGPU is fixed-function
300 \item Nyuzi lessons: Software-only rendering not enough.
301 Getting through L1 cache takes most power. Fixed functions
302 such as parallel FP-Quad to ARGB Pixel, and Z-Buffer
303 needed.
304 \item Fallback is GC800 (\$250k) {\it contact me if you can do better!}
305 \end{itemize}
306 {\it Jacob Bachmeyer's Cache-control proposal turns L1 Cache into
307 scratchpad RAM. RVV is just too heavy (sorry!), Simple-V much
308 more light-weight and flexible ($O(1)$ ISA proliferation)
309 }
310 }
311
312
313 \frame{\frametitle{Challenging Stuff [5] - Public Custom Extensions}
314
315 \begin{itemize}
316 \item GPUs are usually done with incompatible ISAs and effectively
317 doing OpenGL over IPC / RPC (Remote Procedure Calls)
318 \item Much simpler: GPGPU "one ISA" approach. Custom-extend the
319 core ISA to handle 3D, use Gallium3D-LLVM.
320 \item Now add Video Extensions. and SIMD etc and
321 {\bf we are well beyond the only 2 available 32-bit custom opcodes}
322 \item Due to the Libre nature of this project, the custom opcode
323 space will be "dominated" by
324 high-profile public hard-forks of gcc, binutils, llvm etc.
325 Which isn't going to go down well.
326 \item ISA "Conflict Resolution" is therefore absolutely critical\\
327 \url{http://libre-riscv.org/isa_conflict_resolution/}
328 \end{itemize}
329 {\it Remember Altivec. Learn from Intel.
330 \underline{This is everyone's problem.}
331 }
332 }
333
334
335 \frame{\frametitle{Interesting Missing Stuff [1] - Pinmux}
336
337 \begin{itemize}
338 \item Pinmux: multiplexer of functions onto pins\\
339 {\it DRAM Cell != DDR3/4, Mux Cell != Muxer}
340 \item Strategically extremely important to Commercial SoC success\\
341 STMicro, Rockchip, Freescale, Samsung, TI, {\bf EVERYONE}
342 \item Bizarrely, a libre-licensed multi-way Pinmux doesn't exist.\\
343 {\it not on anyone's radar. at all.}
344 SiFive IOF not enough.
345 \item Verification (scenario analysis) and auto-generation of
346 TRM, header files, device-tree files, pretty much everything
347 makes sense (to any "lazy" Software Engineer...)
348 \item Corporations with legacy pinmux unlikely to be interested.
349 \item \url{http://git.libre-riscv.org/?p=pinmux.git} \\
350 \url{http://hands.com/~lkcl/pinmux\_chennai\_2018.pdf}
351 \end{itemize}
352 }
353
354
355 \frame{\frametitle{Interesting Missing Stuff [2] - AC97/I2S, USB2 PHY}
356
357
358 \begin{itemize}
359 \item Rudi (Asics.ws) donating time to create a Multi-Protocol
360 Audio Controller: AC97, PCM, PDM, I2S\\
361 \url{http://libre-riscv.org/shakti/m_class/AC97/}
362 \item USB2 is... convoluted. UTMI-ULPI-USB2 PHY\\
363 USB2-PHY not confirmed (Rudi has one)\\
364 Also Rudi has DDR (8-pin) variant of ULPI
365 \url{http://libre-riscv.org/shakti/m_class/ULPI/}
366 \item USB3 not necessarily a good idea to put into Libre-RISCV\\
367 Daisho USB3 Pipe exists, TUSB1310a PHY is 175 pin FBGA!
368 \item Libre SD/MMC typically at "Open" Level 20MB/sec appx.
369 Full spec and eMMC requires membership (obtained already).
370 \end{itemize}
371 {\it Trying to keep interfaces all-digital (USB3 isn't,
372 HP/Mic definitely isn't). Use
373 external PHYs or Multi-chip Module.
374 }
375 }
376
377
378 \frame{\frametitle{TODO}
379
380 \begin{itemize}
381 \item TODO\vspace{8pt}
382 \end{itemize}
383 }
384
385
386 \frame{\frametitle{Summary}
387
388 \begin{itemize}
389 \item Making a commercially-desirable SoC is neither academically
390 nor standard-investor sexy! No AI. Boring. zzzz
391 \item Luckily there is an anonymous sponsor who needs an SoC that
392 doesn't exist (who knows the commercial benefits of Libre)
393 \item Shakti Group know the benefits (cost, sovereignty) of a Libre
394 Mobile-Class SoC as well (No spying on India citizens!)
395 \item A Libre GPU, even a modest performer (100T/s etc.)
396 is the biggest technical risk/unknown, besides DDR3/4.\\
397 (fall-back is GC800. Do please help with a Libre GPU!)
398 \item DDR3/4 and eMMC are the main high-risk interfaces\\
399 (there are fall-back strategies in place)
400 \item Ultimately the strategy is all about cost reduction
401 vs risk mitigation,
402 with Libre/Ethical prioritised over "convenience"
403 \end{itemize}
404 }
405
406
407 \frame{
408 \begin{center}
409 {\Huge The end\vspace{20pt}\\
410 Thank you\vspace{20pt}\\
411 Questions?\vspace{20pt}
412 }
413 \end{center}
414
415 \begin{itemize}
416 \item Contact: lkcl@lkcl.net
417 \item \url{http://libre-riscv.org/shakti/m_class/}
418 \end{itemize}
419 }
420
421
422 \end{document}