bug 1048, ls011: Add FP Store Shifted Post-Update section
[libreriscv.git] / simple_v_extension / open_3d_alliance_2019aug26.tex
1 \documentclass[slidestop]{beamer}
2 \usepackage{beamerthemesplit}
3 \usepackage{graphics}
4 \usepackage{pstricks}
5
6 \title{Open 3D Alliance RISC-V}
7 \author{Luke Kenneth Casson Leighton}
8
9
10 \begin{document}
11
12 \frame{
13 \begin{center}
14 \huge{Open 3D Alliance: RISC-V}\\
15 \vspace{32pt}
16 \Large{An open invitation to collaborate on 3D Graphics}\\
17 \Large{Hardware and Software}\\
18 \Large{for mobile, embedded, and innovative purposes}\\
19 \vspace{24pt}
20 \Large{With thanks to Pixilica, GoWin, and Western Digital}\\
21 \vspace{16pt}
22 \large{\today}
23 \end{center}
24 }
25
26
27 \frame{\frametitle{Why collaborate?}
28
29 \begin{itemize}
30 \item 3D is hard. It's also not the same as HPC\vspace{15pt}
31 \item NVIDIA, AMD, Imagination - cannot meet "unusual" needs\vspace{15pt}
32 \item Working together on flexible standards, everyone wins\vspace{15pt}
33 \item Without collaboration: 10-20 man-years development\vspace{10pt}
34 \item With collaboration: cross-verification (avoids mistakes)
35 \end{itemize}
36 }
37
38
39 \frame{\frametitle{What is the goal?}
40
41 \begin{itemize}
42 \item You get to decide! No, really!\vspace{12pt}
43 \item Outlined here: some ideas and cost/time-saving approaches\vspace{12pt}
44 \item Two new platforms: 3D "Embedded", 3D "UNIX"\vspace{12pt}
45 \item Flexible optional extensions (Transcendentals, Vectors,\\
46 Texturisation, Pixel/Z-Buffers - all optional)\vspace{12pt}
47 \item Good software support absolutely essential\\
48 (basically, that means Vulkan)\vspace{15pt}
49 \end{itemize}
50 }
51
52 \frame{\frametitle{Libre RISC-V Team}
53
54 \begin{itemize}
55 \item Small team, sponsored by Purism and the NLnet Foundation\vspace{8pt}
56 \item Therefore, focus is on efficiency: leap-frogging ahead\\
57 without requiring huge resources.\vspace{8pt}
58 \item OpenGL API? Gallium3D / Vulkan is better\vspace{8pt}
59 \item Gallium3D turns out to be a single-threaded interpreter\\
60 (Vulkan is compiled, and can be parallelised)\vspace{8pt}
61 \item Independent teams have provided OpenGL to Vulkan adaptors\vspace{8pt}
62 \item Same approach on hardware: seek highest bang-per-buck\\
63 Save design time, save implementation time\vspace{8pt}
64 \end{itemize}
65 }
66
67 \frame{\frametitle{What (optional) things are needed?}
68
69 \begin{itemize}
70 \item Vectorisation. (SIMD? RVV? Other?)\vspace{12pt}
71 \item Transcendentals (SIN, COS, EXP, LOG)\vspace{12pt}
72 \item Texture opcodes, Pixel/Z-Buffers\vspace{12pt}
73 \item Pixel conversion (YUV/RGB etc.)\vspace{12pt}
74 \item Optional accuracy (embedded space needs less accuracy)\vspace{12pt}
75 \item Options give implementors flexibility. No imposition:\\
76 imposition risks fragmentation (however, collaboration does\\
77 need some hard easily-logically-justifiable rules)
78 \end{itemize}
79 }
80
81 \frame{\frametitle{What is essential (not really optional)}
82
83 \begin{itemize}
84 \item The software, basically. Anything other than Vulkan\\
85 is a 10+ man-year effort
86 \item Two new 3D "platforms". Vulkan compliance has implications\\
87 for hardware, and, with the API being public, interoperability\\
88 (and Khronos Compliance - which is Trademarked) is critical.
89 \item Respecting that standards are hard to get right\\
90 (and that consequences of mistakes are severe:\\
91 no opportunity for corrections after a freeze)
92 \item Respecting that, for collaboration and interoperability,\\
93 some things go into a standard that you might not "need"
94 \item Mutually respectful open and fully transparent collaboration.\\
95 No NDAs, no "closed forums". We need the help of experts\\
96 (such as Mitch Alsup) in this highly technical specialist area.
97 \end{itemize}
98 }
99
100 \frame{\frametitle{Why Two new Platforms?}
101
102 \begin{itemize}
103 \item Unique pragmatic consequences of "Hybrid" CPU/GPU
104 \item Embedded - no traps need be raised. Interoperability is\\
105 impossible, software toolchain collaboration is incidental).
106 \item UNIX - illegal instruction traps mandatory: software\\
107 interoperability is mandatory and essential.
108 \item 3D Embedded - failure to allow implementors the freedom\\
109 to reduce FP accuracy automatically results in product failure\\
110 (too many gates, too much power, equals end-user rejection).
111 \item 3D UNIX - likewise. Also: failure to comply with Khronos\\
112 Specifications (then use "Vulkan") is a Trademark violation.
113 \item Solution: allow software to select FP accuracy level\\
114 \textbf{at runtime}. (UNIX Platform: IEEE754. 3D UNIX: Vulkan).\\
115 \item HW: slow for IEEE754, fast for 3D. Product now competitive!
116 \end{itemize}
117 }
118
119 \frame{\frametitle{What has our team done already?}
120
121 \begin{itemize}
122 \item Decided to go the "Hybrid" Route (Separate GPUs requires a\\
123 full-blown RPC/IPC mechanism to transfer all 3D API calls\\
124 to and from userspace memory to GPU memory... and back).
125 \item Developed Simple-V (a "Parallelising" API)\\
126 (Simple-V is very hard to describe, because it is unique:\\
127 there is no common Computer Science terminology)
128 \item Started on Kazan (a Vulkan SPIR-V to LLVM compiler)
129 \item Started work on a highly flexible IEEE754 FPU
130 \item Started work on a "Precise" CDC 6600 style OoO Engine,\\
131 with help from Mitch Alsup, the designer of the M68000
132 \item Variable-issue, predicated SIMD backend, Vector front-end\\
133 "precise" exceptions, branch shadowing, much more
134 \item All Libre-licensed and developed publicly and transparently.
135 \end{itemize}
136 }
137
138 \frame{\frametitle{Why Simple-V? Why not RVV?}
139
140 \begin{itemize}
141 \item RVV is designed exclusively for supercomputing\\
142 (RVV simply has not been designed with 3D in mind).
143 \item Like SIMD, RVV uses dedicated opcodes\\
144 (google "SIMD considered harmful")
145 \item 98\% of FP opcodes are duplicated in RVV. Large portion\\
146 of BitManip opcodes duplicated in predicate Masks
147 \item OP32 space is extremely precious: 48 and 64 bit opcode space\\
148 comes with an inherent I-Cache power consumption penalty
149 \item Simple-V "prefixes" scalar opcodes (all of them)\\
150 No need for any new "vector" opcodes (at all).\\
151 Can therefore use the RVV major opcode for 3D
152 \item SV augments "scalar" opcodes. Implications: it is relatively\\
153 straightforward to convert an \textit{existing design} to SV.\\
154 SV "slots in" between instruction decode and the ALU.
155 \end{itemize}
156 }
157
158 \frame{\frametitle{Simple-V "Prefixing"}
159
160 \begin{itemize}
161 \item SV "Prefix" does exactly that: takes RVC and OP32 opcodes\\
162 and "prefixes" them with predication and a "vector" tag\vspace{8pt}
163 \item Three prefix types: SV P32 (prefixed RVC), P48 and P64\vspace{8pt}
164 \item Prefixed RVC takes 3 "Custom" OP32 opcodes.\\
165 P48 takes standard OP32 scalar opcodes and "prefixes" them\\
166 P64 adds additional vector context on top of P48\\
167 \vspace{8pt}
168 \item "Prefixing" is a bit like SIMD. Vectors may be specified\\
169 of length 2 to 4, elements may be "packed" into registers,\\
170 opcode element widths over-ridden.\vspace{8pt}
171 \item Convenient, but not very space-efficient (and VBLOCK is)\vspace{8pt}
172 \end{itemize}
173 }
174
175 \frame{\frametitle{VBLOCK Format}
176
177 \begin{itemize}
178 \item Again: hard to describe. It is a bit like VLIW (only not really)\\
179 A "block" of instructions is "prefixed" with register "tags"\\
180 which give extra context to scalar instructions within the block
181 \item Sub-blocks include: Vector Length, Swizzling, Vector/Width\\
182 overrides, and predication. All this is added to scalar opcodes!\\
183 \textbf{There are NO vector opcodes} (and no need for any)
184 \item In the "context", it goes like this: "if a register is used\\
185 by a scalar opcode, and the register is listed in the "context",\\
186 SV mode is "activated"
187 \item "Activation" results in a hardware-level "for-loop" issuing\\
188 \textbf{multiple} contiguous scalar operations (instead of just one).
189 \item Implementors are free to implement the "loop" in any fashion\\
190 they see fit. SIMD, Multi-issue, single-execution: anything.
191 \end{itemize}
192 }
193
194 \frame{\frametitle{Other Standard Proposals}
195
196 \begin{itemize}
197 \item Ztrans and Ztrig* - Transcendentals and Trigonometrics\\
198 (optional so that Embedded implementors have some leeway)
199 \item ISAMUX / ISANS - stops arguments over OP32 space\\
200 (also allows clean "paging" of new opcodes into e.g. RVC)
201 \item MV.SWIZZLE and MV.X - RV does not have a MV opcode.
202 \item Zfacc - dynamic FP accuracy. Needed for "fast" Vulkan native\\
203 and to switch between fast 3D accuracy and IEEE754 modes.
204 \item These - and more - need your input! 3D is hard!
205 \item The key strategic premise: these are required as \textbf{public}\\
206 standards, because the \textbf{software} is to be public.
207 \item This is \textbf{not} understood by the RISC-V Foundation.\\
208 ("custom" status not appropriate for high-profile mass-volume\\
209 end-user APIs such as Vulkan).
210 \end{itemize}
211 }
212
213
214 \frame{\frametitle{Summary}
215
216 \begin{itemize}
217 \item 3D is hard (and pure Vectorisation gets you 25\% of \\
218 commercially-acceptable performance).
219 \item Layered optional extensions are going to be key to\\
220 acceptance by a wide variety of 3D Alliance Members.
221 \item With a custom specialised SPIR-V (Vulkan) Compiler\\
222 being an absolutely critical strategic requirement,\\
223 RVV and its associated compiler (still not developed)\\
224 is of marginal value (no clear benefits, extra cost)
225 \item Question everything! Your input, and a willingness to\\
226 take active responsibility for tasks that your Company\\
227 is critically dependent on, are extremely important.
228 \item Public and transparent Collaboration is key. There is simply\\
229 too much to do.
230 \end{itemize}
231 }
232
233
234 \frame{
235 \begin{center}
236 {\Huge \vspace{20pt}
237 The end\vspace{20pt}\\
238 Thank you\vspace{20pt}\\
239 }
240 \end{center}
241
242 \begin{itemize}
243 \item http://lists.libre-riscv.org/pipermail/libre-riscv-dev/
244 \item http://libre-riscv.org/simple\_v\_extension/abridged\_spec/
245 \item https://libre-riscv.org/ztrans\_proposal/
246 \item https://libre-riscv.org/simple\_v\_extension/specification/mv.x/
247 \end{itemize}
248 }
249
250
251 \end{document}