add notes on 2024-01-23 meeting. terminated due to harrassment
[libreriscv.git] / 3d_gpu / tutorial.mdwn
1 # Tutorial on how to get into Libre-SOC development
2
3 This tutorial is a guide for anyone wishing, literally, to start from
4 scratch and learn how to contribute to the Libre-SOC. Much of this you
5 should go through (skim and extract) the [[HDL_workflow]] document,
6 however until you begin to participate much of that document is not
7 fully relevant. This one is intended to get you "up to speed" with
8 basic concepts.
9
10 Discussions here:
11
12 * <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-February/004166.html>
13 * <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/004804.html>
14
15 # Programming vs hardware.
16
17 We are assuming here you know some programming language. You know that
18 it works in sequence (unless you went to Imperial College in the 80s
19 and have heard of [Parlog](https://en.m.wikipedia.org/wiki/Parlog)).
20
21 Hardware basically comprises transistor circuits. There's nothing in the
22 universe or the laws of physics that says light and electricity have to
23 operate sequentially, and consequently all Digital ASICs are an absolutely
24 massive arrays of unbelievably excruciatingly tediously low level "gates",
25 in parallel, separated occasionally by clock-synchronised "latches" that
26 capture data from one processing section before passing it on to the next.
27
28 Thus, it is imperative to conceptually remind yourself, at all times,
29 that everything that you do, even when writing your HDL code line by line,
30 is, in fact, at the gate level, done as massively-parallel processing,
31 **not** as sequential processing, at all. If you want "sequential"
32 you have to store the results of one parallel block of processing in
33 "latches", wait for the clock to "tick", and pass it on to the next block.
34
35 ASIC designers avoid going completely off their heads at the level of
36 detail involved by "compartmentalising" designs into a huge hierarchy of
37 modular design, where the tools selected aid and assist in isolating as
38 much of the contextually-irrelevant detail as practical, allowing the
39 developer to think in relevant concepts without being overwhelmed.
40
41 Understanding this is particularly important because the level of
42 hierarchy of design may be *one hundred* or more modules deep *just in
43 nmigen alone*, (and that's before understanding the abstraction that
44 nmigen itself provides); yosys goes through several layers as well,
45 and finally, alliance / corilolis2 have their own separate layers.
46
47 Throughout each layer, the abstractions of a higher layer are removed
48 and replaced with topologically-equivalent but increasingly detailed
49 equivalents. nmigen has the *concept* of integers (not really: it has
50 the concept of something that, when the tool is executed, will create
51 a representation *of* an integer), and this is passed through intact to
52 yosys. yosys however knows that integers need to be replaced by wires,
53 buses and gates, so that's what it does.
54
55 Thus, you can think safely in terms of "integers" when designing and
56 writing the HDL, confident that the details of converting to gates and
57 wires is taken care of.
58
59 It is also critically important to remember that unlike a software
60 environment there is no memory or stack, only if you create an actual SRAM
61 and lay out the gates to address it with a binary to unary selector. There
62 is no register file unless you actually create one. There is no ALU unless
63 you make one... and so on. And beyond that hardware, if you forget to
64 add something that might be needed for exceptional purposes, if it's
65 not there you simply cannot add it later like you can in software. If
66 it's not there, it's not there and that's the end of the discussion.
67 Consequently a vast amount of time goes into planning and simulation
68 (software, FPGA and SPICE) as mistakes and omissions can literally cost
69 tens of millions of dollars to rectify.
70
71 # Debian
72
73 Sorry, ubuntu, macosx and windows lovers: start by installing debian
74 either in actual hardware or in a VM. A VM has the psychological
75 disadvantage of making you feel like you are not taking things seriously
76 (it's a toy), so consider dual booting or getting a second machine.
77
78 # Python
79
80 First: learn python. python3 to be precise. Start by learning the basic
81 data types: string, int, float then dict, list and tuple. Then move
82 on to functions, then classes, exceptions and the "with" statement.
83 Along the way you will pick up imports. Do not use "import \*" it will
84 cause you and everyone else who tries to read your code a world of pain.
85
86 # Git
87
88 Git is essential. look up git workflow: clone, pull, push, add, commit.
89 Create some test repos and get familiar with it. Read the [[HDL_workflow]]
90 document.
91
92 # Basics of gates
93
94 You need to understand what gates are. look up AND, OR, XOR, NOT, NAND,
95 NOR, MUX, DFF, SR latch on electronics forums and wikipedia. also look up
96 "register latches", then HALF ADDER and FULL ADDER. If you would like a
97 particularly amusing relevant distraction, look up the guy who built an
98 entire functional computer out of 74 series logic chips, on breadboards.
99 It's now in a museum.
100
101 For some reason, ASIC designers call collections of gates (such as MUXers)
102 "Cells", no matter how large they are. There are some more complex
103 "Cells" such as "4-input MUX" or "3-input XOR" and so on, which should
104 be self-explanatory. Thus you will see the words "Cell Library" used.
105
106 Yes you can create your own cell libraries, however you will also see
107 Foundries refer to things called "Standard Cell Libraries" which they
108 expect you to use (under NDA. sigh).
109
110 Also look up "boolean algebra", "Karnaugh maps", truth tables and things
111 like that.
112
113 From there you can begin to appreciate how deeply ridiculously low level
114 this all is, and why we are using nmigen. nmigen constructs "useful"
115 concepts like "32 bit numbers", which actually do not exist at the gate
116 level: they only exist by way of being constructed from chains of 1 bit
117 (binary) processing!
118
119 So for example, a 32 bit adder is "constructed" from a batch of 32 FULL
120 ADDERs (actually, 31 FULL and one HALF). Even things like comparing
121 two numbers, the simple "==" or ">=" operators, are done entirely with
122 a bit-level cascade!
123
124 This would drive you nuts if you had to think at this level all the time,
125 consequently "High" in "High Level Language" was invented. Luckily in
126 python, you can override \_\_add\_\_ and so on in order that when you put
127 "a + b" into a nmigen program it gives you the *impression* that two
128 "actual" numbers are being added, whereas in fact you requested that
129 the HDL create a massive bunch of "gates" on your behalf.
130
131 i.e. *behind the scenes* the HDL uses "cells" that in a massive
132 hierarchical cascade ultimately end up at nothing more than "gates".
133
134 Yes you really do need to know this because those "gates" cost both
135 power, space, and take time to switch. So if you have too many of them
136 in a chain, your chip is limited in its top speed. This is the point
137 at which you should be looking up "pipelines" and "register latches",
138 as well as "combinatorial blocks".
139
140 you also want to look up the concept of a FSM (Finite State Machine)
141 and the difference between a Mealy and a Moore FSM.
142
143 ## NDAs...
144
145 These are a nuisance. There are around 4 levels of NDAs to bust through:
146 Full chip designs, peripherals and other third party components, Cell
147 Libraries, and Foundries. Often, the Foundries supply their own Standard
148 Cell Libraries (see above).
149
150 Sometimes you want to design something not under NDA (as we do), but
151 in order to do so you still need to know the "shape" of the Cells.
152 Occasionally, then, the licensee of those Cells will allow you to use
153 "phantoms", which are the same shape and have the same connections.
154 The official Industry term for these is "phantom views". See
155 <http://bugs.libre-riscv.org/show_bug.cgi?id=178#c106> for discussion.
156
157 Then there are also "abstract" views: these are also under NDA.
158 So, we will be doing the layout in generic "lambda" design, and a
159 conversion pass (under NDA) is carried out which maps to TSMC. See
160 <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/004804.html>
161
162
163 # nmigen
164
165 Once you understand gates and python, nmigen starts to make sense.
166
167 Nmigen works by creating an in-memory "Abstract Syntax Tree" which
168 is handed to yosys (via yosys "ILANG" format) which in turn actually
169 generates the cells and netlists.
170
171 So you write code in python, using the nmigen library of classes and
172 helper routines, to construct an AST which *represents* the actual
173 hardware. Yosys takes care of the level *below* nmigen, and is just
174 a tool.
175
176 Install nmigen (and yosys) by following [[HDL_workflow]]
177 then follow the excellent tutorial by Robert
178 <https://github.com/RobertBaruch/nmigen-tutorial> and also look up the
179 resources here <https://m-labs.hk/gateware/nmigen/>
180
181 Pay particular attention to the bits in HDL workflow about using yosys
182 "show" command. This is essential because the nmigen code gets turned
183 into gates, and yosys show will bring up a graph that allows you to see
184 that. It's also very useful to run the "proc" and "opt" command followed
185 by a second "show top" (or show {insert module name}). Yosys "process"
186 and "optimise" commands transform the design into something closer to
187 what is actually synthesised at the gate level.
188
189 In nmigen, pay particular attention to "comb" (combinatorial)
190 and "sync" (synchronous). Comb is a sequence of gates without any
191 clock-synchronised latches. With comb it is absolutely essential that
192 you **do not** create a "loop" by mistake: i.e. combinatorial output
193 must never, under any circumstances, loop back to combinatorial input.
194 "comb" blocks must be DAGs (Directed Acyclic Graphs) in other words.
195 "sync" will *automatically* create a clock synchronised register for you.
196 This is how you construct pipelines. Also, if you want to create cyclic
197 graphs, you absolutely **must** store partial results of combinatorial
198 blocks in registers (with sync) *before* passing those partial results
199 back into more (or the same) combinatorial blocks.
200
201 * https://github.com/YosysHQ/yosys
202
203 # verilog
204
205 Verilog is really worth mentioning in passing. You will see it a lot.
206 Verilog was designed in the 1980s, when the state of the art in computer
207 programming was BASIC, FORTRAN, and, if you were lucky, PASCAL.
208
209 Object-orientated design was a buzzword in Universities. java did
210 not exist. c++ did not exist. Consequently, even just for testing of
211 ASICs, which were still being done at the gate level, some bright spark
212 decided to write a test suite in a high level language.
213
214 That language was: verilog.
215
216 Soon afterwards, someone realised that actual ASICs themselves could
217 be written *in* verilog. Unfortunately, however, with verilog being
218 designed on 1980s state of the art programming concepts, it has hampered
219 ASiC design ever since.
220
221 We use nmigen because we can do proper OO. we can do multiple inheritance,
222 class MixIns. proper parameterisation and much more, all of which would
223 be absolute hell to do in verilog. We would need some form of massive
224 macro preprocessing system or a nonstandard version of verilog.
225
226 Rather than inflict that kind of pain onto both ourselves and the rest
227 of the world, we went with nmigen. Now you know why. hurrah.
228
229 p.s. here's the
230 [full discussion](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2018-November/000171.html)
231 and a [more recent one](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/004703.html)