add kazan update
[crowdsupply.git] / updates / 008_2018dec28_kazan.mdwn
1 # Kazan
2
3 So after deciding to sponsor Jacob to work on a 3D Graphics Driver,
4 for some reason I thought using rust would be a good idea. Normally,
5 3D Graphics Drivers are written in c or c++ for performance reasons,
6 however in this case I was attracted to the security and memory-safety
7 inherent in rust.
8
9 Hilariously, it wasn't until some time last week that the way that Vulkan
10 works actually sank in. I thought it was some sort of interpreter of
11 a 3D API, just like gallium3d: it most definitely is not. The core of
12 Vulkan is [SPIR-V](https://en.wikipedia.org/wiki/SPIR-V), an Intermediate
13 Representation (IR) language based on LLVM's IR. Originally developed
14 for OpenCL Parallel Compute, somewhere along the line someone realised
15 that SPIR-V would also do well for representing shaders in 3D applications.
16
17 So whereas previously I was deeply concerned that I had made a huge mistake
18 in using rust, actually, the rust driver isn't so much a "driver" as it
19 is a **compiler**. As in: the purpose of a Vulkan implementation is to
20 **compile** the 3D shader SPIR-V binary provided by the 3D application
21 into something that will execute directly on the underlying hardware.
22
23 We chose to compile SPIR-V IR into LLVM IR, and for that task, the fact
24 that the compiler is written in rust does **not** affect performance
25 **in any way**. Once compiled to LLVM, the resultant IR will be handed
26 to the standard LLVM JIT (Just-in-Time) low-level compiler, and it will
27 execute **directly** as assembler, **and** it will execute in parallel,
28 as well.
29
30 Contrast this with gallium3d-llvmpipe where the API is **interpreted**
31 (and also single-threaded).
32
33 # Example
34
35 So I asked Jacob if he could do a quick write-up of an
36 [example translation](https://salsa.debian.org/Kazan-team/kazan/blob/master/docs/Example%20Translation%20from%20SPIR-V%20to%20LLVM%20IR.md).
37 I wanted to see what goes on, as I quite like compilers and language
38 translators. Also, a couple weeks back he ran into some roadblocks on
39 how the data structures would work in the compiler, so I figured it
40 would be nice to do a visual worked example.
41
42 It looks pretty straightforward. Start at c-code, compile to SPIR-V
43 (the writer of the 3D or OpenCL application does that part). The interesting
44 bit is that SPIR-V kinda assumes a SIMD (or SIMT - which is basically
45 "predicated SIMD") micro-architecture.
46
47 Unlike in a standard sequential algorithm, branches are not done as
48 "branches": they're done by testing a set of conditions (in parallel),
49 which produces a bit-field of 1s and 0s (representing success or
50 failure of each of the parallel compares), then the "THEN" part of the
51 statement - bear in mind this is all parallel - will be executed on each
52 element where its corresponding "predicate" bit is set to "1", and the
53 "ELSE" part of the statement will be executed where each bit is "0".
54
55 Predication is not very popular outside of the parallel world, because
56 CPU cycles are "wasted" by having to send both the "THEN" *and* the "ELSE"
57 statements through to the execution unit. Remember, though, that in
58 the Libre-RISCV micro-architecture, as is described in the
59 update on predication, a zero predication bit results in that element
60 being **skipped**. Whilst it may sent to the ALU, once the predicate
61 bit is known, the operation is **cancelled** and the Function Units
62 may be allocated alternate resources. So, unlike more traditional
63 Vector and SIMT micro-architectures, our design does not suffer a performance
64 penalty due to predication.
65
66 We do have a couple of issues to contend with, in LLVM. Firstly: whilst
67 this is a variable-length vectorisation micro-architecture, LLVM itself
68 does not yet support variable-length data structures. It's all based around
69 fixed SIMD. There is work underway to deal with that: we can adjust
70 accordingly as it happens.
71 Secondly: LLVM's IR support for predication is not as feature rich as
72 we would like: it's incomplete.
73
74 However we have to start somewhere, and, as this is mostly software, there
75 is plenty of room to improve performance as time and resources allow.
76 Interestingly, AMD are planning some improvements to LLVM that will help
77 us out, here. The AMDGPU has similar polymorphic registers, so there are
78 plans to add in support for register "types" that have the ability to
79 span (use) more than one "hardware" register. This will be fascinating
80 to watch that unfold.