updates/008_2018dec28_kazan.mdwn

   1 # Kazan
   2
   3 So after deciding to sponsor Jacob to work on a 3D Graphics Driver,
   4 for some reason I thought using rust would be a good idea.  Normally,
   5 3D Graphics Drivers are written in c or c++ for performance reasons,
   6 however in this case I was attracted to the security and memory-safety
   7 inherent in rust.
   8
   9 Hilariously, it wasn't until some time last week that the way that Vulkan
  10 works actually sank in.  I thought it was some sort of interpreter of
  11 a 3D API, just like gallium3d: it most definitely is not.  The core of
  12 Vulkan is [SPIR-V](https://en.wikipedia.org/wiki/SPIR-V), an Intermediate
  13 Representation (IR) language based on LLVM's IR.  Originally developed
  14 for OpenCL Parallel Compute, somewhere along the line someone realised
  15 that SPIR-V would also do well for representing shaders in 3D applications.
  16
  17 So whereas previously I was deeply concerned that I had made a huge mistake
  18 in using rust, actually, the rust driver isn't so much a "driver" as it
  19 is a **compiler**.  As in: the purpose of a Vulkan implementation is to
  20 **compile** the 3D shader SPIR-V binary provided by the 3D application
  21 into something that will execute directly on the underlying hardware.
  22
  23 We chose to compile SPIR-V IR into LLVM IR, and for that task, the fact
  24 that the compiler is written in rust does **not** affect performance
  25 **in any way**.  Once compiled to LLVM, the resultant IR will be handed
  26 to the standard LLVM JIT (Just-in-Time) low-level compiler, and it will
  27 execute **directly** as assembler, **and** it will execute in parallel,
  28 as well.
  29
  30 Contrast this with gallium3d-llvmpipe where the API is **interpreted**
  31 (and also single-threaded).
  32
  33 # Example
  34
  35 So I asked Jacob if he could do a quick write-up of an
  36 [example translation](https://salsa.debian.org/Kazan-team/kazan/blob/master/docs/Example%20Translation%20from%20SPIR-V%20to%20LLVM%20IR.md).
  37 I wanted to see what goes on, as I quite like compilers and language
  38 translators.  Also, a couple weeks back he ran into some roadblocks on
  39 how the data structures would work in the compiler, so I figured it
  40 would be nice to do a visual worked example.
  41
  42 It looks pretty straightforward.  Start at c-code, compile to SPIR-V
  43 (the writer of the 3D or OpenCL application does that part).  The interesting
  44 bit is that SPIR-V kinda assumes a SIMD (or SIMT - which is basically
  45 "predicated SIMD") micro-architecture.
  46
  47 Unlike in a standard sequential algorithm, branches are not done as
  48 "branches": they're done by testing a set of conditions (in parallel),
  49 which produces a bit-field of 1s and 0s (representing success or
  50 failure of each of the parallel compares), then the "THEN" part of the
  51 statement - bear in mind this is all parallel - will be executed on each
  52 element where its corresponding "predicate" bit is set to "1", and the
  53 "ELSE" part of the statement will be executed where each bit is "0".
  54
  55 Predication is not very popular outside of the parallel world, because
  56 CPU cycles are "wasted" by having to send both the "THEN" *and* the "ELSE"
  57 statements through to the execution unit.  Remember, though, that in
  58 the Libre-RISCV micro-architecture, as is described in the
  59 update on predication, a zero predication bit results in that element
  60 being **skipped**.  Whilst it may sent to the ALU, once the predicate
  61 bit is known, the operation is **cancelled** and the Function Units
  62 may be allocated alternate resources.  So, unlike more traditional
  63 Vector and SIMT micro-architectures, our design does not suffer a performance
  64 penalty due to predication.
  65
  66 We do have a couple of issues to contend with, in LLVM.  Firstly: whilst
  67 this is a variable-length vectorisation micro-architecture, LLVM itself
  68 does not yet support variable-length data structures.  It's all based around
  69 fixed SIMD.  There is work underway to deal with that: we can adjust
  70 accordingly as it happens.
  71 Secondly: LLVM's IR support for predication is not as feature rich as
  72 we would like: it's incomplete.
  73
  74 However we have to start somewhere, and, as this is mostly software, there
  75 is plenty of room to improve performance as time and resources allow.
  76 Interestingly, AMD are planning some improvements to LLVM that will help
  77 us out, here.  The AMDGPU has similar polymorphic registers, so there are
  78 plans to add in support for register "types" that have the ability to
  79 span (use) more than one "hardware" register.  This will be fascinating
  80 to watch that unfold.