add Georgia Tech section

[crowdsupply.git] / updates / 023_2020mar26_decoder_emulator_started.mdwn
diff --git a/updates/023_2020mar26_decoder_emulator_started.mdwn b/updates/023_2020mar26_decoder_emulator_started.mdwn

index 072de1d684ffdfb8395b25b0ebddc798c5306cd0..1fd0606f9a2be89dd2393cf569b87211092724fb 100644 (file)
--- a/updates/023_2020mar26_decoder_emulator_started.mdwn
+++ b/updates/023_2020mar26_decoder_emulator_started.mdwn
@@ -26,6 +26,8 @@ Here's the summary (if it can be called a summary):
  * Jacob's simple-soft-float library growing
    [Power FP compatibility](http://bugs.libre-riscv.org/show_bug.cgi?id=258)
    and python bindings.
+* Kazan, the Vulkan driver Jacob is writing, is getting
+  a [new shader compiler IR](http://bugs.libre-riscv.org/show_bug.cgi?id=161).
  * A Conference call with OpenPOWER Foundation Director, Hugh, and Timothy
    Pearson from RaptorCS has been established every two weeks.
  * The OpenPOWER Foundation is also running some open
@@ -185,18 +187,20 @@ specification to text, and wrote a module in python to extract those
  fields from an instruction.
  
  To test the decoder, we initially verified it against the tables we
-extracted, and manually against the power ISA specification. Later
-however, we came up with the idea of verifying the decoder against the
-output of the GNU assembler. This is done by selecting an instruction
-type (integer reg/reg, integer immediate, load store, etc), and
-randomly selecting the opcode, registers, immediates, and other
-operands. We then feed this instruction to GNU AS to assemble, and
-then the assembled instruction is sent to our decoder. From this, we
-can then verify that the output of the decoder matches what was generated
-earlier.
+extracted, and manually against the [POWER ISA
+specification](https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0). Later
+however, we came up with the idea of [verifying the
+decoder](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/decoder/test/test_decoder_gas.py;h=9238d3878d964907c5569a3468d6895effb7dc02;hb=433ab59cf9b7ab1ae10754798fc1c110e705db76)
+against the output of the GNU assembler. This is done by selecting an
+instruction type (integer reg/reg, integer immediate, load store,
+etc), and randomly selecting the opcode, registers, immediates, and
+other operands. We then feed this instruction to GNU AS to assemble,
+and then the assembled instruction is sent to our decoder. From this,
+we can then verify that the output of the decoder matches what was
+generated earlier.
  
  We also explored using a similar idea to test the functionality of the
-entire SOC. By using the [QEMU](https://www.qemu.org/) powerpc
+entire SOC. By using the [QEMU](https://www.qemu.org/) PowerPC
  emulator, we can compare the execution of our SOC against that of the
  emulator to verify that our decoder and backend are working correctly.
  We would write snippets of test code (or potentially randomly generate
@@ -205,8 +209,9 @@ QEMU. We would then simulate our SOC until it was finished executing
  instructions, and use Qemu's gdb interface to do the same. We would
  then use Qemu's gdb interface to compare the register file and memory
  with that of our SOC to verify that it is working correctly. I did
-some experimentation using this technique to verify a rudimentary
-simulator of the SOC backend, and it seemed to work quite well.
+some experimentation using this technique to verify a [rudimentary
+simulator](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/test_sim.py;h=aadaf667eff7317b1aa514993cd82b9abedf1047;hb=433ab59cf9b7ab1ae10754798fc1c110e705db76)
+of the SOC backend, and it seemed to work quite well.
  
  *(Note from Luke: this automated approach, taking either other people's
  regularly-written code or actual PDF specifications, not only saves us a
@@ -215,8 +220,7 @@ correct and does not contain transcription errors).*
  
  # simple-soft-float Library and POWER FP emulation
  
-The
-[simple-soft-float](https://salsa.debian.org/Kazan-team/simple-soft-float)
+The [simple-soft-float](https://salsa.debian.org/Kazan-team/simple-soft-float)
  library is a floating-point library Jacob wrote with the intention
  of being a reference implementation of IEEE 754 for hardware testing
  purposes. It's specifically designed to be written to be easier to
@@ -256,7 +260,7 @@ rewriting a lot of the status-flag handling code since Power supports a
  much larger set of floating-point status flags and exceptions than most
  other ISAs.
  
-Thanks to RaptorCS for giving us remote access to a Power9 system,
+Thanks to Raptor CS for giving us remote access to a Power9 system,
  since that makes it much easier verifying that the test cases are correct
  (more on this below).
  
@@ -264,14 +268,163 @@ API Docs for stable releases of both
  [simple-soft-float](https://docs.rs/simple-soft-float) and
  [algebraics](https://docs.rs/algebraics) are available on docs.rs.
  
+The algebraics library was chosen as the
+[Crate of the Week for October 8, 2019 on This Week in
+Rust](https://this-week-in-rust.org/blog/2019/10/08/this-week-in-rust-307/#crate-of-the-week).
+
  One of the really important things about these libraries: they're not
-specifically coded exclusively for Libre-SOC: like softfloat-3 itself
+specifically coded exclusively for Libre-SOC: like Berkeley softfloat itself
  (and also like the [IEEE754 FPU](https://git.libre-riscv.org/?p=ieee754fpu.git))
  they're intended for *general-purpose* use by other projects.  These are
  exactly the kinds of side-benefits for the wider Libre community that
  sponsorship, from individuals, Foundations (such as NLNet) and Companies
  (such as Purism and Raptor CS) brings.
  
+# Kazan Getting a New Shader Compiler IR
+
+After spending several weeks only to discover that translating directly from
+SPIR-V to LLVM IR, Vectorizing, and all the other front-end stuff all in a
+single step is not really feasible, Jacob has switched to [creating a new
+shader compiler IR](http://bugs.libre-riscv.org/show_bug.cgi?id=161) to allow
+decomposing the translation process into several smaller steps.
+
+The IR and
+SPIR-V to IR translator are being written simultaneously, since that allows
+more easily finding the things that need to be represented in the shader
+compiler IR. Because writing both of the IR and SPIR-V translator together is
+such a big task, we decided to pick an arbitrary point ([translating a totally
+trivial shader into the IR](http://bugs.libre-riscv.org/show_bug.cgi?id=177))
+and split it into tasks at that point so Jacob would be able to get paid
+after several months of work.
+
+The IR uses structured control-flow inspired by WebAssembly's control-flow
+constructs as well as
+[SSA](https://en.wikipedia.org/wiki/Static_single_assignment_form) but, instead
+of using traditional phi instructions, it uses block and loop parameters and
+return values (inspired by [Cranelift's EBB
+parameters](https://github.com/bytecodealliance/wasmtime/blob/master/cranelift/docs/ir.md#static-single-assignment-form)
+as well as both of the [Rust](https://www.rust-lang.org/) and [Lua](https://www.lua.org/) programming languages).
+
+The IR has a single pointer type for all data pointers (`data_ptr`), unlike LLVM IR where pointer types have a type they point to (like `* i32`, where `i32` is the type the pointer points to).
+
+Because having a serialized form of the IR is important for any good IR, like
+LLVM IR, it has a user-friendly textual form that can be both read and
+written without losing any information (assuming the IR is valid, comments are
+ignored). A binary form may be added later.
+
+Some example code (the IR is likely to change somewhat):
+
+```
+# this is a comment, comments go from the `#` character
+# to the end of the line.
+
+fn function1[] -> ! {
+    # declares a function named function1 that takes
+    # zero parameters and doesn't return
+    # (the return type is !, taken from Rust).
+    # If the function could return, there would instead be
+    # a list of return types:
+    # fn my_fn[] -> [i32, i64] {...}
+    # my_fn returns an i32 and an i64. The multiple
+    # returned values is inspired by Lua's multiple return values.
+
+    # the hints for this function
+    hints {
+        # there are no inlining hints for this function
+        inlining_hint: none,
+        # this function doesn't have a side-effect hint
+        side_effects: normal,
+    }
+
+    # function local variables
+    {
+        # the local variable is an i32 with an
+        # alignment of 4 bytes
+        i32, align: 0x4 -> local_var1: data_ptr;
+        # the pointer to the local variable is
+        # assigned to local_var1 which has the type data_ptr
+    }
+
+    # the function body is a single block -- block1.
+    # block1's return types are instead attached to the
+    # function signature above
+    # (the `-> !` in the `fn function1[] -> !`).
+    block1 {
+        # the first instruction is a loop named loop1.
+        # the initial value of loop_var is the_const,
+        # which is a named constant.
+        # the value of the_const is the address of the
+        # function `function1`.
+        loop loop1[the_const: fn function1] -> ! {
+            # loop1 takes 1 parameter, which is assigned
+            # to loop_var. the type of loop_var is a pointer to a
+            # function which takes no parameters and doesn't
+            # return.
+            -> [loop_var: fn[] -> !];
+
+            # the loop body is a single block -- block2.
+            # block2's return value definitions are instead
+            # attached to the loop instruction above
+            # (the `-> !` in the `loop loop1[...] -> !`).
+            block2 {
+
+                # block3 is a block instruction, it returns
+                # two values, which are assigned to a and b.
+                # Both of a and b have type i32.
+                block block3 -> [a: i32, b: i32] {
+                    # the only way a block can return is by
+                    # being broken out of using the break
+                    # instruction. It is invalid for execution
+                    # to reach the end of a block.
+
+                    # this break instruction breaks out of
+                    # block3, making block3 return the
+                    # constants 1 and 2, both of type i32.
+                    break block3[1i32, 2i32];
+                };
+
+                # an add instruction. The instruction adds
+                # the value `a` (returned by block3 above) to
+                # the constant `increment` (which is an i32
+                # with the value 0x1), and stores the
+                # result in the value `"a"1`. The source-code
+                # location for the add instruction is specified
+                # as being line 12, column 34, in the file
+                # `source_file.vertex`.
+                add [a, increment: 0x1i32]
+                    -> ["a"1: i32] @ "source_file.vertex":12:34;
+
+                # The `"a"1` name is stored as just `a` in
+                # the IR, where the 1 is a numerical name
+                # suffix to differentiate between the two
+                # values with name `a`. This allows robustly
+                # handling duplicate names, by using the
+                # numerical name suffix to disambiguate.
+                #
+                # If a name is specified without the numerical
+                # name suffix, the suffix is assumed to be the
+                # number 0. This also allows handling names that
+                # have unusual characters or are just the empty
+                # string by using the form with the numerical
+                # suffix:
+                # `""0` (empty string)
+                # `"\n"0` (a newline)
+                # `"\u{12345}"0` (the unicode scalar value 0x12345)
+
+
+                # this continue instruction jumps back to
+                # the beginning of loop1, supplying the new
+                # values of the loop parameters. In this case,
+                # we just supply loop_var as the value for
+                # the parameter, which just gets assigned to
+                # loop_var in the next iteration.
+                continue loop1[loop_var];
+            }
+        };
+    }
+}
+```
+
  # OpenPOWER Conference calls
  
  We've now established a routine two-week conference call with Hugh Blemings,
@@ -281,16 +434,28 @@ the newly-announced OpenPOWER Foundation effort as it progresses.
  
  One of the most important things that we, Libre-SOC, need, and are
  discussing with Hugh and Tim is: a way to switch on/off functionality
-over a limited 32-bit opcode space, so that we have one mode for
+in the (limited) 32-bit opcode space, so that we have one mode for
  "POWER 3.0B compliance" and another for "things that are absolutely
  essential to make a decent GPU".  With these two being strongly
  mutually exclusively incompatible, this is just absolutely critical.
  
+Khronos Vulkan Floating-point Compliance is, for example, critical not
+just from a Khronos Trademark Compliance perspective, it's essential
+from a power-saving and thus commercial success perspective.  If we
+have absolute strict compliance with IEEE754 for POWER 3.0B, this will
+result in far more silicon than any commercially-competitive GPU on
+the market, and we will not be able to sell product.  Thus it is
+*commercially* essential to be able to swap between POWER Compliance
+and Khronos Compliance *at the silicon level*.
+
  POWER 3.0B does not have c++ style LR/SC atomic operations for example,
-and if we have half a **million** data structures **per second** that need
-SMP-level inter-core mutexes, and the current POWER 3.0B multi-instruction
-atomic operations are used, we're highly likely to use 10 to 15 **percent**
-processing power consumed on spin-locking.
+and if we have half a **million** 3D GPU data structures **per second**
+that need SMP-level inter-core mutexes, and the current POWER 3.0B
+multi-instruction atomic operations are used - conforming strictly to
+the standard - we're highly likely to use 10 to 15 **percent** processing
+power consumed on spin-locking.  Finding out from Tim on one of these
+calls that this is something that c++ atomics is something that end-users
+have been asking about is therefore a good sign.
  
  Adding new and essential features that could well end up in a future version
  of the POWER ISA *need* to be firewalled in a clean way, and we've been
@@ -300,13 +465,8 @@ and experience inside IBM, for them to consider.  Some help in reviewing
  it would be greatly appreciated.
  
  These and many other things are why the calls with Tim and Hugh are a
-good idea.  The amazing thing is that they're taking us seriously.  I believe
-I may have mentioned (at least on the mailing list), that there have been
-several people inside IBM and other places, working quietly for a long time
-to get OpenPOWER in place - and, critically, *done right*.  I believe Hugh
-mentioned that it was quite amusing that several people contacted him to
-say "why aren't you doing what RISC-V is doing, being open and all?" whilst
-he and others had been quietly spearheading an effort to that for some time!
+good idea.  The amazing thing is that they're taking us seriously, and
+we can discuss things like those above with them.
  
  Other nice things we learned (more on this below) is that Epic Games
  and RaptorCS are collaborating to get POWER9 supported in Unreal Engine.
@@ -440,7 +600,46 @@ separate lists (below).
  
  # Georgia Tech CREATE-X
  
-TODO
+(*This section kindly written by Yehowshua*)
+
+Yehowshua is a student at Georgia Tech currently pursuing a Masters in
+Computer Engineering - to graduate this summer. He had started working
+on LibreSOC in December and wanted to to get LibreSOC more funding so
+I could work on it full time.
+
+He originally asked if the ECE Chair at Georgia Tech would be willing
+to fund an in-department effort to deliver an SOC in collaboration
+with LibreSOC(an idea to which he was quite receptive). Through Luke,
+Yehowshua got in contact with Chistopher Klaus who suggested Yehowshua
+should look into Klaus's startup accelerator program Create-X and perhaps
+consider taking LibreSOC down the startup route.  Robert Rhinehart, who
+had funded LibreSOC a little in the past (*note from Luke: he donated
+the ZC706 and also funded modernisation of Richard Herveille's excellent
+[vga_lcd](https://github.com/RoaLogic/vga_lcd) Library*)
+also suggested that Yehowshua
+incorporate LibreSOC with help from Create-X and said he would be willing
+to be a seed investor. All this happened by February.
+
+As of March, Yehowshua has been talking with Robert about what type of
+customers would be interested in LibreSOC. Robert is largely interested in
+biological applications. Yehowshua also had a couple meetings with Rahul
+from Create-X. Yehowshua has started the incorporation of LibreSOC. The
+parent company will probably be called Systèmes-Libres with LibreSOC
+simply being one of the products we will offer. Yehowshua also attended
+HPCA in late February and had mentioned LIbreSOC during his talk. People
+seemed to find the idea quite interesting
+
+He will later be speaking with some well know startup lawyers that have
+an HQ in Atlanta to discuss business related things such as S Corps,
+C corps, taxes, wages, equity etc…
+
+Yehowshua plans for Systèmes-Libres to hire full time employees. Part
+time work on Libre-SOC will still be possible through donations and
+support from NL Net and companies like purism.
+
+Currently, Yehowshua plans to take the Create-X summer launch program
+and fund Systèmes-Libres by August. Full time wages would probably be
+set around 100k USD.
  
  # LOAD/STORE Buffer and 6600 design documentation