nlnet_2019_video/questions.mdwn

   1 # Questions
   2
   3 Dear Lauri and Luke,
   4
   5 you applied to the 2019-10 open call from NLnet. We have some questions
   6 regarding your project The Libre-RISCV SoC, Video Acceleration.
   7
   8 This is a very ambitious project, and involves a huge skill set across
   9 hardware, software and industry politics - and after that requires a lot
  10 of active downstream uptake to become viable. Who are the main drivers
  11 in addition to yourself that will push for this? Who will be involved?
  12
  13 What amount of the work would be enough to validate the most risky
  14 assumptions and/or pick the first low hanging fruit? What happens if
  15 there is not enough buy in?
  16
  17 To what extent is this work specific to the RISC-V community? How
  18 reusable would efforts be in say the (Open)Power ecosystem?  Is there a
  19 possibility to reuse ideas from e.g. the Cell processor that was used
  20 previously in a number of game consoles which were graphically very
  21 demanding (perhaps IBM would be willing to open source these)? That
  22 would also be a good way to get IP clearance, since there likely are a
  23 lot of patents from various parties. For Risc-V that was meticulously
  24 documented, but for these new ideas likely nothing exist.
  25
  26 Can you provide a budget breakdown in terms of tasks, and in particular
  27 identify the rates used?
  28
  29 # Answers (1)
  30
  31 > Dear Lauri and Luke,
  32 > you applied to the 2019-10 open call from NLnet. We have some questions regarding your project The Libre-RISCV SoC, Video Acceleration.
  33
  34 sure.
  35
  36 > This is a very ambitious project, and involves a huge skill set across hardware, software and industry politics - and after that requires a lot of active downstream uptake to become viable. Who are the main drivers in addition to yourself that will push for this? Who will be involved?
  37
  38 i sent out just the one message to mailing lists asking if anyone would like to help, and was surprised to get two responses in under 36 hours, from people with expertise in assembler: lauri was one of them.
  39
  40 for the hardware side (which will come later in the project: the first priority is the simulator) if we absolutely have to use it there is work already done and released on opencores: https://opencores.org/projects/video_systems
  41
  42 "active upstream uptake" is much later in the project's lifecycle.  there are several routes:
  43
  44 (1) a large customer orders a large number of units for a custom project.  we provide a BSP: deployment and software maintenance becomes "their problem" (with assistance and therefore paid support contracts from us).  at this point, upstream is unlikely.
  45
  46 (2) several large customers appear, resulting in mass-produced products ending up in the market which end-users expect to be able to re-program and re-purpose.  this results in pressure on upstream to accept patches
  47
  48 (3) the project is adopted by (or becomes) "Samsung-like" or "Texas Instruments-like" at which point it is taken seriously and we do not have the political mess.
  49
  50 (4) we become members of the OpenPower Foundation (which we can do because the Director, Hugh Blemings, has the same goals as we do: to permit Libre Members to join), and on the strength of that, what we do becomes part of *mainstream* official OpenPower Standards.  it should be clear at that point that support from IBM, NXP and other Members would kick in, and "upstreaming" is no longer a problem.
  51
  52 all of these things take time - however there is one common theme: if we do not start, it is *guaranteed* that they will never reach upstream :)
  53
  54
  55 > What amount of the work would be enough to validate the most risky assumptions and/or pick the first low hanging fruit?
  56
  57 the simulator will be key, here.  it has always been part of the strategy (even the "base" project - the 2018.02P one) to use cycle-accurate simulation with logging analysis to see how much time the *simulator* takes to execute applications.
  58
  59 the most important part of using a simulator is that the assembly-instruction implementation is *not* written in a hardware language (HDL): it's written in *c* code.
  60
  61 thus the iterative loop to "prove" that [proposed] instructions will do the job is actually extremely quick... and does *NOT* involve costly or time-consuming tasks.
  62
  63 > What happens if there is not enough buy in?
  64
  65 i believe this was partly answered above with 1-4?
  66
  67 To what extent is this work specific to the RISC-V community? How
  68 reusable would efforts be in say the (Open)Power ecosystem?
  69
  70 right.  assembly code as you know is very specific to that processor.  if we go with a particular processor, the hot-loops can only be used on that processor.
  71
  72 i investigated avutil and associated libraries used by ffmpeg and found that there were already *eight* hard-coded assembler "#ifdefs" to bring in different implementations of the hot-loops.  YUV2RGB, RGB2YUV, at 15/16/24/32 BPP and so on.
  73
  74 these are all actually not that big, they're extremely localised, and very specific.
  75
  76 if however we need to use e.g. richard herveille's https://opencores.org/projects/video_systems and even if we end up implementing a hard-coded YUV2RGB instruction, these *can* be re-used (at the hardware level).
  77
  78 just as with the IEEE754 FPU that is about... 80% completed already... the hardware engine is *not* specific to a particular processor [caveat: in the case of the IEEE754 FPU, RISC-V "NaNs" are slightly different from other NaN formats.  great, eh?  IEEE754 is a standard that isn't... a standard....  *sigh*]
  79
  80
  81 > Is there a possibility to reuse ideas from e.g. the Cell processor that was used previously in a number of game consoles which were graphically very demanding (perhaps IBM would be willing to open source these)?
  82
  83 ok long story [apologies]
  84
  85 we're going to have to switch off the IBM Vector Processor system (and not implement it).  there are several reasons for that: not least is that the number of instructions added is immense (which is detrimental in several ways).
  86
  87 the "Simple-V" system that i invented takes *all* scalar instructions - all *scalar* instructions - and parallelises them with a "Vector Context".
  88
  89 a "standard" vector processor is implemented as follows:
  90
  91 * take all scalar instructions, duplicate them, and create "vector" versions of the exact same thing
  92 * now add scalar-vector operations as well (allowing some operands to be scalar, some to be vector)
  93 * now add scalar-to-vector data-moving instructions because you can't get data in and out between the two without them
  94 * now add vector-to-scalar data-moving instructions
  95 * now add "predication" instructions
  96 * now add "predicate mask manipulation instructions" which usually duplicate Bit-Manipulation opcodes
  97 * now add scalar-to-predicate and predicate-to-scalar instructions to get data between the two
  98 * now add vector-to-predicate instructions
  99
 100 and much, much more.  you see how insane that is?  Simple-V *literally* adds *NO* new instructions.  at all.  (ok, there's one or two).
 101
 102 IBM went the route that they did because it is "traditional".  they have the resources.  i've picked an "intelligent" route which is specifically designed to minimise time and resources for implementing it.
 103
 104 That
 105 would also be a good way to get IP clearance, since there likely are a
 106 lot of patents from various parties. For Risc-V that was meticulously
 107 documented, but for these new ideas likely nothing exist.
 108
 109 > Can you provide a budget breakdown in terms of tasks, and in particular identify the rates used?
 110
 111 i'll get back to you on that one, it'll take a bit longer - the answers above are "off top of head".
 112
 113 thanks michiel.
 114