sync_up: Added Dmitry, Sadoon

[libreriscv.git] / zfpacc_proposal.mdwn
diff --git a/zfpacc_proposal.mdwn b/zfpacc_proposal.mdwn

index 2e21c0740c3bdec9693edabfbc39077876828e12..8b22582a5d984dacb72bb766a487a30045959e86 100644 (file)
--- a/zfpacc_proposal.mdwn
+++ b/zfpacc_proposal.mdwn
@@ -1,13 +1,39 @@
+
  # FP Accuracy proposal
  
+Credits:
+
+* Bruce Hoult
+* Allen Baum
+* Dan Petroski
+* Jacob Lifshay
+
  TODO: complete writeup
  
  * <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002400.html>
  * <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002412.html>
  
-Zfpacc: a proposal to allow implementations to dynamically set the bit-accuracy
-of results, trading speed (reduced latency) for accuracy (higher latency).
-IEE754 format is preserved: only ULP (Unit in Last Place) is permitted to be non-zero.
+Zfpacc: a proposal to allow implementations to dynamically set the
+bit-accuracy of floating-point results, trading speed (reduced latency)
+*at runtime* for accuracy (higher latency).  IEEE754 format is preserved:
+instruction operand and result format requirements are unmodified by
+this proposal.  Only ULP (Unit in Last Place) of the instruction *result*
+is permitted to meet alternative accuracy requirements, whilst still
+retaining the instruction's requested format.
+
+This proposal is *only* suitable for adding pre-existing accuracy standards
+where it is clearly established, well in advance of applications being
+written that conform to that standard, that dealing with variations in
+accuracy across hardware implementations is the responsibility of the
+application writer.  This is the case for both Vulkan and OpenCL.
+
+This proposal is *not* suitable for inclusion of "de-facto" (proprietary)
+accuracy standards (historic IBM Mainframe vs Ahmdahl incompatibility)
+where there was no prior agreement or notification to applications
+writers that variations in accuracy across hardware implementations
+would occur.  In the unlikely event that they *are* ever to be included
+(n the future, rather than as a Custom Extension, then, unlike Vulkan
+and OpenCL, they must **only** be added as "bit-for-bit compatible".
  
  # Extension of FCSR
  
@@ -85,9 +111,19 @@ The values for the field facc to include the following:
  | 0b000 | IEEE754 | correctly rounded   | 
  | 0b010 | ULP<1   | Unit Last Place < 1 | 
  | 0b100 | Vulkan  | Vulkan compliant    | 
-| 0b110 | Appx    | Machine Learning    |
+| 0b110 | Appx    | Machine Learning    
+
+(TODO: review alternative idea: ULP0.5, ULP1, ULP2, ULP4, ULP16)
  
-Note that the format of the operands and result remain the same for all opcodes. The only change is in the *accuracy* of the result, not its format.
+Notes: 
+
+* facc=0 to match current RISC-V behaviour, where these bits were formerly reserved and set to zero.
+* The format of the operands and result remain the same for
+all opcodes. The only change is in the *accuracy* of the result, not
+its format.
+* facc sets the *minimum* accuracy. It is acceptable to provide *more* accurate results than is requested by a given facc mode (although, clearly, the opportunity for reduced power and latency would be missed).
+
+## Discussion
  
  maybe a solution would be to add an extra field to the fp control csr
  to allow selecting one of several accurate or fast modes:
@@ -101,6 +137,12 @@ to allow selecting one of several accurate or fast modes:
  - fully-accurate-mode: correctly rounded in all cases
  - maybe more modes?
  
+extra mode suggestions:
+
+    it might be reasonable to add a mode saying you're prepared to accept
+    worse then 0.5 ULP accuracy, perhaps with a few options: 1, 2, 4,
+    16 or something like that.
+
  Question: should better accuracy than is requested be permitted? Example:
  Ahmdahl-370 issues.
  
@@ -123,3 +165,52 @@ Comments:
      There are also 8 and 9-bit floating point formats that could be useful
  
      <https://en.wikipedia.org/wiki/Minifloat>
+
+### function accuracy in standards (opencl, vulkan)
+
+[[resources]] for OpenCL and Vulkan
+
+Vulkan requires full ieee754 precision for all F/D instructions except for fdiv and fsqrt.
+
+<https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/chap40.html#spirvenv-precision-operation>
+
+Source is here:
+<https://github.com/KhronosGroup/Vulkan-Docs/blob/master/appendices/spirvenv.txt#L1172>
+
+OpenCL slightly different, suggest adding as an extra entry.
+
+<https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_Env.html#relative-error-as-ulps>
+
+Link, finds version 2.1 of opencl environment specification, table 8.4.1 however needs checking if it is the same as the above, which has "SPIRV" in it and is 2.2 not 2.1
+
+https://www.google.com/search?q=opencl+environment+specification
+
+2.1 superceded by 2.2
+<https://github.com/KhronosGroup/OpenCL-Docs/blob/master/env/numerical_compliance.asciidoc>
+
+### Compliance
+
+Dan Petroski:
+
+    It’s a bit more complicated than that. Different FP
+    representations/algorithms have different quantization ranges, so you
+    can get more or less precise depending on how large the arguments are.
+
+    For instance, machine A can compute within ULP3 from 0 to 10000, but
+    ULP2 from 10000 upwards. Machine B can compute within ULP2 from 0 to
+    6000, then ULP3 for 6000+. How do you design a compliance suite which
+    guarantees behavior across all fpaccs?
+
+and from Allen Baum:
+
+    In the example above, you'd need a ratified spec with the defined
+    ranges  (possbily per range and per op) - and then implementations
+    would need to at least meet that spec (but could be more accurate)
+
+    so - not impossible, but a lot more work to write different kinds
+    of tests than standard IEEE compatible test would have.
+
+    And, by the way, if you want it to be a ratified spec, it needs a
+    compliance suite, and whoever has defined the spec is responsible
+    for writing it.,
+