# FP Accuracy proposal

TODO: complete writeup
* <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002400.html>
* <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002412.html>

Zfpacc: a proposal to allow implementations to dynamically set the bit-accuracy
of results, trading speed (reduced latency) for accuracy (higher latency).

# Extension of FCSR

Zfpacc would use some of the the reserved bits of FCSR.  It would be treated
very similarly to how dynamic frm is treated.

frm is treated as follows:

* Floating-point operations use either a static rounding mode encoded
  in the instruction, or a dynamic rounding mode held in frm.
* Rounding modes are encoded as shown in Table 11.1 of the RISC-V ISA Spec
* A value of 111 in the instruction’s rm field selects the dynamic rounding
  mode held in frm. If frm is set to an invalid value (101–111),
  any subsequent attempt to execute a floating-point operation with a
  dynamic rounding mode will raise an illegal instruction exception.

If we wish to support up to 4 accuracy modes, that would require 2 'fam'
bits.  The Default would be IEEE754-compliant, encoded as 00.  This means
that all current hardware would be compliant with the default mode.

Unsupported modes cause a trap to allow emulation where traps are supported.
Emulation of unsupported modes would be required for UNIX platforms.
As with frm, an implementation may choose to support any permutation
of dynamic fam-instruction pairs. It will illegal-instruction trap upon
executing an unsupported fam-instruction pair.  The implementation can
then emulate the accuracy mode required.

If the bits are in FCSR, then the switch itself would be exposed to
user mode.  User-mode would not be able to detect emulation vs hardware
supported instructions, however (by design).  That would require some
platform-specific code.

Emulation of unsupported modes would be required for unix platforms.

TODO:

A mechanism for user mode code to detect which modes are emulated
(csr? syscall?) (if the supervisor decides to make the emulation visible)
that would allow user code to switch to faster software implementations
if it chooses to.

TODO:

Choose which accuracy modes are required

    Which accuracy modes should be included is a question outside of
    my expertise and would require a literature review of instruction
    frequency in key workloads, PPA analysis of simple and advanced
    implementations, etc.

TODO: reduced accuracy

    I don't see why Unix should be required to emulate some arbitrary
    reduced accuracy ML mode.  My guess would be that Unix Platform Spec
    requires support for IEEE, whereas arbitrary ML platform requires
    support for Mode XYZ.  Of course, implementations of either platform
    would be free to support any/all modes that they find valuable.
    Compiling for a specific platform means that support for required
    accuracy modes is guaranteed (and therefore does not need discovery
    sequences), while allowing portable code to execute discovery
    sequences to detect support for alternative accuracy modes.