zfpacc_proposal.mdwn

   1 # FP Accuracy proposal
   2
   3 TODO: writeup
   4 * <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002400.html>
   5 * <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002412.html>
   6
   7     A natural place for a standard reduced accuracy extension "Zfpacc"
   8     would be in the reserved bits of FCSR.  It could be treated very
   9     similarly to how dynamic frm is treated now. Currently, there are 5
  10     bits of fflags, 3 bits of frm and 24 Reserved bits. The L (decimal
  11     floating-point) extension will presumably use some, but not all of
  12     them. I'm unable to find any public proposals for L bit encodings
  13     in FCSR.
  14
  15     For reference, frm is treated as follows: Floating-point operations
  16     use either a static rounding mode encoded in the instruction, or
  17     a dynamic rounding mode held in frm. Rounding modes are encoded
  18     as shown in Table 11.1. A value of 111 in the instruction’s rm
  19     field selects the dynamic rounding mode held in frm. If frm is set
  20     to an invalid value (101–111), any subsequent attempt to execute
  21     a floating-point operation with a dynamic rounding mode will raise
  22     an illegal instruction exception.
  23
  24     Let's say that we wish to support up to 4 accuracy modes -- 2 'fam'
  25     bits.  Default would be IEEE-compliant, encoded as 00.  This means
  26     that all current hardware would be compliant with the default mode.
  27
  28     the unsupported modes would cause a trap to allow emulation where
  29     traps are supported. emulation of unsupported modes would be required
  30     for unix platforms.
  31
  32     As with frm, an implementation can choose to support any permutation
  33     of dynamic fam-instruction pairs. It will illegal-instruction
  34     trap upon executing an unsupported fam-instruction pair.
  35     The implementation can then emulate the accuracy mode required.
  36
  37     there would be a mechanism for user mode code to detect which modes
  38     are emulated (csr? syscall?) (if the supervisor decides to make the
  39     emulation visible) that would allow user code to switch to faster
  40     software implementations if it chooses to.
  41
  42     If the bits are in FCSR, then the switch itself would be exposed
  43     to user mode.  User-mode would not be able to detect emulation vs
  44     hardware supported instructions, however (by design).  That would
  45     require some platform-specific code.
  46
  47     Now, which accuracy modes should be included is a question outside
  48     of my expertise and would require a literature review of instruction
  49     frequency in key workloads, PPA analysis of simple and advanced
  50     implementations, etc.  (Thanks for the insights, Mitch!)
  51
  52     emulation of unsupported modes would be required for unix platforms.
  53
  54     I don't see why Unix should be required to emulate some arbitrary
  55     reduced accuracy ML mode.  My guess would be that Unix Platform Spec
  56     requires support for IEEE, whereas arbitrary ML platform requires
  57     support for Mode XYZ.  Of course, implementations of either platform
  58     would be free to support any/all modes that they find valuable.
  59     Compiling for a specific platform means that support for required
  60     accuracy modes is guaranteed (and therefore does not need discovery
  61     sequences), while allowing portable code to execute discovery
  62     sequences to detect support for alternative accuracy modes.