Bug 1244: changes to description pospopcount
[libreriscv.git] / openpower / isans_letter.mdwn
1 # Letter regarding ISAMUX / NS
2
3 * [Full revision history](https://git.libre-soc.org/?p=libreriscv.git;a=history;f=openpower/isans_letter.mdwn)
4 * Revision 0.0 draft: 03 Mar 2020
5 * Revision 0.1 addw review: 16 Apr 2020
6 * Revision 0.9 pre-final: 18 Apr 2020
7 * Revision 0.91 mention dual ISA: 22 Apr 2020
8 * Revision 0.92 mention countdown idea: 22 Apr 2020
9 * Revision 0.93 illegal instruction trap: 27 Apr 2020
10
11 ## Why has Libre-SOC chosen PowerPC ?
12
13 For a hybrid CPU-VPU-GPU, intended for mass-volume adoption in tablets,
14 netbooks, chromebooks and industrial embedded (SBC) systems, our choice
15 was between Nyuzi, MIAOW, RISC-V, PowerPC, MIPS and OpenRISC.
16
17 Of all the options, the PowerPC architecture is more complete and far more
18 mature. It also has a deeper adoption by Linux distributions.
19
20 Following IBM's release of the Power Architecture instruction set to the
21 Linux Foundation in August 2019 the barrier to using it is no more than
22 that of using RISC-V. We are encouraged that the OpenPOWER Foundation is
23 supportive of what we are doing and helping, e.g by putting us in touch
24 with people who can help us.
25
26 ## Summary
27
28 * We propose the standardisation of the way that the PowerPC Instruction
29 Set Architecture (PPC ISA) is extended, enabling many different flavours
30 within a well supported family to co-exist, long-term, without conflict,
31 right across the board.
32 * This is about more than just our project. Our proposals will facilitate
33 the use of PPC in novel or niche applications without breaking the PPC
34 ISA into incompatible islands.
35 * PPC will gain a competitive market advantage by removing the need
36 for separate VPU or GPU functions in RTL or ASICs thus enabling lower
37 cost systems. Libre-SOC's project is to extend the PPC to integrate
38 the GPU and VPU functionality directly as part of the PPC ISA (example:
39 Broadcom VideoCore IV being based around extensions to an ARC core).
40 * Libre-SOC's extensions will be easily adopted, as the standard GNU/Linux
41 distributions will very deliberately run unmodified on our ISA,
42 including full compatibility with illegal instruction trap requirements.
43
44 ## One CPU multiple ISAs
45
46 This is a quick overview of the way that we would like to add changes
47 that we are proposing to the PowerPC instruction set (ISA). It is based on
48 a Open Standardisation of the way that existing "mode switches",
49 already found in the POWER instruction set, are added:
50
51 * FPSCR's "NI" bit, setting non-IEEE754 FP mode
52 * MSR's "LE" bit (and associated HILE bit), setting little-endian mode
53 * MSR's "SF" bit, setting either 32-bit or 64-bit mode
54 * PCR's "compatibility" bits 60-62, V2.05 V2.06 V2.07 mode
55
56 [It is well-noted that unless each "mode switch" bit is set, any
57 alternative (additional) instructions (and functionality) are completely
58 inaccessible, and will result in "illegal instruction" traps being thrown.
59 This is recognised as being critically important.]
60
61 These bits effectively create multiple, incompatible run-time switchable ISAs
62 within one CPU. They are selectable for the needs of the individual
63 program (or OS) being run.
64
65 All of these bits are set by an instruction, that, once set, radically
66 changes the entire behaviour and characteristics of subsequent instructions.
67
68 With these (and other) long-established precedents already in POWER,
69 there is therefore essentially conceptually nothing new about what we
70 propose: we simply seek that the process by which such "switching" is
71 added is formalised and standardised, such that we (and others, including
72 IBM itself) have a clear, well-defined standards-non-disruptive, atomic
73 and non-intrusive path to extend the POWER ISA for use in markets that
74 it presently cannot enter.
75
76 We advocate that some of "mode-setting" (escape-sequencing) bits be
77 binary encoded, some unary encoded, and that some space marked for
78 "offical" use, some "experimental", some "custom" and some "reserved".
79 The available space in a suitably-chosen SPR to be formalised, and
80 recommend the OpenPOWER Foundation be given the IANA-like role in
81 atomically allocating mode bits.
82
83 The IANA-like atomic role ensures that new PCR mode bits are allocated world-wide unique. In combination with a mandatory illegal instruction exception to be thrown on any system not supporting any given mode, the opportunity exists for all systems to trap and emulate all other systems and thus retain some semblance of interoperability. (*Contrast this with either allocating the same mode bit(s) to two (or more) designers, or not making illegal exceptions mandatory: binary interoperability becomes unachievable and the result is irrevocable damage to POWER's reputation.*)
84
85 We also advocate to consider reserving some bits as a "countdown" where the new mode will be enabled only for a certain *number* of instructions. This avoids an explicit need to "flip back", reducing binary code size. Note that it is not a good idea to let the counter cross a branch or other change in PC (and to throw illegal instruction trap if attempted). However traps and exceptions themselves will need to save (and restore) the countdown, just as the rest of the PCR and other modeswitching bits need to be saved.
86
87 Instructions that we need to add, which are a normal part of GPUs,
88 include ATAN2, LOG, NORMALISE, YUV2RGB, Khronos Compliance FP mode
89 (different from both IEEE754 and "NI" mode), and many more. Many of
90 these may turn out to be useful in a wider context: they however need
91 to be fully isolated behind "mode-setting" before being in any way
92 considered for Standards-track formal adoption.
93
94 Some mode-setting instructions are privileged, i.e can only be set by
95 the kernel (e.g 32 or 64 bit mode). Most of the escape sequences that we
96 propose will be (have to be) usable without the need for an expensive
97 system call overhead (because some of the instructions needed will be
98 in extremely tight inner loops).
99
100 # About Libre-SOC Commercial Project
101
102 The Libre-SOC Commercial Product is a hybrid GPU-GPU-VPU intended for
103 mass-volume production. There is no separate GPU, because the CPU
104 *is* the GPU. There is no separate VPU, because the CPU *is* the GPU.
105 There is not even a separate pipeline: the CPU pipelines *are* the
106 GPU and VPU pipelines.
107
108 Closest equivalents include the ARC core (which has VPU extensions and
109 3D extensions in the form of Broadcom's VideoCore IV) and the ICubeCorp
110 IC3128. Both are considered "hybrid" CPU-GPU-VPU processors.
111
112 "Normal" Commercial GPUs are entirely separate processors. The development
113 cost and complexity purely in terms of Software Drivers alone is immense.
114 We reject that approach (and as a small team we do not have the resources
115 anyway).
116
117 With the project being Libre - not proprietary and secretive and never
118 to be published, ever - it is no good having the extensions as "custom"
119 because "custom" is specifically for the cases where the augmented
120 toolchain is never, under any circumstances, published and made public by
121 the proprietary company (and would never be accepted upstream anyway).
122 For business commercial reasons, Libre-SOC is the total opposite of this
123 proprietary, secretive approach.
124
125 Therefore, to meet our business objectives:
126
127 * As shown from Nyuzi and Larrabee, although ideally suited to high
128 performance compute tasks, a "traditional" general-purpose full
129 IEEE754-compliant Vector ISA (such as that in POWER9) is not an adequate
130 basis for a commercially competitive GPU. Nyuzi's conclusion is that
131 using such general-purpose Vector ISAs results in reaching only 25%
132 performance (or requiring 4-fold increase in power consumption) to
133 achieve par with current commercial-grade GPUs.
134 * We are not going the "traditional" (separate custom GPU) route because
135 it is not practical for a new team to design hardware and spend 8+
136 man-years on massively complex inter-processor driver development as well
137 * We cannot meet our objectives with a "custom extension" because the
138 financial burden on our team to maintain a total hard fork of not just
139 toolchains, but also entire GNU/Linux Distros, is highly undesirable,
140 and completely impractical (we know for certain that Redhat would
141 strongly object to any efforts to hard-fork Fedora)
142 * We could invent our own custom GPU instruction set (or use and extend an existing one, to save a man-decade on toolchain development) however even to switch over to that "Dual ISA" GPU instruction set in the next clock cycle *still requires a PCR modeswitch bit* in order to avoid needing a full Inter-Processor Bus Architecture like on "traditional" GPUs.
143 * If extending any instruction set, rather than have a Dual ISA (which needs the PCR modeswitch bit to access it) we would rather extend POWER.
144 * We cannot "go ahead anyway" because to do so would be highly irresponsible
145 and cause massive disruption to the POWER community.
146
147 With all impractical options eliminated the only remaining responsible
148 option is to extend the POWER ISA in an atomically-managed (IANA-style)
149 formal fashion, whilst (critically and absolutely essentially) always
150 providing a PCR compatibility mode that is fully POWER compliant, including
151 all illegal instruction traps.
152