start filling in
[libreriscv.git] / isa_conflict_resolution.mdwn
1 # Resolving ISA conflicts and providing a pain-free RISC-V Standards Upgrade Path
2
3 In a lengthy thread that ironically was full of conflict indicative
4 of the future direction in which RISC-V will go if left unresolved,
5 multiple Custom Extensions were noted to be permitted free rein to
6 introduce global binary-encoding conflict with no means of resolution
7 described or endorsed by the RISC-V Standard: a practice that has known
8 disastrous and irreversible consequences for any architecture that
9 permits such practices (1).
10
11 Much later on in the discussion it was realised that there is also no way
12 within the current RISC-V Specification to transition to improved versions
13 of the standard, regardless of whether the fixes are absolutely critical
14 show-stoppers or whether they are just keeping the standard up-to-date (2).
15
16 With no transition path there is guaranteed to be tension and conflict
17 within the RISC-V Community over whether revisions should be made:
18 should existing legacy designs be prioritised, mutually-exclusively over
19 future designs (and what happens during the transition period is absolute
20 chaos, with the compiler toolchain, software ecosystem and ultimately
21 the end-users bearing the full brunt of the impact). If several
22 overlapping revisions are required that have not yet transitioned out
23 of use (which could take well over two decades to occur) the situation
24 becomes disastrous for the credibility of the entire RISC-V ecosystem.
25
26 It was also pointed out that Compliance is an extremely important factor
27 to take into consideration, and that Custom Extensions (as being optional)
28 effectively and quite reasonably fall entirely outside of the scope of
29 Compliance Testing. At this point in the discussion however it was not
30 yet noted the stark problem that the *mandatory* RISC-V Specification
31 also faces, by virtue of there being no transitional way to bring in
32 show-stopping critical alterations.
33
34 To put this into perspective, just taking into account hardware costs
35 alone: with production mask charges for 28nm being around USD $1.5m,
36 engineering development costs and licensing of RTLs for peripherals
37 being of a similar magnitude, no manufacturer is going to back away
38 from selling a "flawed" or "legacy" product (whether it complies with
39 the RISC-V Specification or not) without a bitter fight.
40
41 It was also pointed out that there will be significant software tool
42 maintenance costs for manufacturers, meaning that the probability will
43 be extremely high that they will refuse to shoulder such costs, and
44 will publish and continue to publish (and use) hopelessly out-of-date
45 unpatched tools. This practice is well-known to result in security
46 flaws going unpatched, with one of many immediate undesirable consequences
47 being that product in extremely large volume gets discarded into landfill.
48
49 All and any of the issues that were discussed, and all of those that
50 were not, can be avoided by providing a forwards and backwards
51 compatible transition path between the current and future *mandatory*
52 parts of revisions of the RISC-V ISA Standard.
53
54 The rest of the discussion - indicative as it was of the stark mutually
55 exclusive gap being faced by the RISC-V ISA Standard given that it does
56 not cope with the problem - was an effort by two groups in two clear
57 camps: one that wanted things to remain as they are, and another that
58 made efforts to point out that the consequences of not taking action
59 are clearly extreme and irreversible (which, unfortunately, given the
60 severity, some of the first group were unable to believe, despite there
61 being clear historical precedent for the same mistake being made in
62 other architectures).
63
64 However after a significant amount of time, certain clear requirements came
65 out of the discussion:
66
67 * Any proposal must be a minimal change with minimal (or zero) impact
68 * Any proposal should place no restriction on existing or future
69 ISA encoding space
70 * Any proposal should take into account that there are existing implementors
71 of the (yet to be finalised but still "partly frozen") Standard who may
72 resist, for financial investment reasons, efforts to make any change
73 (at all) that could cost them immediate short-term profits.
74
75 Several proposals were put forward (and some are still under discussion)
76
77 * "Do nothing": problem is not severe: no action needed.
78 * "Do nothing": problem is out-of-scope for RISC-V Foundation.
79 * "Do nothing": problem complicates Compliance Testing (and is out of scope)
80 * "MISA": the MISA CSR enables and disables extensions already: use that
81 * "MISA-like": a new CSR which switches in and out new encodings
82 (without destroying state)
83 * "mvendorid/marchid WARL": switching the entire "identity" of a machine
84 * "ioctl-like": a OO proposal based around the linux kernel "ioctl" system.
85
86 Each of these will be discussed below in their own sections.
87
88 # Do nothing (no problem exists)
89
90 TBD (basically not an option).
91
92 There were several solutions offered that fell into this category.
93 A few of them are listed in the introduction; more are listed below,
94 and it was exhaustively (and exhaustingly) established that none of
95 them are workable.
96
97 Initially it was pointed out that Fabless Semiconductor companies could
98 simply license multiple Custom Extensions and a suitable RISC-V core, and
99 modify them accordingly. The Fabless Semi Company would be responsible
100 for paying the NREs on re-developing the test vectors (as the extension
101 licensers would be extremely unlikely to do that without payment), and
102 given that said Companies have an "integration" job to do, it would
103 be reasonable to expect them to have such additional costs as well.
104
105 The costs of this approach were outlined and discussed as being
106 disproportionate and extreme compared to the actual likely cost of
107 licensing the Custom Extensions in the first place. Additionally it
108 was pointed out that not only hardware NREs would be involved but
109 custom software tools (compilers and more) would also be required
110 (and maintained separately, on the basis that upstream would not
111 accept them except under extreme pressure, and then only with
112 prejudice).
113
114 All similar schemes involving customisation of the custom extensions
115 were likewise rejected, but not before the customisation process was
116 mistakenly conflated with tne *normal* integration process of developing
117 a custom processor (Bus Architectures, Cache layouts, peripheral layouts).
118
119 The most compelling hardware-related reason (excluding the severe impact on
120 the software ecosystem) for rejecting the customisation-of-customisation
121 approach was the case where Extensions were using an instruction encoding
122 space (48-bit, 64-bit) *greater* than that which the chosen core could
123 cope with (32-bit, 48-bit).
124
125 Overall, none of the options presented were feasible, and, in addition,
126 even if they were followed through, still would result in the failure
127 of the RISC-V ecosystem due to global conflicting ISA binary-encoding
128 meanings (POWERPC's Altivec / SPE nightmare).
129
130 # Do nothing (out of scope)
131
132 TBD (basically, may not be RV Foundation's "scope", still results in
133 problem, so not an option)
134
135 This was one of the first arguments presented: The RISC-V Foundation
136 considers Custom Extensions to be "out of scope"; that "it's not their
137 problem, therefore there isn't a problem".
138
139 The logical errors in this argument were quickly enumerated: namely
140 that the RISC-V Foundation is not in control of the use-cases, such
141 that binary-encoding is a hundred percent guaranteed to occur, and
142 a hundred percent guaranteed to occur in *commodity* hardware where
143 Debian, Fedora, SUSE and other distros will be hardest hit by the
144 resultant chaos, and that will just be the more "visible" aspect of
145 the underlying problem.
146
147 # Do nothing (Compliance too complex, therefore out of scope)
148
149 TBD (basically, may not be RV Foundation's "scope", still results in
150 problem, so not an option)
151
152 Two interestingly diametrically-opposed equally valid arguments exist here:
153
154 * Whilst Compliance testing of Custom Extensions is definitely legitimately
155 out of scope, Compliance testing of simultaneous legacy (old revisions of
156 ISA Standards) and current (new revisions of ISA Standard) definitely
157 is not. Efforts to reduce *Compliance Testing* complexity is therefore
158 "Compliance Tail Wagging Standard Dog".
159 * Beyond a certain threshold, complexity of Compliance Testing is so
160 burdensome that it risks outright rejection of the entire Standard.
161
162 Meeting these two diametrically-opposed perspectives requires that the
163 solution be very, very simple.
164
165 # MISA
166
167 TBD, basically MISA not suitable
168
169 MISA permits extensions to be disabled by masking out the relevant bit.
170 Hypothetically it could be used to disable one extension, then enable
171 another that happens to use the same binary encoding.
172
173 *However*:
174
175 * MISA Extension disabling is permitted (optionally) to **destroy**
176 the state information. Thus it is totally unsuitable for cases
177 where instructions from different Custom extensions are needed in
178 quick succession.
179 * MISA was only designed to cover Standard Extensions.
180 * There is nothing to prevent multiple Extensions being enabled
181 that wish to simultaneously interpret the same binary encoding.
182
183 Overall, whilst the MISA concept is a step in the right direction it's
184 a hundred percent unsuitable for solving the problem.
185
186 # MISA-like
187
188 TBD, basically same as mvend/march WARL except needs an extra CSR where
189 mv/ma doesn't.
190
191 # mvendorid/marchid WARL
192
193 TBD paraphrase and clarify
194
195 > In an earlier part of the thread someone kindly pointed out that MISA
196 > already switches out entire sets of instructions [which interacts at the
197 > "decode" phase]. However it was noted after a few days of investigating
198 > that particular lead that:
199 >
200 > * MISA Extension disabling is permitted (optionally) to DESTROY the state
201 > information (which means that it *has* to be re-initialised just to be
202 > safe... mistake in the standard, there), and * MISA was only designed
203 > to cover Standard Extensions.
204 >
205 > So the practice of switching extensions in and out - and the resultant
206 > "disablement" and "enablement" at the *instruction decode phase* is
207 > *already* a hard requirement as part of conforming with the present
208 > RISC-V Specification.
209 >
210 > Around the same MISA discussion, someone else also kindly pointed out
211 > that one solution to the heavyweight nature of the switching would
212 > be to deliberately introduce a pipeline stall whilst the switching is
213 > occurring: I can see the sense in that approach, even if I don't know the
214 > full details of what each implementor might choose to do. They may even
215 > choose two, or three, or N pipeline stalls: it really doesn't matter,
216 > as it's an implementors' choice (and problem to solve).
217 >
218 > So yes it's pretty heavy-duty... and also already required.
219 >
220 > For the case where "legacy" variants of the RISC-V Standard are
221 > backwards-forwards-compatibly supported over a 10-20 year period
222 > in Industrial and Military/Goverment-procurement scenarios (so that
223 > the impossible-to-achieve pressure is off to get the spec ABSOLUTELY
224 > correct, RIGHT now), nobody would expect a seriously heavy-duty amount
225 > of instruction-by-instruction switching: it'd be used pretty much once
226 > and only once at boot-up (or once in a Hypervisor Virtual Machine client)
227 > and that's it.
228 >
229 > I can however foresee instances where implementors would actually
230 > genuinely want a bank of operations to be carried out using one extension,
231 > followed immediately by another bank from a (conflicting binary-encoding)
232 > extension, in an inner loop: Software-defined MPEG / MP4 decode to call
233 > DCT block decode Custom Extension followed immediately by Custom Video
234 > Processing Extension followed immediately by Custom DSP Processing
235 > Extension to do YUV-to-RGB conversion for example is something that
236 > is clearly desirable. Solving that one would be entiiirely their
237 > problem... and the RISC-V Specification really really should give them
238 > the space to do that in a clear-cut unambiguous way.
239
240 # ioctl-like
241
242 TBD - [[ioctl]] for full details, summary kept here
243
244 # Discussion and analysis
245
246 TBD
247
248 # Conclusion
249
250 TBD
251
252 # Conversation Exerpts
253
254 The following conversation exerpts are taken from the ISA-dev discussion
255
256 ## (1) Albert Calahan on SPE / Altiven conflict in POWERPC
257
258 > Yes. Well, it should be blocked via legal means. Incompatibility is
259 > a disaster for an architecture.
260 >
261 > The viability of PowerPC was badly damaged when SPE was
262 > introduced. This was a vector instruction set that was incompatible
263 > with the AltiVec instruction set. Software vendors had to choose,
264 > and typically the choice was "neither". Nobody wants to put in the
265 > effort when there is uncertainty and a market fragmented into
266 > small bits.
267 > Note how Intel did not screw up. When SSE was added, MMX remained.
268 > Software vendors could trust that instructions would be supported.
269 > Both MMX and SSE remain today, in all shipping processors. With very
270 > few exceptions, Intel does not ship chips with missing functionality.
271 > There is a unified software ecosystem.
272 >
273 > This goes beyond the instruction set. MMU functionality also matters.
274 > You can add stuff, but then it must be implemented in every future CPU.
275 > You can not take stuff away without harming the architecture.
276
277 ## (2) Luke Kenneth Casson Leighton on Standards backwards-compatibility
278
279 > For the case where "legacy" variants of the RISC-V Standard are
280 > backwards-forwards-compatibly supported over a 10-20 year period in
281 > Industrial and Military/Goverment-procurement scenarios (so that the
282 > impossible-to-achieve pressure is off to get the spec ABSOLUTELY
283 > correct, RIGHT now), nobody would expect a seriously heavy-duty amount
284 > of instruction-by-instruction switching: it'd be used pretty much once
285 > and only once at boot-up (or once in a Hypervisor Virtual Machine
286 > client) and that's it.
287
288 ## (3) Allen Baum on Standards Compliance
289
290 > Putting my compliance chair hat on: One point that was made quite
291 > clear to me is that compliance will only test that an implementation
292 > correctly implements the portions of the spec that are mandatory, and
293 > the portions of the spec that are optional and the implementor claims
294 > it is implementing. It will test nothing in the custom extension space,
295 > and doesn't monitor or care what is in that space.
296