move ioctl-like to separate page
[libreriscv.git] / isa_conflict_resolution.mdwn
1 # Resolving ISA conflicts and providing a pain-free RISC-V Standards Upgrade Path
2
3 ## Executive Summary
4
5 A non-invasive backwards-compatible change to make mvendorid and marchid
6 being read-only to be a formal declaration of an architecture having no
7 Custom Extensions, and being permitted to be WARL in order to support
8 multiple simultaneous architectures on the same processor (or per hart
9 or harts) permits not only backwards and forwards compatibility with
10 existing implementations of the RISC-V Standard, not only permits seamless
11 transitions to future versions of the RISC-V Standard (something that is
12 not possible at the moment), but fixes the problem of clashes in Custom
13 Extension opcodes on a global worldwide permanent and ongoing basis.
14
15 Summary of impact and benefits:
16
17 * Implementation impact for existing implementations (even though
18 the Standard is not finalised) is zero.
19 * Impact for future implementations compliant with (only one) version of the
20 RISC-V Standard is zero.
21 * Benefits for implementations complying with (one or more) versions
22 of the RISC-V Standard is: increased customer acceptance due to
23 a smooth upgrade path at the customer's pace and initiative vis-a-vis
24 legacy proprietary software.
25 * Benefits for implementations deploying multiple Custom Extensions
26 are a massive reduction in NREs and the hugely reduced ongoing software
27 toolchain maintenance costs plus the benefit of having security updates
28 from upstream software sources due to
29 *globally unique identifying information* resulting in zero binary
30 encoding conflicts in the toolchains and resultant binaries
31 *even for Custom Extensions*.
32
33 ## Introduction
34
35 In a lengthy thread that ironically was full of conflict indicative
36 of the future direction in which RISC-V will go if left unresolved,
37 multiple Custom Extensions were noted to be permitted free rein to
38 introduce global binary-encoding conflict with no means of resolution
39 described or endorsed by the RISC-V Standard: a practice that has known
40 disastrous and irreversible consequences for any architecture that
41 permits such practices (1).
42
43 Much later on in the discussion it was realised that there is also no way
44 within the current RISC-V Specification to transition to improved versions
45 of the standard, regardless of whether the fixes are absolutely critical
46 show-stoppers or whether they are just keeping the standard up-to-date (2).
47
48 With no transition path there is guaranteed to be tension and conflict
49 within the RISC-V Community over whether revisions should be made:
50 should existing legacy designs be prioritised, mutually-exclusively over
51 future designs (and what happens during the transition period is absolute
52 chaos, with the compiler toolchain, software ecosystem and ultimately
53 the end-users bearing the full brunt of the impact). If several
54 overlapping revisions are required that have not yet transitioned out
55 of use (which could take well over two decades to occur) the situation
56 becomes disastrous for the credibility of the entire RISC-V ecosystem.
57
58 It was also pointed out that Compliance is an extremely important factor
59 to take into consideration, and that Custom Extensions (as being optional)
60 effectively and quite reasonably fall entirely outside of the scope of
61 Compliance Testing. At this point in the discussion however it was not
62 yet noted the stark problem that the *mandatory* RISC-V Specification
63 also faces, by virtue of there being no transitional way to bring in
64 show-stopping critical alterations.
65
66 To put this into perspective, just taking into account hardware costs
67 alone: with production mask charges for 28nm being around USD $1.5m,
68 engineering development costs and licensing of RTLs for peripherals
69 being of a similar magnitude, no manufacturer is going to back away
70 from selling a "flawed" or "legacy" product (whether it complies with
71 the RISC-V Specification or not) without a bitter fight.
72
73 It was also pointed out that there will be significant software tool
74 maintenance costs for manufacturers, meaning that the probability will
75 be extremely high that they will refuse to shoulder such costs, and
76 will publish and continue to publish (and use) hopelessly out-of-date
77 unpatched tools. This practice is well-known to result in security
78 flaws going unpatched, with one of many immediate undesirable consequences
79 being that product in extremely large volume gets discarded into landfill.
80
81 **All and any of the issues that were discussed, and all of those that
82 were not, can be avoided by providing a hardware-level runtime-enabled
83 forwards and backwards compatible transition path between *all* parts
84 (mandatory or not) of current and future revisions of the RISC-V ISA
85 Standard.**
86
87 The rest of the discussion - indicative as it was of the stark mutually
88 exclusive gap being faced by the RISC-V ISA Standard given that it does
89 not cope with the problem - was an effort by two groups in two clear
90 camps: one that wanted things to remain as they are, and another that
91 made efforts to point out that the consequences of not taking action
92 are clearly extreme and irreversible (which, unfortunately, given the
93 severity, some of the first group were unable to believe, despite there
94 being clear historical precedent for the exact same mistake being made in
95 other architectures, and the consequences on the same being absolutely
96 clear).
97
98 However after a significant amount of time, certain clear requirements came
99 out of the discussion:
100
101 * Any proposal must be a minimal change with minimal (or zero) impact
102 * Any proposal should place no restriction on existing or future
103 ISA encoding space
104 * Any proposal should take into account that there are existing implementors
105 of the (yet to be finalised but still "partly frozen") Standard who may
106 resist, for financial investment reasons, efforts to make any change
107 (at all) that could cost them immediate short-term profits.
108
109 Several proposals were put forward (and some are still under discussion)
110
111 * "Do nothing": problem is not severe: no action needed.
112 * "Do nothing": problem is out-of-scope for RISC-V Foundation.
113 * "Do nothing": problem complicates Compliance Testing (and is out of scope)
114 * "MISA": the MISA CSR enables and disables extensions already: use that
115 * "MISA-like": a new CSR which switches in and out new encodings
116 (without destroying state)
117 * "mvendorid/marchid WARL": switching the entire "identity" of a machine
118 * "ioctl-like": a OO proposal based around the linux kernel "ioctl" system.
119
120 Each of these will be discussed below in their own sections.
121
122 # Do nothing (no problem exists)
123
124 (Summary: not an option)
125
126 There were several solutions offered that fell into this category.
127 A few of them are listed in the introduction; more are listed below,
128 and it was exhaustively (and exhaustingly) established that none of
129 them are workable.
130
131 Initially it was pointed out that Fabless Semiconductor companies could
132 simply license multiple Custom Extensions and a suitable RISC-V core, and
133 modify them accordingly. The Fabless Semi Company would be responsible
134 for paying the NREs on re-developing the test vectors (as the extension
135 licensers would be extremely unlikely to do that without payment), and
136 given that said Companies have an "integration" job to do, it would
137 be reasonable to expect them to have such additional costs as well.
138
139 The costs of this approach were outlined and discussed as being
140 disproportionate and extreme compared to the actual likely cost of
141 licensing the Custom Extensions in the first place. Additionally it
142 was pointed out that not only hardware NREs would be involved but
143 custom software tools (compilers and more) would also be required
144 (and maintained separately, on the basis that upstream would not
145 accept them except under extreme pressure, and then only with
146 prejudice).
147
148 All similar schemes involving customisation of the custom extensions
149 were likewise rejected, but not before the customisation process was
150 mistakenly conflated with tne *normal* integration process of developing
151 a custom processor (Bus Architectures, Cache layouts, peripheral layouts).
152
153 The most compelling hardware-related reason (excluding the severe impact on
154 the software ecosystem) for rejecting the customisation-of-customisation
155 approach was the case where Extensions were using an instruction encoding
156 space (48-bit, 64-bit) *greater* than that which the chosen core could
157 cope with (32-bit, 48-bit).
158
159 Overall, none of the options presented were feasible, and, in addition,
160 with no clear leadership from the RISC-V Foundation on how to avoid
161 global world-wide encoding conflict, even if they were followed through,
162 still would result in the failure of the RISC-V ecosystem due to
163 irreversible global conflicting ISA binary-encoding meanings (POWERPC's
164 Altivec / SPE nightmare).
165
166 This in addition to the case where the RISC-V Foundation wishes to
167 fix a critical show-stopping update to the Standard, post-release,
168 where billions of dollars have been spent on deploying RISC-V in the
169 field.
170
171 # Do nothing (out of scope)
172
173 (Summary: may not be RV Foundation's "scope", still results in
174 problem, so not an option)
175
176 This was one of the first arguments presented: The RISC-V Foundation
177 considers Custom Extensions to be "out of scope"; that "it's not their
178 problem, therefore there isn't a problem".
179
180 The logical errors in this argument were quickly enumerated: namely that
181 the RISC-V Foundation is not in control of the uses to which RISC-V is
182 put, such that public global conflicts in binary-encoding are a hundred
183 percent guaranteed to occur (*outside* of the control and remit of the
184 RISC-V Foundation), and a hundred percent guaranteed to occur in
185 *commodity* hardware where Debian, Fedora, SUSE and other distros will
186 be hardest hit by the resultant chaos, and that will just be the more
187 "visible" aspect of the underlying problem.
188
189 # Do nothing (Compliance too complex, therefore out of scope)
190
191 (Summary: may not be RV Foundation's "scope", still results in
192 problem, so not an option)
193
194 The summary here was that Compliance testing of Custom Extensions is
195 not just out-of-scope, but even if it was taken into account that
196 binary-encoding meanings could change, it would still be out-of-scope.
197
198 However at the time that this argument was made, it had not yet been
199 appreciated fully the impact that revisions to the Standard would have,
200 when billions of dollars worth of (older, legacy) RISC-V hardware had
201 already been deployed.
202
203 Two interestingly diametrically-opposed equally valid arguments exist here:
204
205 * Whilst Compliance testing of Custom Extensions is definitely legitimately
206 out of scope, Compliance testing of simultaneous legacy (old revisions of
207 ISA Standards) and current (new revisions of ISA Standard) definitely
208 is not. Efforts to reduce *Compliance Testing* complexity is therefore
209 "Compliance Tail Wagging Standard Dog".
210 * Beyond a certain threshold, complexity of Compliance Testing is so
211 burdensome that it risks outright rejection of the entire Standard.
212
213 Meeting these two diametrically-opposed perspectives requires that the
214 solution be very, very simple.
215
216 # MISA
217
218 (Summary: MISA not suitable, leads to better idea)
219
220 MISA permits extensions to be disabled by masking out the relevant bit.
221 Hypothetically it could be used to disable one extension, then enable
222 another that happens to use the same binary encoding.
223
224 *However*:
225
226 * MISA Extension disabling is permitted (optionally) to **destroy**
227 the state information. Thus it is totally unsuitable for cases
228 where instructions from different Custom extensions are needed in
229 quick succession.
230 * MISA was only designed to cover Standard Extensions.
231 * There is nothing to prevent multiple Extensions being enabled
232 that wish to simultaneously interpret the same binary encoding.
233 * There is nothing in the MISA specification which permits
234 *future* versions (bug-fixes) of the RISC-V ISA to be "switched in".
235
236 Overall, whilst the MISA concept is a step in the right direction it's
237 a hundred percent unsuitable for solving the problem.
238
239 # MISA-like
240
241 (Summary: basically same as mvend/march WARL except needs an extra CSR where
242 mv/ma doesn't. Along right lines, doesn't meet full requirements)
243
244 Out of the MISA discussion came a "MISA-like" proposal, which would
245 take into account the flaws pointed out by trying to use "MISA":
246
247 * The MISA-like CSR's meaning would be identified by compilers using the
248 mvendor-id/march-id tuple as a compiler target
249 * Each custom-defined bit of the MISA-like CSR would (mutually-exclusively)
250 redirect binary encoding(s) to specific encodings
251 * No Extension would *actually* be disabled: its internal state would
252 be left on (permanently) so that switching of ISA decoding
253 could be done inside inner loops without adverse impact on
254 performance.
255
256 Whilst it was the first "workable" solution it was also noted that the
257 scheme is invasive: it requires an entirely new CSR to be added
258 to the privileged spec (thus making existing implementations redundant).
259 This does not fulfil the "minimum impact" requirement.
260
261 Also interesting around the same time an additional discussion was
262 raised that covered the *compiler* side of the same equation. This
263 revolved around using mvendorid-marchid tuples at the compiler level,
264 to be put into assembly output (by gcc), preserving the required
265 *globally* unique identifying information for binutils to successfully
266 turn the custom instruction into an actual binary-encoding (plus
267 binary-encoding of the context-switching information). (**TBD, Jacob,
268 separate page? review this para?**)
269
270 # mvendorid/marchid WARL
271
272 (Summary: the only idea that meets the full requirements. Needs
273 toolchain backup, but only when the first chip is released)
274
275 Coming out of the software-related proposal by Jacob Bachmeyer, which
276 hinged on the idea of a globally-maintained gcc / binutils database
277 that kept and coordinated architectural encodings (curated by the Free
278 Software Foundation), was to quite simply make the mvendorid and marchid
279 CSRs have WARL (writeable) characteristics. For instances where mvendorid
280 and marchid are readable, that would be taken to be a Standards-mandatory
281 "declaration" that the architecture has *no* Custom Extensions (and that
282 it conforms precisely to one and only one specific variant of the
283 RISC-V Specification).
284
285 This incredibly simple non-invasive idea has some unique and distinct
286 advantages over other proposals:
287
288 * Existing designs - even though the specification is not finalised
289 (but has "frozen" aspects) - would be completely unaffected: the
290 change is to the "wording" of the specification to "retrospectively"
291 fit reality.
292 * Unlike with the MISA idea this is *purely* at the "decode" phase:
293 no internal Extension state information is permitted to be disabled,
294 altered or destroyed as a direct result of writing to the
295 mvendor/march-id CSRs.
296 * Compliance Testing may be carried out with a different vendorid/marchid
297 tuple set prior to a test, allowing a vendor to claim *Certified*
298 compatibility with *both* one (or more) legacy variants of the RISC-V
299 Specification *and* with a present one.
300 * With sufficient care taken in the implementation an implementor
301 may have multiple interpretations of the same binary encoding within
302 an inner loop, with a single instruction (to the WARL register)
303 changing the meaning.
304
305 A couple of points were made:
306
307 * Compliance Testing may **fail** any system that has mvendorid/marchid
308 as WARL. This however is a clear case of "Compliance Tail Wagging Standard
309 Dog".
310 * The redirection of meaning of certain binary encodings to multiple
311 engines was considered extreme, eyebrow-raising, and also (importantly)
312 potentially expensive, introducing significant latency at the decode
313 phase.
314
315 On this latter point, it was observed that MISA already switches out entire
316 sets of instructions (interacts at the "decode" phase). The difference
317 between what MISA does and the mvendor/march-id WARL idea is that whilst
318 MISA only switches instruction decoding on (or off), the WARL idea
319 *redirects* encoding, effectively to *different* simultaneous engines,
320 fortunately in a deliberately mutually-exclusive fashion.
321
322 Implementations would therefore, in each Extension (assuming one separate
323 "decode" engine per Extension), simply have an extra (mutually-exclusively
324 enabled) wire in the AND gate for any given binary encoding, and in this
325 way there would actually be very little impact on the latency. The assumption
326 here is that there are not dozens of Extensions vying for the same binary
327 encoding (at which point the Fabless Semi Company has other much more
328 pressing issues to deal with that make resolving binary encoding conflicts
329 trivial by comparison).
330
331 Also pointed out was that in certain cases pipeline stalls could be introduced
332 during the switching phase, if needed, just as they may be needed for
333 correct implementation of (mandatory) support for MISA.
334
335 **This is the only one of the proposals that meet the full requirements**
336
337 # ioctl-like <a name="#ioctl-like">
338
339 (Summary: good solid orthogonal idea. See [[ioctl]] for full details)
340
341 NOTE: under discussion.
342
343 This proposal basically mirrors the concept of POSIX ioctls, providing
344 (arbitrarily) 8 functions (opcodes) whose meaning may be over-ridden
345 in an object-orientated fashion by calling an "open handle" (and close)
346 function (instruction) that switches (redirects) the 8 functions over to
347 different opcodes.
348
349 The proposal is functionally near-identical to that of the mvendor/march-id
350 except extended down to individual opcodes. As such it could hypothetically
351 be proposed as an independent Standard Extension in its own right that extends
352 the Custom Opcode space *or* fits into the brownfield spaces within the
353 existing ISA opcode space *or* is used as the basis of an independent
354 Custom Extension in its own right.
355
356 ==RB==
357 I really think it should be in browncode
358 ==RB==
359
360 One of the reasons for seeking an extension of the Custom opcode space is
361 that the Custom opcode space is severely limited: only 2 opcodes are free
362 within the 32-bit space, and only four total remain in the 48 and 64-bit
363 space.
364
365 Despite the proposal (which is still undergoing clarification)
366 being worthwhile in its own right, and standing on its own merits and
367 thus definitely worthwhile pursuing, it is non-trivial and much more
368 invasive than the mvendor/march-id WARL concept.
369
370 # Comments, Discussion and analysis
371
372 TBD: placeholder as of 26apr2018
373
374 # Summary and Conclusion
375
376 In the early sections (those in the category "no action") it was established
377 in each case that the problem is not solved. Avoidance of responsibility,
378 or conflation of "not our problem" with "no problem" does not make "problem"
379 go away. Even "making it the Fabless Semiconductor's design problem" resulted
380 in a chip being *more costly to engineer as hardware **and** more costly
381 from a software-support perspective to maintain*... without actually
382 fixing the problem.
383
384 The first idea considered which could fix the problem was to just use
385 the pre-existing MISA CSR, however this was determined not to have
386 the right coverage (Standard Extensions only), and also crucially it
387 destroyed state. Whilst unworkable it did lead to the first "workable"
388 solution, "MISA-like".
389
390 The "MISA-like" proposal, whilst meeting most of the requirements, led to
391 a better idea: "mvendor/march-id WARL", which, in combination with an offshoot
392 idea related to gcc and binutils, is the only proposal that fully meets the
393 requirements.
394
395 The "ioctl-like" idea *also* solves the problem, but, unlike the WARL idea
396 does not meet the full requirements to be "non-invasive" and "backwards
397 compatible" with pre-existing (pre-Standards-finalised) implementations.
398 It does however stand on its own merit as a way to extend the extremely
399 small Custom Extension opcode space, even if it itself implemented *as*
400 a Custom Extension into which *other* Custom Extensions are subsequently
401 shoe-horned. This approach has the advantage that it requires no "approval"
402 from the RISC-V Foundation... but without the RISC-V Standard "approval"
403 guaranteeing no binary-encoding conflicts, still does not actually solve the
404 problem (if deployed as a Custom Extension for extending Custom Extensions).
405
406 Overall the mvendor/march-id WARL idea meets the three requirements,
407 and is the only idea that meets the three requirements:
408
409 * **Any proposal must be a minimal change with minimal (or zero) impact**
410 (met through being purely a single backwards-compatible change to the
411 wording of the specification: mvendor/march-id changes from read-only
412 to WARL)
413 * **Any proposal should place no restriction on existing or future
414 ISA encoding space**
415 (met because it is just a change to one pre-existing CSR, as opposed
416 to requiring additional CSRs or requiring extra opcodes or changes
417 to existing opcodes)
418 * **Any proposal should take into account that there are existing implementors
419 of the (yet to be finalised but still "partly frozen") Standard who may
420 resist, for financial investment reasons, efforts to make any change
421 (at all) that could cost them immediate short-term profits.**
422 (met because existing implementations, with the exception of those
423 that have Custom Extensions, come under the "vendor/arch-id read only
424 is a formal declaration of an implementation having no Custom Extensions"
425 fall-back category)
426
427 So to summarise:
428
429 * The consequences of not tackling this are severe: the RISC-V Foundation
430 cannot take a back seat. If it does, clear historical precedent shows
431 100% what the outcome will be (1).
432 * Making the mvendorid and marchid CSRs WARL solves the problem in a
433 minimal to zero-disruptive backwards-compatible fashion that provides
434 indefinite transparent *forwards*-compatibility.
435 * The retro-fitting cost onto existing implementations (even though the
436 specification has not been finalised) is zero to negligeable
437 (only changes to words in the specification required at this time:
438 no vendor need discard existing designs, either being designed,
439 taped out, or actually in production).
440 * The benefits are clear (pain-free transition path for vendors to safely
441 upgrade over time; no fights over Custom opcode space; no hassle for
442 software toolchain; no hassle for GNU/Linux Distros)
443 * The implementation details are clear (and problem-free except for
444 vendors who insist on deploying dozens of conflicting Custom Extensions:
445 an extreme unlikely outlier).
446 * Compliance Testing is straightforward and allows vendors to seek and
447 obtain *multiple* Compliance Certificates with past, present and future
448 variants of the RISC-V Standard (in the exact same processor,
449 simultaneously), in order to support end-customer legacy scenarios and
450 provide the same with a way to avoid "impossible-to-make" decisions that
451 throw out ultra-costly multi-decade-investment in proprietary legacy
452 software at the same as the (legacy) hardware.
453
454 -------
455
456 # Conversation Exerpts
457
458 The following conversation exerpts are taken from the ISA-dev discussion
459
460 ## (1) Albert Calahan on SPE / Altiven conflict in POWERPC
461
462 > Yes. Well, it should be blocked via legal means. Incompatibility is
463 > a disaster for an architecture.
464 >
465 > The viability of PowerPC was badly damaged when SPE was
466 > introduced. This was a vector instruction set that was incompatible
467 > with the AltiVec instruction set. Software vendors had to choose,
468 > and typically the choice was "neither". Nobody wants to put in the
469 > effort when there is uncertainty and a market fragmented into
470 > small bits.
471 >
472 > Note how Intel did not screw up. When SSE was added, MMX remained.
473 > Software vendors could trust that instructions would be supported.
474 > Both MMX and SSE remain today, in all shipping processors. With very
475 > few exceptions, Intel does not ship chips with missing functionality.
476 > There is a unified software ecosystem.
477 >
478 > This goes beyond the instruction set. MMU functionality also matters.
479 > You can add stuff, but then it must be implemented in every future CPU.
480 > You can not take stuff away without harming the architecture.
481
482 ## (2) Luke Kenneth Casson Leighton on Standards backwards-compatibility
483
484 > For the case where "legacy" variants of the RISC-V Standard are
485 > backwards-forwards-compatibly supported over a 10-20 year period in
486 > Industrial and Military/Goverment-procurement scenarios (so that the
487 > impossible-to-achieve pressure is off to get the spec ABSOLUTELY
488 > correct, RIGHT now), nobody would expect a seriously heavy-duty amount
489 > of instruction-by-instruction switching: it'd be used pretty much once
490 > and only once at boot-up (or once in a Hypervisor Virtual Machine
491 > client) and that's it.
492
493 ## (3) Allen Baum on Standards Compliance
494
495 > Putting my compliance chair hat on: One point that was made quite
496 > clear to me is that compliance will only test that an implementation
497 > correctly implements the portions of the spec that are mandatory, and
498 > the portions of the spec that are optional and the implementor claims
499 > it is implementing. It will test nothing in the custom extension space,
500 > and doesn't monitor or care what is in that space.
501
502 # References
503
504 * <https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/7bbwSIW5aqM>
505 * <https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/InzQ1wr_3Ak%5B1-25%5D>