clarify
[libreriscv.git] / isa_conflict_resolution.mdwn
1 # Resolving ISA conflicts and providing a pain-free RISC-V Standards Upgrade Path
2
3 In a lengthy thread that ironically was full of conflict indicative
4 of the future direction in which RISC-V will go if left unresolved,
5 multiple Custom Extensions were noted to be permitted free rein to
6 introduce global binary-encoding conflict with no means of resolution
7 described or endorsed by the RISC-V Standard: a practice that has known
8 disastrous and irreversible consequences for any architecture that
9 permits such practices (1).
10
11 Much later on in the discussion it was realised that there is also no way
12 within the current RISC-V Specification to transition to improved versions
13 of the standard, regardless of whether the fixes are absolutely critical
14 show-stoppers or whether they are just keeping the standard up-to-date (2).
15
16 With no transition path there is guaranteed to be tension and conflict
17 within the RISC-V Community over whether revisions should be made:
18 should existing legacy designs be prioritised, mutually-exclusively over
19 future designs (and what happens during the transition period is absolute
20 chaos, with the compiler toolchain, software ecosystem and ultimately
21 the end-users bearing the full brunt of the impact). If several
22 overlapping revisions are required that have not yet transitioned out
23 of use (which could take well over two decades to occur) the situation
24 becomes disastrous for the credibility of the entire RISC-V ecosystem.
25
26 It was also pointed out that Compliance is an extremely important factor
27 to take into consideration, and that Custom Extensions (as being optional)
28 effectively and quite reasonably fall entirely outside of the scope of
29 Compliance Testing. At this point in the discussion however it was not
30 yet noted the stark problem that the *mandatory* RISC-V Specification
31 also faces, by virtue of there being no transitional way to bring in
32 show-stopping critical alterations.
33
34 To put this into perspective, just taking into account hardware costs
35 alone: with production mask charges for 28nm being around USD $1.5m,
36 engineering development costs and licensing of RTLs for peripherals
37 being of a similar magnitude, no manufacturer is going to back away
38 from selling a "flawed" or "legacy" product (whether it complies with
39 the RISC-V Specification or not) without a bitter fight.
40
41 It was also pointed out that there will be significant software tool
42 maintenance costs for manufacturers, meaning that the probability will
43 be extremely high that they will refuse to shoulder such costs, and
44 will publish and continue to publish (and use) hopelessly out-of-date
45 unpatched tools. This practice is well-known to result in security
46 flaws going unpatched, with one of many immediate undesirable consequences
47 being that product in extremely large volume gets discarded into landfill.
48
49 **All and any of the issues that were discussed, and all of those that
50 were not, can be avoided by providing a hardware-level runtime-enabled
51 forwards and backwards compatible transition path between *all* parts
52 (mandatory or not) of current and future revisions of the RISC-V ISA
53 Standard.**
54
55 The rest of the discussion - indicative as it was of the stark mutually
56 exclusive gap being faced by the RISC-V ISA Standard given that it does
57 not cope with the problem - was an effort by two groups in two clear
58 camps: one that wanted things to remain as they are, and another that
59 made efforts to point out that the consequences of not taking action
60 are clearly extreme and irreversible (which, unfortunately, given the
61 severity, some of the first group were unable to believe, despite there
62 being clear historical precedent for the exact same mistake being made in
63 other architectures, and the consequences on the same being absolutely
64 clear).
65
66 However after a significant amount of time, certain clear requirements came
67 out of the discussion:
68
69 * Any proposal must be a minimal change with minimal (or zero) impact
70 * Any proposal should place no restriction on existing or future
71 ISA encoding space
72 * Any proposal should take into account that there are existing implementors
73 of the (yet to be finalised but still "partly frozen") Standard who may
74 resist, for financial investment reasons, efforts to make any change
75 (at all) that could cost them immediate short-term profits.
76
77 Several proposals were put forward (and some are still under discussion)
78
79 * "Do nothing": problem is not severe: no action needed.
80 * "Do nothing": problem is out-of-scope for RISC-V Foundation.
81 * "Do nothing": problem complicates Compliance Testing (and is out of scope)
82 * "MISA": the MISA CSR enables and disables extensions already: use that
83 * "MISA-like": a new CSR which switches in and out new encodings
84 (without destroying state)
85 * "mvendorid/marchid WARL": switching the entire "identity" of a machine
86 * "ioctl-like": a OO proposal based around the linux kernel "ioctl" system.
87
88 Each of these will be discussed below in their own sections.
89
90 # Do nothing (no problem exists)
91
92 (Summary: not an option)
93
94 There were several solutions offered that fell into this category.
95 A few of them are listed in the introduction; more are listed below,
96 and it was exhaustively (and exhaustingly) established that none of
97 them are workable.
98
99 Initially it was pointed out that Fabless Semiconductor companies could
100 simply license multiple Custom Extensions and a suitable RISC-V core, and
101 modify them accordingly. The Fabless Semi Company would be responsible
102 for paying the NREs on re-developing the test vectors (as the extension
103 licensers would be extremely unlikely to do that without payment), and
104 given that said Companies have an "integration" job to do, it would
105 be reasonable to expect them to have such additional costs as well.
106
107 The costs of this approach were outlined and discussed as being
108 disproportionate and extreme compared to the actual likely cost of
109 licensing the Custom Extensions in the first place. Additionally it
110 was pointed out that not only hardware NREs would be involved but
111 custom software tools (compilers and more) would also be required
112 (and maintained separately, on the basis that upstream would not
113 accept them except under extreme pressure, and then only with
114 prejudice).
115
116 All similar schemes involving customisation of the custom extensions
117 were likewise rejected, but not before the customisation process was
118 mistakenly conflated with tne *normal* integration process of developing
119 a custom processor (Bus Architectures, Cache layouts, peripheral layouts).
120
121 The most compelling hardware-related reason (excluding the severe impact on
122 the software ecosystem) for rejecting the customisation-of-customisation
123 approach was the case where Extensions were using an instruction encoding
124 space (48-bit, 64-bit) *greater* than that which the chosen core could
125 cope with (32-bit, 48-bit).
126
127 Overall, none of the options presented were feasible, and, in addition,
128 with no clear leadership from the RISC-V Foundation on how to avoid
129 global world-wide encoding conflict, even if they were followed through,
130 still would result in the failure of the RISC-V ecosystem due to
131 irreversible global conflicting ISA binary-encoding meanings (POWERPC's
132 Altivec / SPE nightmare).
133
134 This in addition to the case where the RISC-V Foundation wishes to
135 fix a critical show-stopping update to the Standard, post-release,
136 where billions of dollars have been spent on deploying RISC-V in the
137 field.
138
139 # Do nothing (out of scope)
140
141 (Summary: may not be RV Foundation's "scope", still results in
142 problem, so not an option)
143
144 This was one of the first arguments presented: The RISC-V Foundation
145 considers Custom Extensions to be "out of scope"; that "it's not their
146 problem, therefore there isn't a problem".
147
148 The logical errors in this argument were quickly enumerated: namely that
149 the RISC-V Foundation is not in control of the uses to which RISC-V is
150 put, such that public global conflicts in binary-encoding are a hundred
151 percent guaranteed to occur (*outside* of the control and remit of the
152 RISC-V Foundation), and a hundred percent guaranteed to occur in
153 *commodity* hardware where Debian, Fedora, SUSE and other distros will
154 be hardest hit by the resultant chaos, and that will just be the more
155 "visible" aspect of the underlying problem.
156
157 # Do nothing (Compliance too complex, therefore out of scope)
158
159 (Summary: may not be RV Foundation's "scope", still results in
160 problem, so not an option)
161
162 The summary here was that Compliance testing of Custom Extensions is
163 not just out-of-scope, but even if it was taken into account that
164 binary-encoding meanings could change, it would still be out-of-scope.
165
166 However at the time that this argument was made, it had not yet been
167 appreciated fully the impact that revisions to the Standard would have,
168 when billions of dollars worth of (older, legacy) RISC-V hardware had
169 already been deployed.
170
171 Two interestingly diametrically-opposed equally valid arguments exist here:
172
173 * Whilst Compliance testing of Custom Extensions is definitely legitimately
174 out of scope, Compliance testing of simultaneous legacy (old revisions of
175 ISA Standards) and current (new revisions of ISA Standard) definitely
176 is not. Efforts to reduce *Compliance Testing* complexity is therefore
177 "Compliance Tail Wagging Standard Dog".
178 * Beyond a certain threshold, complexity of Compliance Testing is so
179 burdensome that it risks outright rejection of the entire Standard.
180
181 Meeting these two diametrically-opposed perspectives requires that the
182 solution be very, very simple.
183
184 # MISA
185
186 (Summary: MISA not suitable, leads to better idea)
187
188 MISA permits extensions to be disabled by masking out the relevant bit.
189 Hypothetically it could be used to disable one extension, then enable
190 another that happens to use the same binary encoding.
191
192 *However*:
193
194 * MISA Extension disabling is permitted (optionally) to **destroy**
195 the state information. Thus it is totally unsuitable for cases
196 where instructions from different Custom extensions are needed in
197 quick succession.
198 * MISA was only designed to cover Standard Extensions.
199 * There is nothing to prevent multiple Extensions being enabled
200 that wish to simultaneously interpret the same binary encoding.
201 * There is nothing in the MISA specification which permits
202 *future* versions (bug-fixes) of the RISC-V ISA to be "switched in".
203
204 Overall, whilst the MISA concept is a step in the right direction it's
205 a hundred percent unsuitable for solving the problem.
206
207 # MISA-like
208
209 (Summary: basically same as mvend/march WARL except needs an extra CSR where
210 mv/ma doesn't. Along right lines, doesn't meet full requirements)
211
212 Out of the MISA discussion came a "MISA-like" proposal, which would
213 take into account the flaws pointed out by trying to use "MISA":
214
215 * The MISA-like CSR's meaning would be identified by compilers using the
216 mvendor-id/march-id tuple as a compiler target
217 * Each custom-defined bit of the MISA-like CSR would (mutually-exclusively)
218 redirect binary encoding(s) to specific encodings
219 * No Extension would *actually* be disabled: its internal state would
220 be left on (permanently) so that switching of ISA decoding
221 could be done inside inner loops without adverse impact on
222 performance.
223
224 Whilst it was the first "workable" solution it was also noted that the
225 scheme is invasive: it requires an entirely new CSR to be added
226 to the privileged spec (thus making existing implementations redundant).
227 This does not fulfil the "minimum impact" requirement.
228
229 Also interesting around the same time an additional discussion was
230 raised that covered the *compiler* side of the same equation. This
231 revolved around using mvendorid-marchid tuples at the compiler level,
232 to be put into assembly output (by gcc), preserving the required
233 *globally* unique identifying information for binutils to successfully
234 turn the custom instruction into an actual binary-encoding (plus
235 binary-encoding of the context-switching information). (**TBD, Jacob,
236 separate page? review this para?**)
237
238 # mvendorid/marchid WARL
239
240 (Summary: the only idea that meets the full requirements. Needs
241 toolchain backup, but only when the first chip is released)
242
243 Coming out of the software-related proposal by Jacob Bachmeyer, which
244 hinged on the idea of a globally-maintained gcc / binutils database
245 that kept and coordinated architectural encodings (curated by the Free
246 Software Foundation), was to quite simply make the mvendorid and marchid
247 CSRs have WARL (writeable) characteristics. For instances where mvendorid
248 and marchid are readable, that would be taken to be a Standards-mandatory
249 "declaration" that the architecture has *no* Custom Extensions (and that
250 it conforms precisely to one and only one specific variant of the
251 RISC-V Specification).
252
253 This incredibly simple non-invasive idea has some unique and distinct
254 advantages over other proposals:
255
256 * Existing designs - even though the specification is not finalised
257 (but has "frozen" aspects) - would be completely unaffected: the
258 change is to the "wording" of the specification to "retrospectively"
259 fit reality.
260 * Unlike with the MISA idea this is *purely* at the "decode" phase:
261 no internal Extension state information is permitted to be disabled,
262 altered or destroyed as a direct result of writing to the
263 mvendor/march-id CSRs.
264 * Compliance Testing may be carried out with a different vendorid/marchid
265 tuple set prior to a test, allowing a vendor to claim *Certified*
266 compatibility with *both* one (or more) legacy variants of the RISC-V
267 Specification *and* with a present one.
268 * With sufficient care taken in the implementation an implementor
269 may have multiple interpretations of the same binary encoding within
270 an inner loop, with a single instruction (to the WARL register)
271 changing the meaning.
272
273 A couple of points were made:
274
275 * Compliance Testing may **fail** any system that has mvendorid/marchid
276 as WARL. This however is a clear case of "Compliance Tail Wagging Standard
277 Dog".
278 * The redirection of meaning of certain binary encodings to multiple
279 engines was considered extreme, eyebrow-raising, and also (importantly)
280 potentially expensive, introducing significant latency at the decode
281 phase.
282
283 On this latter point, it was observed that MISA already switches out entire
284 sets of instructions (interacts at the "decode" phase). The difference
285 between what MISA does and the mvendor/march-id WARL idea is that whilst
286 MISA only switches instruction decoding on (or off), the WARL idea
287 *redirects* encoding, to *different* engines, fortunately in a deliberately
288 mutually-exclusive fashion.
289
290 Implementations would therefore, in each Extension (assuming one separate
291 "decode" engine per Extension), simply have an extra (mutually-exclusively
292 enabled) wire in the AND gate for any given binary encoding, and in this
293 way there would actually be very little impact on the latency. The assumption
294 here is that there are not dozens of Extensions vying for the same binary
295 encoding (at which point the Fabless Semi Company has other much more
296 pressing issues to deal with that make resolving encoding conflicts trivial
297 by comparison).
298
299 Also pointed out was that in certain cases pipeline stalls could be introduced
300 during the switching phase, if needed, just as they may be needed for
301 correct implementation of (mandatory) support for MISA.
302
303 **This is the only one of the proposals that meet the full requirements**
304
305 # ioctl-like
306
307 (Summary: good solid orthogonal idea. See [[ioctl]] for full details)
308
309 This proposal basically mirrors the concept of POSIX ioctls, providing
310 (arbitrarily) 8 functions (opcodes) whose meaning may be over-ridden
311 in an object-orientated fashion by calling an "open handle" (and close)
312 function (instruction) that switches (redirects) the 8 functions over to
313 different opcodes.
314
315 The "open handle" opcode takes a GUID (globally-unique identifier)
316 and an ioctl number, and stores the UUID in a table indexed by the
317 ioctl number:
318
319 handle_global_state[8] # stores UUID or index of same
320
321 def open_handle(uuid, ioctl_num):
322 handle_global_state[ioctl_num] = uuid
323
324 def close_handle(ioctl_num):
325 handle_global_state[ioctl_num] = -1 # clear table entry
326
327 "Ioctls" (arbitrarily 8 separate R-type opcodes) then perform a redirect
328 based on what the global state for that numbered "ioctl" has been set to:
329
330 def ioctl_fn0(*rargs): # star means "take all arguments as a tuple"
331 if handle_global_state[0] == CUSTOMEXT1UUID:
332 CUSTOMEXT1_FN0(*rargs) # apply all arguments to function
333 elif handle_global_state[0] == CUSTOMEXT2UUID:
334 CUSTOMEXT2_FN0(*rargs) # apply all arguments to function
335 else:
336 raise Exception("undefined opcode")
337
338 The proposal is functionally near-identical to that of the mvendor/march-id
339 except extended down to individual opcodes. As such it could hypothetically
340 be proposed as an independent Standard Extension in its own right that extends
341 the Custom Opcode space *or* fits into the brownfield spaces within the
342 existing ISA opcode space *or* is used as the basis of an independent
343 Custom Extension in its own right.
344
345 One of the reasons for seeking an extension of the Custom opcode space is
346 that the Custom opcode space is severely limited: only 2 opcodes are free
347 within the 32-bit space, and only four total remain in the 48 and 64-bit
348 space.
349
350 Despite the proposal (which is still undergoing clarification)
351 being worthwhile in its own right, and standing on its own merits and
352 thus definitely worthwhile pursuing, it is non-trivial and much more
353 invasive than the mvendor/march-id WARL concept.
354
355 # Comments, Discussion and analysis
356
357 TBD: placeholder as of 26apr2018
358
359 # Summary and Conclusion
360
361 In the early sections (those in the category "no action") it was established
362 in each case that the problem is not solved. Avoidance of responsibility,
363 or conflation of "not our problem" with "no problem" does not make "problem"
364 go away. Even "making it the Fabless Semiconductor's design problem" resulted
365 in a chip being *more costly to engineer as hardware **and** more costly
366 from a software-support perspective to maintain*... without actually
367 fixing the problem.
368
369 The first idea considered which could fix the problem was to just use
370 the pre-existing MISA CSR, however this was determined not to have
371 the right coverage (Standard Extensions only), and also crucially it
372 destroyed state. Whilst unworkable it did lead to the first "workable"
373 solution, "MISA-like".
374
375 The "MISA-like" proposal, whilst meeting most of the requirements, led to
376 a better idea: "mvendor/march-id WARL", which, in combination with an offshoot
377 idea related to gcc and binutils, is the only proposal that fully meets the
378 requirements.
379
380 The "ioctl-like" idea *also* solves the problem, but, unlike the WARL idea
381 does not meet the full requirements to be "non-invasive" and "backwards
382 compatible" with pre-existing (pre-Standards-finalised) implementations.
383 It does however stand on its own merit as a way to extend the extremely
384 small Custom Extension opcode space, even if it itself implemented *as*
385 a Custom Extension into which *other* Custom Extensions are subsequently
386 shoe-horned. This approach has the advantage that it requires no "approval"
387 from the RISC-V Foundation... but without the RISC-V Standard "approval"
388 guaranteeing no binary-encoding conflicts, still does not actually solve the
389 problem (if deployed as a Custom Extension for extending Custom Extensions).
390
391 Overall the mvendor/march-id WARL idea meets the three requirements,
392 and is the only idea that meets the three requirements:
393
394 * **Any proposal must be a minimal change with minimal (or zero) impact**
395 (met through being purely a single change to the specification:
396 mvendor/march-id changes from read-only to WARL)
397 * **Any proposal should place no restriction on existing or future
398 ISA encoding space**
399 (met because it is just a change to one pre-existing CSR)
400 * **Any proposal should take into account that there are existing implementors
401 of the (yet to be finalised but still "partly frozen") Standard who may
402 resist, for financial investment reasons, efforts to make any change
403 (at all) that could cost them immediate short-term profits.**
404 (met because existing implementations, with the exception of those
405 that have Custom Extensions, come under the "vendor/arch-id read only
406 is a declaration of having no Custom Extensions" fall-back category)
407
408 So to summarise:
409
410 * The consequences of not tackling this are severe: the RISC-V Foundation
411 cannot take a back seat. If it does, clear historical precedent shows
412 100% what the outcome will be (1).
413 * The retro-fitting cost onto existing implementations (even though the
414 specification has not been finalised) is negligeable
415 (changes to words in the specification)
416 * The benefits are clear (pain-free transition path for vendors to safely
417 upgrade over time; no fights over Custom opcode space; no hassle for
418 software toolchain; no hassle for GNU/Linux Distros)
419 * The implementation details are clear (and problem-free except for
420 vendors who insist on deploying dozens of conflicting Custom Extensions:
421 an extreme unlikely outlier).
422 * Compliance Testing is straightforward and allows vendors to seek and
423 obtain *multiple* Compliance Certificates with past, present and future
424 variants of the RISC-V Standard (in the exact same processor), in order
425 support legacy customers and provide same customers with a way to avoid
426 "impossible-to-make" decisions that throw out ultra-expensive multi-decade
427 proprietary legacy software at the same as the hardware.
428
429 # Conversation Exerpts
430
431 The following conversation exerpts are taken from the ISA-dev discussion
432
433 ## (1) Albert Calahan on SPE / Altiven conflict in POWERPC
434
435 > Yes. Well, it should be blocked via legal means. Incompatibility is
436 > a disaster for an architecture.
437 >
438 > The viability of PowerPC was badly damaged when SPE was
439 > introduced. This was a vector instruction set that was incompatible
440 > with the AltiVec instruction set. Software vendors had to choose,
441 > and typically the choice was "neither". Nobody wants to put in the
442 > effort when there is uncertainty and a market fragmented into
443 > small bits.
444 >
445 > Note how Intel did not screw up. When SSE was added, MMX remained.
446 > Software vendors could trust that instructions would be supported.
447 > Both MMX and SSE remain today, in all shipping processors. With very
448 > few exceptions, Intel does not ship chips with missing functionality.
449 > There is a unified software ecosystem.
450 >
451 > This goes beyond the instruction set. MMU functionality also matters.
452 > You can add stuff, but then it must be implemented in every future CPU.
453 > You can not take stuff away without harming the architecture.
454
455 ## (2) Luke Kenneth Casson Leighton on Standards backwards-compatibility
456
457 > For the case where "legacy" variants of the RISC-V Standard are
458 > backwards-forwards-compatibly supported over a 10-20 year period in
459 > Industrial and Military/Goverment-procurement scenarios (so that the
460 > impossible-to-achieve pressure is off to get the spec ABSOLUTELY
461 > correct, RIGHT now), nobody would expect a seriously heavy-duty amount
462 > of instruction-by-instruction switching: it'd be used pretty much once
463 > and only once at boot-up (or once in a Hypervisor Virtual Machine
464 > client) and that's it.
465
466 ## (3) Allen Baum on Standards Compliance
467
468 > Putting my compliance chair hat on: One point that was made quite
469 > clear to me is that compliance will only test that an implementation
470 > correctly implements the portions of the spec that are mandatory, and
471 > the portions of the spec that are optional and the implementor claims
472 > it is implementing. It will test nothing in the custom extension space,
473 > and doesn't monitor or care what is in that space.
474
475 # References
476
477 * <https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/7bbwSIW5aqM>
478 * <https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/InzQ1wr_3Ak%5B1-25%5D>
479