Oops no mention of pasting in rs2
[libreriscv.git] / overloadable_opcodes.mdwn
1 # Overloadable opcodes.
2
3 The xext proposal defines a small number N (e.g. N= 8) standardised R-type instructions
4 xcmd0, xcmd1, ...xcmd[N-1], preferably in the brownfield opcode space.
5 Each xcmd takes (in rs1) a 12 bit "logical unit" (lun) identifying a (sub)device on the cpu
6 that implements some "extension interface" (xintf) together with some additional data.
7 Extension devices may be implemented in any convenient form, e.g. non standard extensions
8 of the CPU iteself, IP tiles, or closely coupled external devices.
9
10 An xintf is a set of up to N commands with 2 input and 1 output port (i.e. like an
11 R-type instruction), together with a description of the semantics of the commands. Calling
12 e.g. xcmd3 routes its two inputs and one output ports to command 3 on the device determined
13 by the lun bits in rs1. Thus, the N standard xcmd instructions are standard-designated
14 overloadable opcodes, with the non standard semantics of the opcode determined by the lun.
15
16 Portable software, does not use luns directly. Instead, it goes through a level of
17 indirection using a further instruction xext. The xext instruction translates a 20 bit globally
18 unique identifier UUID of an xintf, to the lun of a device on the cpu that implements that xintf.
19 The cpu can do this, because it knows (at manufacturing or boot time) which devices it has, and
20 which xintfs they provide. This includes devices that would be described as non standard extension
21 of the cpu if the designers had used custom opcodes instead of xintf as an interface. If the
22 UUID of the xintf is not recognised at the current privilege level, the xext instruction returns
23 the special lun = 0, causing any xcmd to trap. Minor variations of this scheme (requiring two
24 more instructions xext0 and xextm1) cause xcmd instructions to fallback to always return 0
25 or -1 instead of trapping.
26
27 Remark1: the main difference with a previous "ioctl like proposal" is that UUID translation
28 is stateless and does not use resources. The xext instruction _neither_ initialises a
29 device _nor_ builds global state identified by a cookie. If a device needs initialisation
30 it can do this using xcmds as init and deinit instructions. Likewise, it can hand out
31 cookies (which can include the lun) as a return value .
32
33 Remark2: Implementing devices can respond to an (essentially) arbitrary number of xintfs.
34 Hence, while an xintf is restricted to N commands, an implementing device can have an
35 arbitrary number of commands. Organising related commands in xintfs, helps avoid UUID space
36 pollution, and allows to amortise the (small) cost of UUID to lun translation if related
37 commands are used in combination.
38
39 Tl;DR see below for a C description of how this is supposed to work.
40
41 == Description of the instructions ==
42
43 xcmd0 rd, rs1, rs2
44 xcmd1 rd, rs1, rs2
45 ....
46 xcmdN rd, rs1, rs2
47
48 * rs1 contains a 12 bit "logical unit" (lun) together with xlen - 12 bits of additional data.
49 * rs2 is arbitrary
50
51 For e.g xmd3, route the inputs rs1, rs2 and output port rd to command 3 of the (sub)device on the cpu identified by the lun bits of rs1.
52
53 after execution:
54 * rd contains the value that of the output port of the implementing device
55
56 --------
57 xext rd, rs1, rs2
58 xext0 rd, rs1, rs2
59 xextm1 rd, rs1, rs2
60
61
62 * rs1 contains
63 --a UUID of at least 20 bit in bit 12 .. XLEN of rs1 identifying an xintf.
64 --the sequence number of a device at the current privilege level on the cpu implementing the xintf in bit 0..11 .
65 In particular, if bit 0..11 is zero, the default implemententation is requested.
66 * rs2 is arbitrary (but bit XLEN-12 to XLEN -1 is discarded)
67
68 after execution,
69 if the cpu recognises the UUID and device at the current privilege level, rd contains the lun of a device
70 implementing the xintf in bit 0..11, followed by bit 0.. XLEN - 13 of rs2.
71 if the cpu does not recognise the UUID and device it returns the numbers 0 (for xext), 1 (for xext0) or 2 (for xextm1), in particular bit 12.. XLEN are 0.
72
73 ---
74 The net effect is that, when the CPU implements an xintf with UUID 0xABCDE a sequence like
75
76 //fake UUID of an xintf
77 lui rd 0xABCDE
78 xext rd rd rs1
79 xcmd0 rd rd rs2
80
81 acts like a single namespaced instruction cmd0_ABCDE rd rs1 rs2 (with the annoying caveat that the last 12 of rs1 are discarded) The sequence not indivisible but the crucial semantics that you might want to be indivisible is in xcmd0.
82
83 Delegation and UUID is expected to come at a small performance price compared to a "native" instruction. This should, however, be an acceptable tradeoff in many cases. Moreover implementations may opcode-fuse the whole instruction sequence or the first or last two instructions.
84 If several instructions of the same interface are used, one can also use instruction sequences like
85
86 lui t1 0xABCDE //org_tinker_tinker__RocknRoll_uuid
87 xext t1 t1 zero
88 xcmd0 a5, t1, a0 // org_tinker_tinker__RocknRoll__rock(a5, t1, a0)
89 xcmd1 t2, t1, a1 // org_tinker_tinker__RocknRoll__roll(t2, t1, a5)
90 xcmd0 a0, t1, t2 // org_tinker_tinker__RocknRoll__rock(a0, t1, t2)
91
92 If 0xABCDE is an unknown UUID at the current privilege level, the sequence results in a trap just like cmd0_ABCDE rd rs1 rs2 would. The sequence
93
94 //fake UUID of an xintf
95 lui rd 0xABCDE
96 xext0 rd rd rs1
97 xcmd0 rd rd rs2
98
99 acts exactly like the sequence with xext, except that 0 is returned by xcmd0 if the UUID is unknown at the current privilege level. Likewise usage of xextm1 results in -1 being returned. This requires lun = 0 , 1 and 2 to be routed to three mandatory fallback
100 interfaces defined below.
101
102 On the software level, the xintf is just a set of glorified assembler macros
103
104 org.tinker.tinker:RocknRoll{
105 uuid : 0xABCDE
106 rock rd rs1 rs2 : xcmd0 rd rs1 rs2
107 roll rd rs1 rs2 : xcmd1 rd rs1 rs2
108 }
109
110 so that the above sequence can be more clearly written as
111
112 import(org.tinker.tinker:RocknRoll)
113
114 lui rd org.tinker.tinker:RocknRoll:uuid
115 xext rd rd rs1
116 org.tinker.tinker:RocknRoll:rock rd rd rs2
117
118
119 ------
120 The following standard xintfs shall be implemented by the CPU.
121
122 For lun == 0:
123
124 At privilege level user mode, supervisor mode and hypervisor mode
125
126 org.RiscV:Fallback:Trap{
127 uuid: 0
128 trap0 rd rs1 rs2: xcmd0 rd rs1 rs2
129 ...
130 trap[N-1] rd rs1 rs2: xcmd[N-1] rd rs1 rs2
131 }
132
133 each of the xcmd instructions shall trap to one level higher.
134
135 At privilege level machine mode each trap command has unspecified behaviour, but in debug mode
136 should cause an exception to a debug environment.
137
138 For lun == 1, at all privilege levels
139
140 org.RiscV:Fallback:ReturnZero{
141 uuid: 1
142 return_zero0 rd rs1 rs2: xcmd0 rd rs1 rs2
143 ...
144 return_zero[N-1] rd rs1 rs2: xcmd[N-1] rd rs1 rs2
145 }
146
147 each return_zero command shall return 0 in rd.
148
149 For lun == 2, at all privilege levels
150
151 org.RiscV:Fallback:ReturnMinusOne{
152 uuid: 2
153 return_minusone0 rd rs1 rs2: xcmd0 rd rs1 rs2
154 ...
155 return_minusone[N-1] rd rs1 rs2: xcmd[N-1] rd rs1 rs2
156 }
157
158 each return_minusone shall return -1.
159
160 ---
161
162 Remark:
163 Quite possibly even glorified standard assembler macros are overkill and it is easier to just use defines or ordinary macro's with long names. E.g. writing
164
165 #define org_tinker_tinker__RocknRoll__uuid 0xABCDE
166 #define org_tinker_tinker__RocknRoll__rock(rd, rs1, rs2) xcmd0 rd, rs1, rs2
167 #define org_tinker_tinker__RocknRoll__roll(rd, rs1, rs2) xcmd1 rd, rs1, rs2
168
169 allows the same sequence to be written as
170
171 lui rd org_tinker_tinker__RocknRoll__uuid
172 xext rd rs1
173 org_tinker_tinker__RocknRoll__rock(rd, rd, rs2)
174
175 Readability of assembler is no big deal for a compiler, but people are supposed to _document_ the semantics of the interface. In particular specifying the semantics of the xintf in same way as the semantics of the cpu should allow formal verification.
176
177 ==Implications for the RiscV ecosystem ==
178
179
180 The proposal allows independent groups to define one or more extension
181 interfaces of (slightly crippled) R-type instructions implemented by an
182 extension device. Such an extension device would be an native but non standard
183 extension of the CPU, an IP tile or a closely coupled external chip and would
184 be configured at manufacturing time or bootup of the CPU.
185
186 The 20 bit provided by the UUID of an xintf is much more room than provided by the 2 custom 32 bit, or even 4 custom 64/48 bit opcode spaces. Thus the overloadable opcodes proposal avoids most of the need to put a claim on opcode space and the associated collisions when combining independent extensions. In this respect it is similar to POSIX ioctls, which (almost) obviate the need for defining new syscalls to control new or nonstandard hardware.
187
188 The expanded flexibility comes at the cost: the standard can specify the
189 semantics of the delegation mechanism and the interfacing with the rest
190 of the cpu, but the actual semantics of the overloaded instructions can
191 only be defined by the designer of the interface. Likewise, a device
192 can be conforming as far as delegation and interaction with the CPU
193 is concerned, but whether the hardware is conforming to the semantics
194 of the interface is outside the scope of spec. Being able to specify
195 that semantics using the methods used for RV itself is clearly very
196 valuable. One impetus for doing that is using it for purposes of its own,
197 effectively freeing opcode space for other purposes. Also, some interfaces
198 may become de facto or de jure standards themselves, necessitating
199 hardware to implement competing interfaces. I.e., facilitating a free
200 for all, may lead to standards proliferation. C'est la vie.
201
202 The only "ISA-collisions" that can still occur are in the 20 bit (~10^6)
203 interface identifier space, with 12 more bits to identify a device on
204 a hart that implements the interface. One suggestion is setting aside
205 2^19 id's that are handed out for a small fee by a central (automated)
206 registration (making sure the space is not just claimed), while the
207 remaining 2^19 are used as a good hash on a long, plausibly globally
208 unique human readable interface name. This gives implementors the choice
209 between a guaranteed private identifier paying a fee, or relying on low
210 probabilities. On RV64 the UUID can also be extended to 52 bits (> 10^15).
211
212
213 ==== Description of the extension as C functions.==
214
215 /* register format of rs1 for xext instructions */
216 typedef struct uuid_device{
217 long dev:12;
218 long uuid: 8*sizeof(long) - 12;
219 } uuid_device_t
220
221 /* register format for rd of xext and rs1 of xcmd instructions, packs lun and data */
222 typedef struct lun_data{
223 long lun:12;
224 long data: 8*sizeof(long) - 12;
225 } lun_data_t
226
227 /* proposed R-type instructions
228 xext rd rs1 rs2
229 xcmd0 rd rs1 rs2
230 xcmd1 rd rs1 rs2
231 ...
232 xcmd7 rd rs1 rs2
233 */
234
235 lun_data_t xext(uuid_dev_t rs1, long rs2);
236 long xcmd0(lun_data_t rs1, long rs2);
237 long xcmd1(lun_data_t rs1, long rs2);
238 ...
239 long xcmd<N>(lun_data_t rs1, long rs2);
240
241 /* hardware interface presented by an implementing device. */
242 typedef
243 long device_fn(unsigned short subdevice_xcmd, lun_data_t rs1, long rs2);
244
245 /* cpu internal datatypes */
246
247 enum privilege = {user = 0b0001, super = 0b0010, hyper = 0b0100, mach = 0b1000};
248
249 /* cpu internal, does what is on the label */
250 static
251 enum privilege cpu__current_privilege_level()
252
253 typedef
254 struct lun{
255 unsigned short id:12
256 } lun_t;
257
258 struct uuid_device_priv2lun{
259 struct{
260 uuid_dev_t uuid_dev;
261 enum privilege reqpriv;
262 };
263 lun_t lun;
264 };
265
266 struct device_subdevice{
267 device_fn* device_addr;
268 unsigned short subdeviceId:12;
269 };
270
271 struct lun_priv2device_subdevice{
272 struct{
273 lun_t lun;
274 enum privilege reqpriv
275 }
276 struct device_subdevice devAddr_subdevId;
277 }
278
279 static
280 struct uuid_device_priv2lun cpu__lun_map[];
281
282 /*
283 map (UUID, device, privilege) to a 12 bit lun,
284 return (lun_t){0} on unknown (at acces level)
285
286 does associative memory lookup and tests privilege.
287 */
288 static
289 lun_t cpu__lookup_lun(const struct uuid_device_priv2lun* lun_map, uuid_dev_t uuid_dev, enum privilege priv);
290
291
292
293 lun_data_t xext(uuid_dev_t rs1, long rs2)
294 {
295 lun_t lun = cpu__lookup_lun(lun_map, rs1, current_privilege_level());
296
297 return (lun_data_t){.lun = lun.id, .data = rs2 % (1<< (8*sizeof(long) - 12))}
298 }
299
300
301 struct lun_priv2device_subdevice cpu__device_subdevice_map[];
302
303 /* map (lun, priv) to struct device_subdevice pair.
304 For lun = 0, or unknown (lun, priv) pair, returns (struct device_subdevice){NULL,0}
305 */
306 static
307 device_subdevice_t cpu__lookup_device_subdevice(const struct lun_priv2device_subdevice_map* dev_subdev_map,
308 lun_t lun, enum privileges priv);
309
310
311
312 /* functional description of the delegating xcmd0 .. xcmd7 instructions */
313 template<k = 0..N-1> //pretend this is C
314 long xcmd<k>(lun_data_t rs1, long rs2)
315 {
316 struct device_subdevice dev_subdev = cpu__lookup_device_subdevice(device_subdevice_map, rs1.lun, current_privilege());
317 return dev_subdev.devAddr(dev_subdev.subdevId | k >> 12 , rs1, rs2);
318 }
319
320 /*Fallback interfaces*/
321 #define org_RiscV__Fallback__Trap__uuid 0
322 #define org_RiscV__Fallback__ReturnZero__uuid 1
323 #define org_RiscV__Fallback__ReturnMinusOne__uuid 2
324
325 /* fallback device */
326 static
327 long cpu__falback(short subdevice_xcmd, lun_data_t rs1, long rs2)
328 {
329 switch(subdevice_xcmd % (1 << 12) ){
330 case 0 /* org.RiscV:Trap */: trap_to(cpu__next_higher_privilege_level());
331 case 1 /* org.RiscV:ReturnZero */: return 0;
332 case 2 /* org.RiscV:ReturnMinus1 */: return -1
333 case 3 /* org.RiscV:Trap Machinelevel */: printf("something is rotten in machinemode: unknown xintf device"); return 31415926;
334 default: trap("hardware configuration error");
335 }
336
337 Example:
338
339
340
341 #define com_bigbucks__Frobate__uuid 0xABCDE
342 #define org_tinker_tinker__RocknRoll__uuid 0x12345
343 #define org_tinker_tinker__Jazz__uuid 0xD0B0D
344 /*
345 com.bigbucks:Frobate{
346 uuid: com_bigbucks__Frobate__uuid
347 frobate rd rs1 rs2 : cmd0 rd rs1 rs2
348 foo rd rs1 rs2 : cmd1 rd rs1 rs2
349 bar rd rs1 rs2 : cmd1 rd rs1 rs2
350 }
351 */
352 org.tinker.tinker:RocknRoll{
353 uuid: org_tinker_tinker__RocknRoll__uuid
354 rock rd rs1 rs2: cmd0 rd rs1 rs2
355 roll rd rs1 rs2: cmd1 rd rs1 rs2
356 }
357
358 long com_bigbucks__device1(short subdevice_xcmd, lun_data_t rs1, long rs2)
359 {
360 switch(subdevice_xcmd) {
361 case 0 | 0 << 12 /* com.bigbucks:Frobate:frobate */ : return device1_frobate(rs1, rs2);
362 case 42| 0 << 12 /* com.bigbucks:FrobateMach:frobate : return device1_frobate_machine_level(rs1, rs2);
363 case 0 | 1 << 12 /* com.bigbucks:Frobate:foo */ : return device1_foo(rs1, rs2);
364 case 0 | 2 << 12 /* com.bigbucks:Frobate:bar */ : return device1_bar(rs1, rs2);
365 case 1 | 0 << 12 /* org.tinker.tinker:RocknRoll:rock */ : return device1_rock(rs1, rs2);
366 case 1 | 1 << 12 /* org.tinker.tinker:RocknRoll:roll */ : return device1_roll(rs1, rs2);
367 default: trap(“hardware configuration error”);
368 }
369 }
370
371 /*
372 org.tinker.tinker:Jazz{
373 uuid: org_tinker_tinker__Jazz__uuid
374 boogy rd rs1 rs2: cmd0 rd rs1 rs2
375 }
376 */
377
378 long org_tinker_tinker__device2(short subdevice_xcmd, lun_data_t rs1, long rs2)
379 {
380 switch(dev_cmd.interfId){
381 case 0 | 0 << 12 /* com.bigbucks:Frobate:frobate */: return device2_frobate(rs1, rs2);
382 case 0 | 1 << 12 /* com.bigbucks:Frobate:foo */ : return device2_foo(rs1, rs2);
383 case 0 | 2 << 12 /* com.bigbucks:Frobate:bar */ : return device2_foo(rs1, rs2);
384 case 1 | 0 << 12 /* org_tinker_tinker:Jazz:boogy */: return device2_boogy(rs1, rs2);
385 default: trap(“hardware configuration error”);
386 }
387 }
388
389
390 /* struct uuid_dev2lun_map[] */
391 lun_map = {
392 {{.uuid_devId = {org_RiscV__Fallback__Trap__uuid , 0}, .priv = user}, .lun = 0},
393 {{.uuid_devId = {org_RiscV__Fallback__Trap__uuid , 0}, .priv = super}, .lun = 0},
394 {{.uuid_devId = {org_RiscV__Fallback__Trap__uuid , 0}, .priv = hyper}, .lun = 0},
395 {{.uuid_devId = {org_RiscV__Fallback__Trap__uuid , 0}, .priv = mach}, .lun = 0},
396 {{.uuid_devId = {org_RiscV__Fallback__ReturnZero__uuid , 0}, .priv = user}, .lun = 1},
397 {{.uuid_devId = {org_RiscV__Fallback__ReturnZero__uuid , 0}, .priv = super}, .lun = 1},
398 {{.uuid_devId = {org_RiscV__Fallback__ReturnZero__uuid , 0}, .priv = hyper}, .lun = 1},
399 {{.uuid_devId = {org_RiscV__Fallback__ReturnZero__uuid , 0}, .priv = mach} .lun = 1},
400 {{.uuid_devId = {org_RiscV__Fallback__ReturnMinusOne__uuid, 0}, .priv = user}, .lun = 2},
401 {{.uuid_devId = {org_RiscV__Fallback__ReturnMinusOne__uuid, 0}, .priv = super}, .lun = 2},
402 {{.uuid_devId = {org_RiscV__Fallback__ReturnMinusOne__uuid, 0}, .priv = hyper}, .lun = 2},
403 {{.uuid_devId = {org_RiscV__Fallback__ReturnMinusOne__uuid, 0}, .priv = mach}, .lun = 2},
404 {{.uuid_devId = {com_bigbucks__Frobate__uuid, 0}, .priv = user} .lun = 32}, //32 sic!
405 {{.uuid_devId = {com_bigbucks__Frobate__uuid, 1}, .priv = super} .lun = 32},
406 {{.uuid_devId = {com_bigbucks__Frobate__uuid, 1}, .priv = hyper} .lun = 32},
407 {{.uuid_devId = {com_bigbucks__Frobate__uuid, 1}, .priv = mach} .lun = 32},
408 {{.uuid_devId = {com_bigbucks__Frobate__uuid, 0}, .priv = super} .lun = 34}, //34 sic!
409 {{.uuid_devId = {com_bigbucks__Frobate__uuid, 0}, .priv = hyper} .lun = 34},
410 {{.uuid_devId = {com_bigbucks__Frobate__uuid, 0}, .priv = mach} .lun = 34},
411 {{.uuid_devId = {org_tinker_tinker__RocknRoll__uuid, 0}, .priv = user} .lun = 33}, //33 sic!
412 {{.uuid_devId = {org_tinker_tinker__RocknRoll__uuid, 0}, .priv = super} .lun = 33},
413 {{.uuid_devId = {org_tinker_tinker__RocknRoll__uuid, 0}, .priv = hyper} .lun = 33},
414 {{.uuid_devId = {org_tinker_tinker__RocknRoll__uuid, 0}, .priv = super}, .lun = 35},
415 {{.uuid_devId = {org_tinker_tinker__RocknRoll__uuid, 0}, .priv = hyper}, .lun = 35},
416 }
417
418 /* struct lun2dev_subdevice_map[] */
419 dev_subdevice_map = {
420 {{.lun = 0, .priv = user}, .devAddr_interfId = {fallback, 0 /* Trap */}},
421 {{.lun = 0, .priv = super}, .devAddr_interfId = {fallback, 0 /* Trap */}},
422 {{.lun = 0, .priv = hyper}, .devAddr_interfId = {fallback, 0 /* Trap */}},
423 {{.lun = 0, .priv = mach}, .devAddr_interfId = {fallback, 3 /* Trap */}},
424 {{.lun = 1, .priv = user}, .devAddr_interfId = {fallback, 1 /* ReturnZero */}},
425 {{.lun = 1, .priv = super}, .devAddr_interfId = {fallback, 1 /* ReturnZero */}},
426 {{.lun = 1, .priv = hyper}, .devAddr_interfId = {fallback, 1 /* ReturnZero */}},
427 {{.lun = 1, .priv = mach}, .devAddr_interfId = {fallback, 1 /* ReturnZero */}},
428 {{.lun = 2, .priv = user}, .devAddr_interfId = {fallback, 2 /* ReturnMinusOne*/}},
429 {{.lun = 2, .priv = super}, .devAddr_interfId = {fallback, 2 /* ReturnMinusOne*/}},
430 {{.lun = 2, .priv = hyper}, .devAddr_interfId = {fallback, 2 /* ReturnMinusOne*/}},
431 {{.lun = 2, .priv = mach}, .devAddr_interfId = {fallback, 2 /* ReturnMinusOne*/}},
432 // .lun = 3 .. 7 reserved for other fallback RV interfaces
433 // .lun = 8 .. 30 reserved as error numbers, c.li t1 31; bltu rd t1 L_fail tests errors
434 // .lun = 31 reserved out of caution
435 {{.lun = 32, .priv = user}, .devAddr_interfId = {device1, 0 /* Frobate interface */}},
436 {{.lun = 32, .priv = super}, .devAddr_interfId = {device1, 0 /* Frobate interface */}},
437 {{.lun = 32, .priv = hyper}, .devAddr_interfId = {device1, 0 /* Frobate interface */}},
438 {{.lun = 32, .priv = mach}, .devAddr_interfId = {device1,64 /* Frobate machine level interface */}},
439 {{.lun = 33, .priv = user}, .devAddr_InterfId = {device1, 1 /* RocknRoll interface */}},
440 {{.lun = 33, .priv = super}, .devAddr_InterfId = {device1, 1 /* RocknRoll interface */}},
441 {{.lun = 33, .priv = hyper}, .devAddr_InterfId = {device1, 1 /* RocknRoll interface */}},
442 {{.lun = 34, .priv = super}, .devAddr_interfId = {device2, 0 /* Frobate interface */}},
443 {{.lun = 34, .priv = hyper}, .devAddr_interfId = {device2, 0 /* Frobate interface */}},
444 {{.lun = 34, .priv = mach}, .devAddr_interfId = {device2, 0 /* Frobate interface */}},
445 {{.lun = 35, .priv = super}, .devAddr_interfId = {device2, 1 /* Jazz interface */}},
446 {{.lun = 35, .priv = hyper}, .devAddr_interfId = {device2, 1 /* Jazz interface */}},
447 }