Explain the use of xext0 and xext1
[libreriscv.git] / overloadable_opcodes.mdwn
1 # Overloadable opcodes.
2
3 The xext proposal defines a small number N (e.g. N= 8) standardised R-type instructions xcmd0, xcmd1, ...xcmd[N-1] (preferably in the brownfield opcode space).
4 Each xcmd takes (in rs1) a 12 bit "logical unit" (lun) identifying a (sub)device on the cpu that implements some "extension interface" (xintf) together with some additional data. Extension devices may be implemented in any convenient form, e.g. non standard extensions of the CPU iteself, IP tiles, or closely coupled external devices. An xintf is a set of up to N commands with 2 input and 1 output port (i.e. like an R-type instruction), together with a description of the semantics of the commands. Calling e.g. xcmd3 routes its two inputs and one output ports to command 3 on the device determined by the lun bits in rs1. Thus, the N standard xcmd instructions are standard-designated overloadable opcodes, with the non standard semantics of the opcode determined by the lun.
5
6 Portable software, does not use luns directly. Instead, it goes through a level of indirection using a further instruction xext. The xext instruction translates a 20 bit globally unique identifier UUID of an xintf, to the lun of a device on the cpu that implements that xintf. The cpu can do this, because it knows (at manufacturing or boot time) which devices it has, and which xintfs they provide. This includes devices that would be described as non standard extension of the cpu if the designers had used custom opcodes instead of xintf as an interface. If the UUID of the xintf is not recognised at the current privilege level, the xext instruction returns the special lun = 0, causing any xcmd to trap. Minor variations of this scheme (requiring two more instructions xext0 and xextm1) cause xcmd instructions to fallback to always return 0 or -1 instead of trapping.
7
8 Remark1: the main difference with a previous "ioctl like proposal" is that UUID translation is stateless and does not use resources. The xext instruction _neither_ initialises a device _nor_ builds global state identified by a cookie. If a device needs initialisation it can do this using xcmds as init and deinit instructions. Likewise, it can hand out cookies (which can include the lun) as a return value .
9
10 Remark2: Implementing devices can respond to an (essentially) arbitrary number of xintfs. Hence, while an xintf is restricted to N commands, an implementing device can have an arbitrary number of commands. Organising related commands in xintfs, helps avoid UUID space pollution, and allows to amortise the (small) cost of UUID to lun translation if related commands are used in combination.
11
12 Tl;DR see below for a C description of how this is supposed to work.
13
14 == Description of the instructions ==
15
16 xcmd0 rd, rs1, rs2
17 xcmd1 rd, rs1, rs2
18 ....
19 xcmdN rd, rs1, rs2
20
21 * rs1 contains a 12 bit "logical unit" (lun) together with xlen - 12 bits of additional data.
22 * rs2 is arbitrary
23
24 For e.g xmd3, route the inputs rs1, rs2 and output port rd to command 3 of the (sub)device on the cpu identified by the lun bits of rs1.
25
26 after execution:
27 * rd contains the value that of the output port of the implementing device
28
29 --------
30 xext rd, rs1, rs2
31 xext0 rd, rs1, rs2
32 xextm1 rd, rs1, rs2
33
34
35 * rs1 contains
36 --a UUID of at least 20 bit in bit 12 .. XLEN of rs1 identifying an xintf.
37 --the sequence number of a device at the current privilege level on the cpu implementing the xintf in bit 0..11 .
38 In particular, if bit 0..11 is zero, the default implemententation is requested.
39 * rs2 is arbitrary (but bit XLEN-12 to XLEN -1 is discarded)
40
41 after execution,
42 rd contains the lun of a device implementing the xintf or the luns 0 (for xext), 1 (for xext0) or 2 (for xextm1).
43
44 ---
45 The net effect is that, when the CPU implements an xintf with UUID 0xABCDE a sequence like
46
47 //fake UUID of an xintf
48 lui rd 0xABCDE
49 xext rd rd rs1
50 xcmd0 rd rd rs2
51
52 acts like a single namespaced instruction cmd0_ABCDE rd rs1 rs2 (with the annoying caveat that the last 12 of rs1 are discarded) The sequence not indivisible but the crucial semantics that you might want to be indivisible is in xcmd0.
53
54 Delegation and UUID is expected to come at a small performance price compared to a "native" instruction. This should, however, be an acceptable tradeoff in many cases. Moreover implementations may opcode-fuse the whole instruction sequence or the first or last two instructions.
55 If several instructions of the same interface are used, one can also use instruction sequences like
56
57 lui t1 0xABCDE //org_tinker_tinker__RocknRoll_uuid
58 xext t1 t1 zero
59 xcmd0 a5, t1, a0 // org_tinker_tinker__RocknRoll__rock(a5, t1, a0)
60 xcmd1 t2, t1, a1 // org_tinker_tinker__RocknRoll__roll(t2, t1, a5)
61 xcmd0 a0, t1, t2 // org_tinker_tinker__RocknRoll__rock(a0, t1, t2)
62
63 If 0xABCDE is an unknown UUID at the current privilege level, the sequence results in a trap just like cmd0_ABCDE rd rs1 rs2 would. The sequence
64
65 //fake UUID of an xintf
66 lui rd 0xABCDE
67 xext0 rd rd rs1
68 xcmd0 rd rd rs2
69
70 acts exactly like the sequence with xext, except that 0 is returned by xcmd0 if the UUID is unknown at the current privilege level. Likewise usage of xextm1 results in -1 being returned. This requires lun = 0 , 1 and 2 to be routed to three mandatory fallback
71 interfaces defined below.
72
73 On the software level, the xintf is just a set of glorified assembler macros
74
75 org.tinker.tinker:RocknRoll{
76 uuid : 0xABCDE
77 rock rd rs1 rs2 : xcmd0 rd rs1 rs2
78 roll rd rs1 rs2 : xcmd1 rd rs1 rs2
79 }
80
81 so that the above sequence can be more clearly written as
82
83 import(org.tinker.tinker:RocknRoll)
84
85 lui rd org.tinker.tinker:RocknRoll:uuid
86 xext rd rd rs1
87 org.tinker.tinker:RocknRoll:rock rd rd rs2
88
89
90 ------
91 The following standard xintfs shall be implemented by the CPU.
92
93 For lun == 0:
94
95 At privilege level user mode, supervisor mode and hypervisor mode
96
97 org.RiscV:Fallback:Trap{
98 uuid: 0
99 trap0 rd rs1 rs2: xcmd0 rd rs1 rs2
100 ...
101 trap[N-1] rd rs1 rs2: xcmd[N-1] rd rs1 rs2
102 }
103
104 each of the xcmd instructions shall trap to one level higher.
105
106 At privilege level machine mode each trap command has unspecified behaviour, but in debug mode
107 should cause an exception to a debug environment.
108
109 For lun == 1, at all privilege levels
110
111 org.RiscV:Fallback:ReturnZero{
112 uuid: 1
113 return_zero0 rd rs1 rs2: xcmd0 rd rs1 rs2
114 ...
115 return_zero[N-1] rd rs1 rs2: xcmd[N-1] rd rs1 rs2
116 }
117
118 each return_zero command shall return 0 in rd.
119
120 For lun == 2, at all privilege levels
121
122 org.RiscV:Fallback:ReturnMinusOne{
123 uuid: 2
124 return_minusone0 rd rs1 rs2: xcmd0 rd rs1 rs2
125 ...
126 return_minusone[N-1] rd rs1 rs2: xcmd[N-1] rd rs1 rs2
127 }
128
129 each return_minusone shall return -1.
130
131 ---
132
133 Remark:
134 Quite possibly even glorified standard assembler macros are overkill and it is easier to just use defines or ordinary macro's with long names. E.g. writing
135
136 #define org_tinker_tinker__RocknRoll__uuid 0xABCDE
137 #define org_tinker_tinker__RocknRoll__rock(rd, rs1, rs2) xcmd0 rd, rs1, rs2
138 #define org_tinker_tinker__RocknRoll__roll(rd, rs1, rs2) xcmd1 rd, rs1, rs2
139
140 allows the same sequence to be written as
141
142 lui rd org_tinker_tinker__RocknRoll__uuid
143 xext rd rs1
144 org_tinker_tinker__RocknRoll__rock(rd, rd, rs2)
145
146 Readability of assembler is no big deal for a compiler, but people are supposed to _document_ the semantics of the interface. In particular specifying the semantics of the xintf in same way as the semantics of the cpu should allow formal verification.
147
148 ==Implications for the RiscV ecosystem ==
149
150
151 The proposal allows independent groups to define one or more extension
152 interfaces of (slightly crippled) R-type instructions implemented by an
153 extension device. Such an extension device would be an native but non standard
154 extension of the CPU, an IP tile or a closely coupled external chip and would
155 be configured at manufacturing time or bootup of the CPU.
156
157 The 20 bit provided by the UUID of an xintf is much more room than provided by the 2 custom 32 bit, or even 4 custom 64/48 bit opcode spaces. Thus the overloadable opcodes proposal avoids most of the need to put a claim on opcode space and the associated collisions when combining independent extensions. In this respect it is similar to POSIX ioctls, which (almost) obviate the need for defining new syscalls to control new or nonstandard hardware.
158
159 The expanded flexibility comes at the cost: the standard can specify the
160 semantics of the delegation mechanism and the interfacing with the rest
161 of the cpu, but the actual semantics of the overloaded instructions can
162 only be defined by the designer of the interface. Likewise, a device
163 can be conforming as far as delegation and interaction with the CPU
164 is concerned, but whether the hardware is conforming to the semantics
165 of the interface is outside the scope of spec. Being able to specify
166 that semantics using the methods used for RV itself is clearly very
167 valuable. One impetus for doing that is using it for purposes of its own,
168 effectively freeing opcode space for other purposes. Also, some interfaces
169 may become de facto or de jure standards themselves, necessitating
170 hardware to implement competing interfaces. I.e., facilitating a free
171 for all, may lead to standards proliferation. C'est la vie.
172
173 The only "ISA-collisions" that can still occur are in the 20 bit (~10^6)
174 interface identifier space, with 12 more bits to identify a device on
175 a hart that implements the interface. One suggestion is setting aside
176 2^19 id's that are handed out for a small fee by a central (automated)
177 registration (making sure the space is not just claimed), while the
178 remaining 2^19 are used as a good hash on a long, plausibly globally
179 unique human readable interface name. This gives implementors the choice
180 between a guaranteed private identifier paying a fee, or relying on low
181 probabilities. On RV64 the UUID can also be extended to 52 bits (> 10^15).
182
183
184 ==== Description of the extension as C functions.==
185
186 /* register format of rs1 for xext instructions */
187 typedef struct uuid_device{
188 long dev:12;
189 long uuid: 8*sizeof(long) - 12;
190 } uuid_device_t
191
192 /* register format for rd of xext and rs1 of xcmd instructions, packs lun and data */
193 typedef struct lun_data{
194 long lun:12;
195 long data: 8*sizeof(long) - 12;
196 } lun_data_t
197
198 /* proposed R-type instructions
199 xext rd rs1 rs2
200 xcmd0 rd rs1 rs2
201 xcmd1 rd rs1 rs2
202 ...
203 xcmd7 rd rs1 rs2
204 */
205
206 lun_data_t xext(uuid_dev_t rs1, long rs2);
207 long xcmd0(lun_data_t rs1, long rs2);
208 long xcmd1(lun_data_t rs1, long rs2);
209 ...
210 long xcmd<N>(lun_data_t rs1, long rs2);
211
212 /* hardware interface presented by an implementing device. */
213 typedef
214 long device_fn(unsigned short subdevice_xcmd, lun_data_t rs1, long rs2);
215
216 /* cpu internal datatypes */
217
218 enum privilege = {user = 0b0001, super = 0b0010, hyper = 0b0100, mach = 0b1000};
219
220 /* cpu internal, does what is on the label */
221 static
222 enum privilege cpu__current_privilege_level()
223
224 typedef
225 struct lun{
226 unsigned short id:12
227 } lun_t;
228
229 struct uuid_device_priv2lun{
230 struct{
231 uuid_dev_t uuid_dev;
232 enum privilege reqpriv;
233 };
234 lun_t lun;
235 };
236
237 struct device_subdevice{
238 device_fn* device_addr;
239 unsigned short subdeviceId:12;
240 };
241
242 struct lun_priv2device_subdevice{
243 struct{
244 lun_t lun;
245 enum privilege reqpriv
246 }
247 struct device_subdevice devAddr_subdevId;
248 }
249
250 static
251 struct uuid_device_priv2lun cpu__lun_map[];
252
253 /*
254 map (UUID, device, privilege) to a 12 bit lun,
255 return (lun_t){0} on unknown (at acces level)
256
257 does associative memory lookup and tests privilege.
258 */
259 static
260 lun_t cpu__lookup_lun(const struct uuid_device_priv2lun* lun_map, uuid_dev_t uuid_dev, enum privilege priv);
261
262
263
264 lun_data_t xext(uuid_dev_t rs1, long rs2)
265 {
266 lun_t lun = cpu__lookup_lun(lun_map, rs1, current_privilege_level());
267
268 return (lun_data_t){.lun = lun.id, .data = rs2 % (1<< (8*sizeof(long) - 12))}
269 }
270
271
272 struct lun_priv2device_subdevice cpu__device_subdevice_map[];
273
274 /* map (lun, priv) to struct device_subdevice pair.
275 For lun = 0, or unknown (lun, priv) pair, returns (struct device_subdevice){NULL,0}
276 */
277 static
278 device_subdevice_t cpu__lookup_device_subdevice(const struct lun_priv2device_subdevice_map* dev_subdev_map,
279 lun_t lun, enum privileges priv);
280
281
282
283 /* functional description of the delegating xcmd0 .. xcmd7 instructions */
284 template<k = 0..N-1> //pretend this is C
285 long xcmd<k>(lun_data_t rs1, long rs2)
286 {
287 struct device_subdevice dev_subdev = cpu__lookup_device_subdevice(device_subdevice_map, rs1.lun, current_privilege());
288 return dev_subdev.devAddr(dev_subdev.subdevId | k >> 12 , rs1, rs2);
289 }
290
291 /*Fallback interfaces*/
292 #define org_RiscV__Fallback__Trap__uuid 0
293 #define org_RiscV__Fallback__ReturnZero__uuid 1
294 #define org_RiscV__Fallback__ReturnMinusOne__uuid 2
295
296 /* fallback device */
297 static
298 long cpu__falback(short subdevice_xcmd, lun_data_t rs1, long rs2)
299 {
300 switch(subdevice_xcmd % (1 << 12) ){
301 case 0 /* org.RiscV:Trap */: trap_to(cpu__next_higher_privilege_level());
302 case 1 /* org.RiscV:ReturnZero */: return 0;
303 case 2 /* org.RiscV:ReturnMinus1 */: return -1
304 case 3 /* org.RiscV:Trap Machinelevel */: printf("something is rotten in machinemode: unknown xintf device"); return 31415926;
305 default: trap("hardware configuration error");
306 }
307
308 Example:
309
310
311
312 #define com_bigbucks__Frobate__uuid 0xABCDE
313 #define org_tinker_tinker__RocknRoll__uuid 0x12345
314 #define org_tinker_tinker__Jazz__uuid 0xD0B0D
315 /*
316 com.bigbucks:Frobate{
317 uuid: com_bigbucks__Frobate__uuid
318 frobate rd rs1 rs2 : cmd0 rd rs1 rs2
319 foo rd rs1 rs2 : cmd1 rd rs1 rs2
320 bar rd rs1 rs2 : cmd1 rd rs1 rs2
321 }
322 */
323 org.tinker.tinker:RocknRoll{
324 uuid: org_tinker_tinker__RocknRoll__uuid
325 rock rd rs1 rs2: cmd0 rd rs1 rs2
326 roll rd rs1 rs2: cmd1 rd rs1 rs2
327 }
328
329 long com_bigbucks__device1(short subdevice_xcmd, lun_data_t rs1, long rs2)
330 {
331 switch(subdevice_xcmd) {
332 case 0 | 0 << 12 /* com.bigbucks:Frobate:frobate */ : return device1_frobate(rs1, rs2);
333 case 42| 0 << 12 /* com.bigbucks:FrobateMach:frobate : return device1_frobate_machine_level(rs1, rs2);
334 case 0 | 1 << 12 /* com.bigbucks:Frobate:foo */ : return device1_foo(rs1, rs2);
335 case 0 | 2 << 12 /* com.bigbucks:Frobate:bar */ : return device1_bar(rs1, rs2);
336 case 1 | 0 << 12 /* org.tinker.tinker:RocknRoll:rock */ : return device1_rock(rs1, rs2);
337 case 1 | 1 << 12 /* org.tinker.tinker:RocknRoll:roll */ : return device1_roll(rs1, rs2);
338 default: trap(“hardware configuration error”);
339 }
340 }
341
342 /*
343 org.tinker.tinker:Jazz{
344 uuid: org_tinker_tinker__Jazz__uuid
345 boogy rd rs1 rs2: cmd0 rd rs1 rs2
346 }
347 */
348
349 long org_tinker_tinker__device2(short subdevice_xcmd, lun_data_t rs1, long rs2)
350 {
351 switch(dev_cmd.interfId){
352 case 0 | 0 << 12 /* com.bigbucks:Frobate:frobate */: return device2_frobate(rs1, rs2);
353 case 0 | 1 << 12 /* com.bigbucks:Frobate:foo */ : return device2_foo(rs1, rs2);
354 case 0 | 2 << 12 /* com.bigbucks:Frobate:bar */ : return device2_foo(rs1, rs2);
355 case 1 | 0 << 12 /* org_tinker_tinker:Jazz:boogy */: return device2_boogy(rs1, rs2);
356 default: trap(“hardware configuration error”);
357 }
358 }
359
360
361 /* struct uuid_dev2lun_map[] */
362 lun_map = {
363 {{.uuid_devId = {org_RiscV__Fallback__Trap__uuid , 0}, .priv = user}, .lun = 0},
364 {{.uuid_devId = {org_RiscV__Fallback__Trap__uuid , 0}, .priv = super}, .lun = 0},
365 {{.uuid_devId = {org_RiscV__Fallback__Trap__uuid , 0}, .priv = hyper}, .lun = 0},
366 {{.uuid_devId = {org_RiscV__Fallback__Trap__uuid , 0}, .priv = mach}, .lun = 0},
367 {{.uuid_devId = {org_RiscV__Fallback__ReturnZero__uuid , 0}, .priv = user}, .lun = 1},
368 {{.uuid_devId = {org_RiscV__Fallback__ReturnZero__uuid , 0}, .priv = super}, .lun = 1},
369 {{.uuid_devId = {org_RiscV__Fallback__ReturnZero__uuid , 0}, .priv = hyper}, .lun = 1},
370 {{.uuid_devId = {org_RiscV__Fallback__ReturnZero__uuid , 0}, .priv = mach} .lun = 1},
371 {{.uuid_devId = {org_RiscV__Fallback__ReturnMinusOne__uuid, 0}, .priv = user}, .lun = 2},
372 {{.uuid_devId = {org_RiscV__Fallback__ReturnMinusOne__uuid, 0}, .priv = super}, .lun = 2},
373 {{.uuid_devId = {org_RiscV__Fallback__ReturnMinusOne__uuid, 0}, .priv = hyper}, .lun = 2},
374 {{.uuid_devId = {org_RiscV__Fallback__ReturnMinusOne__uuid, 0}, .priv = mach}, .lun = 2},
375 {{.uuid_devId = {com_bigbucks__Frobate__uuid, 0}, .priv = user} .lun = 32}, //32 sic!
376 {{.uuid_devId = {com_bigbucks__Frobate__uuid, 1}, .priv = super} .lun = 32},
377 {{.uuid_devId = {com_bigbucks__Frobate__uuid, 1}, .priv = hyper} .lun = 32},
378 {{.uuid_devId = {com_bigbucks__Frobate__uuid, 1}, .priv = mach} .lun = 32},
379 {{.uuid_devId = {com_bigbucks__Frobate__uuid, 0}, .priv = super} .lun = 34}, //34 sic!
380 {{.uuid_devId = {com_bigbucks__Frobate__uuid, 0}, .priv = hyper} .lun = 34},
381 {{.uuid_devId = {com_bigbucks__Frobate__uuid, 0}, .priv = mach} .lun = 34},
382 {{.uuid_devId = {org_tinker_tinker__RocknRoll__uuid, 0}, .priv = user} .lun = 33}, //33 sic!
383 {{.uuid_devId = {org_tinker_tinker__RocknRoll__uuid, 0}, .priv = super} .lun = 33},
384 {{.uuid_devId = {org_tinker_tinker__RocknRoll__uuid, 0}, .priv = hyper} .lun = 33},
385 {{.uuid_devId = {org_tinker_tinker__RocknRoll__uuid, 0}, .priv = super}, .lun = 35},
386 {{.uuid_devId = {org_tinker_tinker__RocknRoll__uuid, 0}, .priv = hyper}, .lun = 35},
387 }
388
389 /* struct lun2dev_subdevice_map[] */
390 dev_subdevice_map = {
391 {{.lun = 0, .priv = user}, .devAddr_interfId = {fallback, 0 /* Trap */}},
392 {{.lun = 0, .priv = super}, .devAddr_interfId = {fallback, 0 /* Trap */}},
393 {{.lun = 0, .priv = hyper}, .devAddr_interfId = {fallback, 0 /* Trap */}},
394 {{.lun = 0, .priv = mach}, .devAddr_interfId = {fallback, 3 /* Trap */}},
395 {{.lun = 1, .priv = user}, .devAddr_interfId = {fallback, 1 /* ReturnZero */}},
396 {{.lun = 1, .priv = super}, .devAddr_interfId = {fallback, 1 /* ReturnZero */}},
397 {{.lun = 1, .priv = hyper}, .devAddr_interfId = {fallback, 1 /* ReturnZero */}},
398 {{.lun = 1, .priv = mach}, .devAddr_interfId = {fallback, 1 /* ReturnZero */}},
399 {{.lun = 2, .priv = user}, .devAddr_interfId = {fallback, 2 /* ReturnMinusOne*/}},
400 {{.lun = 2, .priv = super}, .devAddr_interfId = {fallback, 2 /* ReturnMinusOne*/}},
401 {{.lun = 2, .priv = hyper}, .devAddr_interfId = {fallback, 2 /* ReturnMinusOne*/}},
402 {{.lun = 2, .priv = mach}, .devAddr_interfId = {fallback, 2 /* ReturnMinusOne*/}},
403 // .lun = 3 .. 7 reserved for other fallback RV interfaces
404 // .lun = 8 .. 30 reserved as error numbers, c.li t1 31; bltu rd t1 L_fail tests errors
405 // .lun = 31 reserved out of caution
406 {{.lun = 32, .priv = user}, .devAddr_interfId = {device1, 0 /* Frobate interface */}},
407 {{.lun = 32, .priv = super}, .devAddr_interfId = {device1, 0 /* Frobate interface */}},
408 {{.lun = 32, .priv = hyper}, .devAddr_interfId = {device1, 0 /* Frobate interface */}},
409 {{.lun = 32, .priv = mach}, .devAddr_interfId = {device1,64 /* Frobate machine level interface */}},
410 {{.lun = 33, .priv = user}, .devAddr_InterfId = {device1, 1 /* RocknRoll interface */}},
411 {{.lun = 33, .priv = super}, .devAddr_InterfId = {device1, 1 /* RocknRoll interface */}},
412 {{.lun = 33, .priv = hyper}, .devAddr_InterfId = {device1, 1 /* RocknRoll interface */}},
413 {{.lun = 34, .priv = super}, .devAddr_interfId = {device2, 0 /* Frobate interface */}},
414 {{.lun = 34, .priv = hyper}, .devAddr_interfId = {device2, 0 /* Frobate interface */}},
415 {{.lun = 34, .priv = mach}, .devAddr_interfId = {device2, 0 /* Frobate interface */}},
416 {{.lun = 35, .priv = super}, .devAddr_interfId = {device2, 1 /* Jazz interface */}},
417 {{.lun = 35, .priv = hyper}, .devAddr_interfId = {device2, 1 /* Jazz interface */}},
418 }