tighten up desciption.
[libreriscv.git] / overloadable_opcodes.mdwn
1 # Overloadable opcodes.
2
3 This proposal adds a standardised extension instructions to the RV
4 instruction set by introducing a fixed small number N (e.g. N = 8) of
5 R-type opcodes xcmd0 rd, rs1, rs2, .. , xcmd[N-1] rd, rs1, rs2, that are intended to be used as "overloadable" R-type instructions for independently developed extensions. Extensions may be implemented in the form of non standard CPU extensions, IP tiles, or closely coupled external devices.
6
7 Tl;DR see below for a C description of how this is supposed to work.
8
9 -----
10 xcmd0 rd, rs1, rs2
11 xcmd1 rd, rs1, rs2
12 ....
13 xcmdN rd, rs1, rs2
14
15 * rs1 contains a 12 bit "logical unit" (lun) together with xlen - 12 bits of additional data.
16 * rs2 is arbitrary
17
18 For xmd3, route the inputs rs1, rs2 and output port rd to command 3 of the (sub)device on the cpu identified by the lun bits of rs1.
19
20 after execution
21 * rd contains the value that of the output port of the implementing device
22
23 --------
24 xext rd, rs1, rs2
25 xext0 rd, rs1, rs2
26 xextm1 rd, rs1, rs2
27
28
29 * rs1 contains
30 --a UUID of at least 20 bit in bit 12 .. XLEN of rs1 identifying an xintf.
31 --the sequence number of a device at the current privilege level on the cpu implementing the xintf in bit 0..11 .
32 In particular, if bit 0..11 is zero, the default implemententation is meant.
33 * rs2 is arbitrary (but bit XLEN-12 to XLEN -1 are unused)
34
35 after execution,
36 rd contains the lun of a device implementing the xintf or the luns 0 (xext), 1 (xext0) or 2 (xextm1)
37
38 ---
39 The net effect is that, when the CPU implements an xintf with UUID 0xABCDE a sequence like
40
41 //fake UUID of an xintf
42 lui rd 0xABCDE
43 xext rd rd rs1
44 xcmd0 rd rd rs2
45
46 acts like a single namespaced instruction cmd0_ABCDE rd rs1 rs2 with the annoying caveat that rs1 can only use bits 0..XLEN-12 (the sequence is also not indivisible but the crucial semantics that you might want to be indivisible is in xcmd0). Delegation is expected to come at a small
47 additional performance price compared to a "native" instruction. This should, however, be an acceptable tradeoff in many cases. Moreover implementations may opcode fuse the whole instruction sequence (or the first or last two instructions).
48
49 If several instructions of the same interface are used, one can also use instruction sequences like
50
51 lui t1 0xABCDE //org_tinker_tinker__RocknRoll_uuid
52 xext t1 t1 zero
53 xcmd0 a5, t1, a0 // org_tinker_tinker__RocknRoll__rock(a5, t1, a0)
54 xcmd1 t2, t1, a1 // org_tinker_tinker__RocknRoll__roll(t2, t1, a5)
55 xcmd0 a0, t1, t2 // org_tinker_tinker__RocknRoll__rock(a0, t1, t2)
56
57 This amortises the cost of the xext instruction.
58
59 When the xintf UUID is not recognised, the xcmd in the above sequence traps. Using xext0 instead of xext ensures that the xcmd0 returns 0.
60 Likewise, using xextm1 ensures that the xcmd returns -1. This requires lun = 0 , 1 and 2 to be routed to three mandatory fallback
61 interfaces defined below.
62
63 On the software level, the xintf is just a set of glorified assembler macros
64
65 org.tinker.tinker:RocknRoll{
66 uuid : 0xABCDE
67 rock rd rs1 rs2 : xcmd0 rd rs1 rs2
68 roll rd rs1 rs2 : xcmd1 rd rs1 rs2
69 }
70
71 so that the above sequence can be more clearly written as
72
73 import(org.tinker.tinker:RocknRoll)
74
75 lui rd org.tinker.tinker:RocknRoll:uuid
76 xext rd rd rs1
77 org.tinker.tinker:RocknRoll:rock rd rd rs2
78
79
80 ------
81 The following standard xintfs shall be implemented by the CPU.
82
83 For lun == 0:
84
85 At privilege level user mode, supervisor mode and hypervisor mode
86
87 org.RiscV:Fallback:Trap{
88 uuid: 0
89 trap0 rd rs1 rs2: xcmd0 rd rs1 rs2
90 ...
91 trap[N-1] rd rs1 rs2: xcmd[N-1] rd rs1 rs2
92 }
93
94 each of the xcmd instructions shall trap to one level higher.
95
96 At privilege level machine mode each trap command has unspecified behaviour, but in debug mode
97 should cause an exception to a debug environment.
98
99 For lun == 1, at all privilege levels
100
101 org.RiscV:Fallback:ReturnZero{
102 uuid: 1
103 return_zero0 rd rs1 rs2: xcmd0 rd rs1 rs2
104 ...
105 return_zero[N-1] rd rs1 rs2: xcmd[N-1] rd rs1 rs2
106 }
107
108 each return_zero command shall return 0 in rd.
109
110 For lun == 2, at all privilege levels
111
112 org.RiscV:Fallback:ReturnMinusOne{
113 uuid: 2
114 return_minusone0 rd rs1 rs2: xcmd0 rd rs1 rs2
115 ...
116 return_minusone[N-1] rd rs1 rs2: xcmd[N-1] rd rs1 rs2
117 }
118
119 each return_minusone shall return -1.
120
121 ---
122
123 Remark:
124 Quite possibly even glorified standard assembler macros are overkill and it is easier to just use defines or ordinary macro's with long names. E.g. writing
125
126 #define org_tinker_tinker__RocknRoll__uuid 0xABCDE
127 #define org_tinker_tinker__RocknRoll__rock(rd, rs1, rs2) xcmd0 rd, rs1, rs2
128 #define org_tinker_tinker__RocknRoll__roll(rd, rs1, rs2) xcmd1 rd, rs1, rs2
129
130 allows the same sequence to be written as
131
132 lui rd org_tinker_tinker__RocknRoll__uuid
133 xext rd rs1
134 org_tinker_tinker__RocknRoll__rock(rd, rd, rs2)
135
136 Readability of assembler is no big deal for a compiler, but people are supposed to _document_ the interface and its semantics. In particular specifying the semantics of the xintf in same way as the semantics of the cpu should allow formal verification.
137
138 ==Implications for the RiscV ecosystem ==
139
140
141 The proposal allows independent groups to define one or more extension
142 interfaces of (slightly crippled) R-type instructions implemented by an
143 extension device. Such an extension device would be an native but non standard
144 extension of the CPU, an IP tile or a closely coupled external chip and would
145 be configured at manufacturing time or bootup of the CPU.
146
147 Having a standardised overloadable interface simply avoids much of the
148 need for isa extensions for hardware with non standard interfaces and
149 semantics. This is analogous to the way that the standardised overloadable
150 ioctl interface of the kernel almost completely avoids the need for
151 extending the kernel with syscalls for the myriad of hardware devices
152 with their specific interfaces and semantics.
153
154 The expanded flexibility comes at the cost: the standard can specify the
155 semantics of the delegation mechanism and the interfacing with the rest
156 of the cpu, but the actual semantics of the overloaded instructions can
157 only be defined by the designer of the interface. Likewise, a device
158 can be conforming as far as delegation and interaction with the CPU
159 is concerned, but whether the hardware is conforming to the semantics
160 of the interface is outside the scope of spec. Being able to specify
161 that semantics using the methods used for RV itself is clearly very
162 valuable. One impetus for doing that is using it for purposes of its own,
163 effectively freeing opcode space for other purposes. Also, some interfaces
164 may become de facto or de jure standards themselves, necessitating
165 hardware to implement competing interfaces. I.e., facilitating a free
166 for all, may lead to standards proliferation. C'est la vie.
167
168 The only "ISA-collisions" that can still occur are in the 20 bit (~10^6)
169 interface identifier space, with 12 more bits to identify a device on
170 a hart that implements the interface. One suggestion is setting aside
171 2^19 id's that are handed out for a small fee by a central (automated)
172 registration (making sure the space is not just claimed), while the
173 remaining 2^19 are used as a good hash on a long, plausibly globally
174 unique human readable interface name. This gives implementors the choice
175 between a guaranteed private identifier paying a fee, or relying on low
176 probabilities. On RV64 the UUID can also be extended to 52 bits (> 10^15).
177
178
179 ==== Description of the extension as C functions.==
180
181 /* register format of rs1 for xext instructions */
182 typedef struct uuid_device{
183 long dev:12;
184 long uuid: 8*sizeof(long) - 12;
185 } uuid_device_t
186
187 /* register format for rd of xext and rs1 of xcmd instructions, packs lun and data */
188 typedef struct lun_data{
189 long lun:12;
190 long data: 8*sizeof(long) - 12;
191 } lun_data_t
192
193 /* proposed R-type instructions
194 xext rd rs1 rs2
195 xcmd0 rd rs1 rs2
196 xcmd1 rd rs1 rs2
197 ...
198 xcmd7 rd rs1 rs2
199 */
200
201 lun_data_t xext(uuid_dev_t rs1, long rs2);
202 long xcmd0(lun_data_t rs1, long rs2);
203 long xcmd1(lun_data_t rs1, long rs2);
204 ...
205 long xcmd<N>(lun_data_t rs1, long rs2);
206
207 /* hardware interface presented by an implementing device. */
208 typedef
209 long device_fn(unsigned short subdevice_xcmd, lun_data_t rs1, long rs2);
210
211 /* cpu internal datatypes */
212
213 enum privilege = {user = 0b0001, super = 0b0010, hyper = 0b0100, mach = 0b1000};
214
215 /* cpu internal, does what is on the label */
216 static
217 enum privilege cpu__current_privilege_level()
218
219 typedef
220 struct lun{
221 unsigned short id:12
222 } lun_t;
223
224 struct uuid_device_priv2lun{
225 struct{
226 uuid_dev_t uuid_dev;
227 enum privilege reqpriv;
228 };
229 lun_t lun;
230 };
231
232 struct device_subdevice{
233 device_fn* device_addr;
234 unsigned short subdeviceId:12;
235 };
236
237 struct lun_priv2device_subdevice{
238 struct{
239 lun_t lun;
240 enum privilege reqpriv
241 }
242 struct device_subdevice devAddr_subdevId;
243 }
244
245 static
246 struct uuid_device_priv2lun cpu__lun_map[];
247
248 /*
249 map (UUID, device, privilege) to a 12 bit lun,
250 return (lun_t){0} on unknown (at acces level)
251
252 does associative memory lookup and tests privilege.
253 */
254 static
255 lun_t cpu__lookup_lun(const struct uuid_device_priv2lun* lun_map, uuid_dev_t uuid_dev, enum privilege priv);
256
257
258
259 lun_data_t xext(uuid_dev_t rs1, long rs2)
260 {
261 lun_t lun = cpu__lookup_lun(lun_map, rs1, current_privilege_level());
262
263 return (lun_data_t){.lun = lun.id, .data = rs2 % (1<< (8*sizeof(long) - 12))}
264 }
265
266
267 struct lun_priv2device_subdevice cpu__device_subdevice_map[];
268
269 /* map (lun, priv) to struct device_subdevice pair.
270 For lun = 0, or unknown (lun, priv) pair, returns (struct device_subdevice){NULL,0}
271 */
272 static
273 device_subdevice_t cpu__lookup_device_subdevice(const struct lun_priv2device_subdevice_map* dev_subdev_map,
274 lun_t lun, enum privileges priv);
275
276
277
278 /* functional description of the delegating xcmd0 .. xcmd7 instructions */
279 template<k = 0..N-1> //pretend this is C
280 long xcmd<k>(lun_data_t rs1, long rs2)
281 {
282 struct device_subdevice dev_subdev = cpu__lookup_device_subdevice(device_subdevice_map, rs1.lun, current_privilege());
283 return dev_subdev.devAddr(dev_subdev.subdevId | k >> 12 , rs1, rs2);
284 }
285
286 /*Fallback interfaces*/
287 #define org_RiscV__Fallback__Trap__uuid 0
288 #define org_RiscV__Fallback__ReturnZero__uuid 1
289 #define org_RiscV__Fallback__ReturnMinusOne__uuid 2
290
291 /* fallback device */
292 static
293 long cpu__falback(short subdevice_xcmd, lun_data_t rs1, long rs2)
294 {
295 switch(subdevice_xcmd % (1 << 12) ){
296 case 0 /* org.RiscV:Trap */: trap_to(cpu__next_higher_privilege_level());
297 case 1 /* org.RiscV:ReturnZero */: return 0;
298 case 2 /* org.RiscV:ReturnMinus1 */: return -1
299 case 3 /* org.RiscV:Trap Machinelevel */: printf("something is rotten in machinemode: unknown xintf device"); return 31415926;
300 default: trap("hardware configuration error");
301 }
302
303 Example:
304
305
306
307 #define com_bigbucks__Frobate__uuid 0xABCDE
308 #define org_tinker_tinker__RocknRoll__uuid 0x12345
309 #define org_tinker_tinker__Jazz__uuid 0xD0B0D
310 /*
311 com.bigbucks:Frobate{
312 uuid: com_bigbucks__Frobate__uuid
313 frobate rd rs1 rs2 : cmd0 rd rs1 rs2
314 foo rd rs1 rs2 : cmd1 rd rs1 rs2
315 bar rd rs1 rs2 : cmd1 rd rs1 rs2
316 }
317 */
318 org.tinker.tinker:RocknRoll{
319 uuid: org_tinker_tinker__RocknRoll__uuid
320 rock rd rs1 rs2: cmd0 rd rs1 rs2
321 roll rd rs1 rs2: cmd1 rd rs1 rs2
322 }
323
324 long com_bigbucks__device1(short subdevice_xcmd, lun_data_t rs1, long rs2)
325 {
326 switch(subdevice_xcmd) {
327 case 0 | 0 << 12 /* com.bigbucks:Frobate:frobate */ : return device1_frobate(rs1, rs2);
328 case 42| 0 << 12 /* com.bigbucks:FrobateMach:frobate : return device1_frobate_machine_level(rs1, rs2);
329 case 0 | 1 << 12 /* com.bigbucks:Frobate:foo */ : return device1_foo(rs1, rs2);
330 case 0 | 2 << 12 /* com.bigbucks:Frobate:bar */ : return device1_bar(rs1, rs2);
331 case 1 | 0 << 12 /* org.tinker.tinker:RocknRoll:rock */ : return device1_rock(rs1, rs2);
332 case 1 | 1 << 12 /* org.tinker.tinker:RocknRoll:roll */ : return device1_roll(rs1, rs2);
333 default: trap(“hardware configuration error”);
334 }
335 }
336
337 /*
338 org.tinker.tinker:Jazz{
339 uuid: org_tinker_tinker__Jazz__uuid
340 boogy rd rs1 rs2: cmd0 rd rs1 rs2
341 }
342 */
343
344 long org_tinker_tinker__device2(short subdevice_xcmd, lun_data_t rs1, long rs2)
345 {
346 switch(dev_cmd.interfId){
347 case 0 | 0 << 12 /* com.bigbucks:Frobate:frobate */: return device2_frobate(rs1, rs2);
348 case 0 | 1 << 12 /* com.bigbucks:Frobate:foo */ : return device2_foo(rs1, rs2);
349 case 0 | 2 << 12 /* com.bigbucks:Frobate:bar */ : return device2_foo(rs1, rs2);
350 case 1 | 0 << 12 /* org_tinker_tinker:Jazz:boogy */: return device2_boogy(rs1, rs2);
351 default: trap(“hardware configuration error”);
352 }
353 }
354
355
356 /* struct uuid_dev2lun_map[] */
357 lun_map = {
358 {{.uuid_devId = {org_RiscV__Fallback__Trap__uuid , 0}, .priv = user}, .lun = 0},
359 {{.uuid_devId = {org_RiscV__Fallback__Trap__uuid , 0}, .priv = super}, .lun = 0},
360 {{.uuid_devId = {org_RiscV__Fallback__Trap__uuid , 0}, .priv = hyper}, .lun = 0},
361 {{.uuid_devId = {org_RiscV__Fallback__Trap__uuid , 0}, .priv = mach}, .lun = 0},
362 {{.uuid_devId = {org_RiscV__Fallback__ReturnZero__uuid , 0}, .priv = user}, .lun = 1},
363 {{.uuid_devId = {org_RiscV__Fallback__ReturnZero__uuid , 0}, .priv = super}, .lun = 1},
364 {{.uuid_devId = {org_RiscV__Fallback__ReturnZero__uuid , 0}, .priv = hyper}, .lun = 1},
365 {{.uuid_devId = {org_RiscV__Fallback__ReturnZero__uuid , 0}, .priv = mach} .lun = 1},
366 {{.uuid_devId = {org_RiscV__Fallback__ReturnMinusOne__uuid, 0}, .priv = user}, .lun = 2},
367 {{.uuid_devId = {org_RiscV__Fallback__ReturnMinusOne__uuid, 0}, .priv = super}, .lun = 2},
368 {{.uuid_devId = {org_RiscV__Fallback__ReturnMinusOne__uuid, 0}, .priv = hyper}, .lun = 2},
369 {{.uuid_devId = {org_RiscV__Fallback__ReturnMinusOne__uuid, 0}, .priv = mach}, .lun = 2},
370 {{.uuid_devId = {com_bigbucks__Frobate__uuid, 0}, .priv = user} .lun = 32}, //32 sic!
371 {{.uuid_devId = {com_bigbucks__Frobate__uuid, 1}, .priv = super} .lun = 32},
372 {{.uuid_devId = {com_bigbucks__Frobate__uuid, 1}, .priv = hyper} .lun = 32},
373 {{.uuid_devId = {com_bigbucks__Frobate__uuid, 1}, .priv = mach} .lun = 32},
374 {{.uuid_devId = {com_bigbucks__Frobate__uuid, 0}, .priv = super} .lun = 34}, //34 sic!
375 {{.uuid_devId = {com_bigbucks__Frobate__uuid, 0}, .priv = hyper} .lun = 34},
376 {{.uuid_devId = {com_bigbucks__Frobate__uuid, 0}, .priv = mach} .lun = 34},
377 {{.uuid_devId = {org_tinker_tinker__RocknRoll__uuid, 0}, .priv = user} .lun = 33}, //33 sic!
378 {{.uuid_devId = {org_tinker_tinker__RocknRoll__uuid, 0}, .priv = super} .lun = 33},
379 {{.uuid_devId = {org_tinker_tinker__RocknRoll__uuid, 0}, .priv = hyper} .lun = 33},
380 {{.uuid_devId = {org_tinker_tinker__RocknRoll__uuid, 0}, .priv = super}, .lun = 35},
381 {{.uuid_devId = {org_tinker_tinker__RocknRoll__uuid, 0}, .priv = hyper}, .lun = 35},
382 }
383
384 /* struct lun2dev_subdevice_map[] */
385 dev_subdevice_map = {
386 {{.lun = 0, .priv = user}, .devAddr_interfId = {fallback, 0 /* Trap */}},
387 {{.lun = 0, .priv = super}, .devAddr_interfId = {fallback, 0 /* Trap */}},
388 {{.lun = 0, .priv = hyper}, .devAddr_interfId = {fallback, 0 /* Trap */}},
389 {{.lun = 0, .priv = mach}, .devAddr_interfId = {fallback, 3 /* Trap */}},
390 {{.lun = 1, .priv = user}, .devAddr_interfId = {fallback, 1 /* ReturnZero */}},
391 {{.lun = 1, .priv = super}, .devAddr_interfId = {fallback, 1 /* ReturnZero */}},
392 {{.lun = 1, .priv = hyper}, .devAddr_interfId = {fallback, 1 /* ReturnZero */}},
393 {{.lun = 1, .priv = mach}, .devAddr_interfId = {fallback, 1 /* ReturnZero */}},
394 {{.lun = 2, .priv = user}, .devAddr_interfId = {fallback, 2 /* ReturnMinusOne*/}},
395 {{.lun = 2, .priv = super}, .devAddr_interfId = {fallback, 2 /* ReturnMinusOne*/}},
396 {{.lun = 2, .priv = hyper}, .devAddr_interfId = {fallback, 2 /* ReturnMinusOne*/}},
397 {{.lun = 2, .priv = mach}, .devAddr_interfId = {fallback, 2 /* ReturnMinusOne*/}},
398 // .lun = 3 .. 7 reserved for other fallback RV interfaces
399 // .lun = 8 .. 30 reserved as error numbers, c.li t1 31; bltu rd t1 L_fail tests errors
400 // .lun = 31 reserved out of caution
401 {{.lun = 32, .priv = user}, .devAddr_interfId = {device1, 0 /* Frobate interface */}},
402 {{.lun = 32, .priv = super}, .devAddr_interfId = {device1, 0 /* Frobate interface */}},
403 {{.lun = 32, .priv = hyper}, .devAddr_interfId = {device1, 0 /* Frobate interface */}},
404 {{.lun = 32, .priv = mach}, .devAddr_interfId = {device1,64 /* Frobate machine level interface */}},
405 {{.lun = 33, .priv = user}, .devAddr_InterfId = {device1, 1 /* RocknRoll interface */}},
406 {{.lun = 33, .priv = super}, .devAddr_InterfId = {device1, 1 /* RocknRoll interface */}},
407 {{.lun = 33, .priv = hyper}, .devAddr_InterfId = {device1, 1 /* RocknRoll interface */}},
408 {{.lun = 34, .priv = super}, .devAddr_interfId = {device2, 0 /* Frobate interface */}},
409 {{.lun = 34, .priv = hyper}, .devAddr_interfId = {device2, 0 /* Frobate interface */}},
410 {{.lun = 34, .priv = mach}, .devAddr_interfId = {device2, 0 /* Frobate interface */}},
411 {{.lun = 35, .priv = super}, .devAddr_interfId = {device2, 1 /* Jazz interface */}},
412 {{.lun = 35, .priv = hyper}, .devAddr_interfId = {device2, 1 /* Jazz interface */}},
413 }