fosdem2024_formal: add slides and diagrams
[libreriscv.git] / docs / pinmux.mdwn
1 # Pinmux, IO Pads, and JTAG Boundary scan
2
3 Links:
4
5 * <http://www2.eng.cam.ac.uk/~dmh/4b7/resource/section14.htm>
6 * <https://www10.edacafe.com/book/ASIC/CH02/CH02.7.php>
7 * <https://ftp.libre-soc.org/Pin_Control_Subsystem_Overview.pdf>
8 * <https://bugs.libre-soc.org/show_bug.cgi?id=50>
9 * <https://bugs.libre-soc.org/show_bug.cgi?id=750>
10 * <https://bugs.libre-soc.org/show_bug.cgi?id=762>
11 * <https://git.libre-soc.org/?p=c4m-jtag.git;a=tree;hb=HEAD>
12 * Extra info: [[/docs/pinmux/temp_pinmux_info]]
13 * <https://git.libre-soc.org/?p=pinmux.git;a=blob;f=src/stage2.py> - Latest
14 manual demo of pinmux generation
15
16 Managing IO on an ASIC is nowhere near as simple as on an FPGA.
17 An FPGA has built-in IO Pads, the wires terminate inside an
18 existing silicon block which has been tested for you.
19 In an ASIC, you are going to have to do everything yourself.
20 In an ASIC, a bi-directional IO Pad requires three wires (in, out,
21 out-enable) to be routed right the way from the ASIC, all
22 the way to the IO PAD, where only then does a wire bond connect
23 it to a single external pin.
24
25 Below, therefore is a (simplified) diagram of what is
26 usually contained in an FPGA's bi-directional IO Pad,
27 and consequently this is what you must also provide, and explicitly
28 wire up in your ASIC's HDL.
29
30 [[!img asic_iopad_gen.svg]]
31
32 Designing an ASIC, there is no guarantee that the IO pad is
33 working when manufactured. Worse, the peripheral could be
34 faulty. How can you tell what the cause is? There are two
35 possible faults, but only one symptom ("it dunt wurk").
36 This problem is what JTAG Boundary Scan is designed to solve.
37 JTAG can be operated from an external digital clock,
38 at very low frequencies (5 khz is perfectly acceptable)
39 so there is very little risk of clock skew during that testing.
40
41 Additionally, an SoC is designed to be low cost, to use low cost
42 packaging. ASICs are typically only 32 to 128 pins QFP
43 in the Embedded
44 Controller range, and between 300 to 650 FBGA in the Tablet /
45 Smartphone range, absolute maximum of 19 mm on a side.
46 2 to 3 in square 1,000 pin packages common to Intel desktop processors are
47 absolutely out of the question.
48
49 (*With each pin wire bond smashing
50 into the ASIC using purely heat of impact to melt the wire,
51 cracks in the die can occur. The more times
52 the bonding equipment smashes into the die, the higher the
53 chances of irreversible damage, hence why larger pin packaged
54 ASICs are much more expensive: not because of their manufacturing
55 cost but because far more of them fail due to having been
56 literally hit with a hammer many more times*)
57
58 Yet, the expectation from the market is to be able to fit 1,000+
59 pins worth of peripherals into only 200 to 400 worth of actual
60 IO Pads. The solution here: a GPIO Pinmux, described in some
61 detail here <https://ftp.libre-soc.org/Pin_Control_Subsystem_Overview.pdf>
62
63 This page goes over the details and issues involved in creating
64 an ASIC that combines **both** JTAG Boundary Scan **and** GPIO
65 Muxing, down to layout considerations using coriolis2.
66
67 # Resources, Platforms and Pins
68
69 When creating nmigen HDL as Modules, they typically know nothing about FPGA
70 Boards or ASICs. They especially do not know anything about the
71 Peripheral ICs (UART, I2C, USB, SPI, PCIe) connected to a given FPGA
72 on a given PCB, and they should not have to.
73
74 Through the Resources, Platforms and Pins API, a level of abstraction
75 between peripherals, boards and HDL designs is provided. Peripherals
76 may be given `(name, number)` tuples, the HDL design may "request"
77 a peripheral, which is described in terms of Resources, managed
78 by a ResourceManager, and a Platform may provide that peripheral.
79 The Platform is given
80 the resposibility to wire up the Pins to the correct FPGA (or ASIC)
81 IO Pads, and it is the HDL design's responsibility to connect up
82 those same named Pins, on the other side, to the implementation
83 of the PHY/Controller, in the HDL.
84
85 Here is a function that defines a UART Resource:
86
87 #!/usr/bin/env python3
88 from nmigen.build.dsl import Resource, Subsignal, Pins
89
90 def UARTResource(*args, rx, tx):
91 io = []
92 io.append(Subsignal("rx", Pins(rx, dir="i", assert_width=1)))
93 io.append(Subsignal("tx", Pins(tx, dir="o", assert_width=1)))
94 return Resource.family(*args, default_name="uart", ios=io)
95
96 Note that the Subsignal is given a convenient name (tx, rx) and that
97 there are Pins associated with it.
98 UARTResource would typically be part of a larger function that defines,
99 for either an FPGA or an ASIC, a full array of IO Connections:
100
101 def create_resources(pinset):
102 resources = []
103 resources.append(UARTResource('uart', 0, tx='A20', rx='A21'))
104 # add clock and reset
105 clk = Resource("clk", 0, Pins("sys_clk", dir="i"))
106 rst = Resource("rst", 0, Pins("sys_rst", dir="i"))
107 resources.append(clk)
108 resources.append(rst)
109 return resources
110
111 For an FPGA, the Pins names are typically the Ball Grid Array
112 Pad or Pin name: A12, or N20. ASICs can do likewise: it is
113 for convenience when referring to schematics, to use the most
114 recogniseable well-known name.
115
116 Next, these Resources need to be handed to a ResourceManager or
117 a Platform (Platform derives from ResourceManager)
118
119 from nmigen.build.plat import TemplatedPlatform
120
121 class ASICPlatform(TemplatedPlatform):
122 def __init__(self, resources):
123 super().__init__()
124 self.add_resources(resources)
125
126 An HDL Module may now be created, which, if given
127 a platform instance during elaboration, may request
128 a UART (caveat below):
129
130 from nmigen import Elaboratable, Module, Signal
131
132 class Blinker(Elaboratable):
133 def elaborate(self, platform):
134 m = Module()
135 # get the UART resource, mess with the output tx
136 uart = platform.request('uart')
137 intermediary = Signal()
138 m.d.comb += uart.tx.eq(~intermediary) # invert, for fun
139 m.d.comb += intermediary.eq(uart.rx) # pass rx to tx
140
141 return m
142
143 The caveat here is that the Resources of the platform actually
144 have to have a UART in order for it to be requestable! Thus:
145
146 resources = create_resources() # contains resource named "uart"
147 asic = ASICPlatform(resources)
148 hdl = Blinker()
149 asic.build(hdl)
150
151 Finally the association between HDL, Resources, and ASIC Platform
152 is made:
153
154 * The Resources contain the abstract expression of the
155 type of peripheral, its port names, and the corresponding
156 names of the IO Pads associated with each port.
157 * The HDL which knows nothing about IO Pad names requests
158 a Resource by name
159 * The ASIC Platform, given the list of Resources, takes care
160 of connecting requests for Resources to actual IO Pads.
161
162 This is the simple version. When JTAG Boundary Scan needs
163 to be added, it gets a lot more complex.
164
165 # JTAG Boundary Scan
166
167 JTAG Scanning is a (paywalled) IEEE Standard: 1149.1 which with
168 a little searching can be found online. Its purpose is to allow
169 a well-defined method of testing ASIC IO pads that a Foundry or
170 ASIC test house may apply easily with off-the-shelf equipment.
171 Scan chaining can also connect multiple ASICs together so that
172 the same test can be run on a large batch of ASICs at the same
173 time.
174
175 IO Pads generally come in four primary different types:
176
177 * Input
178 * Output
179 * Output with Tristate (enable)
180 * Bi-directional Tristate Input/Output with direction enable
181
182 Interestingly these can all be synthesised from one
183 Bi-directional Tristate IO Pad. Other types such as Differential
184 Pair Transmit may also be constructed from an inverter and a pair
185 of IO Pads. Other more advanced features include pull-up
186 and pull-down resistors, Schmidt triggering for interrupts,
187 different drive strengths, and so on, but the basics are
188 that the Pad is either an input, or an output, or both.
189
190 The JTAG Boundary Scan therefore needs to know what type
191 each pad is (In/Out/Bi) and has to "insert" itself in between
192 *all* the Pad's wires, which may be just an input, or just an output,
193 and, if bi-directional, an "output enable" line.
194
195 The "insertion" (or, "Tap") into those wires requires a
196 pair of Muxes for each wire. Under normal operation
197 the Muxes bypass JTAG entirely: the IO Pad is connected,
198 through the two Muxes,
199 directly to the Core (a hardware term for a "peripheral",
200 in Software terminology).
201
202 When JTAG Scan is enabled, then for every pin that is
203 "tapped into", the Muxes flip such that:
204
205 * The IO Pad is connected directly to latches controlled
206 by the JTAG Shift Register
207 * The Core (peripheral) likewise but to *different bits*
208 from those that the Pad is connected to
209
210 In this way, not only can JTAG control or read the IO Pad,
211 but it can also read or control the Core (peripheral).
212 This is its entire purpose: interception to allow for the detection
213 and triaging of faults.
214
215 * Software may be uploaded and run which sets a bit on
216 one of the peripheral outputs (UART Tx for example).
217 If the UART TX IO Pad was faulty, no possibility existd
218 without Boundary Scan to determine if the peripheral
219 was at fault. With the UART TX pin function being
220 redirected to a JTAG Shift Register, the results of the
221 software setting UART Tx may be detected by checking
222 the appropriate Shift Register bit.
223 * Likewise, a voltage may be applied to the UART RX Pad,
224 and the corresponding SR bit checked to see if the
225 pad is working. If the UART Rx peripheral was faulty
226 this would not be possible.
227
228 [[!img jtag-block.svg ]]
229
230 ## C4M JTAG TAP
231
232 Staf Verhaegen's Chips4Makers JTAG TAP module includes everything
233 needed to create JTAG Boundary Scan Shift Registers,
234 as well as the IEEE 1149.1 Finite State Machine to access
235 them through TMS, TDO, TDI and TCK Signalling. However,
236 connecting up cores (a hardware term: the equivalent software
237 term is "peripherals") on one side and the pads on the other is
238 especially confusing, but deceptively simple. The actual addition
239 to the Scan Shift Register is this straightforward:
240
241 from c4m.nmigen.jtag.tap import IOType, TAP
242
243 class JTAG(TAP):
244 def __init__(self):
245 TAP.__init__(self, ir_width=4)
246 self.u_tx = self.add_io(iotype=IOType.Out, name="tx")
247 self.u_rx = self.add_io(iotype=IOType.In, name="rx")
248
249 This results in the creation of:
250
251 * Two Records, one of type In named rx, the other an output
252 named tx
253 * Each Record contains a pair of sub-Records: one core-side
254 and the other pad-side
255 * Entries in the Boundary Scan Shift Register which if set
256 may control (or read) either the peripheral / core or
257 the IO PAD
258 * A suite of Muxes (as shown in the diagrams above) which
259 allow either direct connection between pad and core
260 (bypassing JTAG) or interception
261
262 During Interception Mode (Scanning) pad and core are connected
263 to the Shift Register. During "Production" Mode, pad and
264 core are wired directly to each other (on a per-pin basis,
265 for every pin. Clearly this is a lot of work).
266
267 It is then your responsibility to:
268
269 * connect up each and every peripheral input and output
270 to the right IO Core Record in your HDL
271 * connect up each and every IO Pad input and output
272 to the right IO Pad in the Platform.
273 * **This does not happen automatically and is not the
274 responsibility of the TAP Interface, it is yours**
275
276 The TAP interface connects the **other** side of the pads
277 and cores Records: **to the Muxes**. You **have** to
278 connect **your** side of both core and pads Records in
279 order for the Scan to be fully functional.
280
281 Both of these tasks are painstaking and tedious in the
282 extreme if done manually, and prone to either sheer boredom,
283 transliteration errors, dyslexia triggering or just utter
284 confusion. Despite this, let us proceed, and, augmenting
285 the Blinky example, wire up a JTAG instance:
286
287 class Blinker(Elaboratable):
288 def elaborate(self, platform):
289 m = Module()
290 m.submodules.jtag = jtag = JTAG()
291
292 # get the records from JTAG instance
293 utx, urx = jtag.u_tx, jtag.u_rx
294 # get the UART resource, mess with the output tx
295 p_uart = platform.request('uart')
296
297 # uart core-side from JTAG
298 intermediary = Signal()
299 m.d.comb += utx.core.o.eq(~intermediary) # invert, for fun
300 m.d.comb += intermediary.eq(urx.core.i) # pass rx to tx
301
302 # wire up the IO Pads (in right direction) to Platform
303 m.d.comb += uart.rx.eq(utx.pad.i) # receive rx from JTAG input pad
304 m.d.comb += utx.pad.o.eq(uart.tx) # transmit tx to JTAG output pad
305 return m
306
307 Compared to the non-scan-capable version, which connected UART
308 Core Tx and Rx directly to the Platform Resource (and the Platform
309 took care of wiring to IO Pads):
310
311 * Core HDL is instead wired to the core-side of JTAG Scan
312 * JTAG Pad side is instead wired to the Platform
313 * (the Platform still takes care of wiring to actual IO Pads)
314
315 JTAG TAP capability on UART TX and RX has now been inserted into
316 the chain. Using openocd or other program it is possible to
317 send TDI, TMS, TDO and TCK signals according to IEEE 1149.1 in order
318 to intercept both the core and IO Pads, both input and output,
319 and confirm the correct functionality of one even if the other is
320 broken, during ASIC testing.
321
322 ## Libre-SOC Automatic Boundary Scan
323
324 Libre-SOC's JTAG TAP Boundary Scan system is a little more sophisticated:
325 it hooks into (replaces) ResourceManager.request(), intercepting the request
326 and recording what was requested. The above manual linkup to JTAG TAP
327 is then taken care of **automatically and transparently**, but to
328 all intents and purposes looking exactly like a Platform even to
329 the extent of taking the exact same list of Resources.
330
331 class Blinker(Elaboratable):
332 def __init__(self, resources):
333 self.jtag = JTAG(resources)
334
335 def elaborate(self, platform):
336 m = Module()
337 m.submodules.jtag = jtag = self.jtag
338
339 # get the UART resource, mess with the output tx
340 uart = jtag.request('uart')
341 intermediary = Signal()
342 m.d.comb += uart.tx.eq(~intermediary) # invert, for fun
343 m.d.comb += intermediary.eq(uart.rx) # pass rx to tx
344
345 return jtag.boundary_elaborate(m, platform)
346
347 Connecting up and building the ASIC is as simple as a non-JTAG,
348 non-scanning-aware Platform:
349
350 resources = create_resources()
351 asic = ASICPlatform(resources)
352 hdl = Blinker(resources)
353 asic.build(hdl)
354
355 The differences:
356
357 * The list of resources was also passed to the HDL Module
358 such that JTAG may create a complete identical list
359 of both core and pad matching Pins
360 * Resources were requested from the JTAG instance,
361 not the Platform
362 * A "magic function" (JTAG.boundary_elaborate) is called
363 which wires up all of the seamlessly intercepted
364 Platform resources to the JTAG core/pads Resources,
365 where the HDL connected to the core side, exactly
366 as if this was a non-JTAG-Scan-aware Platform.
367 * ASICPlatform still takes care of connecting to actual
368 IO Pads, except that the Platform.resource requests were
369 triggered "behind the scenes". For that to work it
370 is absolutely essential that the JTAG instance and the
371 ASICPlatform be given the exact same list of Resources.
372
373
374 ## Clock synchronisation
375
376 Take for example USB ULPI:
377
378 <img src="https://www.crifan.com/files/pic/serial_story/other_site/p_blog_bb.JPG"
379 width=400 />
380
381 Here there is an external incoming clock, generated by the PHY, to which
382 both Received *and Transmitted* data and control is synchronised. Notice
383 very specifically that it is *not the main processor* generating that clock
384 Signal, but the external peripheral (known as a PHY in Hardware terminology)
385
386 Firstly: note that the Clock will, obviously, also need to be routed
387 through JTAG Boundary Scan, because, after all, it is being received
388 through just another ordinary IO Pad, after all. Secondly: note that
389 if it didn't, then clock skew would occur for that peripheral because
390 although the Data Wires went through JTAG Boundary Scan MUXes, the
391 clock did not. Clearly this would be a problem.
392
393 However, clocks are very special signals: they have to be distributed
394 evenly to all and any Latches (DFFs) inside the peripheral so that
395 data corruption does not occur because of tiny delays.
396 To avoid that scenario, Clock Domain Crossing (CDC) is used, with
397 Asynchronous FIFOs:
398
399 rx_fifo = stream.AsyncFIFO([("data", 8)], self.rx_depth, w_domain="ulpi", r_domain="sync")
400 tx_fifo = stream.AsyncFIFO([("data", 8)], self.tx_depth, w_domain="sync", r_domain="ulpi")
401 m.submodules.rx_fifo = rx_fifo
402 m.submodules.tx_fifo = tx_fifo
403
404 However the entire FIFO must be covered by two Clock H-Trees: one
405 by the ULPI external clock, and the other the main system clock.
406 The size of the ULPI clock H-Tree, and consequently the size of
407 the PHY on-chip, will result in more Clock Tree Buffers being
408 inserted into the chain, and, correspondingly, matching buffers
409 on the ULPI data input side likewise must be inserted so that
410 the input data timing precisely matches that of its clock.
411
412 The problem is not receiving of data, though: it is transmission
413 on the output ULPI side. With the ULPI Clock Tree having buffers
414 inserted, each buffer creates delay. The ULPI output FIFO has to
415 correspondingly be synchronised not to the original incoming clock
416 but to that clock *after going through H Tree Buffers*. Therefore,
417 there will be a lag on the output data compared to the incoming
418 (external) clock
419
420 # Pinmux GPIO Block
421
422 The following diagram is an example of a mux'd GPIO block that comes from the
423 Ericson presentation on a GPIO architecture.
424
425 [[!img gpio-block.svg size="800x"]]
426
427 ## Our Pinmux Block
428
429 The block we are developing is very similar, but is lacking some of
430 configuration of the former (due to complexity and time constraints).
431
432 The implemented pinmux uses two sub-blocks:
433
434 1. A Wishbone controlled N-GPIO block.
435
436 1. N-port I/O multiplexer (for current usecase set to 4
437 ports).
438
439 ### Terminology
440
441 For clearer explanation, the following definitions will be used in the text.
442 As the documentation is actively being written, the experimental code may not
443 adhere to these all the time.
444
445 * Bank - A group of contiguous pins under a common name.
446 * Pin - Bi-directional wire connecting to the chip's pads.
447 * Function (pin) - A signal used by the peripheral.
448 * Port - Bi-directional signal on the peripheral and pad sides of the
449 multiplexer.
450 * Muxwidth - Number of input ports to the multiplexers
451 * PinMux - Multiplexer for connecting multiple peripheral functions to one IO
452 pad.
453
454 For example:
455
456 A 128-pin chip has 4 banks N/S/E/W corresponding to the 4 sides of the chip.
457 Each bank has 32 pins. Each pin can have up to 4 multiplexed functions, thus
458 the multiplexer width for each pin is 4.
459
460 ### PinSpec Class
461
462 * <https://git.libre-soc.org/?p=pinmux.git;a=blob;f=src/spec/base.py;h=c8fa2b09d650b1b7cfdb499bfe711a3ebaf5848b;hb=HEAD> PinSpec class
463 defined here.
464
465 PinSpec is a powerful construct designed to hold all the necessary information
466 about the chip's banks. This includes:
467
468 1. Number of banks
469 1. Number of pins per each bank
470 1. Peripherals present (UART, GPIO, I2C, etc.)
471 1. Mux configuration for every pin (mux0: gpio, mux1: uart tx, etc.)
472 1. Peripheral function signal type (out/in/bi)
473
474 ### Pinouts
475
476 * <https://git.libre-soc.org/?p=pinmux.git;a=blob;f=src/spec/interfaces.py;h=f5ecf4817ba439b607a1909a4fcb6aa2589e2afd;hb=HEAD> Pinouts class
477 defined here.
478
479 The Pinspec class inherits from the Pinouts class, which allows to view the
480 dictionaries containing bank names and pin information.
481
482 * keys() - returns dict_key object of all the pins (summing all the banks) which
483 is iterable
484 * items() - a dict_key of pins, with each pin's mux information
485
486 For example, PinSpec object 'ps' has one bank 'A' with 4 pins
487 ps.keys() returns dict_keys([0, 1, 2, 3])
488 ps.items() returns
489 dict_items([(0, {0: ('GPIOA_A0', 'A'), 1: ('UART0_TX', 'A'),
490 2: ('TWI0_SDA', 'A')}),
491 (1, {0: ('GPIOA_A1', 'A'), 1: ('UART0_RX', 'A'),
492 2: ('TWI0_SCL', 'A')}),
493 (2, {0: ('GPIOA_A2', 'A')}), (3, {0: ('GPIOA_A3', 'A')})])
494
495 ### PinGen
496
497 pinfunctions.py contains the "pinspec" list containing the Python functions
498 which generate the necessary signals for gpio, uart, i2c, etc. (with IOType
499 information). PinGen class uses "__call__" and "pinspec" to effectively create a
500 Lambda function for generating specified peripheral signals.
501
502 ## The GPIO block
503
504 *NOTE !* - Need to change 'bank' terminology for the GPIO block in doc and code!
505
506 [[!img n-gpio.svg size="600x"]]
507
508 The GPIO module is multi-GPIO block integral to the pinmux system.
509 To make the block flexible, it has a variable number of of I/Os based on an
510 input parameter.
511
512 ### Configuration Word
513
514 After a discussion with Luke on IRC (14th January 2022), new layout of the
515 8-bit data word for configuring the GPIO (through WB):
516
517 * oe - Output Enable (see the Ericson presentation for the GPIO diagram)
518 * ie - Input Enable *(Not used, as IOPad only supports i/o/oe)*
519 * puen - Pull-Up resistor enable
520 * pden - Pull-Down resistor enable
521 * i/o - When configured as output (oe set), this bit sets/clears output. When
522 configured as input, shows the current state of input (read-only)
523 * bank[2:0] - Bank Select *(only 4 banks used, bank[2] used for JTAG chain)*
524
525 ### Simultaneous/Packed Configuration
526
527 To make the configuration more efficient, multiple GPIOs can be configured with
528 one data word. The number of GPIOs in one "row" is dependent on the WB data bus
529 *width* and *granuality* (see Wishbone B4 spec, section 3.5 Data Organization
530 for more details).
531
532 If for example, the data bus is 64-bits wide and granuality is 8, eight GPIO
533 configuration bytes - and thus eight GPIOs - can be configured in one go.
534 To configure only certain GPIOs, the WB sel signal can be used (see next
535 section).
536
537 *(NOTE: Currently the code doesn't support granuality higher than 8)*
538
539 The diagram below shows the layout of the configuration byte.
540
541 [[!img gpio-config-word.jpg size="600x"]]
542
543 If the block is created with more GPIOs than can fit in a single data word,
544 the next set of GPIOs can be accessed by incrementing the address.
545 For example, if 16 GPIOs are instantiated and 64-bit data bus is used, GPIOs
546 0-7 are accessed via address 0, whereas GPIOs 8-15 are accessed by address 1.
547
548 ### Example Memory Map
549
550 [[!img gpio-mem-layout.jpg size="600x"]]
551
552 The diagrams above show the difference in memory layout between 16-GPIO block
553 implemented with 64-bit and 32-bit WB data buses.
554 The 64-bit case shows there are two rows with eight GPIOs in each, and it will
555 take two writes (assuming simple WB write) to completely configure all 16 GPIOs.
556 The 32-bit on the other hand has four address rows, and so will take four write transactions.
557
558 64-bit:
559
560 * 0x00 - Configure GPIOs 0-7 - requires 8-bit `sel` one bit per GPIO
561 * 0x01 - Configure GPIOs 8-15 - requires 8-bit `sel` one bit per GPIO
562
563 32-bit:
564
565 * 0x00 - Configure GPIOs 0-3 - requires 4-bit `sel` one bit per GPIO
566 * 0x01 - Configure GPIOs 4-7 - requires 4-bit `sel` one bit per GPIO
567 * 0x02 - Configure GPIOs 8-11 - requires 4-bit `sel` one bit per GPIO
568 * 0x03 - Configure GPIOs 12-15 - requires 4-bit `sel` one bit per GPIO
569
570 Here is the pseudocode for reading the GPIO
571 data structs:
572
573 read_bytes = []
574 for i in range(len(sel)):
575 GPIO_num = adr*len(sel)+i
576 if sel[i]:
577 read_bytes.append(GPIO[GPIO_num])
578 else:
579 read_bytes.append(Const(0, 8))
580 if not wen:
581 dat_r.eq(Cat(read_bytes))
582
583 and for writing, slightly different style:
584
585 if wen:
586 write_bytes = []
587 for i in range(len(sel)):
588 GPIO_num = adr*len(sel)+i
589 write_byte = dat_w.bit_select(i*8, 8)
590 if sel[i]:
591 GPIO[GPIO_num].eq(write_byte)
592
593 As explained in this video <https://m.youtube.com/watch?v=Pf6gmDQnw_4>
594 if each GPIO is mapped to one single byte, and it is assumed that
595 the `sel` lines are enough to **always** give byte-level read/write
596 then the GPIO number *becomes* the Memory-mapped byte number, hence
597 the use of `len(sel)` above. `len(dat_r)//8` would do as well
598 because these should be equal.
599
600
601 ## The IO Mux block
602
603 [[!img iomux-4bank.svg size="600x"]]
604
605 This block is an N-to-1 (4-port shown above) mux and it simultaneously connects:
606
607 * o/oe signals from one of N peripheral ports, to the pad output port
608
609 * i pad port signal to one of N peripheral ports (the rest being set to 0).
610
611 The block is then used in a higher-level pinmux block, and instantiated for each
612 pin.
613
614 ## Combined Block
615
616 *NOTE !* - Need to change 'bank' terminology for the GPIO block in doc and code!
617
618 [[!img pinmux-1pin.svg size="600x"]]
619
620 The GPIO and IOMux blocks are combined in a single block called the
621 Pinmux block.
622
623 By default, bank 0 is hard-wired to the memory-mapped WB bus GPIO. The CPU
624 core can just write the configuration word to the GPIO row address. From this
625 perspective, it is no different to a conventional GPIO block.
626
627 Bank select, allows to switch over the control of the IO pad to
628 another peripheral. The peripheral will be given sole connectivity to the
629 o/oe/i signals, while additional parameters such as pull up/down will either
630 be automatically configured (as the case for I2C), or will be configurable
631 via the WB bus. *(This has not been implemented yet, so open to discussion)*
632
633 ### Bank Select Options
634
635 * bank 0 - WB bus has full control (GPIO peripheral)
636 * bank 1,2,3 - WB bus only controls puen/pden, periphal gets o/oe/i
637 (whether ie should be routed out is not finalised yet)
638
639
640 ### Adding JTAG BS Chain to the Pinmux block (In Progress)
641
642 The JTAG BS chain need to have access to the bank select bits, to allow
643 selecting different peripherals during testing. At the same time, JTAG may
644 also require access to the WB bus to access GPIO configuration options
645 not available to bank 1/2/3 peripherals.
646
647 The proposed JTAG BS chain is as follows:
648
649 * Connect puen/pden/bank from GPIO block to the IOMux through JTAG BS chain.
650 * Connect the i/o/oe pad port from IOMux via JTAG BS chain.
651 * (?) Test port for configuring GPIO without WB? - utilising bank bit 2?
652 * (?) Way to drive WB via JTAG?
653
654 Such a setup would allow the JTAG chain to control the bank select when testing
655 connectivity of the peripherals, as well as give full control to the GPIO
656 configuration when bank select bit 2 is set.
657
658 For the purposes of muxing peripherals, bank select bit 2 is ignored. This
659 means that even if JTAG is handed over full control, the peripheral is
660 still connected to the GPIO block (via the BS chain).
661
662 Signals for various ports:
663
664 * WB bus or Periph0: WB data read, data write, address, sel, cyc, stb, ack
665 * Periph1/2/3: o,oe,i (puen/pden are only controlled by WB, test port, or
666 fixed by functionality; ie not used yet)
667 * (?) Test port: bank[2:0], o,oe,i,ie,puen,pden. In addition, internal
668 address to access individual GPIOs will be available (this will consist of a
669 few bits, as more than 16 GPIOs per block is likely to be to big).
670
671 As you can see by the above list, the pinmux block is becoming quite a complex
672 beast. If there are suggestions to simplify or reduce some of the signals,
673 that will be helpful.
674
675 The diagrams above showed 1-bit GPIO connectivity. Below you'll find the
676 4-bit case *(NOT IMPLEMENTED YET)*.
677
678 [[!img gpio_jtag_4bit.jpg size="600x"]]
679
680 # Core/Pad Connection + JTAG Mux
681
682 Diagram constructed from the nmigen plat.py file.
683
684 [[!img i_o_io_tristate_jtag.svg ]]
685